Brian Christian
Author of Algorithms to Live By: The Computer Science of Human Decisions
7 Works 2,417 Members 67 Reviews 1 Favorited
About the Author
Brian Christian is the author of the acclaimed bestsellers The Most Human Human and Algorithms to Live By (with Tom Griffiths), which have been translated into nineteen languages. A visiting scholar at the University of California, Berkeley, he lives in San Francisco.
Includes the name: Brian Christian (Author)
Works by Brian Christian
The Most Human Human: What Talking with Computers Teaches Us About What It Means to Be Alive (2011) 570 copies, 15 reviews
Algorithms at Work 2 copies
Tagged
AI (22)
algorithms (43)
artificial intelligence (35)
audible (22)
audio (7)
audiobook (14)
business (12)
cognitive science (8)
computer science (60)
computers (37)
computing (11)
currently-reading (16)
decision making (32)
ebook (21)
ethics (10)
goodreads (22)
goodreads import (11)
Kindle (19)
machine learning (7)
math (24)
neuroscience (6)
non-fiction (138)
owned (7)
philosophical anthropology (6)
philosophy (48)
popular science (7)
problem solving (10)
programming (7)
psychology (64)
read (12)
science (76)
self-help (17)
sociology (6)
tech (9)
technology (61)
to-read (389)
Turing test (9)
unread (7)
vintiquebooks (6)
wishlist (8)
Common Knowledge
- Birthdate
- 1984
- Gender
- male
- Nationality
- USA
- Birthplace
- Wilmington, Delaware, USA
- Places of residence
- San Francisco, California, USA
Little Silver, New Jersey, USA - Education
- Brown University
University of Washington
High Technology High School, Lincroft, New Jersey, USA - Occupations
- author
poet - Short biography
- Brian Christian is the author of The Most Human Human, which was named a Wall Street Journal bestseller and a New Yorker favorite book of 2011, and has been translated into ten languages.
His writing has appeared in The New Yorker, The Atlantic, Wired, The Wall Street Journal, The Guardian, The Paris Review, and in scientific journals such as Cognitive Science. Christian has been featured on The Charlie Rose Show and The Daily Show with Jon Stewart and has lectured at Google, Microsoft, the Santa Fe Institute, and the London School of Economics. His work has won several awards, including fellowships at Yaddo and the MacDowell Colony, publication in Best American Science & Nature Writing, and an award from the Academy of American Poets.
Born in Wilmington, Delaware, Christian holds degrees in philosophy, computer science, and poetry from Brown University and the University of Washington. He lives in San Francisco.
http://www.brian-christian.com/bio-co...
Members
Reviews
Flagged
cpg | 4 other reviews | Aug 22, 2024 | Such an interesting title but the book seems like a letdown. "Algorithms to Live By" can be lived by only by those who know how to conduct a statistical analysis on their own. While the concepts in many chapters are interesting, the book as a whole is too technical and utterly impractical. Only those who read it as a Statistical textbook will derive something valuable from it.
Statistics is such an interesting subject. I wish the book could have made this subject appealing to novice readers rather than driving them further away from the subject, which I'm sure it will.
Not a waste of time, but certainly not a must-read.… (more)
Statistics is such an interesting subject. I wish the book could have made this subject appealing to novice readers rather than driving them further away from the subject, which I'm sure it will.
Not a waste of time, but certainly not a must-read.… (more)
Flagged
RoshReviews | 46 other reviews | Jul 30, 2024 | https://www.goodreads.com/review/show/6272678124
I am thoroughly impressed with Christian’s documentation of AI’s development and emergence from nascent geekery to world-altering capital-T Thing. This book released in 2020, and a mere 3.5 years later basically every tech product you’re likely to see has had “AI” thrown at the front or back of its name. There is so much fear, uncertainty, and doubt around this technology that half of the conversations I’m in that involve it seem to want to resolve into people fleeing for the woods.
Christian does a good job of documenting the historical, psychological, ethical, and epistemological origins of AI. I was particularly drawn to the psychological analogies, many of which surprised me. I rented this from the library in physical form and so to save my notes for future reference had to painstakingly write page numbers on index cards and go back to scan/dictate the text to my Notes app, but I’m posting those here for my convenience.
—
Notes:
The Alignment Problem
P30 - In one of the first articles explicitly addressing the notion of bias in computing systems, the University of Washington's Batya Friedman and Cornell's Helen Nissenbaum had warned that "computer systems, for instance, are comparatively inexpensive to disseminate, and thus, once developed, a biased system has the potential for widespread impact. If the system becomes a standard in the field, the bias becomes pervasive.", ^40 (Representation)
P49 - As Princeton's Arvind Narayanan puts it: "Contrary to the 'tech moves too fast for society to keep up' cliché, commercial deployments of tech often move glacially-just look at the banking and airline mainframes still running. ML [machine-learning] models being trained today might still be in production in 50 years, and that's terrifying." ^93 (Representation)
Feedback loops
“Machine learning is not, by default, fair or just in any meaningful way.” - Moritz Hardt (^3, Fairness)
“No machinery is more efficient than the human element that operates it.” (??)
“One of the most important things in any prediction is to make sure that you’re actually predicting what you think you’re predicting. This is harder than it sounds.”
P123 - Thorndike sees here the makings of a bigger, more general law of nature. As he puts it, the results of our actions are either "satisfying" or "annoying." When the result of an action is "satisfying," we tend to do it more. When on the other hand the outcome is "annoying," we'll do it less. The more clear the connection between action and outcome, the stronger the resulting change. Thorndike calls this idea, perhaps the most famous and durable of his career, "the law of effect."
As he puts it:
The Law of Effect is: When a modifiable connection between a situation and a response is made and is accompanied or followed by a satisfying state of affairs, that connection's strength is increased: When made and accompanied or followed by an annoying state of affairs its strength is decreased. The strengthening effect of satisfyingness (or the weakening effect of annoy-ingness upon a bond varies with the closeness of the connection between it and the bond. ^7 (Reinforcement)
P127 - Continuing to develop machines that could learn, in other words—by human instruction or their own experience-would alleviate the need for programming. Moreover it would enable computers to do things we didn't know how to program them to do.
P141 - “this is apparently the first application of this algorithm to a complex non-trivial task,” TESAURO wrote. Re: use of algorithms to play go, I got you got, learning from guesses steadily coming to learn one adventure position look like. … “it is spelling it, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong, intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact, surpasses comparable networks trained on a massive human expert data set. This indicates that TD learning may work better in practice than one would expect based on current theory.“
P151 - Meanwhile, we take up another question. Reinforcement learning in its classical form takes for granted the structure of the rewards in the world and asks the question of how to arrive at the behavior-the "policy" —that maximally reaps them. But in many ways this obscures the more interesting—and more dire—matter that faces us at the brink of Al. We find ourselves rather more interested in the exact opposite of this question: Given the behavior we want from our machines, how do we structure the environment's rewards to bring that behavior about?
How do we get what we want when it is we who sit in the back of the audience, in the critic's chair—we who administer the food pellets, or their digital equivalent?
This is the alignment problem, in the context of a reinforcement learner. Though the question has taken on a new urgency in the last five to ten years, as we shall see it is every bit as deeply rooted in the past as reinforcement learning itself.
P160 - But Miyamoto had a problem. There are also good mushrooms, which you have to learn, not to dodge, but to seek. "This gave us a real head-ache," he explains. "We needed somehow to make sure the player understood that this was something really good." So now what? The good mushroom approaches you in an area where you have too little headroom to easily jump over it-you brace for impact, but instead of killing you, it makes you double in size. The mechanics of the game have been established, and now you are let loose. You think you are simply playing.
But you are carefully, precisely, inconspicuously being trained. You learn the rule, then you learn the exception. You learn the basic mechanics, then you are given free rein.
P161 - in both cases, the use of a curriculum – an easier version of the problem, followed by a harder version – succeeding in cases we’re trying to learn the more difficult problem by itself could not.
P169 - "As a general rule," says Russell, "it is better to design performance measures according to what one actually wants in the environment, rather than according to how one thinks the agent should behave.”^50 Put differently, the key insight is that we should strive to reward states of the world, not actions of our agent. These states typically represent "progress" toward the ultimate goal, whether that progress is represented in physical distance or in something more conceptual like completed subgoals (chapters of a book, say, or portions of a mechanical assembly). (^50 Shaping).
P185 - Learned helplessness; “As the celebrated aphorist Ashleigh Brilliant put it, “If you’re careful enough, nothing bad or good will ever happen to you.” ^11 (Curiosity)
P202 - All rewards are internal. ^61 (Curiosity).
P222 - Conway lloyd Morgan - “Five minutes’ demonstration is worth more than five hours’ talking where the object is to impart skill. It is of comparatively little use to describe or explain how a skilled feat is to be accomplished; it is far more helpful to show how it is done.” ^32 (Imitation)
P228 - At its root, the problem stems from the fact that the learner sees an expert execution of the problem, and an expert almost never gets into trouble. No matter how good the learner is, though, they will make mistakes – whether blatant or supple. But because the learner never saw the expert get into trouble, they have also never seen the expert get out. In fact, when the beginner makes beginner mistakes, they may end up in a situation that is completely different from anything they saw during their observation of the expert. “That means,“ says Sergey Levine, “that, you know, all bets are off.” (Cascading errors).
P247 - Eliezer Yudkowsky, cofounder of the Machine Intelligence Research Institute, wrote an influential 2004 manuscript in which he argues for imbuing machines, not simply to imitate and hold or norms as we imperfectly embody them, but rather, we should instill in machines what he calls our “coherent extrapolated volition.“ “In poetic terms, “he writes, “our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wish we were.“
P251 - Warneken, along with his collaborator Michael Tomasello of Duke, was the first to systematically show, in 2006, that human infants as young as eighteen months old will reliably identify a fellow human facing a problem, will identify the human's goal and the obstacle in the way, and will spontaneously help if they can-even if their help is not requested, even if the adult doesn't so much as make eye contact with them, and even when they expect (and receive) no reward for doing so.^2 (Inference)
P261 - We are now, it is fair to say, well beyond the point where our machines can do only that which we can program into them in the explicit language of math and code.
P268 - Russell dubbed this new framework cooperative inverse reinforcement learning ("CIRL," for short).^40 In the CIRL formulation, the human and the computer work together to jointly maximize a single reward function - and initially only the human knows what it is.
“We we’re trying to think, what’s the simplest change we can make to the current math and the current theoretical systems that fixes the theory that leads to these sort of existential-risk problems?“ says Hadfield-Menell. “What is a math problem where the optimal thing is what we actually want?“^41 (Inference)
P282 - He has the students play games where they must decide which side of various bets to take, figuring out how to turn their beliefs and hunches into probabilities, and deriving the laws of probability theory from scratch. They are games of epistemology: What do you know? And what do you believe? And how confident are you, exactly? "That gives you a very good tool for machine learning," says Gal, "to build algorithms—to build computational tools —that can basically use these sorts of principles of rationality to talk about uncertainty." (…) Gal: “I wouldn’t rely on a model that couldn’t tell me whether it’s actually certain about its predictions.” (re: uncertainty in models and models communicating uncertainty; ensembling; dropouts…).^14
There's a certain irony here, in that deep learning--despite being deeply rooted in statistics—has, as a rule, not made uncertainty a first-class citizen.
Note from TB: thinking about uncertainty in prioritization. Weighing measures in a prioritization algorithm.
P292 - Another researcher who has been focused on these problems in recent years is DeepMind's Victoria Krakovna. Krakovna notes that one of the big problems with penalties for impact is that in some cases, achieving a specific goal necessarily requires high-impact actions, but this could lead to what's called "offsetting": taking further high-impact actions to counterbalance the earlier ones. This isn't always bad: if the system makes a mess of some kind, we probably want it to clean up after itself. But sometimes these "offsetting" actions are problematic. We don't want a system that cures someone's fatal illness but then-to nullify the high impact of the cure-kills them. ^43 (Uncertainty)
Note from TB: thinking about uncertainty in prioritization again, and how to measure / quantify “impact on PEH,” in algorithm. What is the impact of each stage of the prioritization process, from inflow to referral, etc.
P294 - Turner’s idea is that the reason we care about the Shanghai Stock Exchange, or the integrity of our cherished vase, or, for that matter, the ability to move boxes around the virtual warehouse, is it those things for whatever reason matter to us, and they matter to us because they are ultimately in some way or other tied to our goals. We want to save for retirement, put flowers in the vase, complete the sokoban level. What if we model this idea of goals explicitly? His proposal goes by the name “attainable utility preservation“: giving the system a set of auxiliary goals in the game environment, and making sure that it can still effectively pursue these auxiliary goals after it’s done whatever points-scoring actions the game incentivizes. Fascinatingly, the mandate to preserve a tangible utility seems to foster good behavior in the AI safety gridworlds even when the auxiliary goals are generated at random. ^49 (Uncertainty)
P295 - One of the most chilling and prescient quotations in the field of AI safety comes in a famous 1960 article on the "Moral and Technical Consequences of Automation" by MIT's Norbert Wiener: "If we use, to achieve our purposes, a mechanical agency with whose operation we cannot efficiently interfere once we have started it... then we had better be quite sure that the purpose put into the machine is the purpose which we really desire and not merely a colorful imitation of it."^51 It is the first succinct expression of the alignment problem.
No less crucial, however, is this statement's flip side: If we were not sure that the objectives and constraints we gave the machine entirely and perfectly specified what we did and didn't want the machine to do, then we had better be sure we can intervene. In the Al safety literature, this concept goes by the name of “corrigibility,” and—soberingly—it’s a whole lot more complicated than it seems.^52 (Uncertainty)
P299 - But, they found, there's a major catch. If the system's model of what you care about is fundamentally "misspecified"-there are things you care about of which it's not even aware and that don't even enter into the system's model of your rewards-then it's going to be confused about your motivation. For instance, if the system doesn't understand the subtleties of human appetite, it may not understand why you requested a steak dinner at six o'clock but then declined the opportunity to have a second steak dinner at seven o'clock. If locked into an oversimplified or misspecified model where steak (in this case) must be entirely good or entirely bad, then one of these two choices, it concludes, must have been a mistake on your part. It will interpret your behavior as "irrational," and that, as we've seen, is the road to incorrigibility, to disobedience."^63 (Uncertainty)
——
Notes
Representation
* 40 - Friedman and Nissenbaum, “Bias in Computer Systems.”
* 93 - Narayanan on Twitter: https://twitter.com/random_walker/sta...
Fairness
* 3 - Hardt, “How Big Data Is Unfair.”
Reinforcement
* 7 - Thorndike, The Psychology of Learning.
Shaping
* 50 - Russell and Norvid, Artificial Intelligence.
Curiosity
* 11 - See Henry Alford, “The Wisdom of Ashleigh Brilliant,” http://www.ashleighbrilliant.com/Bril..., excerpted from Alford, How to Live (New York: Twelve, 2009).
* 61 - Singh, Lewis, and Barto. For more discussion, see Oudeyer and Kaplan, “What Is Intrinsic Motivation?”
* Sing, Lewis, and Barto — “Where Do Rewards Come From?” In “Proceedings of the Annual Conference of the Cognitive Science Society,” 2601-06. 2009.
Imitation
* 32 - Morgan, “An Introduction to Comparative Psychology.”
Inference
* 2 - See also Meltzoff, “Understanding the intentions of Others” which showed that eighteen-month olds can successfully imitate the intended acts that adults tried and failed to do, indicating that they ‘situate people within a psychological framework that differentiates between the surface behavior of people and a deeper level involving goals and intentions.’
* The citation for the Warneken paper: Warneken, Felix, and Michael Tomasello. “Altruistic Helping in Human Infants and Young Chimpanzees.” Science 311, no. 5765 (2006): 1301-03.
* 40 - Hadfield-Menell et al., “Cooperative Inverse Reinforcement Learning.” (“CIRL” is pronounced with a soft c, homophonous with the last name of strong AI skeptic John Searle (no relation). I have agitated within the community that a hard c “curl” pronunciation makes more sense, given that “cooperative” uses a hard c, but it appears the die is cast.).
* Note from TB: I agree w/ the hard c note.
* 41 - Dylan Hadfield-Menell, personal interview, March 15, 2018.
Uncertainty
* 14 - Yarin Gal, “Modern Deep Learning Through Bayesian Eyes” (lecture), Microsoft Research, December 11, 2015, https://www.microsoft.com/en-us/resea....
* 43 - As Eliezer Yudkowsky put it, “If you’re going to cure cancer, make sure the patient still dies!” See https://intelligence.org/2016/12/28/a.... See also Armstrong and Levinstein, “Low Impact Artificial Intelligence,” which uses the example of an asteroid headed for earth. A system constrained to only take “low-impact” actions might fail to divert it—or, perhaps even worse, a system capable of offsetting might divert the asteroid, saving the planet, and then blow the planet up anyway.
* 49 - Mind Safety Kesearer o
* designing-agent-incentives-to-avoid-side-effects-elac80ea6107.
* 49. Turner, Hadfield-Menell, and Tadepalli, "Conservative Agency via Attainable Utility Preservation." See also Turner's "Reframing Impact" sequence at http://www.alignmentforum.org/s/7Cdoz... and additional discussion in his "Towards a New Impact Measure," https://www.alignmentforum.org/ posts/yEa7kwoMpsBgaBCgb/towards-a-new-impact-measure; he writes, "I have a theory that AUP seemingly works for advanced agents not because the content of the attainable set's utilities actually matters, but rather because there exists a common utility achievement currency of power." See Turner, "Optimal Farsighted Agents Tend to Seek Power." For more on the notion of power in an Al safety context, including an information-theoretic account of "empowerment," see Amodei et al., "Concrete Problems in Al Safety," which, in turn, references Salge, Glackin, and Polani, "Empowerment: An Introduction," and Mohamed and Rezende, "Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning."
* 51 - Wiener, “Some Moral and Technical Consequences of Automation.”
* 52 - According to Paul Christiano, “corrigibility” as a tenet of AI safety began with Machine intelligence Research Institute’s Eliezer Yudkowsky, and the name itself came from Robert Miles. See Christiano’s “Corrigibility,” https://ai-alignment.com/corrigibilit....
* 63 - For more on corrigibility and model misspecification using this paradigm, see also, e.g., Carey, “Incorrigibility in the CIRL Framework.”… (more)
I am thoroughly impressed with Christian’s documentation of AI’s development and emergence from nascent geekery to world-altering capital-T Thing. This book released in 2020, and a mere 3.5 years later basically every tech product you’re likely to see has had “AI” thrown at the front or back of its name. There is so much fear, uncertainty, and doubt around this technology that half of the conversations I’m in that involve it seem to want to resolve into people fleeing for the woods.
Christian does a good job of documenting the historical, psychological, ethical, and epistemological origins of AI. I was particularly drawn to the psychological analogies, many of which surprised me. I rented this from the library in physical form and so to save my notes for future reference had to painstakingly write page numbers on index cards and go back to scan/dictate the text to my Notes app, but I’m posting those here for my convenience.
—
Notes:
The Alignment Problem
P30 - In one of the first articles explicitly addressing the notion of bias in computing systems, the University of Washington's Batya Friedman and Cornell's Helen Nissenbaum had warned that "computer systems, for instance, are comparatively inexpensive to disseminate, and thus, once developed, a biased system has the potential for widespread impact. If the system becomes a standard in the field, the bias becomes pervasive.", ^40 (Representation)
P49 - As Princeton's Arvind Narayanan puts it: "Contrary to the 'tech moves too fast for society to keep up' cliché, commercial deployments of tech often move glacially-just look at the banking and airline mainframes still running. ML [machine-learning] models being trained today might still be in production in 50 years, and that's terrifying." ^93 (Representation)
Feedback loops
“Machine learning is not, by default, fair or just in any meaningful way.” - Moritz Hardt (^3, Fairness)
“No machinery is more efficient than the human element that operates it.” (??)
“One of the most important things in any prediction is to make sure that you’re actually predicting what you think you’re predicting. This is harder than it sounds.”
P123 - Thorndike sees here the makings of a bigger, more general law of nature. As he puts it, the results of our actions are either "satisfying" or "annoying." When the result of an action is "satisfying," we tend to do it more. When on the other hand the outcome is "annoying," we'll do it less. The more clear the connection between action and outcome, the stronger the resulting change. Thorndike calls this idea, perhaps the most famous and durable of his career, "the law of effect."
As he puts it:
The Law of Effect is: When a modifiable connection between a situation and a response is made and is accompanied or followed by a satisfying state of affairs, that connection's strength is increased: When made and accompanied or followed by an annoying state of affairs its strength is decreased. The strengthening effect of satisfyingness (or the weakening effect of annoy-ingness upon a bond varies with the closeness of the connection between it and the bond. ^7 (Reinforcement)
P127 - Continuing to develop machines that could learn, in other words—by human instruction or their own experience-would alleviate the need for programming. Moreover it would enable computers to do things we didn't know how to program them to do.
P141 - “this is apparently the first application of this algorithm to a complex non-trivial task,” TESAURO wrote. Re: use of algorithms to play go, I got you got, learning from guesses steadily coming to learn one adventure position look like. … “it is spelling it, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong, intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact, surpasses comparable networks trained on a massive human expert data set. This indicates that TD learning may work better in practice than one would expect based on current theory.“
P151 - Meanwhile, we take up another question. Reinforcement learning in its classical form takes for granted the structure of the rewards in the world and asks the question of how to arrive at the behavior-the "policy" —that maximally reaps them. But in many ways this obscures the more interesting—and more dire—matter that faces us at the brink of Al. We find ourselves rather more interested in the exact opposite of this question: Given the behavior we want from our machines, how do we structure the environment's rewards to bring that behavior about?
How do we get what we want when it is we who sit in the back of the audience, in the critic's chair—we who administer the food pellets, or their digital equivalent?
This is the alignment problem, in the context of a reinforcement learner. Though the question has taken on a new urgency in the last five to ten years, as we shall see it is every bit as deeply rooted in the past as reinforcement learning itself.
P160 - But Miyamoto had a problem. There are also good mushrooms, which you have to learn, not to dodge, but to seek. "This gave us a real head-ache," he explains. "We needed somehow to make sure the player understood that this was something really good." So now what? The good mushroom approaches you in an area where you have too little headroom to easily jump over it-you brace for impact, but instead of killing you, it makes you double in size. The mechanics of the game have been established, and now you are let loose. You think you are simply playing.
But you are carefully, precisely, inconspicuously being trained. You learn the rule, then you learn the exception. You learn the basic mechanics, then you are given free rein.
P161 - in both cases, the use of a curriculum – an easier version of the problem, followed by a harder version – succeeding in cases we’re trying to learn the more difficult problem by itself could not.
P169 - "As a general rule," says Russell, "it is better to design performance measures according to what one actually wants in the environment, rather than according to how one thinks the agent should behave.”^50 Put differently, the key insight is that we should strive to reward states of the world, not actions of our agent. These states typically represent "progress" toward the ultimate goal, whether that progress is represented in physical distance or in something more conceptual like completed subgoals (chapters of a book, say, or portions of a mechanical assembly). (^50 Shaping).
P185 - Learned helplessness; “As the celebrated aphorist Ashleigh Brilliant put it, “If you’re careful enough, nothing bad or good will ever happen to you.” ^11 (Curiosity)
P202 - All rewards are internal. ^61 (Curiosity).
P222 - Conway lloyd Morgan - “Five minutes’ demonstration is worth more than five hours’ talking where the object is to impart skill. It is of comparatively little use to describe or explain how a skilled feat is to be accomplished; it is far more helpful to show how it is done.” ^32 (Imitation)
P228 - At its root, the problem stems from the fact that the learner sees an expert execution of the problem, and an expert almost never gets into trouble. No matter how good the learner is, though, they will make mistakes – whether blatant or supple. But because the learner never saw the expert get into trouble, they have also never seen the expert get out. In fact, when the beginner makes beginner mistakes, they may end up in a situation that is completely different from anything they saw during their observation of the expert. “That means,“ says Sergey Levine, “that, you know, all bets are off.” (Cascading errors).
P247 - Eliezer Yudkowsky, cofounder of the Machine Intelligence Research Institute, wrote an influential 2004 manuscript in which he argues for imbuing machines, not simply to imitate and hold or norms as we imperfectly embody them, but rather, we should instill in machines what he calls our “coherent extrapolated volition.“ “In poetic terms, “he writes, “our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wish we were.“
P251 - Warneken, along with his collaborator Michael Tomasello of Duke, was the first to systematically show, in 2006, that human infants as young as eighteen months old will reliably identify a fellow human facing a problem, will identify the human's goal and the obstacle in the way, and will spontaneously help if they can-even if their help is not requested, even if the adult doesn't so much as make eye contact with them, and even when they expect (and receive) no reward for doing so.^2 (Inference)
P261 - We are now, it is fair to say, well beyond the point where our machines can do only that which we can program into them in the explicit language of math and code.
P268 - Russell dubbed this new framework cooperative inverse reinforcement learning ("CIRL," for short).^40 In the CIRL formulation, the human and the computer work together to jointly maximize a single reward function - and initially only the human knows what it is.
“We we’re trying to think, what’s the simplest change we can make to the current math and the current theoretical systems that fixes the theory that leads to these sort of existential-risk problems?“ says Hadfield-Menell. “What is a math problem where the optimal thing is what we actually want?“^41 (Inference)
P282 - He has the students play games where they must decide which side of various bets to take, figuring out how to turn their beliefs and hunches into probabilities, and deriving the laws of probability theory from scratch. They are games of epistemology: What do you know? And what do you believe? And how confident are you, exactly? "That gives you a very good tool for machine learning," says Gal, "to build algorithms—to build computational tools —that can basically use these sorts of principles of rationality to talk about uncertainty." (…) Gal: “I wouldn’t rely on a model that couldn’t tell me whether it’s actually certain about its predictions.” (re: uncertainty in models and models communicating uncertainty; ensembling; dropouts…).^14
There's a certain irony here, in that deep learning--despite being deeply rooted in statistics—has, as a rule, not made uncertainty a first-class citizen.
Note from TB: thinking about uncertainty in prioritization. Weighing measures in a prioritization algorithm.
P292 - Another researcher who has been focused on these problems in recent years is DeepMind's Victoria Krakovna. Krakovna notes that one of the big problems with penalties for impact is that in some cases, achieving a specific goal necessarily requires high-impact actions, but this could lead to what's called "offsetting": taking further high-impact actions to counterbalance the earlier ones. This isn't always bad: if the system makes a mess of some kind, we probably want it to clean up after itself. But sometimes these "offsetting" actions are problematic. We don't want a system that cures someone's fatal illness but then-to nullify the high impact of the cure-kills them. ^43 (Uncertainty)
Note from TB: thinking about uncertainty in prioritization again, and how to measure / quantify “impact on PEH,” in algorithm. What is the impact of each stage of the prioritization process, from inflow to referral, etc.
P294 - Turner’s idea is that the reason we care about the Shanghai Stock Exchange, or the integrity of our cherished vase, or, for that matter, the ability to move boxes around the virtual warehouse, is it those things for whatever reason matter to us, and they matter to us because they are ultimately in some way or other tied to our goals. We want to save for retirement, put flowers in the vase, complete the sokoban level. What if we model this idea of goals explicitly? His proposal goes by the name “attainable utility preservation“: giving the system a set of auxiliary goals in the game environment, and making sure that it can still effectively pursue these auxiliary goals after it’s done whatever points-scoring actions the game incentivizes. Fascinatingly, the mandate to preserve a tangible utility seems to foster good behavior in the AI safety gridworlds even when the auxiliary goals are generated at random. ^49 (Uncertainty)
P295 - One of the most chilling and prescient quotations in the field of AI safety comes in a famous 1960 article on the "Moral and Technical Consequences of Automation" by MIT's Norbert Wiener: "If we use, to achieve our purposes, a mechanical agency with whose operation we cannot efficiently interfere once we have started it... then we had better be quite sure that the purpose put into the machine is the purpose which we really desire and not merely a colorful imitation of it."^51 It is the first succinct expression of the alignment problem.
No less crucial, however, is this statement's flip side: If we were not sure that the objectives and constraints we gave the machine entirely and perfectly specified what we did and didn't want the machine to do, then we had better be sure we can intervene. In the Al safety literature, this concept goes by the name of “corrigibility,” and—soberingly—it’s a whole lot more complicated than it seems.^52 (Uncertainty)
P299 - But, they found, there's a major catch. If the system's model of what you care about is fundamentally "misspecified"-there are things you care about of which it's not even aware and that don't even enter into the system's model of your rewards-then it's going to be confused about your motivation. For instance, if the system doesn't understand the subtleties of human appetite, it may not understand why you requested a steak dinner at six o'clock but then declined the opportunity to have a second steak dinner at seven o'clock. If locked into an oversimplified or misspecified model where steak (in this case) must be entirely good or entirely bad, then one of these two choices, it concludes, must have been a mistake on your part. It will interpret your behavior as "irrational," and that, as we've seen, is the road to incorrigibility, to disobedience."^63 (Uncertainty)
——
Notes
Representation
* 40 - Friedman and Nissenbaum, “Bias in Computer Systems.”
* 93 - Narayanan on Twitter: https://twitter.com/random_walker/sta...
Fairness
* 3 - Hardt, “How Big Data Is Unfair.”
Reinforcement
* 7 - Thorndike, The Psychology of Learning.
Shaping
* 50 - Russell and Norvid, Artificial Intelligence.
Curiosity
* 11 - See Henry Alford, “The Wisdom of Ashleigh Brilliant,” http://www.ashleighbrilliant.com/Bril..., excerpted from Alford, How to Live (New York: Twelve, 2009).
* 61 - Singh, Lewis, and Barto. For more discussion, see Oudeyer and Kaplan, “What Is Intrinsic Motivation?”
* Sing, Lewis, and Barto — “Where Do Rewards Come From?” In “Proceedings of the Annual Conference of the Cognitive Science Society,” 2601-06. 2009.
Imitation
* 32 - Morgan, “An Introduction to Comparative Psychology.”
Inference
* 2 - See also Meltzoff, “Understanding the intentions of Others” which showed that eighteen-month olds can successfully imitate the intended acts that adults tried and failed to do, indicating that they ‘situate people within a psychological framework that differentiates between the surface behavior of people and a deeper level involving goals and intentions.’
* The citation for the Warneken paper: Warneken, Felix, and Michael Tomasello. “Altruistic Helping in Human Infants and Young Chimpanzees.” Science 311, no. 5765 (2006): 1301-03.
* 40 - Hadfield-Menell et al., “Cooperative Inverse Reinforcement Learning.” (“CIRL” is pronounced with a soft c, homophonous with the last name of strong AI skeptic John Searle (no relation). I have agitated within the community that a hard c “curl” pronunciation makes more sense, given that “cooperative” uses a hard c, but it appears the die is cast.).
* Note from TB: I agree w/ the hard c note.
* 41 - Dylan Hadfield-Menell, personal interview, March 15, 2018.
Uncertainty
* 14 - Yarin Gal, “Modern Deep Learning Through Bayesian Eyes” (lecture), Microsoft Research, December 11, 2015, https://www.microsoft.com/en-us/resea....
* 43 - As Eliezer Yudkowsky put it, “If you’re going to cure cancer, make sure the patient still dies!” See https://intelligence.org/2016/12/28/a.... See also Armstrong and Levinstein, “Low Impact Artificial Intelligence,” which uses the example of an asteroid headed for earth. A system constrained to only take “low-impact” actions might fail to divert it—or, perhaps even worse, a system capable of offsetting might divert the asteroid, saving the planet, and then blow the planet up anyway.
* 49 - Mind Safety Kesearer o
* designing-agent-incentives-to-avoid-side-effects-elac80ea6107.
* 49. Turner, Hadfield-Menell, and Tadepalli, "Conservative Agency via Attainable Utility Preservation." See also Turner's "Reframing Impact" sequence at http://www.alignmentforum.org/s/7Cdoz... and additional discussion in his "Towards a New Impact Measure," https://www.alignmentforum.org/ posts/yEa7kwoMpsBgaBCgb/towards-a-new-impact-measure; he writes, "I have a theory that AUP seemingly works for advanced agents not because the content of the attainable set's utilities actually matters, but rather because there exists a common utility achievement currency of power." See Turner, "Optimal Farsighted Agents Tend to Seek Power." For more on the notion of power in an Al safety context, including an information-theoretic account of "empowerment," see Amodei et al., "Concrete Problems in Al Safety," which, in turn, references Salge, Glackin, and Polani, "Empowerment: An Introduction," and Mohamed and Rezende, "Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning."
* 51 - Wiener, “Some Moral and Technical Consequences of Automation.”
* 52 - According to Paul Christiano, “corrigibility” as a tenet of AI safety began with Machine intelligence Research Institute’s Eliezer Yudkowsky, and the name itself came from Robert Miles. See Christiano’s “Corrigibility,” https://ai-alignment.com/corrigibilit....
* 63 - For more on corrigibility and model misspecification using this paradigm, see also, e.g., Carey, “Incorrigibility in the CIRL Framework.”… (more)
Flagged
ThomasEB | 4 other reviews | Jul 4, 2024 | The authors review tried and true algorithms from the fields of mathematics and computer science and show how they can be applied to real world decisions that you and I may make every day. Not only is it a great demonstration of real-world applications, the book is quite fun to read.
Flagged
kokeyama | 46 other reviews | May 25, 2024 | Lists
Awards
You May Also Like
Associated Authors
Statistics
- Works
- 7
- Members
- 2,417
- Popularity
- #10,603
- Rating
- 4.0
- Reviews
- 67
- ISBNs
- 46
- Languages
- 7
- Favorited
- 1
I think the book will appeal to those interested in psychology; it feels like there was a lot of that. While there was definitely talk of ethics, I don't leave the book feeling very enlightened about practical ethics as it pertains to AI. And it feels like maybe another book on AI might deal with other branches of philosophy in ways that this book doesn't.
So this book may be for you. It wasn't for me.… (more)