The Rabbit Sends in a Little Bill (Alice, Ch. 4)

But it was the White Rabbit, trotting slowly back again, and looking anxiously about as it went as if it had lost something; and she heard it muttering to itself “The Duchess! The Duchess! Oh, my dear paws! Oh, my fur and whiskers! She’ll get me executed, as sure as ferrets are ferrets! Where can I have dropped them, I wonder?”

“What are you searching for, Mr. Rabbit?”, asked Alice.

“Isn’t it obvious?! What will I search for other than my thoughts? I had collected them to present them to the Duchess but now they are all gone! Oh, what am I to do.”, replied the Rabbit.

Alice asked, very solemnly, if she could help him somehow.

“Why, yes of course! Your conscience has now seeped well into the wonderland, and you should be able to find my thoughts. Let me give you the address.”, replied the rabbit.

Always curious, Alice quickly asked, “But do thoughts have addresses?”

The rabbit replied in a haste, “Yes, every thought is addressed by its preceding thought. And here is the address for the thoughts I’m missing: The Duchess was not happy with the AI model that I had created for her. It was a simple classifier, but she wanted me to make guesses and tell her what could possibly be wrong. So now, help me find my thoughts.”

Drawn by Lewis Carroll himself!

This was one thing Alice was pretty confident she could help with. After all, she had played around with so many models that she ought to know everything that could go wrong. She started with the usual suspects. But it turned out that the Rabbit had taken care of all of these things. Hence, Alice started talking about other potential problems the Duchess might have seen.

“It (the model) might not be robust to adversarial attacks.”, she said.

Before she even started talking, though, the white rabbit stopped her. He replied, “Images are very, very high-dimensional, but the real-life pictures the model gets as input are distributed on very, very low-dimensional submanifolds. So, if you specifically train a model to search for an artificial submanifold with incorrect predictions, you are very likely to find such examples. But the question is, do we really find such submanifolds in practice?”

An adversarial attack.

Alice did not know the answer to this, so she went in a completely different direction. Maybe there was an existential risk associated with the model. After all, Wonderland seemed to take its AI stuff very seriously.

She started with some AI safety ideas. She asked the rabbit if he had considered the following concrete problems in AI safety:

Avoiding negative side-effects
Avoiding reward hacking
Scalable oversight
Safe exploration
Robustness to distributional shift

“You see, maybe your AI model has a seemingly harmless objective, but is inadvertently potentially very harmful?”, said Alice.

“For instance, a model trained to maximize the number of paperclips might convert all of the universe’s metals into paperclips, which is not what we wanted. Or, a model that is allowed to alter its inputs might learn to create a box of delusions and give itself fake rewards.”

The rabbit was keenly listening, and responded, “Well, that’s very unlikely, because it is a classifier that does not take any actions at all.”

Alice started thinking again. “Well, in that case, it may be susceptible to one of many kinds of biases (like algorithmic or institutional biases).”

For some reason, this got the white rabbit very flustered. He replied vigorously, “First of all, I don’t get all this commotion around fairness and biases. Don’t animals understand that some things are simply more difficult? Tell me something, Alice. If you were in a dark room, would you have a difficult time identifying a white rabbit or a black rabbit?”.

“I’m not sure, but I guess a black rabbit will be more difficult to look for.”, replied Alice.

“Exactly. The models are not unfair, God is.”, finished the rabbit.

Alice did not know how to even start responding to this. She tried another angle. She had read about work on having models identify out-of-syllabus inputs, for example, teaching GPT to identify nonsense.

But the rabbit said that there is no nonsense — or if there is, everything is nonsense. He said, “If I had a world of my own, everything would be nonsense. Nothing would be what it is, because everything would be what it isn’t. And contrary wise, what is, it wouldn’t be. And what it wouldn’t be, it would. You see?”.

Alice was getting very anxious now and was very keen on just finding the issue and moving on to find the Duchess. She was thinking about some of the dangers of AI. Maybe the issue was that it used up much more compute than required and was harming the environment.

Dangerous AI. Haan aisa hi dikhta hai.

She was just thinking more about cooperative learning, or inverse learning and using animal feedback, etc., when suddenly a lizard walked towards them holding a black box. The lizard and the rabbit seemed to be friends because they waved at each other.

The rabbit said excitedly, “Hi, how are you, Bill.”

Bill the lizard was repeating something again and again, like a chant or a slogan. As he got near them, he put down the black box and waved back with both his hands. Alice found it very curious that Bill the lizard walked with two legs instead of four. She had also not seen a lizard walk on the ground.

Alice was starting to think that there may be something wrong with the metric or measure used by the white rabbit’s model. She had read long ago about Goodhart’s law, which said that “when a measure becomes a target, it ceases to be a good measure.”

But before she could even start, she was cut off by the loud and hoarse chant of the lizard.

Bill the lizard waving, after putting down the black box very carefully.

Bill the lizard was continuously repeating these four letters: R 1 0 G. Aaar, One, Zero, Gee . . . Aaar, One, Zero, Gee . . .

“What are you saying?”, asked the white rabbit.

Bill replied, “Why, this is the first rule of handling this black box”, he said, pointing to the box on the ground, “It stands for: Rule One: Zero Gradients. I don’t know what it means, but as long as I keep on repeating this, the black box stays in a good shape.”

There was a small pause, after which Bill simply started chanting again. R10G. R10G.

“Yay! This is what my model is missing. I forgot to clear the gradients!”, said the white rabbit, and with this, he started dancing around with the lizard, ignoring Alice, who was just standing and watching this curious act.

This seemed to Alice a good opportunity for making her escape; so she set off at once, and ran till she was quite tired and out of breath, and till the lizard’s chant sounded quite faint in the distance.

“And yet what a dear little rabbit it was!” said Alice, as she leaned against a buttercup to rest herself, and fanned herself with one of the leaves: “I should have liked teaching it tricks very much, if—if I’d only been the right size to do it! Oh, dear! I’d nearly forgotten that I’ve got to grow up again! Let me see—how is it to be managed? I suppose I ought to eat or drink something or other; but the great question is, what?”

The great question certainly was, what? Alice looked all around her at the flowers and the blades of grass, but she did not see anything that looked like the right thing to eat or drink under the circumstances. There was a large mushroom growing near her, about the same height as herself; and when she had looked under it, on both sides of it, and behind it, it occurred to her that she might as well look and see what was on the top of it.

She stretched herself up on tiptoe and peeped over the edge of the mushroom, and her eyes immediately met those of a large blue caterpillar, that was sitting on the top with its arms folded, quietly smoking a long hookah, and taking not the smallest notice of her or of anything else.

The Rabbit Sends in a Little Bill (Alice, Ch. 4)

Like this:

Discover more from 7vik