Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “and what is the use of a book,” thought Alice “without pictures or conversations?”.
Her sister, though, seemed fully satisfied with reading this particular passage:
Then up rose Mrs. Cratchit, Cratchit’s wife, dressed out but poorly in a twice-turned gown, brave in ribbons, which are cheap and make a goodly show for sixpence; and she laid the cloth, assisted by Belinda Cratchit, second of her daughters, also brave in ribbons . . .1
So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies.
So she sat on the riverbank with her laptop, scrolling through emails that she would have otherwise archived without a glance. In one of these emails highlighting a new research breakthrough, she saw a white rabbit with pink eyes!

The white rabbit was waving at Alice from inside the screen, and Alice, even though she had completed an undergraduate degree in Computer Science, waved back. It was not that big of a surprise to her that this rabbit could walk around and wave — for it was easily possible to generate such things with a suitably trained model. However, when the rabbit actually took a watch out of its waistcoat pocket, looked at it, and then hurried on, she was shocked. She quickly regained her senses and was just in time to see it pop down a large rabbit hole under a hedge. She was so tired that we wished that she could follow the rabbit down into the hole. She was also feeling really dumb for wishing so. However, as soon as she looked into the dark hole, she felt lighter.
And the more she looked into it, the lighter she got. Within a few seconds, she was being pulled towards the hole, as if she were made of light. She tried to yell for help, but either no voice came out of her mouth, or her sister was so lost in her book that she didn’t hear a thing.
The rabbit hole went straight on like a tunnel for some way, then dipped suddenly down so suddenly that Alice had not a moment to think about stopping herself before falling down a very deep well.
Quite unfortunately, she did not take her laptop with her. Oh, how was she to ever get out! Alice tried to remember what that research breakthrough was about. It is difficult to remember things when you are falling down. She placed two words distinctly — “Wonderland”, and “the world’s largest attempt at learning ingenuity”.

She also remembered something about there being a novel generative model. Now, she knew about a very famous generative model — a GAN, or a Generative Adversarial Network, from her college, for she had done a minor in Artificial Intelligence with her undergraduate.
“It is very simple”, thought Alice. “There’s a generator network (G), which is supposed to learn to create real-like fake samples (out of an unknown distribution), and there’s a discriminator network (D), which is supposed to learn to differentiate between a real sample and a fake one. Provided with sufficient real samples, we train both the discriminator and the generator together in a minimax, two-player game setting. The generator tries to fool the discriminator, and the discriminator tries not to be fooled by the generator.”
“The loss functions of the generator and discriminator are adjusted such that they learn to minimize and maximize the following respectively:

Where x is real samples, z is noise used by G to create fake samples, and D(y) is the probability with which D classifies y as real.”, finished Alice.
Down, down, down. There was nothing else to do, so Alice soon began talking again. “Dinah’ll miss me very much to-night, I should think!” (Dinah was the cat.) “I hope they’ll remember her saucer of milk at tea-time. Dinah my dear! I wish you were down here with me! There are no mice in the air, I’m afraid, but you might catch a bat, and that’s very like a mouse, you know. But do cats eat bats, I wonder?”
She tried to make sense of what was happening. It is also difficult to make sense when you are falling down.
“It is so curious”, she said to herself, “that I fell into a neural network. And that too, with a white rabbit. I better be looking around for that rabbit. It should be able to help me. Oh, I wish I was back home with Dinah.”, when suddenly, thump! thump! down she came upon a heap of sticks and dry leaves, and the fall was over. Alice was not a bit hurt, and she jumped up onto her feet in a moment: she looked up, but it was all dark overhead; before her was another long passage, and the White Rabbit was still in sight, hurrying down it.
She ran as fast as she could towards the white rabbit with pink eyes. Usually, just a small run is enough to make her very tired, but for some reason, down this hole, she didn’t feel tired at all! It was when she reached the rabbit that she realized that she did not know how to address it. Fearful of losing the rabbit again, she yelled, at the top of her voice, “Excuse me, please, Mr. Rabbit!”.
To her relief, the rabbit stopped at once and turned around on two legs facing Alice. “I am very late. Oh dear, the Duchess will be very angry with me.”, he said, rubbing his forehead with his hands.
“Please tell me what this place is, and how to get out of it.”, urged Alice, almost about to cry.
“You’re entering Wonderland. It is a huge generative model that is supposed to learn the art of being clever, original, and inventive. Humans are not supposed to enter Wonderland as inputs though. You are a very naughty child, going into unknown and dark places without telling anyone. I have no idea how you’ll be encoded, but that is what’s going on. As you go ahead, be prepared for all sorts of kooky changes. Your conscience might also get dispersed — not that I know much though. As for how to get out, you need to get thrown out of the generator as a fake sample; good luck with that. The Duchess might tell you more. I gotta go. Ciao!”
And before Alice could even open her mouth and ask how she could find the Duchess, the white rabbit was gone. She tried to run after him, and in a while, she found herself in a long, low hall, which was lit up by a row of lamps hanging from the roof.
There were doors all around the hall, but they were all locked; and when Alice had been all the way down one side and up the other, trying every door, she walked sadly down the middle, wondering how she was ever to get out again.
She went to a table across the hall, half hoping she might find a key on it. Instead, she found a little bottle on it, and around the neck of the bottle was a paper label, with the words “DRINK ME” beautifully printed on it in large letters.

This bottle was not marked “poison,” so Alice ventured to taste it, and finding it very nice, (it had, in fact, a sort of mixed flavor of cherry tart, custard, pineapple, roast turkey, toffee, and hot buttered toast) she very soon finished it off.
“I am getting smaller!”, thought Alice. And soon indeed, she was so small that she was able to lie down and enter through the small gap in one of the doors, which led to a huge (at least, huge for our poor little Alice) garden.
Alice wondered what this potion did to her — whether it reduced her receptive field in the network (whatever that meant for human input streams), or whether this was a way to reduce her dimensions before being passed through another set of layers. She was close to tears yet again.
So Alice started thinking about things that could cheer her up. There was one thing that always cheered her up — and that was thinking about maths. So Alice sat down, and tried to divert her mind from all the sadness, and instead tried to think about all the things she had read about information theory. She recalled:
One fine day, Shannon wanted to define the “information” of an event occurring. For example, he wanted to quantify how much information we would get when we know that a fair dice rolled to a five, or that George RR Martin finally finished writing the next book, etc. This definition of information is needed to meet several axioms:
- The only thing we have for an event is the probability of it happening. Hence, it needs to be a function of this probability.
- An event with probability 1 tells us nothing — and hence should have zero information.
- The less probable an event is, the more surprising it is and the more information it tells us.
- The total amount of information of two independent events should be the sum of the information of the individual events.
We start with defining the Information of any event x:

Here, P is the probability of the occurrence of the event. For any two independent events (x,y), with probabilities (P, Q), where the probability of them occurring together is just P*Q, the last axiom tells us that:

Mathematicians (Cauchy and others) had proved that under mild conditions (which are satisfied here), there’s just one function that satisfies this property, and that is the logarithm. Thus, we can define Information as:

If we set the base of the logarithm to 2, we get the number of bits that are required to store this much amount of information.
So, in our childhood, we read about the Distributive Property of Multiplication over Addition. What Cauchy and others proved, in essence, is that Multiplication is the only function for which it is possible to have a distributive property over Addition. Which is a very good thing, thought Alice, since these properties were taught to us in such a boring way!
Alice remembered that we could extend this definition of Information of the occurrence of one outcome to the information of, on average, a probability distribution of several outcomes. This is just a fancy way of talking about the information given, on average, by one coin toss. Entropy is thus defined as:

Alice had a lot of time so she also calculated the entropy of a fair dice (it is 2.585 bits). Now, you could also sample from this distribution while adding up the information using a different distribution (in information theory land, this corresponds to using a different coding scheme for storing information). This is how we define the Cross-Entropy between two distributions:

Now, Kullback, Laplace, and others have independently, and in very different contexts, proved that the Cross-Entropy of a distribution is minimum when measured against the same distribution (in which case, it is just the Entropy). If we choose a different distribution, the Cross-Entropy “diverges” from the Entropy by an amount that is defined as the Kullback-Leibler (or KL) Divergence. This is defined as follows:

Alice fondly remembered that minimizing the Cross-Entropy between two distributions was required to solve a number of machine learning problems through maximum likelihood estimation (MLE). This formed the basis of most of the common objectives in Deep Learning.
“. . . like the loss function for this humongous generative model in which I’m stuck.”, finished Alice.
She was, however, not fully convinced about this whole Information and Cross-Entropy way of thinking about it. Many a time Alice’s professors had to reply to questions like “But isn’t Cross-Entropy and the idea of using a cost function too dependant on the assumption that a goal needs to be reached? And isn’t the necessity of randomness that creates all of the reality in this universe, including life and intelligence, against this very idea of there being a goal?”
All of this thinking had made Alice very hungry, and once again, she started sobbing. “I wish I could have something to eat.”
She started searching in the big hall outside the garden. Soon her eye fell on a very small cake, on which the words “EAT ME” were beautifully marked in currants. So she set to work and very soon finished off the cake.
- From: A Christmas Carol by Charles Dickens. ↩︎