AI is in all places. ChatGPT is writing essays, debugging code, and having conversations that really feel weirdly human. However a yr in the past, I saved questioning, how are these items really working?
I imply, I knew the fundamentals from faculty, neural networks, some machine studying stuff. However these easy fashions felt like pocket calculators in comparison with no matter magic GPT was pulling off.
So I made a decision to determine it out. I signed up for Andrew Ng’s Deep Studying specialization on Coursera and spent my weekends to be taught.
For months, I labored by several types of neural networks. CNNs for recognizing photos, RNNs for understanding sequences, LSTMs for remembering issues. Every one was cool, however they nonetheless felt completely totally different from the AI I used to be really utilizing on daily basis. I nonetheless felt a great distance from understanding how GPT works.
In one of many closing programs, one thing new appeared. A quick point out of the Transformer mannequin, the muse of contemporary GPT-style massive language fashions. It referenced a paper revealed in 2017 titled: “Attention Is All You Need.”
That title alone intrigued me. So I dug deeper and, to my shock, due to the constructing blocks I’d already coated within the course, understanding the paper at a excessive stage wasn’t that troublesome.
The unique Transformer was constructed for language translation, taking a sentence in a single language and changing it to a different.
Right here’s the way it labored: you had two essential elements. An “encoder” that learn and understood the enter sentence, and a “decoder” that wrote out the interpretation phrase by phrase. The magic was in one thing known as “consideration”, as an alternative of studying phrases one after the other, it may take a look at all phrases directly and determine which of them had been most necessary.
However right here’s what confused me: GPT doesn’t translate something. It generates responses to no matter you inform it. So how can they each be “Transformers”?
That query made me dig means deeper. And what I discovered was superb: the Transformer isn’t only one structure. It’s extra like a blueprint that individuals have tailored for completely totally different jobs.
Right here’s what I realized: the Transformer isn’t only one factor. It’s extra like a toolbox with totally different instruments for various jobs.
These are like actually good listeners. Fashions like BERT are nice at understanding textual content as a result of they will take a look at every little thing directly.
How they work: They’ll see the entire sentence earlier than deciding what any phrase means. So once they see “financial institution” in “I went to the financial institution,” they take a look at the entire sentence to determine if it’s a spot to get cash or the facet of a river.
What they’re good at:
- Serps understanding what you’re on the lookout for
- Determining if a assessment is constructive or detrimental
- Answering questions on textual content
These are the storytellers. GPT, Claude, and related fashions are nice at creating textual content. They work like people do when writing , they see what got here earlier than and guess what ought to come subsequent.
How they work: They’ll solely look backward, not ahead. It’s like writing with blinders on. They’ll’t peek forward to see the place the story goes, in order that they should construct textual content one phrase at a time.
What they’re good at:
- Having conversations
- Writing tales or code
- Persevering with your ideas
That is why GPT is superb at chatting however generally misses issues that want you to grasp the complete image first.
These use each parts- the reader and the author. Fashions like T5 first perceive what you gave them, then create one thing completely totally different.
How they work: The encoder half reads and understands every little thing. Then the decoder half generates one thing new based mostly on that understanding.
What they’re good at:
- Translating languages
- Summarizing lengthy texts
- Turning photos into captions
Understanding these three forms of Transformers was like discovering the lacking piece of a puzzle. Out of the blue, the AI panorama made sense. ChatGPT wasn’t some magical black field, it was a decoder-only Transformer, skilled to foretell the subsequent phrase based mostly on every little thing that got here earlier than.
Now, after I use ChatGPT or Claude, I’m not simply amazed by what they will do, I really perceive how they’re doing it. That understanding makes them much more spectacular, as a result of it exhibits how essentially the most highly effective instruments are sometimes constructed from easy, versatile constructing blocks mixed in inventive methods.