Manny Bernabe • 2023-08-14
Notes from Uplimit’s “Building AI Products with OpenAI” w/ Ted Sanders, Sidharth Ramachandran. [Course link]. Images from Sidharth’s presentation on history of NLP and how we got here.
The computational world has experienced quite the boom. But what's truly exciting? The evolution in training techniques – think fine-tuning, zero-shot and few-shot learning, instruction fine-tuning, and the super cool reinforcement learning with human feedback (RLHF).
Take a step back and think about the OG methods in Natural Language Processing (NLP). Want to dissect the sentiment in product reviews? The go-to was splitting reviews into bite-sized words using the good ol' bag-of-words technique. Then, train a model to slap on labels like POSITIVE or NEGATIVE.
But, here's the hiccup – this method craved labels. And while supervised learning is pretty darn effective, needing labels puts a cap on how big it can get.
Things got spicy when Large Language Models (LLMs) took on unsupervised learning, saying "bye" to those pesky pre-defined labels.
Instead of just figuring out if someone's throwing shade or giving praise, the aim morphed into guessing a missing word. And hey, why stop there? Predict the next word, sentence, or even what's around the corner.
The real magic? This method's scale. It gobbles up a ton of unlabelled data, pumping up its power. And where's all this text coming from? Good question. Public gold mines like Wikipedia, for starters.
Speaking of gold, enter Transfer Learning. It’s like borrowing genius from one area and sprinkling it over another. Boom! Now we have this Swiss Army knife of a model.
Remember BERT? Yeah, that brainchild from Google. It laid down the tracks, allowing specific task tuning with just a smidge of labeled data.
Zoom to modern NLP, and a sprinkle of examples is all it takes to whip up a model. This major shift hit the limelight in the groundbreaking “Language Models are Unsupervised Multitask Learners”. And from this? Ideas like zero-shot (just ask the AI buddy a question) and few-shot (give it a nudge with some examples) were born. [1]
These beastly models underwent even more polish. To get LLMs to play nice with human tasks, they were given a crash course on prompts and instructions. The result? They became champs at following commands. [2]
And then, OpenAI added the pièce de résistance with Reinforcement Learning with Human Feedback (RLHF). This bonus training round gave ChatGPT its almost eerie human-like prowess, thanks to the human touch in its training. [3]
Wrapping it up, today's AI assistants are a trifecta – spreading knowledge, nailing instructions, and vibing with human feedback.
Papers
[1] Appendix examples from Radford et al., 2019 - https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
[2] Scaling Instruction-Finetuned Language Models - Chung et al., 2022 - https://arxiv.org/abs/2210.11416
[3] https://openai.com/research/instruction-following
Let's chat