blog


How to build more intelligent AI


šŸ—“ļø
no of reads šŸ‘€ : loading...

all LLMs are overhyped… chatGPT, perplexity, claude, llama3, mistral, mixtral and wizardLM2 [my favorite LLM btw]

Part 0 : LLMs won’t Take ā€œYOURā€ Job Away

not now, not anytime soon and definetely not by chatGPT

GPTransformers

this is a ELi5 [explain like i’m 5] deep dive into how ā€œTransformersā€ work

remember all LLM are simply just most advanced ā€œautocompleteā€ algorithms

Part 1 : How do Transformers Work

this guide is going to skip over the fundamentals of how neural networks can learn almost anything…

PS: I recommend you read this blog post for an intuitive understanding of neural networks

Transformers have 3 major components:

Embedding

all computer algorithms understand numbers… okay atleast numbers in binary

but whenever you interact with an LLM you talk in words… or tokens to be more precise

for the LLM to show any signs of intelligence… it needs to be able to understand relationships between 2 tokens and since computers are good with math we do this by plotting the tokens as vectors on a graph

words with similar meaning have similar direction

words with similar intensity have similar magnitute

now imagine having a pre-plotted graph for billions of tokens…

so for each token the transformer looks up the corresponding embedding

Attention

if you’ve followed deep learning for even a while.. chances are you’ve heard of the famous google paper ā€œAttention is all you needā€

in language, the meaning of the word often varies with the context that it is used in…

separately the term ā€œa leap of faithā€ would have embeddings like

but clearly individual embeddings does not contain the whole scemantic meaning of the whole phrase

with some simple matrix multiplication magic the transformer just concatinates the contextual meaning of the whole phrase into an embedded token

Unembedding

now that the LLM has a contextual understanding of the phrase… next is using a classic statistical algorithm AKA neural networks which is trained on huge data to figure out

  1. if or not will a next word be present
  2. what could the next word be based on all the training data that it has seen

LLM chatbots do this on prompt you give it… over and over until the next word the LLM predicts is a NULL

Part 2 - How does chatGPT helps me with my homework

if you think about it… after you’ve understood how transformers work… you might say that LLMs are basically just autocomplete

but you’ve gotten a lot of assistance from chatGPT… it knows how to solve every problem… it knows everything… or atleast it seems to know everything

lemme ask you a question…

let’s say you wanted pizza (prompt) and you went to a pizzeria (chatGPT), which of below is most suited to be the LLM?

  1. waiter
  2. chef
  3. raw food
Click for Answer
1. waiter is the LLM
2. chef doesn't belong in a pizzeria [eg: the blog author who wrote an article on question you asked chatGPT]
3. raw food is the enormous raw data available on the internet

Part 3 - How to build more intelligent AI

now that we have established that current state of the art AI - LLMs like chatGPT are basically autocomplete machines

what is intelligence?

the ability of a system/entity to make a mental map of every input it has experienced PS: maps have O(1) to lookup something

Humans possess intelligence across 21 dimensions [Fun Fact: humans have 21 not 5 senses].

Robots will max demonstrate intelligence in four dimensions [visual, auditory, textual and motor-encoding].

LLMs have only recently expanded to two dimensions [image and textual].

TL;DR research invested in Embedding Systems will give 100x returns


until next time !ļø ✌

or you could spot me in the wild 🤭 i mean instagram, twitter, linkedin and maybe even youtube where i may make video versions of these blog posts