all LLMs are overhypedā¦ chatGPT, perplexity, claude, llama3, mistral, mixtral and wizardLM2 [my favorite LLM btw]
Part 0 : LLMs wonāt Take āYOURā Job Away
not now, not anytime soon and definetely not by chatGPT
this is a ELi5 [explain like iām 5] deep dive into how āTransformersā work
remember all LLM are simply just most advanced āautocompleteā algorithms
Part 1 : How do Transformers Work
this guide is going to skip over the fundamentals of how neural networks can learn almost anythingā¦
PS: I recommend you read this blog post for an intuitive understanding of neural networks
Transformers have 3 major components:
Embedding
all computer algorithms understand numbersā¦ okay atleast numbers in binary
but whenever you interact with an LLM you talk in wordsā¦ or tokens to be more precise
for the LLM to show any signs of intelligenceā¦ it needs to be able to understand relationships between 2 tokens and since computers are good with math we do this by plotting the tokens as vectors on a graph
words with similar meaning have similar direction
words with similar intensity have similar magnitute
now imagine having a pre-plotted graph for billions of tokensā¦
so for each token the transformer looks up the corresponding embedding
Attention
if youāve followed deep learning for even a while.. chances are youāve heard of the famous google paper āAttention is all you needā
in language, the meaning of the word often varies with the context that it is used inā¦
separately the term āa leap of faithā would have embeddings like
but clearly individual embeddings does not contain the whole scemantic meaning of the whole phrase
with some simple matrix multiplication magic the transformer just concatinates the contextual meaning of the whole phrase into an embedded token
Unembedding
now that the LLM has a contextual understanding of the phraseā¦ next is using a classic statistical algorithm AKA neural networks which is trained on huge data to figure out
- if or not will a next word be present
- what could the next word be based on all the training data that it has seen
LLM chatbots do this on prompt you give itā¦ over and over until the next word the LLM predicts is a NULL
Part 2 - How does chatGPT helps me with my homework
if you think about itā¦ after youāve understood how transformers workā¦ you might say that LLMs are basically just autocomplete
but youāve gotten a lot of assistance from chatGPTā¦ it knows how to solve every problemā¦ it knows everythingā¦ or atleast it seems to know everything
lemme ask you a questionā¦
letās say you wanted pizza (prompt) and you went to a pizzeria (chatGPT), which of below is most suited to be the LLM?
- waiter
- chef
- raw food
Click for Answer
1. waiter is the LLM
2. chef doesn't belong in a pizzeria [eg: the blog author who wrote an article on question you asked chatGPT]
3. raw food is the enormous raw data available on the internet
Part 3 - How to build more intelligent AI
now that we have established that current state of the art AI - LLMs like chatGPT are basically autocomplete machines
what is intelligence?
the ability of a system/entity to make a mental map of every input it has experienced PS: maps have O(1) to lookup something
Humans possess intelligence across 21 dimensions [Fun Fact: humans have 21 not 5 senses].
Robots typically demonstrate intelligence in four dimensions [visual, auditory, textual and motor-encoding].
LLMs have only recently expanded to two dimensions [image and textual].
TL;DR research invested in Embedding Systems will give 100x returns
untill next time !ļø ā
or you could spot me in the wild š¤ i mean instagram, twitter, linkedin and maybe even youtube where i excalidraw those diagrams