I use Markov chains as an example of a "Small Language Model" in teaching LLMs.
My favorite thing about them is that you can use them to demonstrate temperature. The math is basically the same, and it has a similar effect of creating more creativity in the response.
from math import log, exp
from random import choices
# Likilihood of transitioning from of curr_word to next_word
transitions: dict[dict[str, float]] = {...}
def next_word(current_word, temp=1.0):
if current_word not in transitions:
return random.choice(list(transitions.keys()))
probabilities = transitions[current_word]
next_words = list(probabilities.keys())
pvals = list(probabilities.values())
logits = [log(p) for p in pvals]
scaled_logits = [logit/temp for logit in logits]
max_logit = max(scaled_logits)
exps = [exp(s - max_logit) for s in scaled_logits]
sum_exps = sum(exps)
softmax_probs = [exp_val/sum_exps for exp_val in exps]
return choices(next_words, weights=softmax_probs)[0]
def generate_sequence(start_word, length):
sequence = [start_word]
current_word = start_word
for _ in range(length - 1):
current_word = next_word(current_word)
sequence.append(current_word)
return sequence
Some outputs of this when the transitions are trained on a lyrics dataset.
> print(" ".join(generate_sequence("When", 20, temp=0.1)))
When I know that I know that I was a little thing that I know that I don't know that
> print(" ".join(generate_sequence("When", 20, temp=0.5)))
When I don't know you know I can do I see And the river to the light in the time
> print(" ".join(generate_sequence("When", 20, temp=1.0)))
When are melting Little darling, I feel more And if I was very slow (In control) For our troubles And
It's a lot more nonsensical than an LLM, but highlights what the logit manipulation is doing.
My favorite thing about them is that you can use them to demonstrate temperature. The math is basically the same, and it has a similar effect of creating more creativity in the response.
Some outputs of this when the transitions are trained on a lyrics dataset. It's a lot more nonsensical than an LLM, but highlights what the logit manipulation is doing.