I use Markov chains as an example of a "Small Language Model" in teaching LLMs. ...

I use Markov chains as an example of a "Small Language Model" in teaching LLMs.

My favorite thing about them is that you can use them to demonstrate temperature. The math is basically the same, and it has a similar effect of creating more creativity in the response.

    from math import log, exp
    from random import choices

    # Likilihood of transitioning from of curr_word to next_word
    transitions: dict[dict[str, float]] = {...}  
    
    def next_word(current_word, temp=1.0):
        if current_word not in transitions:
            return random.choice(list(transitions.keys()))
        probabilities = transitions[current_word]
        next_words = list(probabilities.keys())
        pvals = list(probabilities.values())
        logits = [log(p) for p in pvals]
        scaled_logits = [logit/temp for logit in logits]
        max_logit = max(scaled_logits)
        exps = [exp(s - max_logit) for s in scaled_logits]
        sum_exps = sum(exps)
        softmax_probs = [exp_val/sum_exps for exp_val in exps]
        return choices(next_words, weights=softmax_probs)[0]

    def generate_sequence(start_word, length):
        sequence = [start_word]
        current_word = start_word
        for _ in range(length - 1):
            current_word = next_word(current_word)
            sequence.append(current_word)
        return sequence

Some outputs of this when the transitions are trained on a lyrics dataset.

    > print(" ".join(generate_sequence("When", 20, temp=0.1)))
    When I know that I know that I was a little thing that I know that I don't know that 

    > print(" ".join(generate_sequence("When", 20, temp=0.5)))
    When I don't know you know I can do I see And the river to the light in the time 

    > print(" ".join(generate_sequence("When", 20, temp=1.0)))
    When are melting Little darling, I feel more And if I was very slow (In control) For our troubles And

It's a lot more nonsensical than an LLM, but highlights what the logit manipulation is doing.