Markov Chain Confusion et Clarté

This week’s task was to generate text using a predictive model, specifically a Markov chain.  To be honest, or tbh as the kids say, a lot of this stuff is drifting over my head, while I grasp bits and pieces as we go along. The top line logic makes sense, but I get lost in the weeds a little bit.

Currently I’m inspired by new language acquisition and my sad attempt to understand  anything when people speak French to me. I squint my eyes, and furrow my brows, and try to parse out enough words that I can guess the gist and quickly patch together a garbled response. This is immersion and I am barely conversational. Sigh. One day.

I chose Darwin’s De l’origine des espèces because I couldn’t find Le Petit Prince or L’etranger. Also, I think I need to buy more books in French, non? Also, it might help me with my animal vocabulary, who knows?

For my idea, I wanted to see if I could take a text in another language, apply a Markov chain to it and see if the transformation happens to create any English words that I can parse out, illustrating how I feel when someone speaks French to me.

Curious thing is, the .txt file from Project Gutenberg is in all caps, so that should be interesting…

At this point I have the 10 most common character pairs in the text, but I’m still trying to figure out how to randomly add those pairs back into the text. Get ready to scroll a TON, there’s more analysis after the code:


I’m struggling to change the “src” from just a word to a txt file or a variable, so what I ended up doing is manually entering the top 14 pairs (cheating, so shameful) and creating new words out of that. Ironically, they just look like a bunch of new French “re” verbs that I don’t understand. It’s essentially the opposite of what I was trying to do, but my comprehension got foggier and foggier as I moved through the steps.

I was hoping that by isolating letter pairs, I could create recognizable words from French, but I got lost in the code and created words I understood even less. 


