I’m at that point in my language acquisition where, if I squint really hard at the person speaking, I can parse out enough vocab words to string together enough comprehension to respond. I know full immersion will push me over the fence, but until then, I’m enjoying the challenge and the chance to feel extremely uncomfortable. It’s humbling.
Most of the code I’ve written this semester has been extremely messy- so messy and nonsensical that I can barely understand it. Hey, it’s just like my relationship to French! Would it be interesting to use predictive models to create French nonsense and then have a human try and make sense of it? When I saw the opportunity to add a human layer to my process, I experienced a burst of inspiration. I then thought of my good friend Hugo, a French teacher, artist, and linguistics master. What if he translated the French nonsense too? There would be this extreme contrast between his native French and English fluency and my intermediate French and native English.
To explore this idea, I created three executions using different models: the first being a letter Markov chain, the second being a word Markov chain, and the third using an RNN.
I wanted to use a basic French text- I chose L’etranger because it’s one of the first novels they recommend for a new French reader: why not read about existentialism with simple French vocab and sentence structure? Win-win! I also chose it because the name translates to “the stranger” or “foreign” which is what the code feels like after it’s been trained. And finally, Camus possesses the type of style where the writer disappears. His sentence structure is extremely simple and neutral, with no decoration. It’s perfect to train because the predictive models provide the “decoration”.
The logic was to use predictive models to create nonsense French, or Frenchlish with an English-trained RNN. This would create a new text that would find common letter and word pairings and would remodel an entire new version of L’etranger. These excerpts would then be individually translated, with no dictionary help, by a native English speaker and a native French speaker. We would agree not to look at each other’s texts until the end as to not influence one another. We would then be making something out of nonsense by projecting our individual language shortcomings onto French that didn’t make sense anyway.
Whew, that was dense. Ok, onto the output.
The output of the letter Markov chain was pretty absurd. The irony is, it looked pretty normal to me because I simply couldn’t tell if it was a French word I didn’t know yet.
The word Markov chain, however was wayyyy easier to understand because at least they were words! This output might be my favorite so far. The contrast is pretty clear as you can see that I get some of the sentences that Hugo does, but just slightly off. He says “I don’t know anymore”, I say “I don’t know more”.
The final execution was the RNN. Again this one used mostly whole words, so was much easier for me to translate. What was interesting was the repeat of phrases like “de la permiere fois de la premiere”. It was a simple but beautiful moment that translated to something like “of the first time of the first”. In a previous iteration, there was an accidental translation during office hours where “mon avocat” became “my avocado”. When, though spelled the same, it is lawyer in this context.
Hugo and I plan on performing the three models by, first, reading out the French to show the differences in our proficiencies of the language, and then read our translations. We’ll then discuss our methodology. You can check out the presentation here.
You can find the code here.
And the RNN code is separate due to my skillz.