(image from

I aspire to being bonkers. All the best innovations and paradigm shifts that change our experience of life come from people who are a bit bonkers, the visionaries of our society. You know what a visionary is? Someone who sees what others don’t. I’m thankful that being bonkers in this part of the world does not result in church-state mandated house arrest as in Galileo’s time, for daring to publicly support the Copernican theory that the earth revolved around the sun, that we humans are not the center of the universe. We’ve got great studies out there that have said that the way we teach second languages is pretty ineffective, and yet the way we learn language remains largely unchanged, with a teacher, a classroom, and lists of words and grammar rules to memorize and regurgitate. Or worse, the pendulum swing that immersion is best, and you’ll learn the language by being exposed to it, with no leverage of what you learned from learning your first language at all. Neither approach is suited to how the bilingual brain actually works.

I’m a big fan of the scientific method. So I’m answering the question of ‘Well, if not that way, then what?’ Working as an independent researcher, I often have conversations running in my head providing the direction of my studies. One of the questions I am most interested in is ‘What is the most useful vocabulary to learn to get up and running in another language?’ I mean, really up and running, like, able to read Harry Potter and enjoy it? Why Harry Potter? Because they are great books and so popular they’ve been translated into 68 languages. Why not the classic Don Quixote that is mandatory reading for any University level Spanish program? Well, this is where my artistic license comes in. I assert by vote of copies sold worldwide that people are much more interested in reading Harry Potter, and you’ll learn more if your learning material is interesting, and fun. I assert that language learning can be fun, even a game. It’s been well documented that kids learn a lot of vocabulary, grammar, and spelling from reading. According to studies referenced in Learning Vocabulary in Another Language, you need to know about 95% of the running words to meaningfully add to your vocabulary from reading.

One approach to figuring out what vocabulary to learn first is to gather examples of language in use (a corpus) and do a frequency analysis of words that pop up. Unsurprisingly in English the, and, to, I, of are the top five. I faithfully started creating learning materials based on the de-facto standard corpuses available, and quickly got frustrated with words that made no sense until I dug a little. The 100 million word corpus I was working with was based on academic papers in chemistry, biology and linguistics. So a word like ‘membrane’ is purported to be a very useful word. Textbooks were little better, one of the only textbooks available in my local library for learning Serbian talked about being an economics student traveling to Serbia in its first dialogue. My reaction to these is something along the lines of ‘this is bonkers, what would a normal person want for vocabulary?’

So what words are best to learn before you attempt Harry Potter? And how many words is that, anyway? My kids were in bilingual Spanish/English for grades one to four. They read Because of Winn-Dixie simultaneously in both languages, and loved it in English, hated it in Spanish. In fact, by the end of grade four, they dug in their heels and said no more bilingual school for me. Stake to my heart! I, who have had so many doors open to me because I was willing to work with people whose first language was not English, and learn their language to communicate more effectively, have kids that refuse to learn another language. Why did they hate reading a great book in Spanish? Because they didn’t understand more than 10% of the running words in the book. If they’d been allowed a dictionary, they would have been looking up 9 words in 10! I love the idea of having second language learners reading great books for learning that is necessary outside of the classroom – no teacher can give you mastery of a language in just classroom time! But to be able to get useful learning from reading, you need a foundation of vocabulary before you attempt it. Else you have a recipe for disappointment and frustration, and in most learners’ cases, giving up.

I started with creating a corpus of 10 Million words in English, based on popular books for around the grade 5 level. That’s the target vocabulary for most English newspapers, by the way. You can see the books I’ve included in the English library on this site. There were weird words popping up because of the subject matter, so I tried 20 million, then 30 million, and now I’m up to 64 million words and 876 books. Don’t ask how long that took, you don’t want to know and I don’t want to tell you. It’s where the bonkers comes in.

How does that apply to learning another language, you ask? Well, my reasoning is this – English books are pretty plentiful from my local library, English books are affordable and widely available on amazon, your local book store, and your local used book store. Similar vocabulary will likely apply in another language, right? We’ll see. I’ve been collecting those same books that are in my English corpus in Spanish for years, but in a focused way for the last two years. Spanish books are pretty available because of the market for them in the United States, if you know where to look. Every trip to Europe and Mexico I’ve headed to the local book stores, and the children’s book selection for grade 5 Spanish is universally dismal, the children’s section ends at something like ‘Cat in the Hat’. I think I can safely leave it to your imagination how useful reading Cat in the Hat is to preparing you for reading Harry Potter. Usually a book is printed once in Spanish, so they can be hard to get for the classics, but not impossible. The increasing popularity of e-books is changing that landscape, however, and I’m taking full advantage of that for my Spanish corpus. I would love to say the same is true for French, in theory, I live in a bilingual country. Head to or any bookstore in western Canada and you’ll see prices that make you choke, similar to book prices in Europe! Thankfully the local library has a decent selection that I’m able to leverage. For now, collecting French children’s books remain on my someday, one day list.

So how many words does it take to get to 95%? A lot, but I’m refining that to something manageable for beginners. To get to 80% of running words, you need around 1600 words, in English. If you look a little closer, and you count dog and dogs as one word, and eat, eats, eating, ate as one word (headwords) you have a palatable 567 words. For 95% of words you need 15101 words, but really only 4672 headwords.

How many words does it take to get to 95% in those same Spanish books? I’m working on it – I’m at 13 million in my Spanish corpus, with approximately 200 books that I’ve found but haven’t added yet. I’ll probably get up to 30 Million in my Spanish corpus. I’m cheating shamelessly by leveraging my English corpus and mapping my most important English vocabulary to its counterpart in Spanish, which admittedly is not always apples to apples. I can tell you where I won’t cheat though – by adding in padding by substituting academic papers or the more available and wildly less useful romance novels that abound in both the English and Spanish language, and that are of zero interest to my twelve year old boys, and, I assert, the male half of the population on planet earth. You should see what the most frequent words in those are!

Leave a Reply

Your email address will not be published. Required fields are marked *