Update: November 16, 2012 — Yokaben is now Macaronics

Several years ago, I passed level two (N2) of the 日本語能力試験 or Japanese Language Proficiency Test. N2 means I know (or knew, at the time I took the test, anyway) at least 1,023 kanji written characters, and 5,035 vocabulary words. So in theory, I should be able to read 90% of the text in a typical newspaper article. Still, when I visited Japan recently, I had trouble getting through even the simplest article.

I was traveling through Hiroshima on a Sunday afternoon, on my way to Miyajima, but most people on my train were dressed in Hiroshima Carp colors, and got off at the station directly in front of the stadium.

They proved to be a boisterous bunch that day, and after I got back to the hotel, I wondered how their team had done.

It turned out they won, but there was very little in the English-language sources abut the game, which was in stark contrast to the local Japanese press, including some fanciful word play about the player who hit a home run.1

As Lost in Translation comically exaggerated, having access to the original source makes a difference.

Machine translation was somewhat helpful, but those results left a lot to be desired, especially when dealing with nuance and context (it was interesting, for example, to see that Bing correctly translated バカ as “moron”, but Google rendered it as “docile child” instead).

What I really needed was a human editor, someone at least partially bilingual, who could fill in the gaps and clean up the obvious errors.

Crowd-sourcing, or more specifically, human-based computation, is a possible solution, though it needs hundreds, thousands, or more editors to make it work.

If it does reach that critical mass, it would open up an even larger audience: people would be able to read original texts in full, regardless of whether they are literate in the source language or not, and even if they have no desire to learn that language in the first place.

Yokaben2 Macaronics3 is an experiment to see whether or not it can be done.

[1] One way to pronounce the numbers “2″ and “9″ together is “niku” which is roughly how the Japanese say the first name of Nick Stavinoha. The author speculated that since the 29th is “Nick’s Day”, fans can expect a similar result on May 29 and through the rest of the season on the 29th of every month.

[2] I didn’t know what to call it, but when I was thinking up names, I heard someone talking about PubSub, which is a contraction of the words “publish” and “subscribe”.

Since what I was building was a way to “Read Write Learn”, I tried similar contractions. While it didn’t work in English, I got some unique syllables from the corresponding Japanese words:

Read : 読む (yomu) → yo
Write : 書く (kaku) → ka
Learn : 勉強 (benkyou) → ben

(Yes, I know that 勉強 really means study, and 学ぶ is a better translation of learn, but “yokamana” or “yokabu” didn’t quite have the same ring to it.)

[3] While researching names for another project, I came across the adjective macaronic, whose dictionary meaning seemed perfect for this, especially since I’d like to see it go beyond just two languages.

Also, yokaben as I’d constructed it (読書勉) is too close to dokusho (読書) and thus potentially confusing for native Japanese speakers.

