I'm in Japan for RubyKaigi and I wrote a Ruby script to make it easy to quickly translate Japanese text with an unreliable Internet connection. This Ruby script loads Rikaikun's dictionary file, dict.dat, to provide a REPL-like translation experience: paste text, hit enter, see word definitions, repeat. If you paste entire phrases or sentences, the script will find the longest matches, but its algorithm sucks for translating this way, especially when the sentence includes conjugated verbs.
Ruby 1.8 doesn't know what to do with the Japanese characters on the console, so I recommend using Ruby 1.9.
entries = File.readlines("dict.dat")
$kanji_dict = Hash.new { |hash, key| hash[key] = [] }
$hiragana_dict = Hash.new { |hash, key| hash[key] = [] }
entries.each do |entry|
if /([^\[]+)( \[(.+)\])? \/(.+)\// =~ entry
$kanji_dict[$1] << [$3, $4]
$hiragana_dict[$3] << [$1, $4] if $3
end
end
def lookup(word)
if $kanji_dict[word].length > 0
$kanji_dict[word].map { |e| [word] + e }
elsif $hiragana_dict[word].length > 0
$hiragana_dict[word].map { |(kanji, definition)| [kanji, word, definition] }
end
end
def prefix_lookup(query, start=0)
(start..query.length-start-1).to_a.reverse.each do |lookup_length|
sub_query_range = (start..start+lookup_length)
sub_query = query[sub_query_range]
entries = lookup(sub_query)
if entries
return [sub_query_range, entries]
end
end
[(0..0), []]
end
loop do
$stdout.write ">> "
$stdout.flush
query = full_query = gets
5.times { puts }
puts query
while query do
range, entries = prefix_lookup(query)
if entries.length > 0
puts "------------"
entries.each do |(kanji, hiragana, definition)|
puts "- #{kanji}#{hiragana ? " [#{hiragana}]" : ""} #{definition}"
end
end
end
end
If you're looking for a practical translation solution, this script isn't too useful. There are better translation techniques:
- In Firefox
- Get Rikaichan. Paste this into an HTML file and turn on Rikaichan: <textarea style="width: 100%; height:100%"></textarea>. Rikaichan works on text in text fields, so you can just hover over the text you paste into the box.
- In Chrome
- Chrome's Rikaikun doesn't work on text fields, but a text area with a bit of Javascript that inserts typed text into the page's HTML would work great. translate.google.com is an okay page that does this if you set it to translate Japanese to Japanese and have an Internet connection, plus you can conveniently change it to translate Japanese to English for hilarity whenever you feel like it.
But this Ruby script is pretty hackable. My next attempt will probably involve running the Rikaichan translator in a command-line Javascript interpreter because it's better at finding word boundaries.