Command-line Japanese Dictionary

I'm in Japan for RubyKaigi and I wrote a Ruby script to make it easy to quickly translate Japanese text with an unreliable Internet connection. This Ruby script loads Rikaikun's dictionary file, dict.dat, to provide a REPL-like translation experience: paste text, hit enter, see word definitions, repeat. If you paste entire phrases or sentences, the script will find the longest matches, but its algorithm sucks for translating this way, especially when the sentence includes conjugated verbs.

Ruby 1.8 doesn't know what to do with the Japanese characters on the console, so I recommend using Ruby 1.9.

entries = File.readlines("dict.dat")

$kanji_dict = Hash.new { |hash, key| hash[key] = [] }
$hiragana_dict = Hash.new { |hash, key| hash[key] = [] }

entries.each do |entry|
  if /([^\[]+)( \[(.+)\])? \/(.+)\// =~ entry
    $kanji_dict[$1] << [$3, $4]
    $hiragana_dict[$3] << [$1, $4] if $3
  end
end

def lookup(word)
  if $kanji_dict[word].length > 0
    $kanji_dict[word].map { |e| [word] + e }
  elsif $hiragana_dict[word].length > 0
    $hiragana_dict[word].map { |(kanji, definition)| [kanji, word, definition] }
  end
end

def prefix_lookup(query, start=0)
  (start..query.length-start-1).to_a.reverse.each do |lookup_length|
    sub_query_range = (start..start+lookup_length)
    sub_query = query[sub_query_range]
    entries = lookup(sub_query)
    if entries
      return [sub_query_range, entries]
    end
  end
  [(0..0), []]
end

loop do
  $stdout.write ">> "
  $stdout.flush
  query = full_query = gets
  5.times { puts }
  puts query
  while query do
    range, entries = prefix_lookup(query)
    if entries.length > 0
      puts "------------"
      entries.each do |(kanji, hiragana, definition)|
        puts "- #{kanji}#{hiragana ? " [#{hiragana}]" : ""} #{definition}"
      end
    end
  end
end

If you're looking for a practical translation solution, this script isn't too useful. There are better translation techniques:

In Firefox
Get Rikaichan. Paste this into an HTML file and turn on Rikaichan: <textarea style="width: 100%; height:100%"></textarea>. Rikaichan works on text in text fields, so you can just hover over the text you paste into the box.
In Chrome
Chrome's Rikaikun doesn't work on text fields, but a text area with a bit of Javascript that inserts typed text into the page's HTML would work great. translate.google.com is an okay page that does this if you set it to translate Japanese to Japanese and have an Internet connection, plus you can conveniently change it to translate Japanese to English for hilarity whenever you feel like it.

But this Ruby script is pretty hackable. My next attempt will probably involve running the Rikaichan translator in a command-line Javascript interpreter because it's better at finding word boundaries.

Did I level up with this post?


Comments

Click here to view the comments on this post, or just send me an e-mail.