Live Site Stats

Looking at my logs after-the-fact is fun, but to get a real-time sense of what's going on, I wrote this script. I pipe my Apache log file in (with a command like ssh alltom tail -fn0 /path/to/access.log | ruby apachegrowl.rb), and it generates Growl messages in real-time.

#!/usr/bin/ruby

# apachegrowl.rb
# made by bored tom
# at http://alltom.com/pages/live-site-stats

# These spider IP lists were STOLEN
# from http://www.iplists.com/

google = %w{
209.185.108
...etc...
}

yahoo = %w{
141.185.209
...etc...
}

lycos = %w{
166.48.225.254
...etc...
}

msn = %w{
131.107.0
...etc...
}

def match_ip(bot, user)
  bot = bot.split(".")
  user = user.split(".").first(bot.length)
  bot == user
end

def get_host(ip)
  res = `host #{ip}`
  return ip if res.match "not found"
  res.split.last
end

def subscribers(agent)
  agent.match /([0-9]+) (subscriber|reader)/i
  $1
end

while line = STDIN.readline
  unless line.match /([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+) [^ ]+ - \[[^\]]+\] "([A-Z]+) ([^ ]+) [^"]+" [0-9]+ [0-9]+ "([^"]+)" "([^"]+)"/
    puts "NOT A MATCH: #{line}"
    next
  end

  puts "MATCH: #{line}"
  ip, method, url, referrer, agent = $1, $2, $3, $4, $5
  subs = subscribers(agent)

  next if /^\/style/.match(url)
  next if /^\/images/.match(url)
  next if /^\/favicon\.ico/.match(url)
  next if /^\/robots\.txt/.match(url)

  if subs.nil?
    next if google.any? { |botip| match_ip(botip, ip) }
    next if yahoo.any?  { |botip| match_ip(botip, ip) }
    next if lycos.any?  { |botip| match_ip(botip, ip) }
    next if msn.any?    { |botip| match_ip(botip, ip) }
  end

  msg = "#{method} #{url} by #{get_host(ip)}"
  msg += " from #{referrer}" unless referrer == "-"
  msg += " (#{subs} subscription(s))" unless subs.nil?
  IO.popen("/usr/local/bin/growlnotify","w") { |io| io.write(msg) }
end

I watched someone move from my home page, to the archive, to my study abroad page, to my independent study project page (then leave) the other day. It was nice to see someone was interested.

glTail.rb gives real-time visualization of traffic and some nice at-a-glance statistics, but it also sucks CPU worse than Firefox, constantly. It must never sleep. Though, if my laptop didn't sound like a rocket launch when the fans activated, and I had a second monitor to push it onto, I might use it.

Further Reading

I did not know about these similar attempts before I started down my own path. There are some interesting attempts, though none are more than a few lines, and so aren't as extensive as my Ruby script above.


Comments

Click here to view the comments on this post.