Saturday 24 January 2009

Broadcasting blog post notifications to Twitter with Ruby and Rake

Blogger to Twitter LogoDuring my latest blogging absence I had some time to tinker around with Ruby. For an introductory challenge I chose to implement a real life feature which currently isn't supported by Blogger.com and screams siren-like for an one-button automation: Broadcasting the latest blog entry to my Twitter account. As I didn't want to sign up for a Twitterfeed account and couldn't resort to the Twitter Tools plugin like WordPress users, I had to perform these broadcasting steps manually, until now. To see how this repetitive and time-stealing process was transformed into a semi-automated one by utilizing Ruby, a splash of Hpricot, Ruby's excellent Twitter Api wrapper and Rake, read on my dear.

Installing the required RubyGems

Prior to diving into the implementation details of the given scenario I had to install the required RubyGems like shown in the next console snippet. The installation of the twitter gem might take a while due to it's dependency on several other gems.
sudo gem install hpricot rake twitter

Scraping the latest blog post details with Hpricot

The initial implementation step was to gather relevant metadata (Url, title and used tags) of the latest blog post. I first took the route to get it by grabbing the blog's RSS feed and extracting the metadata from there, but soon stumbled into problems getting an outdated feed from Feedburner. The next alternative was to scrape the needed metadata directly from the blog landing page. As I went this route before with the Zend_Dom_Query component of the Zend Framework I decided to use something similar from the Ruby toolbox. Some Google hops later I was sold to Hpricot, a HTML Parser for Ruby and as you can see in the first code snippet, showing an extract of the Rake file to come, this is done in just 13 lines of code.
doc = Hpricot(open(blog_landing_page, scrape_options))
latest_post_url = doc.at('h3.post-title > a')['href']
latest_post_title = doc.at('h3.post-title > a').inner_html
label_doc = Hpricot(doc.search('span.post-labels').first.to_s)
label_links = label_doc.search('span.post-labels > a').each do |label_link|
label = label_link.inner_html.gsub(' ', '').downcase
if label.include?('/')
labels = label.split('/')
labels.each { |label| last_post_labels.push(label) }
else
last_post_labels.push(label)
end
end

Outstanding tasks

With the metadata available the oustanding tasks to implement were:
  • to get a short Url for the actual blog post by utilzing a public API of an Url shortening service i.e. is.gd
  • to build the tweet to broadcast by injecting the available metadata into a tweet template
  • to broadcast the notification tweet to the given Twitter account
  • to log the broadcasted blog title to prevent spamming or duplication scenarios
As a guy sold to build tools and eager to learn something new I subverted Rake, Ruby's number one build language, to glue the above mentioned tasks and their implementation together, to manage their sequential dependencies and to have a comfortable invocation interface. The nice thing about Rake is that it allows you to implement each tasks unit of work by using the Ruby language; and there is no need to follow a given structure to implement custom tasks like it's the case for custom Phing tasks. As you will see in the forthcoming complete Rakefile some of the tasks are getting quite long and complex; therefor some of them are pending candidates for Refactoring activities like for example extract task units of work into helper/worker classes.
  require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'twitter'

task :default do
Rake::Task['blog_utils:broadcast_notification'].invoke
end

namespace :blog_utils do

scrape_options = { 'UserAgent' => "Ruby/#{RUBY_VERSION}" }
blog_landing_page = 'http://raphaelstolt.blogspot.com'
latest_post_short_url, latest_post_url, latest_post_title = nil
notification_tweet = nil
last_post_labels = []
broadcast_log_file = File.dirname(__FILE__) + '/broadcasted_posts.log'
twitter_credentials = { :user => 'raphaelstolt', :pwd => 'thatsasecret'}

desc 'Scrape metadata of latest blog post from landing page'
task :scrape_actual_post_metadata do
doc = Hpricot(open(blog_landing_page, scrape_options))
latest_post_url = doc.at('h3.post-title > a')['href']
latest_post_title = doc.at('h3.post-title > a').inner_html
label_doc = Hpricot(doc.search('span.post-labels').first.to_s)
label_links = label_doc.search('span.post-labels > a').each do |label_link|
label = label_link.inner_html.gsub(' ', '').downcase
if label.include?('/')
labels = label.split('/')
labels.each { |label| last_post_labels.push(label) }
else
last_post_labels.push(label)
end
end
end

desc 'Shorten the Url of the latest blog post'
task :shorten_post_url => [:scrape_actual_post_metadata] do
raise_message = 'No Url for latest blog post available'
raise raise_message if latest_post_url.nil?
url_shorten_service_call = "http://is.gd/api.php?longurl=#{latest_post_url}"
latest_post_short_url = open(url_shorten_service_call, scrape_options).read
end

desc 'Check if generate shorten Url references the latest blog post url'
task :check_shorten_url_references_latest do
url_referenced_by_short_url = nil
open(latest_post_short_url, scrape_options) do |f|
url_referenced_by_short_url = f.base_uri.to_s
end
raise_message = "Generated short Url '#{latest_post_short_url}' does not"
raise_message << " reference actual blog post url '#{latest_post_url}'"
raise raise_message unless url_referenced_by_short_url.eql?(latest_post_url)
end

desc 'Check if latest blog post has already been broadcasted'
task :check_logged_broadcasts do
logged_broadcasts = []
if File.exist?(broadcast_log_file)
File.open(broadcast_log_file, 'r') do |f|
logged_broadcasts = f.readlines.collect { |line| line.chomp }
end
end
raise_message = "Blog post '#{latest_post_title}' has already been "
raise_message << "broadcasted"
raise raise_message if logged_broadcasts.include?(latest_post_title)
end

desc 'Build notification tweet by injecting scraped metadata into template'
task :build_notification_tweet => [:shorten_post_url,
:check_shorten_url_references_latest] do
raise_message = 'Required metadata to build tweet is not available'
raise raise_message if latest_post_title.nil? || latest_post_short_url.nil?
raise raise_message if last_post_labels.nil?

notification_tweet = "Published a new blog post '#{latest_post_title}' "
notification_tweet << "available at #{latest_post_short_url}."

raise_message = 'Broadcast for latest blog post exceeds 140 characters'
raise raise_message if notification_tweet.length > 140

last_post_labels.each do |tag|
notification_tweet << " ##{tag}" unless notification_tweet.length +
" ##{tag}".length > 140
end
end

desc 'Broadcast latest blog post notification to twitter'
task :broadcast_notification_to_twitter => [:build_notification_tweet,
:check_logged_broadcasts] do
raise_message = "Notification tweet to broadcast is not available"
raise raise_message if notification_tweet.nil?
puts "Broadcasting '#{notification_tweet}'"
http_auth = Twitter::HTTPAuth.new(twitter_credentials[:user], twitter_credentials[:pwd])
Twitter::Base.new(http_auth).update(notification_tweet)
#Twitter::Base.new(twitter_credentials[:user], twitter_credentials[:pwd]).post(notification_tweet)
Rake::Task['blog_utils:log_broadcast_title'].invoke
end

desc 'Log broadcasted blog post title'
task :log_broadcast_title do
puts "Logging latest post title to #{broadcast_log_file}"
File.open(broadcast_log_file, 'a') do |f|
f.puts latest_post_title
end
end

end

Putting the Rake task(s) to work

The next step was to put the Rakefile into my $HOME directory; and after publishing a new blog post I'm now able to broadcast an automated notification by firing up the console and calling the Rake task like shown next.
sudo rake -f $HOME/Automations/Rakefile.rb blog_utils:broadcast_notification
And as I'm too lazy to type this lengthy command everytime I further added an alias to the $HOME/.profile file which allows me to call the task via the associated alias i.e. blogger2twitter shown in the .profile excerpt.
alias blogger2twitter='sudo rake -f $HOME/Automations/Rakefile.rb blog_utils:broadcast_notification'
After running the Rake task against this blog post the notification gets added to the given Twitter timeline like shown in the outro image.

Notification tweet screenshot

2 comments:

Anonymous said...

I like the fact that you built it using ruby, the code is very easy to follow. One thing though, the password "thatsasecret" doesn't seem to work :)

Raphael Stolt said...

Hi Federico, was just the perfect language/tool for this job and I really enjoyed using it.