Broadcasting blog post notifications to Twitter with Ruby and Rake
During my latest blogging absence I had some time to tinker around with Ruby. For an introductory challenge I chose to implement a real life feature which currently isn't supported by Blogger.com and screams siren-like for an one-button automation: Broadcasting the latest blog entry to my Twitter account. As I didn't want to sign up for a Twitterfeed account and couldn't resort to the Twitter Tools plugin like WordPress users, I had to perform these broadcasting steps manually, until now. To see how this repetitive and time-stealing process was transformed into a semi-automated one by utilizing Ruby, a splash of Hpricot, Ruby's excellent Twitter Api wrapper and Rake, read on my dear.
Installing the required RubyGems
Prior to diving into the implementation details of the given scenario I had to install the required RubyGems like shown in the next console snippet. The installation of the twitter gem might take a while due to it's dependency on several other gems.sudo gem install hpricot rake twitter
Scraping the latest blog post details with Hpricot
The initial implementation step was to gather relevant metadata (Url, title and used tags) of the latest blog post. I first took the route to get it by grabbing the blog's RSS feed and extracting the metadata from there, but soon stumbled into problems getting an outdated feed from Feedburner. The next alternative was to scrape the needed metadata directly from the blog landing page. As I went this route before with the Zend_Dom_Query component of the Zend Framework I decided to use something similar from the Ruby toolbox. Some Google hops later I was sold to Hpricot, a HTML Parser for Ruby and as you can see in the first code snippet, showing an extract of the Rake file to come, this is done in just 13 lines of code.doc = Hpricot(open(blog_landing_page, scrape_options))
latest_post_url = doc.at('h3.post-title > a')['href']
latest_post_title = doc.at('h3.post-title > a').inner_html
label_doc = Hpricot(doc.search('span.post-labels').first.to_s)
label_links = label_doc.search('span.post-labels > a').each do |label_link|
label = label_link.inner_html.gsub(' ', '').downcase
if label.include?('/')
labels = label.split('/')
labels.each { |label| last_post_labels.push(label) }
else
last_post_labels.push(label)
end
end
Outstanding tasks
With the metadata available the oustanding tasks to implement were:- to get a short Url for the actual blog post by utilzing a public API of an Url shortening service i.e. is.gd
- to build the tweet to broadcast by injecting the available metadata into a tweet template
- to broadcast the notification tweet to the given Twitter account
- to log the broadcasted blog title to prevent spamming or duplication scenarios
require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'twitter'
task :default do
Rake::Task['blog_utils:broadcast_notification'].invoke
end
namespace :blog_utils do
scrape_options = { 'UserAgent' => "Ruby/#{RUBY_VERSION}" }
blog_landing_page = 'http://raphaelstolt.blogspot.com'
latest_post_short_url, latest_post_url, latest_post_title = nil
notification_tweet = nil
last_post_labels = []
broadcast_log_file = File.dirname(__FILE__) + '/broadcasted_posts.log'
twitter_credentials = { :user => 'raphaelstolt', :pwd => 'thatsasecret'}
desc 'Scrape metadata of latest blog post from landing page'
task :scrape_actual_post_metadata do
doc = Hpricot(open(blog_landing_page, scrape_options))
latest_post_url = doc.at('h3.post-title > a')['href']
latest_post_title = doc.at('h3.post-title > a').inner_html
label_doc = Hpricot(doc.search('span.post-labels').first.to_s)
label_links = label_doc.search('span.post-labels > a').each do |label_link|
label = label_link.inner_html.gsub(' ', '').downcase
if label.include?('/')
labels = label.split('/')
labels.each { |label| last_post_labels.push(label) }
else
last_post_labels.push(label)
end
end
end
desc 'Shorten the Url of the latest blog post'
task :shorten_post_url => [:scrape_actual_post_metadata] do
raise_message = 'No Url for latest blog post available'
raise raise_message if latest_post_url.nil?
url_shorten_service_call = "http://is.gd/api.php?longurl=#{latest_post_url}"
latest_post_short_url = open(url_shorten_service_call, scrape_options).read
end
desc 'Check if generate shorten Url references the latest blog post url'
task :check_shorten_url_references_latest do
url_referenced_by_short_url = nil
open(latest_post_short_url, scrape_options) do |f|
url_referenced_by_short_url = f.base_uri.to_s
end
raise_message = "Generated short Url '#{latest_post_short_url}' does not"
raise_message << " reference actual blog post url '#{latest_post_url}'"
raise raise_message unless url_referenced_by_short_url.eql?(latest_post_url)
end
desc 'Check if latest blog post has already been broadcasted'
task :check_logged_broadcasts do
logged_broadcasts = []
if File.exist?(broadcast_log_file)
File.open(broadcast_log_file, 'r') do |f|
logged_broadcasts = f.readlines.collect { |line| line.chomp }
end
end
raise_message = "Blog post '#{latest_post_title}' has already been "
raise_message << "broadcasted"
raise raise_message if logged_broadcasts.include?(latest_post_title)
end
desc 'Build notification tweet by injecting scraped metadata into template'
task :build_notification_tweet => [:shorten_post_url,
:check_shorten_url_references_latest] do
raise_message = 'Required metadata to build tweet is not available'
raise raise_message if latest_post_title.nil? || latest_post_short_url.nil?
raise raise_message if last_post_labels.nil?
notification_tweet = "Published a new blog post '#{latest_post_title}' "
notification_tweet << "available at #{latest_post_short_url}."
raise_message = 'Broadcast for latest blog post exceeds 140 characters'
raise raise_message if notification_tweet.length > 140
last_post_labels.each do |tag|
notification_tweet << " ##{tag}" unless notification_tweet.length +
" ##{tag}".length > 140
end
end
desc 'Broadcast latest blog post notification to twitter'
task :broadcast_notification_to_twitter => [:build_notification_tweet,
:check_logged_broadcasts] do
raise_message = "Notification tweet to broadcast is not available"
raise raise_message if notification_tweet.nil?
puts "Broadcasting '#{notification_tweet}'"
http_auth = Twitter::HTTPAuth.new(twitter_credentials[:user], twitter_credentials[:pwd])
Twitter::Base.new(http_auth).update(notification_tweet)
#Twitter::Base.new(twitter_credentials[:user], twitter_credentials[:pwd]).post(notification_tweet)
Rake::Task['blog_utils:log_broadcast_title'].invoke
end
desc 'Log broadcasted blog post title'
task :log_broadcast_title do
puts "Logging latest post title to #{broadcast_log_file}"
File.open(broadcast_log_file, 'a') do |f|
f.puts latest_post_title
end
end
end
Putting the Rake task(s) to work
The next step was to put the Rakefile into my $HOME directory; and after publishing a new blog post I'm now able to broadcast an automated notification by firing up the console and calling the Rake task like shown next.sudo rake -f $HOME/Automations/Rakefile.rb blog_utils:broadcast_notificationAnd as I'm too lazy to type this lengthy command everytime I further added an alias to the $HOME/.profile file which allows me to call the task via the associated alias i.e. blogger2twitter shown in the .profile excerpt.
alias blogger2twitter='sudo rake -f $HOME/Automations/Rakefile.rb blog_utils:broadcast_notification'After running the Rake task against this blog post the notification gets added to the given Twitter timeline like shown in the outro image.
2 comments:
I like the fact that you built it using ruby, the code is very easy to follow. One thing though, the password "thatsasecret" doesn't seem to work :)
Hi Federico, was just the perfect language/tool for this job and I really enjoyed using it.
Post a Comment