Nov 15, 2011

First step with Ruby: Kindle Clipping Extractor


In this post, I'm sharing my last (and first) Ruby script. The script is available on bitbicket here, and you can see the output result on any Reading Notes posts on my blog here: Reading Notes.  I’m still learning Ruby, so feel free to leave me a comment.  I will continue to update the script as I’m getting more comfortable with my new skill.

Starting Idea

If you are reading this post there is a good chance you already know that I’m posting every weeks a post about my weekly reading notes.  I use Instapaper to bookmark all my reading stuff and send it to my Kindle.  So order to make my weekly post I need to pass-through the “My Clippings.txt” file on my kindle, than go back on Instapaper, found the link to this article then put all this information together in a nice readable format. 
So the idea was to speedup this process.

Kindle part

First thing first I need to retrieve all my notes in the “My Clippings.txt” file.  I start trying to do it my self and the result was… not good.  Then I found the really nice gem kindleclippings that was doing exactly what I want: parsing the file and give a array of all the notes with all the information about this note well organize with properties.  So my job was to use it correctly… not to bad.

require 'kindleclippings'

parser = KindleClippings::Parser.new
clippings = parser.parse_file('My Clippings.txt')

#== Build the Html list ==
resume = "<ul>"

clippings.notes.each do |note|
  resume << "\n<li>\n<b><a href=\"#\">#{note.book_title}</a></b> - #{note.content}\n"

  resume << "#{GetBookHightlight(clippings.by_book(note.book_title))}\n</li>"
end

resume << "\n</ul>"

puts resume

As you can see nothing complicated, looping in the notes building a unsorted list (UL).

Instapaper part

Getting the reference link of the article is a little need more work. I need to login into Instapaper with my account, then found the matching bookmark.  Once the good bookmark is found I need to extract url to the full article.  In the same time I will move this bookmark to another folder to keep my “Unread” list short.

I'm using Watir to do my web scraping.  This nice gem is very well done and can be use for testing user interfaces, but this will be in another post.

So first thing first I need to login. Here again nothing complexes get username and password and using it to login.
def InstapaperOpen()
  browser = Watir::Browser.new
  browser.goto 'http://www.instapaper.com/user/login'
  
  puts "What is your Username? "
  username ||= gets.chomp
  
  puts "What is your Password? "
  password ||= gets.chomp

  browser.text_field(:name => 'username').set(username)
  browser.text_field(:name => 'password').set(password)
  browser.button(:type => 'submit').click
  
  abort("Cannot login in Instapaper.") unless browser.link(:text => "Log out").exist?
  
  return browser
end

def InstaPaperClose(browser)
  browser.link(:text => "Log out").click
  browser.close
end

Next, I need a method to search and return the bookmark matching my Kindle note.


def SearchTitle(browser, title)
  rgTitle = Regexp.new Regexp.escape(title)
  
  if browser.link(:text, rgTitle).exist?
    anchor = browser.link(:text, rgTitle)
    return anchor
  end
  
  if browser.link(:href, /\/u\/\d+/).span(:text => "Older items").exist?
    #puts "... Searching older..."
    browser.link(:href, /\/u\/\d+/).span(:text => "Older items").click
    anchor = SearchTitle(browser, title)
  end
  
  return anchor
end


def MovePaperTo(paper, destinationFolder)
  #puts "paper: " + paper.id
  aDiv = paper.div(:class, 'secondaryControls')
  #puts "div: " + aDiv.id
  aLink = aDiv.link(:text,'Move')
  #puts "a: " + aLink.title
  bntMoveTo = paper.link(:text, destinationFolder)
  #puts "link: " + bntMoveTo.id
  bntMoveTo.click
end

2011-11-12_0808So in SearchTitle method using regex I’m looking for an anchor <a> matching the title.  If I didn’t found it I check if I found the “Older items >>” button to search deeper. I doing this calling recursively the method, until I found the matching bookmark or that no more “Older items >>” is present. 

In MovePaperTo I click on the Move link in the context menu of the bookmark.

Finally putting all this together (with some improvements) and I got my final script!  Nice way to practice the basic rules of Ruby. Of course it could be more clean more "rubist", but at least it works and I now I have something to practice my refactoring skill…

Feel free to leave me your comments or suggestions here on this blog or on Bitbuket.

~Franky


References


1 comment:

  1. thanks for this, Franky. kindle users will most likely find script useful. i hope you can also feature http://clippingsconverter.com in your blog, or make use of this online tool in your next projects.

    ReplyDelete