Oct 23, 2014

The making of: Franky's Notes Azure Search - part 1


For a long time now, I'm thinking about creating an API that will allow to search easily through my notes. When Azure Search came public few weeks ago, I knew it was what this project needed to come alive. In this post, I will share how I did it, and more importantly, show how incredibly easy it was to do.


What's Azure Search?


Currently in preview, Azure Search is a cloud-based search-as-a-service that provides a set of REST APIs defined in terms of HTTP requests and responses, in OData JSON format.

Getting Started


From the Azure Portal, let's create an Azure Search Service by clicking the plus button on the bottom left of the screen. Select the Search option, and fill-up the options.

Azure_portal_crete_Search_Service_2014-10-20_0931

Application to populate my Azure Search service


First, we will need some data. My weekly posts Reading Notes are generated with a Ruby script that I did few years ago. You can read more about it on First step with Ruby: Kindle Clipping Extractor. Basically, the script extracts my notes from my Kindle and build a collection of notes grouped in different categories to generate a markdown file. That can easily be done by adding a new Json output file. Here is a quick view this output.
{
  "json_class": "FrankyNotes",
  "categories": {
    "dev": [
      {
        "id": 77077357,
        "title": "Customize the MVC 5 Application Users’ using ASP.Net Identity 2.0",
        "author": "Dhananjay kumar",
        "url": "http://debugmode.net/2014/10/01/customize-the-mvc-5-application-users-using-asp-net-identity-2-0/",
        "note": "Need to get the fukk article",
        "tags": "dev,frankysnotes,readingnotes160",
        "date": "2014/10/17",
        "category": "dev"
      },
      {
        "id": 77156372,
        "title": "Custom Login Scopes, Single Sign-On, new ASP.NET Web API – updates to 
      [...]

Now that we have some data, we need to create an index and be able to add document in it. A console application will be perfect for this job. At the time of writing this post, two libraries exist to interact with the Microsoft Azure Search REST API. For this part of the project, we will use the RedDog.Search library available on Github, since it's a .Net library.

Note: To create an index or upload documents you will need an admin key.

Admin_Key

First, we need to create an Index. Let's keep it simple and just create the index with all the properties of the json object. Here the code of my function CreateNoteIndex.
public IndexManagementClient Client
{
    get
    {
        if (_client == null){
            _client = new IndexManagementClient(ApiConnection.Create("frankysnotes", "AdminKey"));
        }
        return _client;
    }
}

public async Task<string> CreateNoteIndex()
{
    var createResult = await Client.CreateIndexAsync(new Index("notes")
        .WithStringField("id", opt => opt.IsKey().IsRetrievable())
        .WithStringField("title", opt => opt.IsRetrievable().IsSearchable())
        .WithStringField("author", opt => opt.IsRetrievable().IsSearchable())
        .WithStringField("url", opt => opt.IsRetrievable().IsSearchable(false))
        .WithStringField("note", opt => opt.IsRetrievable().IsSearchable())
        .WithStringField("tags", opt => opt.IsRetrievable().IsFilterable().IsSearchable())
        .WithStringField("date", opt => opt.IsRetrievable().IsSearchable())
        .WithStringField("category", opt => opt.IsRetrievable().IsFilterable().IsSearchable())
        );
    if (createResult.IsSuccess)
    {
        return "Index Reseted successfully";
    }
}

To be able to search by note instead of by post, I decided to break down the file in multiple documents containing one note by document. After what, it was really easy to upload the documents into the index.
public async Task<string> AddNotes(string filepath)
{
    var docs = new List<IndexOperation>();
    FrankysNotes notes = DeserializeFNotes(filepath);

    foreach (var category in notes.categories)
    {
        foreach (var fNote in notes.categories[category])
        {
            var doc = ConvertfNote(fNote);
            docs.Add(doc);
        }
    }

    var result = await Client.PopulateAsync("notes", docs.ToArray<IndexOperation>());

    return "File uploaded successfully";
}


private FrankysNotes DeserializeFNotes(string filepath)
{
    var jsonStr = File.ReadAllText(filepath);
    var serializer = new JavaScriptSerializer();

    var notes = serializer.Deserialize<FrankysNotes>(jsonStr);
    return notes;
}

private IndexOperation ConvertfNote(FrankysNote fnote)
{
    var doc = new IndexOperation(IndexOperationType.Upload, "id", fnote.id)
                    .WithProperty("title", fnote.title)
                    .WithProperty("author", fnote.author)
                    .WithProperty("url", fnote.url)
                    .WithProperty("note", fnote.note)
                    .WithProperty("tags", fnote.tags)
                    .WithProperty("date", fnote.date)
                    .WithProperty("category", fnote.category);
    return doc;
}

To keep the code as clear as possible, I removed all validations and error management. The json file is deserialized, then looping through all notes I build a list of IndexOperation. And Finally I upload all the notes with Client.PopulateAsync("notes", docs.ToArray<IndexOperation>());

Wrapping up


Using the RedDog.Search library to push documents in Azure Search Index was extremely easy. In fact, it's that simplicity that pushed me to share my discovery. In the next part of the series, I will create a simple HTML page to do real query.

Stay tune...

~ Frank Boucher

References

No comments:

Post a Comment