Introducing Microsoft Azure Search and the RedDog.Search Client SDK

Related Post:
Managing Microsoft Azure Search with the RedDog Search Portal

Related Post on Just Azure: Getting Started With Microsoft Azure Search

Today Microsoft announced the public preview of Azure Search which allows you integrate advanced search capabilities to your own applications. The past year we had the opportunity to use Azure Search in a few of our projects and it was great to see how Azure Search fulfilled every requirement. Let’s take a closer look at this new service.

High-Level Architecture

Azure Search comes in two types of Search Services: a free/shared service with a maximum of 3 indexes and a dedicated service. In a dedicated service you can allocate one or more search units. The number of search units deployed depends on the number of replicas and partitions you add to your application. A search unit can contain a replica, a partition or both. More partitions, replicas or partition & replica will increase the number of search units. An example:

  • 1 replica + 1 partition = 1 search unit
  • 6 replicas + 1 partition = (1 replica & 1 partition) + 5 replicas = 6 search units
  • 2 replicas + 2 partitions = (1 replica & 1 partition) + (1 replica & 1 partition) = 2 search units

To increase the performance (QPS) you will add more replicas. To increase the document count and size you will add more partitions.

The Client SDK

The portal allows you to create new search services and scale the number of partitions/replicas.

To create indexes, upload data and launch searches you can use the REST API. At the moment there’s no official Search SDK. The past few weeks I implemented a .NET client which is available on GitHub and on NuGet.

Index Management

The client comes with a fluent API which makes it really easy to add and configure the fields in your index. The following example shows the different types of fields you can add to an index (String, Integer, DateTime, …) and the options you can enable for these fields:

  • Key: unique ID of the record in the index
  • Searchable: enable full-text search
  • Filterable: enables support for complex queries (OData syntax)
  • Retrievable: if we get the value of this field after a search
  • Sortable: enable sorting on the field
  • Suggestions: enable autocomplete searches
// Define an index with its fields.
var index = new Index("cars")
    .WithStringField("id", opt => opt.IsKey().IsRetrievable())
    .WithStringField("title", opt => opt.IsSearchable().IsFilterable().IsRetrievable())
    .WithStringField("description", opt => opt.IsSearchable().IsRetrievable())
    .WithStringField("make", opt => opt.IsSearchable().IsFilterable().IsRetrievable().IsSortable())
    .WithStringField("model", opt => opt.IsSearchable().IsFilterable().IsRetrievable().IsSortable())
    .WithIntegerField("year", opt => opt.IsRetrievable().IsSortable().IsFilterable())
    .WithDateTimeField("postedOn", opt => opt.IsFilterable().IsSortable().IsRetrievable())
    .WithGeographyPointField("location", opt => opt.IsFilterable().IsRetrievable().IsSortable())
    .WithStringCollectionField("options", opt => opt.IsFilterable().IsSearchable().IsRetrievable())
    .WithBooleanField("isStillForSale", opt => opt.IsRetrievable().IsFilterable());

// Add a scoring profile where hits on the title and the description will have a better score.
var byUserDescription = new ScoringProfile();
byUserDescription.Name = "byUserDescription";
byUserDescription.Text = new ScoringProfileText();
byUserDescription.Text.Weights = new Dictionary();
byUserDescription.Text.Weights.Add("title", 1.5);
byUserDescription.Text.Weights.Add("description", 1.5);
index.ScoringProfiles.Add(byUserDescription);

// Add a scoring profile where the most recent ads will have a better score.
var mostRecentAds = new ScoringProfile();
mostRecentAds.Name = "mostRecentAds";
mostRecentAds.Functions.Add(new ScoringProfileFunction()
{
    Type = ScoringProfileFunctionType.Freshness,
    FieldName = "postedOn",
    Boost = 1.5,
    Interpolation = InterpolationType.Linear,
    Freshness = new ScoringProfileFunctionFreshness { BoostingDuration = TimeSpan.FromDays(1) }
});
index.ScoringProfiles.Add(mostRecentAds);

// Create the index.
var managementClient = new IndexManagementClient(ApiConnection.Create("serviceName", "serviceKey"));
var result = await managementClient.CreateIndexAsync(index);
if (!result.IsSuccess)
{
    // Do something with result.Error
}

The sample also shows the creation of Scoring Profiles. Whenever you launch a search each record will receive a score, and the score is what defines the ranking of each record within the search results. A scoring profile allows you to influence these scores. In the previous example I showed you how to add weights to specific fields (as a result hits on these fields will get a better score) and how I added a freshness function (to give recent items a better score).

In addition to creating an index you can get/edit/delete the index, get a list of indexes and get the statistics (number of records, size) of your index:

// Update an index.
await managementClient.UpdateIndexAsync(index);

// Delete the cars index.
await managementClient.DeleteIndexAsync("cars");

// Get a list of all indexes.
var indexes = await managementClient.GetIndexesAsync();

// Get the cars index.
var cars = await managementClient.GetIndexAsync("cars");

// Get the statistics for a specific index.
var carsStatistics = await managementClient.GetIndexStatisticsAsync("cars");

Index Population

You’re able to add records to an index, merge existing records and delete records. These operations can all happen in batches of up to 1000 records. Merge and delete operations are based on the key field. Here is an example combining the different operations:

var results = await managementClient.PopulateAsync("cars", new[]
{
    new IndexOperation(IndexOperationType.Upload, "id", "392991932")
        .WithProperty("title", "Black Volvo for sale in Brussels")
        .WithProperty("description", "I really liked this car.")
        .WithProperty("make", "Volvo")
        .WithProperty("model", "S40")
        .WithProperty("year", 2005)
        .WithProperty("postedOn", new DateTimeOffset(2014, 8, 21, 0, 0, 0, TimeSpan.Zero))
        .WithProperty("location", new { type = "Point", coordinates = new[] { 50.8503396, 4.3517103 }})
        .WithProperty("options", new [] { "GPS", "Leather Seats" })
        .WithProperty("isStillForSale", true),
    new IndexOperation(IndexOperationType.Merge, "id", "165884993")
        .WithProperty("isStillForSale", false),
    new IndexOperation(IndexOperationType.Delete, "id", "498929")
});

if (!results.IsSuccess)
{
    // Do something with results.Error
}

foreach (var result in results.Body.Where(r => !r.Status))
{
    // Do something with result.errorMessage
}

Queries

The following example shows a few features available in the query API. Imagine a website where users are looking to buy a car. First they choose the make (Volvo) and they use a slider to choose the minimum year of the car. These are the values that you’ll see in the search filter using the OData query syntax (year gt 2002 and make eq ‘Volvo’). Since our overview only shows a few fields we don’t want to include all the properties available on the cars, we just need the ones we’ll display in the overview (to improve performance). That’s why we use the Select method (which uses the OData $select statement). I’m also making sure the results are ordered by the postedOn field and we’re highlighting the title. Highlighting will add a highlight value to the search results, wrapping each value that was found in an “em” tag, allowing me to show the words the user searched for in my HTML.

For this kind of search you’ll want to let the user know how many results we found, even if we’re maybe just showing a few results. That’s why I’m using the Count method (uses the OData $count).

var queryClient = new IndexQueryClient(connection);
var query = new SearchQuery("gps leather")
    .Count(true)
    .Select("title,description,make,model,year,postedOn,isStillForSale")
    .OrderBy("postedOn")
    .Highlight("title")
    .Filter("year gt 2002 and make eq 'Volvo'");
var searchResults = await queryClient.SearchAsync("cars", query);
foreach (var result in searchResults.Body.Records)
{
    // Do something with the properties: result.Properties["title"], result.Properties["description"]
}

Summary

As you can see it’s very easy to get started with Azure Search and integrate this in your applications. In the next posts I’ll cover how you can scale your service, do geo searches, manage your indexes with the RedDog Search Portal, do advanced queries, …

Enjoy!

About Sandrino Di Mattia

Sandrino Di Mattia is a Windows Azure Consultant at RealDolmen and a Windows Azure Insider. He lives and breathes Windows Azure.

  • silvano

    Dear Sandrino,

    nice articles!

    I was waiting long time for something similar from Microsoft.

    Will it be possible to index also PDF or Office Documents? I have an Azure application and have a lot of Data Documents and Structured Data, so would like to search the entire dataset.

    Thank you so much

    –silvano

  • Dan Friedman

    I feel like this article is too soon. We just found out that Azure Search even exists. We need articles that go over the basics of Azure Search before jumping into ways that may make the REST API easier, but I need to know “how much easier” will it be, before I start using a new SDK. Just my 2 cents

  • nm

    Thank you & very decent to start with. If we can have little bit more documentation and bit more features like returning Total Count of matching records with query result (will be useful in paging) that would be great.

  • Arun Kumar N R

    the dll is not signed and cannot refer in my assembly…can you please sign the assembly

  • Luke Mueller

    Great article Sandrino –

    Question for you – I can’t seem to specify an analyzer on a field – i keep getting “Bad Request” – does your API allow for specifying an analyzer such as “en.lucene” ?

    http://msdn.microsoft.com/en-us/library/azure/dn879793.aspx
    (seems like a bug)

    my index def (JSON):

    {

    “name”: “catalog”,

    “fields”:

    [

    { "name": "id", "type": "Edm.String", "key": true, "searchable": false, "filterable": false, "sortable": false, "facetable": false, "retrievable": true, "suggestions": false },

    { "analyzer" : "en.lucene", "name": "title", "type": "Edm.String", "key": false, "searchable": true, "filterable": true, "sortable": true, "facetable": false, "retrievable": true, "suggestions": true },

    { "analyzer" : "en.lucene", "name": "postContent", "type": "Edm.String", "key": false, "searchable": true, "filterable": true, "sortable": true, "facetable": false, "retrievable": true, "suggestions": false },

    { "analyzer" : "en.lucene", "name": "seoDescription", "type": "Edm.String", "key": false, "searchable": true, "filterable": true, "sortable": true, "facetable": false, "retrievable": true, "suggestions": false }

    ]

    }

    • Luke Mueller

      and of course as soon as i post this, i read a bit deeper and see i’m using the older version of the api…

      from http://msdn.microsoft.com/en-us/library/dn798941.aspx :

      (Available as an experimental feature only in api-version=2014-10-20-Preview).

      • sandrinodimattia

        This is now also supported in the client and the portal.

  • Bas Partovi

    When querying, there is a limit of 50 results/records. How can I change that?