Azure

Cognitive Search – Azure Search with AI

Cognitive Search

Today, at Microsoft //build conference we announced Cognitive Search. You may wonder what is Cognitive Search. To put it as simple as possible: it’s Azure Search powered by Cognitive Services (Azure Machine Learning APIs). You remember when you wanted to run some intelligence over your data with Cognitive Services? You had to handle creating, e.g., Text Analytics API, then writing code that would take your data from database, issue request to API (remember to use proper key!), serialize, deserialize data and put result in your database?

Now, with Cognitive Search, you can achieve that by checking one checkbox. You just need to pick a field on which you want to run analytics, and which cognitive services or skills (1 cognitive service usually contain multiple skills) to run. As for now we support 6 skills:

  1. Key phrases
  2. People
  3. Places
  4. Organizations
  5. Language
  6. OCR (Optical Character Recognition)

We output results directly to your search index.

Creating Intelligent Search Index

To take advantage of Cognitive Search you need to create Azure Search service in South-Central US or in West Europe. More regions coming soon!

To create search index powered by cognitive services you need to use ‘import data’ flow. Go to your Azure Search Service and click on ‘Import data’ command:

Cognitive Search - step 1

Then pick your data source (MSSQL, CosmosDB, blob storage etc.). I will choose sample data source that contains real estate data:

Cognitive Search - import data

Now, you need to pick a field on which you want to run analytics. I will choose description. You also need to choose which cognitive services (skills) you want to run, and provide output field names (fields to which we will output cognitive services analysis result):

Cognitive Search - skillset definition

In the next step you need to configure your index. Usually you want to make fields retrievable, searchable, and filterable. You may also consider making them facetable if you want to aggregate results. This is my sample configuration:

Cognitive search - define index

In the last step you just need to configure indexer – a tool that synchronizes your data source with your search index. In my case I will choose to do synchronization only once, as my sample data source will never change.

Cognitive Search - create indexer

After indexer finish you can browse your data, and cognitive services results in search explorer.

Cognitive Search - browse

You can also generate more usable search UI for your data with AzSearch.js.

Generating UI to search data with AzSearch.js

If you don’t like browsing your data with search explorer in Azure Portal that returns raw JSON, you can use AzSearch.js to quickly generate UI over your data.

The easiest way to get started is to use AzSearch.js generator. Before you start, enable CORS on your index:

Cognitive search - CORS

Once you get your query key and index definition JSON paste it into generator together with your search service name, and click ‘Generate’. An html page with simple search interface will be created.

Cognitive Search - AzSearch.js

This site is super easy to customize. Providing html template for results change JSON into nicely formatted search results:

Cognitive search - AzSearch.js pretty

All what I did was to create HTML template:

    const resultTemplate =
        `<div class="col-xs-12 col-sm-5 col-md-3 result_img">
            <img class="img-responsive result_img" src={{thumbnail}} alt="image not found" />
        </div>
        <div class="col-xs-12 col-sm-7 col-md-9">
            <h4>{{displayText}}</h4>
            <div class="resultDescription">
                {{{summary}}}
            </div>
            <div>
                sqft: <b>{{sqft}}</b>
            </div>
            <div>
                beds: <b>{{beds}}</b>
            </div>
            <div>
                baths: <b>{{baths}}</b>
            </div>
            <div>
                key phrases: <b>{{keyPhrases}}</b>
            </div>
        </div>`;

And add it to already present addResults function call:

automagic.addResults("results", { count: true }, resultTemplate);

I also created resultsProcessor to do some custom transformations. I.e., join few fields into one, truncate description to 200 characters, and convert key phrases from array into string separated by commas:

var resultsProcessor = function(results) {
        return results.map(function(result){
            result.displayText = result.number + " " + result.street+ " " +result.city+ ", " +result.region+ " " +result.countryCode;
            var summary = result.description;
            result.summary = summary.length &lt; 200 ? summary : summary.substring(0, 200) + "...";
            result.keyPhrases = result.keyphrases.join(", ");
            return result;
        });
    };
    automagic.store.setResultsProcessor(resultsProcessor);

You can do similar customization with suggestions. You can also add highlights to your results and much more. Everything is described in AzSearch.js README. We also have starter app written with TypeScript and React based on sample real estate data, which takes advantage of more advanced features of AzSearch.js. If you have any questions or suggestions regarding AzSearch.js let me know on Twitter!

Summary

Cognitive Search takes analyzing data with Azure Search to the next level. It takes away the burden of writing your own infrastructure for running AI-based analysis. For more advanced analysis, including OCR on your images, check out our docs. I am super excited to see it in action, and for the next improvements that we are working on. Let us know what do you think!

*This blog post was written in Boeing 787 during my flight from Toronto to São Paulo, when I was on my way to QCon conference.


Add custom metadata to Azure blob storage files and search them with Azure Search

Did you know that you can add custom metadata to your blob containers, and even to individual blob files?

You can do it in the Azure Portal, using SDK or REST API.

The most common scenario is adding metadata during file upload. Below code is uploading sample invoice from disk, and adds year, month, and day metadata properties.

const string StorageAccountName = "";
const string AccountKey = "";
const string ContainerName = "";

string ConnectionString = $"DefaultEndpointsProtocol=https;AccountName={StorageAccountName};AccountKey={AccountKey};EndpointSuffix=core.windows.net";
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(ConnectionString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference(ContainerName);

const string FileName = "Invoice_2017_01_01";
using (var fileStream = System.IO.File.OpenRead([email protected]"D:\dev\BlobMetadataSample\invoices\{FileName}.pdf"))
{
    var fileNameParts = FileName.Split('_');
    var year = fileNameParts[1];
    var month = fileNameParts[2];
    var day = fileNameParts[3];

    var blob = container.GetBlockBlobReference(FileName);
    blob.Metadata.Add("year", year);
    blob.Metadata.Add("month", month);
    blob.Metadata.Add("day", day);
    blob.UploadFromStream(fileStream);

    var yearFromBlob = blob.Metadata.FirstOrDefault(x => x.Key == "year").Value;
    var monthFromBlob = blob.Metadata.FirstOrDefault(x => x.Key == "month").Value;
    var dayFromBlob = blob.Metadata.FirstOrDefault(x => x.Key == "day").Value;

    Console.WriteLine($"{blob.Name} ({yearFromBlob}-{monthFromBlob}-{dayFromBlob})");
}

If you just want to add metadata to existing blob, instead of calling blob.UploadFromStream(fileStream) you can run blob.SetMetadata().

When you create new index for blob in Azure Search, we will automatically detect these fields. If you already have Azure Search index created, you can add new fields (has to be the same as metadata key), and all changes will be synchronized with next re-indexing.


I am joining Cloud AI team to work on Azure Search

Azure Search

It has been over 3 years since I joined the Azure Portal team. During that time I learned a lot about every aspect of web and mobile development. I delivered over 20 technical talks at different conferences around the World and local meetups. It was amazing to take the new Portal from preview to v1. In the meantime, during the //oneweek hackathon, together with a few other folks, we built a prototype of the Azure Mobile App. After getting feedback from Scott Guthrie who said that “it would be super useful” I started working on the app overnight.

I didn’t know much about mobile development at the time, but I wanted to learn. I didn’t know much about complexities of Active Directory authentication and Azure Resource Manager APIs. I just knew that it would be super cool to have an app that would allow me to check the status of my Azure resources while waiting for my lunch. Receiving a push notification, and being able to scale VM from my phone would be also tremendously valuable.

When I started working on the app full time, my dream came true. I could truly connect my passion with work. I enjoyed the long hours, and late nights we all put to make it happen. The day when Scott Hanselman presented the Azure App at the //build conference was on of the best days of my life.

Now, when the Azure App is released, and backed by great team, I can move to the next challenge.

Machine learning is becoming part of every aspect of our lives. Over last few years, ML crossed a threshold necessary to be extremely useful. I always wanted to be part of it. I took a great Coursera class by Andrew Ng, I started overnight project StockEstimator and I got involved in SeeingAI to learn how Real-World Machine Learning looks like.

Now, I’m taking it to the next level. I am joining Azure Search Team to lead their User Experience. I will be responsible for bringing the product to customers. While using my existing web development knowledge, I will have an amazing opportunity to learn more about Big Data, AI and ML.

Azure Search is managed cloud search service that offers scalable full-text search over multiple languages, geo-spatial search, filtering and faceted navigation, type-ahead queries, hit highlighting, and custom analyzers. You can find more details in this talk by Pablo Castro (Azure Search manager and creator of Open Data Protocol).

The cool thing about working for Microsoft is that you may end up working with person who created HTTP protocol. Henrik Frystyk Nielsen, former Tim Berners-Lee’s student, who shared office with Håkon Wium Lie (creator of CSS), joined my new team this month. What’s even cooler, he is sitting next to me 🙂

In my new office with Henrik:

Henrik Frystyk Nielsen and Jacob Jedryszek

If you want to learn more about all the cool stuff we are doing at Cloud AI group there is an awesome .NET Rocks Podcast with Joseph Sirosh. Check it out!

There is also awesome talk by Joseph from the last Connect(); conference, which includes JFK files demo presented by Corom Thompson from my team (creator of How-Old.NET). In that demo Corom showcases how you can use Azure Search and Cognitive Services to explore JFK files. Super cool! You can see demo in below video, and code on github.

It has never been a better time to work on the intersection of Cloud and Artificial Intelligence!


Azure Resource Manager Batch API

The latest Azure Mobile App update has statuses on the resources list:

Azure App - Statuses on resources list

You probably want to ask why we didn’t have them before. Great question! Currently Azure Resource Manager (public API we are using to get your Azure resources) requires to make separate calls to get single resource status. It means: if you have 100-200 resources, you would have to make 100-200 extra calls. There are some people who has almost 2000 in one subscription! Taking performance and data usage into consideration, this is not ideal.

Both iOS and Android platforms allows to address this problem to some extent by querying for status only resources that are currently visible. However this is still extra 5-10 calls. It is even worse when you start scrolling, and very bad if you scroll on your list containing 2000 resources.

Batch API

Sometime ago ARM added Batch API – you can send POST request with up to 20 URIs in the body. Response will contain up to 20 packaged responses that you have to extract. Using batch API, you can decrease number of requests by up to 20x. This matters especially when user has a lot of resources and keep scrolling on the list.

When implementing batch requests, you need to figure out the optimal interval for sending requests. We started with 200ms, but then we changed it to 50ms. Additionally, every time new request is coming we delay sending batch request by additional 50ms. This may cause indefinite delay. In order to solve this: we always submit request if queue has 20 or more pending requests. 20*50ms = 1000ms = 1s = long time! We tweaked it again, and changed interval to 20ms. With current implementation, we wait anytime between 20ms and 400ms to send batch request.

Implementing Batch API

You probably gonna say: “it all sounds great, but how do I implement it”? For you convenience I created small console application that demonstrate ARM Batch API in action, and I put it on github.

Xamarin.iOS and Xamarin.Android does not have System.Threading.Timer. We created our own implementation OneShotTimer (thanks William Moy!).

Entire magic happens in ArmService. It has one public method GetResource that instead of directly sending GET request is adding request to ConcurrentQueue. OneShotTimer and BatchRequestDipatcher methods are responsible for sending the actual HTTP request.

In order to run console app, you need to provide ARM token, and (optionally) resource ids you want to request. In demo app I provided fake resource ids, which will be fine to issue requests, but you will not get resource back.

To get ARM token, go to Azure Portal, open F12 tools and inspect some ARM request. From request headers, copy Authorization header (string starting with Bearer rAnDoMcHaRacTErS...):

Azure Portal - ARM token

You can also get resources ids from F12 tab. The best way is to go to All Resources blade, and find some batch request:

Azure Portal - resources ids

Once you paste resource ids and ArmToken in Program.cs you can run the app, and you should see the following output:

Batch requests with 5s randomness

Requests are send in random time, anytime from 0 to 5s after program runs. This is done using Task.Delay:

var tasks = _resourceIds.Select(async resourceId =>
            {
                await Task.Delay(new Random().Next() % 5000);   // simulate calling GetResource from different parts of UI
                var response = await _armService.GetResource(resourceId);
                resources.Add(response);
            });

When you change randomness from 5s to 0.5s you can observe that there will be less batch requests (AKA more requests sent in single batch):

Batch requests with 0.5s randomness

Summary

Using Batch API for getting resource statuses visibly improves performance in the mobile app. It is noticeable especially when using network data.

Azure Resource Manager has plans to add ARM API that will allow to do 1 request to get multiple resources with statuses. This should improve performance even more in the future.

If you are facing similar problem with your app, consider implementing Batch API on your server!


Azure Mobile App on Azure Friday

Last month I had a pleasure to talk about Azure App with Scott Hanselman on his show Azure Friday.

In this talk I presented Azure App architecture, our CI/CD infrastructure and how we took advantage of various benefits we get thanks to building app with Xamarin. Check it out!

If you want to learn more details about how we built the Azure App, check out my blog post Under the hood of the Azure Mobile App.

My talk was pretty technical. Deep into the meat. However, the same week, Michael Flanakin gave high-level overview of Azure App and our future plans:

Have you tried Azure App yet?

It is available on iOS and Android.

We would love to hear your feedback through our User Voice!