Building Cloud Search as a Service with AI
It's been almost a year since I joined Azure Search team. A lot has changed since then. I joined right after team doubled by merge with Text Analytics team with a mission to add intelligence to search. A few months later entire Cognitive Services (Azure Machine Learning APIs) platform team joined us. Then we hired additional developers to build scalable platform for both Cognitive Services and Azure Search. After that we also got a team of data scientists who are building the actual machine learning models. Now, as the Applied AI team, we are in the center of AI and Cloud at Microsoft.
Azure Search is a search-as-a-service cloud solution that gives developers APIs and tools for adding a rich search experience in web and mobile applications. You get for free things like autocomplete, suggestions, synonyms, results highlighting, facets, filters, sorting and paging. Functionality is exposed through REST API or .NET SDK. The biggest pain, which is infrastructure and availability are managed by us.
While having all of that, we also need a great developer experience. Everybody needs to be able to understand how to build that Search AI pipeline without spending hours on reading docs. This is another thing we are working on. Email me or tweet message me if you are interested in that kind of stuff.
Where are we going?
We want to build the best Search as a Service platform that enables developers to add Bing-like Google-like search experience to their websites. No need for hiring search experts who know what inverted index is. No challenges with shard allocation and how to implement master election properly. No need for distributed systems expertise to scale this for large amount of data. Last, but not least: no need for setting up, owning and managing the infrastructure. Everything is being taken care of by the platform. By the Cloud.
Our team is also working on market-leading Machine Learning APIs. We are going to utilize these ML models and enable you to search through not only text, but also through your images, audio and videos.
There is a lot of challenges in that journey. From processing large amounts of data, through doing it in reasonable time (performance/parallelization), to providing efficient user experience throughout the process.
Where are we now?
We already have fast, reliable and production-ready system for full-text search. You can provision it in no-time, scale by adding more replicas or partitions, and monitor using metrics we provide. You can query it with .NET SDK or using REST API. We even have Open Source UI generation tool that gets you started with the latter: AzSearch.js.
To learn more about current capabilities of Azure Search check this awesome presentation by Bryan Soltis:
There are two ways to populate your search index: by simply inserting documents (records) into it, or by using indexer - a mechanism that enables you to sync your search index with your data source (SQL or NoSQL Database, blob storage, etc.).
We have already started adding AI to our search pipeline, by enabling you to run text analytics and OCR on your data. If you are using indexer, you can create a skillset, which can detect people, entities, organizations, locations, key phrases, and language on the textual data. On top of that you can use OCR that can recognize text from your images, and enable you to search through that text. You can also run mentioned text analytics on recognized text. We call this approach Cognitive Search. Here is a quick video by Brian and Corom from our team, with a sneak peak of what's possible:
Last year we created a prototype of Cognitive Search, using JFK files that went public. You can check out our JFK files website, github repo and below video from Connect(); conference in 2017, where Corom explaines how he built a pipeline to achieve what is possible now with just checking the checkbox:
We announced Cognitive Search at the //build conference earlier this year. Together with NBA we built a website that allows you to search through player's photos. You can search for players, their shoes or correlations between them:
Similar approach can be used for variety of different scenarios. From filtering your family photos, through analyzing medical records data, to deciding which crypto-currency to buy. Now, all these PDFs and doc documents you have on your hard drive can be used to make an informed business decision.
There are a lot of companies using Azure Search in production. It's super exciting for me that Real Madrid is using Azure Search. It's my favorite football club since I was a kid.
How's the team?
My favorite thing about our team are the people. Every single person is bringing something else to the table, and there is something you can learn from each one of them. From distributed systems expertise, through API design, to building efficient monitoring infrastructure that enables to maintain production cloud service. One of our team members is Henrik Frystyk Nielsen who is best known for his pioneering work on the World Wide Web and subsequent work on computer network protocols. Currently he works on encapsulating Machine Learning models into containers. Our manager, Pablo Castro started not only Azure Search, but also OData protocol and LINQ to Entities. Our Project Manager Lance Olson was one of the founders of the .NET! You can check out what people say about our team on blind! Search for "Azure Search" ;) There is also a blog post written by Pablo a few years ago: Startup at Microsoft. A lot has changed since then. We went through a few rounds of "funding", and our team grew. However, we still believe in core values expressed there. For example: every engineer from the team still talks to customers on daily basis either through social media or directly over email or Skype.
BTW: We are hiring!