Add custom metadata to Azure blob storage files and search them with Azure Search

Did you know that you can add custom metadata to your blob containers, and even to individual blob files?

You can do it in the Azure Portal, using SDK or REST API.

The most common scenario is adding metadata during file upload. Below code is uploading sample invoice from disk, and adds year, month, and day metadata properties.

const string StorageAccountName = "";
const string AccountKey = "";
const string ContainerName = "";

string ConnectionString = $"DefaultEndpointsProtocol=https;AccountName={StorageAccountName};AccountKey={AccountKey};";
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(ConnectionString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference(ContainerName);

const string FileName = "Invoice_2017_01_01";
using (var fileStream = System.IO.File.OpenRead([email protected]"D:\dev\BlobMetadataSample\invoices\{FileName}.pdf"))
    var fileNameParts = FileName.Split('_');
    var year = fileNameParts[1];
    var month = fileNameParts[2];
    var day = fileNameParts[3];

    var blob = container.GetBlockBlobReference(FileName);
    blob.Metadata.Add("year", year);
    blob.Metadata.Add("month", month);
    blob.Metadata.Add("day", day);

    var yearFromBlob = blob.Metadata.FirstOrDefault(x => x.Key == "year").Value;
    var monthFromBlob = blob.Metadata.FirstOrDefault(x => x.Key == "month").Value;
    var dayFromBlob = blob.Metadata.FirstOrDefault(x => x.Key == "day").Value;

    Console.WriteLine($"{blob.Name} ({yearFromBlob}-{monthFromBlob}-{dayFromBlob})");

If you just want to add metadata to existing blob, instead of calling blob.UploadFromStream(fileStream) you can run blob.SetMetadata().

When you create new index for blob in Azure Search, we will automatically detect these fields. If you already have Azure Search index created, you can add new fields (has to be the same as metadata key), and all changes will be synchronized with next re-indexing.

I am joining Cloud AI team to work on Azure Search

Azure Search

It has been over 3 years since I joined the Azure Portal team. During that time I learned a lot about every aspect of web and mobile development. I delivered over 20 technical talks at different conferences around the World and local meetups. It was amazing to take the new Portal from preview to v1. In the meantime, during the //oneweek hackathon, together with a few other folks, we built a prototype of the Azure Mobile App. After getting feedback from Scott Guthrie who said that “it would be super useful” I started working on the app overnight.

I didn’t know much about mobile development at the time, but I wanted to learn. I didn’t know much about complexities of Active Directory authentication and Azure Resource Manager APIs. I just knew that it would be super cool to have an app that would allow me to check the status of my Azure resources while waiting for my lunch. Receiving a push notification, and being able to scale VM from my phone would be also tremendously valuable.

When I started working on the app full time, my dream came true. I could truly connect my passion with work. I enjoyed the long hours, and late nights we all put to make it happen. The day when Scott Hanselman presented the Azure App at the //build conference was on of the best days of my life.

Now, when the Azure App is released, and backed by great team, I can move to the next challenge.

Machine learning is becoming part of every aspect of our lives. Over last few years, ML crossed a threshold necessary to be extremely useful. I always wanted to be part of it. I took a great Coursera class by Andrew Ng, I started overnight project StockEstimator and I got involved in SeeingAI to learn how Real-World Machine Learning looks like.

Now, I’m taking it to the next level. I am joining Azure Search Team to lead their User Experience. I will be responsible for bringing the product to customers. While using my existing web development knowledge, I will have an amazing opportunity to learn more about Big Data, AI and ML.

Azure Search is managed cloud search service that offers scalable full-text search over multiple languages, geo-spatial search, filtering and faceted navigation, type-ahead queries, hit highlighting, and custom analyzers. You can find more details in this talk by Pablo Castro (Azure Search manager and creator of Open Data Protocol).

The cool thing about working for Microsoft is that you may end up working with person who created HTTP protocol. Henrik Frystyk Nielsen, former Tim Berners-Lee’s student, who shared office with Håkon Wium Lie (creator of CSS), joined my new team this month. What’s even cooler, he is sitting next to me 🙂

In my new office with Henrik:

Henrik Frystyk Nielsen and Jacob Jedryszek

If you want to learn more about all the cool stuff we are doing at Cloud AI group there is an awesome .NET Rocks Podcast with Joseph Sirosh. Check it out!

There is also awesome talk by Joseph from the last Connect(); conference, which includes JFK files demo presented by Corom Thompson from my team (creator of How-Old.NET). In that demo Corom showcases how you can use Azure Search and Cognitive Services to explore JFK files. Super cool! You can see demo in below video, and code on github.

It has never been a better time to work on the intersection of Cloud and Artificial Intelligence!

WordPress on Azure: Exceeded ClearDB size = lock on INSERT/UPDATE (not able to log in to the admin panel)

My WordPress blog is hosted on Windows Azure, and I am using the only MySQL provider that is available on Azure: ClearDB.

Yesterday I couldn’t log in to the admin panel. I had no idea what was going on, because blog was working. I was googling for cause/solution, checking Azure logs, monitoring on Azure Portal, and accidentally I noticed that I exceeded ClearDB quota (20 MB). I did not receive any notifications from ClearDB though. What is important: if you exceed this limit, they block INSERT and UPDATE operations. My guess is that WordPress is probably trying to INSERT/UPDATE something in database when you log in. That’s why I couldn’t log in.

I did not want to upgrade from free instance to $9.99/month (the cheapest upgrade option). Fortunately I was able to connect with database using MySQL Workbench, and optimize my database.

I removed post revisions:

DELETE FROM wp_posts WHERE post_type = "revision";

And transients:

DELETE FROM wp_options WHERE option_name LIKE ('%\_transient\_%')

This allowed me to save a lot of space. From 20.28 MB, the database size went to 10.42 MB (transients occupied almost 8MB!):


ClearDB quota

After I did that, I was able to log in. However, INSERT/UPDATE lock is not revoked immediately. I had to wait something between 10 minutes and 2 hours. I went to the swimming pool in meantime, thus I am not sure how much exactly it take.

Useful SQL command to check you database size:

SELECT SUM(round(((data_length + index_length) / 1024 / 1024), 2)) "Size in MB"
FROM information_schema.TABLES
WHERE table_schema = '$DB_NAME'
ORDER BY (data_length + index_length) DESC;

You can also check each table size:

SELECT table_name AS "Tables",
round(((data_length + index_length) / 1024 / 1024), 2) "Size in MB"
FROM information_schema.TABLES
WHERE table_schema = '$DB_NAME'
ORDER BY (data_length + index_length) DESC;

For the future, I pinned the database size tile to my Azure Portal dashboard. Now, I will be able to see it every time I am visiting the portal. I also limited the number of post/pages revisions to 2, by editing wp-config.php file, and inserting this line:

define('WP_POST_REVISIONS', 2);

This should be enough to not exceed the quota for some time, but I will need some permanent solution. I am thinking about hosting my own MySQL database on LinuxVM, on Azure (cost: $13+).

During the troubleshooting I found very good blog post by John Papa: Tips for WordPress on Azure. I recommend you to check this out if you have a WordPress blog on Azure. This article will help you to optimize your WordPress database as well.

EDIT: The plugin Optimize Database after Deleting Revisions allow to clean up database even more efficient. I managed to slim my DB down by another 50%, to 4.11., which gives almost 80% size decrease from the original 20+ MB.

ClearDB quota with plugin

What is even more cool about this plugin, you can create a schedule to run it automatically (Settings -> Optimize DB options):

Optimize Database after Deleting Revisions - Options

Running the greatest VM on Azure

Recently Microsoft Azure introduced the New D-Series Virtual Machine Sizes. The “greatest” available VM has 16 cores, and 112 GB RAM. In my imagination it looks like that:

super PC

I thought it would be cool to create one, and play with it for a while. Not for a month, because that would cost almost $1000 (~700-800 EURO):

Azure VMs pricing

However, what is cool about Azure – you can scale VM down when you are not using it. Even to the cheapest option – A0 Basic (~10 EURO / month). And using D14 for an hour cost only ~1 Euro.

When I was wondering which OS install on the VM I found that Azure already offers Windows 10 preview VM:

Azure Windows 10 VM


This is how it looks after installation:

Huge VM

Working on this VM was even better (faster) than on my PC at work (Xeon with 6 cores and 32GB RAM). To stress the VM I opened over 100 instances of Visual Studio:

100 Visual Studios on Azure VM

After opening 90 instances the VM slowed down. I opened 103 Visual Studios in total, and VM didn’t crash.

This feeling of having the most powerful machine I have ever work on is amazing. Even though it is virtual. The most amazing thing is the fact that it cost me only 1 Euro to play with it for an hour. I can get it in a few minutes, and get rid of it within seconds.

I am using it from time to time as my playground, and scale-up/down according to my needs.

Later in this year, the G-series of VMs will be available on Azure. The biggest in this series would be G5: 32 Cores + 448 GB RAM. That’s gonna be…awesome!


Azure tutorials to get started

Azure new portal

Recenetly I was exploring Windows Microsoft Azure.

Here are the best resources I found, and strongly recommend to get familiar with this powerful cloud platform.

To play with Azure get FREE Trial ($200 to spend in 1 month) and enjoy!

If you are thinking about moving your blog to Azure, check my post Moving WordPress blog to Azure from Webio hosting.