Detecting a persons mood via an image with the Microsoft Emotion API

Introduction

In this post we are going to continue looking at the Microsoft Cognitive Services. Much of the core code for this will be similar to the last blog post on OCR, but this time we are going to get emotion data from the faces of an image.

Sign up for cognitive services

Just like in the previous post on OCR, the Emotion API that we are going to use is part of the Microsoft Cognitive Services. If you have not already, you will need to sign up to use these API’s here. We will be using the Emotion API so sign up for this subscription. It is a preview so is free for 30000 transactions a month, there are some other limits which you can read up on, but for our test it should be more than adequate. We will be using the keys from this subscription later on.

Get Mood from Emotion API

Before we call the Emotion API, we need to setup a data structure for the returned JSON. The data structures below should be pretty self explanatory and will allow us to iterate though multiple faces later on.

public class ImageData
{
    public FaceRectangle faceRectangle { get; set; }
    public Scores scores { get; set; }
}

public class FaceRectangle
{
    public int left { get; set; }
    public int top { get; set; }
    public int width { get; set; }
    public int height { get; set; }
}

public class Scores
{
    public decimal anger { get; set; }
    public decimal contempt { get; set; }
    public decimal disgust { get; set; }
    public decimal fear { get; set; }
    public decimal happiness { get; set; }
    public decimal neutral { get; set; }
    public decimal sadness { get; set; }
    public decimal surprise { get; set; }
}

With our data structure in place we can now call the Emotion API from https://api.projectoxford.ai/emotion/v1.0/reconize. The rest of the code is the same as our previous OCR example, although this time we will deserialize to a list of ImageData objects.

public async Task<List> GetOCRData(string filename)
{
    var client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "5f6067eea83497fdfaa4ce");
    var uri = "https://api.projectoxford.ai/emotion/v1.0/recognize";
    HttpResponseMessage response;
    using (var content = new ByteArrayContent(GetBytesFromFilepath(filename)))
    {
        content.Headers.ContentType = 
            new System.Net.Http.Headers.MediaTypeHeaderValue("application/octet-stream");
        response = await client.PostAsync(uri, content).ConfigureAwait(false);
    }
    var data = JsonConvert.DeserializeObject<List>(
        await response.Content.ReadAsStringAsync());

    return data;
}

We added a helper class to get a byte array from a selected filepath. Although we could have set MediaTypeHeaderValue to “application/json” and sent a uri rather than sending the file bytes.

private byte[] GetBytesFromFilepath(string filePath)
{
    Image img = Image.FromFile(filePath);
    using (var stream = new MemoryStream())
    {
        img.Save(stream, System.Drawing.Imaging.ImageFormat.Jpeg);
        return stream.ToArray();
    }
}

Sample App – The Mood Elevator

The data returned from the API is a series of decimal values where the indicating the detected emotion. The closest to 1 is the indicated emotion.

To display the result of the returned emotion, I have created a basic MVC application to allow a user to browse and upload a file. I have then created a MoodAnalysis class to determine the mood value based on a set a set of criteria that I have chosen. In my case, I have assigned different values for happy based on decimal values 0.9, 0.8 etc.

public async Task Index(HttpPostedFileBase file)
{
    string guid = Guid.NewGuid().ToString() + ".jpg";
    if (file != null && file.ContentLength > 0)
    {
        var fileName = Path.GetFileName(file.FileName);
        var path = Path.Combine(Server.MapPath("~/Content/upload"), guid);
        file.SaveAs(path);
        PhotoReader pr = new PhotoReader();
        var rtn = await pr.GetOCRData(path);
        MoodAnalysis ma = new MoodAnalysis();
        int elScore = ma.GetMood(rtn.FirstOrDefault().scores);
        ViewBag.EmotionScore = elScore;
        ViewBag.EmotionMessage = ma.ConvertStepToElevatorName(elScore);
    }
    
    return View();
}

Using this information I am then able to display a result on a chart, as shown in the example below:

AngryMoodElevator

 

Conclusion

In this brief introduction to the Emotion API we have uploaded a file and retrieved the emotion scores for analysis. I have tried not to overcomplicate it by not showing my break down of the score results, but with the returned decimal values you will be able to experiment based on your applications needs. If you have any questions or comments, please add them below.

Using Microsoft Cognitive Service to OCR business cards

Introduction

A couple of days ago I attended the AzureCraft event in London hosted by Microsoft and the UK Azure User Group. It was a really good event and during on of the many demos in the keynote that overran by an hour briefly touched on the OCR capability in the Microsoft Cognitive Services. Although that example was to show the capability of Azure Functions, I quickly jotted down a note to try out the OCR capability. So after spending yesterday losing at golf (it was only 1 point so I demand a recount!), today I have tried creating an OCR application for the first time. In this post I am going to cover the code I created to create a basic application that allows you to get information from a photo of a business card.

A caveat I would like to add before getting into the code is that this is my first run though with this technology and the code has not been optimised yet, I just thought I would get it all down while it was fresh in my mind.

Sign up for cognitive services

The OCR function we are going to use is part of the Microsoft Cognitive Services. You will need to sign up to use these API’s here. We will be using the Computer Vision API so sign up for this subscription. It is a preview so is free for 5000 transactions a month. We will be using the keys from this subscription later on.

Get OCR data from Computer Vision API

Our first action is to send a file to the service and get the JSON response with the OCR data. For this we create a GetOCRData class that takes the filename of our image file. We need to add some details to the DefaultRequestHeaders which will include the subscription key that you get when signing up for the trial (don’t try using mine, it wont work). There are several options for the content we send, if we just wanted to send a Uri we could use application/json, however in our case we are going to push a local file to the service so need to use application/octet-stream.

Once we have posted our response, we can deserialize the return value. I used Newtonsoft.Json by installing it from NuGet, although there are alternatives that can be used.

public async Task GetOCRData(string filename)
{
    var client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "8ecc1bf65ff1c4e831191");
    var uri = "https://api.projectoxford.ai/vision/v1.0/ocr";
    HttpResponseMessage response;
    using (var content = new ByteArrayContent(GetBytesFromFilepath(filename)))
    {
        content.Headers.ContentType = new Headers.MediaTypeHeaderValue("application/octet-stream");
        response = await client.PostAsync(uri, content).ConfigureAwait(false);
    }
    var x = JsonConvert.DeserializeObject(await response.Content.ReadAsStringAsync());

    return x.ToString();
}

Our code above uses this helper method to get the bytes from a file

private byte[] GetBytesFromFilepath(string filePath)
{
    Image img = Image.FromFile(filePath);
    using (var stream = new MemoryStream())
    {
        img.Save(stream, ImageFormat.Jpeg);
        return stream.ToArray();
    }
}

To allow me to easily test this I created a basic WPF application that contains a textbox, button and textarea. This allows me to check the returned values as I test the app. I also have a series of unit tests, but found the visual display to be a great help. With the code as it stands we get the response as shown below

JSON Return

Interpreting the JSON

What we have so far just gives us the raw JSON request, we can at least see that the OCR is working, but its not very user friendly at this stage. To allow us to handle the returned data we are going to parse the JSON into an OCRData class. For this we need to create the OCRData class and several other classes to cover Regions, Lines and Words which are returned by the web request.

public class OCRData
{
    public List regions { get; set; }
}

public class Region
{
    public string boundingBox { get; set; }
    public List lines { get; set; }
}

public class Line
{
    public string boundingBox { get; set; }
    public List words { get; set; }
}

public class Word
{
    public string boundingBox { get; set; }
    public string text { get; set; }
}

With these in place we can change our GetOCRData class to return a Task and to change the JsonConvert.DeserializeObject method to one that returns an OCRData type as shown in the sample code below

public async Task GetOCRData(string filename)
{
    var client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "8ecc1bf6548a4bdf1c4e831191");
    var uri = "https://api.projectoxford.ai/vision/v1.0/ocr";
    HttpResponseMessage response;
    using (var content = new ByteArrayContent(GetBytesFromFilepath(filename)))
    {
        content.Headers.ContentType = new Headers.MediaTypeHeaderValue("application/octet-stream");
        response = await client.PostAsync(uri, content).ConfigureAwait(false);
    }
    var x = JsonConvert.DeserializeObject(await 
        response.Content.ReadAsStringAsync());

    return x;
}

Populating a ContactCard object and Regex, Regex everywhere!

We now have an OCRData object being returned, but we still need to analyse this data and create a useable contact object from this information. To do this we create a ContactCard class that includes a number of probably properties on a business card. I have also created an override for ToString to allow us to display the contact data easily in our OCR Tester application.

public class ContactCard
{
    public string Name { get; set; }
    public string Company { get; set; }
    public string Position { get; set; }
    public string PhoneNo { get; set; }
    public string Email { get; set; }
    public string Website { get; set; }
    public string Facebook { get; set; }
    public string Twitter { get; set; }

    public override string ToString()
    {
        StringBuilder sb = new StringBuilder();
        sb.AppendLine("Name: " + Name);
        sb.AppendLine("Company: " + Company);
        sb.AppendLine("Position: " + Position);
        sb.AppendLine("Phone: " + PhoneNo);
        sb.AppendLine("Email: " + Email);
        sb.AppendLine("Website: " + Website);
        sb.AppendLine("Facebook: " + Facebook);
        sb.AppendLine("Twitter: " + Twitter);

        return sb.ToString();
    }
}

With the ContactCard object ready and our OCR data available, we now expand out our code. We create an new method called ReadBusinessCard which takes the filename for the image file. This method first calls our GetOCRData method an then creates a new ContactCard. For each property in ContactCard we then call a method to get the relevant data out of our OCRData object.

For the Name, Company and Position I have actually cheated and given defined locations for these based on the sample cards I am using to test this app. The reason for this is that they are much harder to detect. My intention is to identify these using the LUIS natural language service, but this would over complicate this post so they have been omitted. For those interested in using this natural language service, my previous blog post on the Bot Framework has examples of this.

What I have done for this version is create a GetFromRegex method that allows us to send a pattern to the method and have a Regex check done on all the returned words. This then allows us to populate the contact card with phone numbers, emails, websites, twitter handles etc.

public async Task ReadBusinessCard(string filename)
{
    OCRData data = await GetOCRData(filename).ConfigureAwait(false);
    ContactCard cc = new ContactCard();
    Region r = data.regions[0];
    cc.Name = GetName(r);
    cc.Company = GetCompany(r);
    cc.Position = GetPosition(r);
    cc.PhoneNo = GetFromRegex(r, @"^d+$");
    cc.Email = GetFromRegex(r, @"^([a-z0-9_.-]+)@([da-z.-]+).([a-z.]{2,6})$");
    cc.Website = GetFromRegex(r, "^www.", "facebook");
    cc.Facebook = GetFromRegex(r, @"^www.Facebook.com");
    cc.Twitter = GetFromRegex(r, "^@");

    return cc;
}

private string GetFromRegex(Region r, string pattern, string notContains = null)
{
    foreach(Line l in r.lines)
    {
        foreach(Word w in l.words)
        {
            if (Regex.IsMatch(w.text, pattern, RegexOptions.IgnoreCase))
            {
                if (string.IsNullOrEmpty(notContains))
                    return w.text;
                else
                {
                    if (!w.text.Contains(notContains))
                        return w.text;
                }
            }
        }
    }
    return "";
}

Now I’m going to be completely honest, I don’t think the GetFromRegex method or my regex patterns are as good as they could be. However for the sake of this demonstration they allow me to show how we can break down the returned OCR text data into the sections of a contact card.
With the code ready now and getting the data, the next logical step would be to add this to a mobile app. However for the ease of the demo I have modified my OCR Tester application to display the returned ContactCard. From here we could easily add this information to a phones contact or other contact application, but I will leave that for today and look at that another time.

Contact Card Return

Conclusion

This sample shows how powerful and simple to implement OCR using the cognitive services. If you have any questions or suggestions for this post, please add them to the comments below.