Detecting a persons mood via an image with the Microsoft Emotion API

Introduction

In this post we are going to continue looking at the Microsoft Cognitive Services. Much of the core code for this will be similar to the last blog post on OCR, but this time we are going to get emotion data from the faces of an image.

Sign up for cognitive services

Just like in the previous post on OCR, the Emotion API that we are going to use is part of the Microsoft Cognitive Services. If you have not already, you will need to sign up to use these API’s here. We will be using the Emotion API so sign up for this subscription. It is a preview so is free for 30000 transactions a month, there are some other limits which you can read up on, but for our test it should be more than adequate. We will be using the keys from this subscription later on.

Get Mood from Emotion API

Before we call the Emotion API, we need to setup a data structure for the returned JSON. The data structures below should be pretty self explanatory and will allow us to iterate though multiple faces later on.

public class ImageData
{
    public FaceRectangle faceRectangle { get; set; }
    public Scores scores { get; set; }
}

public class FaceRectangle
{
    public int left { get; set; }
    public int top { get; set; }
    public int width { get; set; }
    public int height { get; set; }
}

public class Scores
{
    public decimal anger { get; set; }
    public decimal contempt { get; set; }
    public decimal disgust { get; set; }
    public decimal fear { get; set; }
    public decimal happiness { get; set; }
    public decimal neutral { get; set; }
    public decimal sadness { get; set; }
    public decimal surprise { get; set; }
}

With our data structure in place we can now call the Emotion API from https://api.projectoxford.ai/emotion/v1.0/reconize. The rest of the code is the same as our previous OCR example, although this time we will deserialize to a list of ImageData objects.

public async Task<List> GetOCRData(string filename)
{
    var client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "5f6067eea83497fdfaa4ce");
    var uri = "https://api.projectoxford.ai/emotion/v1.0/recognize";
    HttpResponseMessage response;
    using (var content = new ByteArrayContent(GetBytesFromFilepath(filename)))
    {
        content.Headers.ContentType = 
            new System.Net.Http.Headers.MediaTypeHeaderValue("application/octet-stream");
        response = await client.PostAsync(uri, content).ConfigureAwait(false);
    }
    var data = JsonConvert.DeserializeObject<List>(
        await response.Content.ReadAsStringAsync());

    return data;
}

We added a helper class to get a byte array from a selected filepath. Although we could have set MediaTypeHeaderValue to “application/json” and sent a uri rather than sending the file bytes.

private byte[] GetBytesFromFilepath(string filePath)
{
    Image img = Image.FromFile(filePath);
    using (var stream = new MemoryStream())
    {
        img.Save(stream, System.Drawing.Imaging.ImageFormat.Jpeg);
        return stream.ToArray();
    }
}

Sample App – The Mood Elevator

The data returned from the API is a series of decimal values where the indicating the detected emotion. The closest to 1 is the indicated emotion.

To display the result of the returned emotion, I have created a basic MVC application to allow a user to browse and upload a file. I have then created a MoodAnalysis class to determine the mood value based on a set a set of criteria that I have chosen. In my case, I have assigned different values for happy based on decimal values 0.9, 0.8 etc.

public async Task Index(HttpPostedFileBase file)
{
    string guid = Guid.NewGuid().ToString() + ".jpg";
    if (file != null && file.ContentLength > 0)
    {
        var fileName = Path.GetFileName(file.FileName);
        var path = Path.Combine(Server.MapPath("~/Content/upload"), guid);
        file.SaveAs(path);
        PhotoReader pr = new PhotoReader();
        var rtn = await pr.GetOCRData(path);
        MoodAnalysis ma = new MoodAnalysis();
        int elScore = ma.GetMood(rtn.FirstOrDefault().scores);
        ViewBag.EmotionScore = elScore;
        ViewBag.EmotionMessage = ma.ConvertStepToElevatorName(elScore);
    }
    
    return View();
}

Using this information I am then able to display a result on a chart, as shown in the example below:

AngryMoodElevator

 

Conclusion

In this brief introduction to the Emotion API we have uploaded a file and retrieved the emotion scores for analysis. I have tried not to overcomplicate it by not showing my break down of the score results, but with the returned decimal values you will be able to experiment based on your applications needs. If you have any questions or comments, please add them below.

Using Microsoft Cognitive Service to OCR business cards

Introduction

A couple of days ago I attended the AzureCraft event in London hosted by Microsoft and the UK Azure User Group. It was a really good event and during on of the many demos in the keynote that overran by an hour briefly touched on the OCR capability in the Microsoft Cognitive Services. Although that example was to show the capability of Azure Functions, I quickly jotted down a note to try out the OCR capability. So after spending yesterday losing at golf (it was only 1 point so I demand a recount!), today I have tried creating an OCR application for the first time. In this post I am going to cover the code I created to create a basic application that allows you to get information from a photo of a business card.

A caveat I would like to add before getting into the code is that this is my first run though with this technology and the code has not been optimised yet, I just thought I would get it all down while it was fresh in my mind.

Sign up for cognitive services

The OCR function we are going to use is part of the Microsoft Cognitive Services. You will need to sign up to use these API’s here. We will be using the Computer Vision API so sign up for this subscription. It is a preview so is free for 5000 transactions a month. We will be using the keys from this subscription later on.

Get OCR data from Computer Vision API

Our first action is to send a file to the service and get the JSON response with the OCR data. For this we create a GetOCRData class that takes the filename of our image file. We need to add some details to the DefaultRequestHeaders which will include the subscription key that you get when signing up for the trial (don’t try using mine, it wont work). There are several options for the content we send, if we just wanted to send a Uri we could use application/json, however in our case we are going to push a local file to the service so need to use application/octet-stream.

Once we have posted our response, we can deserialize the return value. I used Newtonsoft.Json by installing it from NuGet, although there are alternatives that can be used.

public async Task GetOCRData(string filename)
{
    var client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "8ecc1bf65ff1c4e831191");
    var uri = "https://api.projectoxford.ai/vision/v1.0/ocr";
    HttpResponseMessage response;
    using (var content = new ByteArrayContent(GetBytesFromFilepath(filename)))
    {
        content.Headers.ContentType = new Headers.MediaTypeHeaderValue("application/octet-stream");
        response = await client.PostAsync(uri, content).ConfigureAwait(false);
    }
    var x = JsonConvert.DeserializeObject(await response.Content.ReadAsStringAsync());

    return x.ToString();
}

Our code above uses this helper method to get the bytes from a file

private byte[] GetBytesFromFilepath(string filePath)
{
    Image img = Image.FromFile(filePath);
    using (var stream = new MemoryStream())
    {
        img.Save(stream, ImageFormat.Jpeg);
        return stream.ToArray();
    }
}

To allow me to easily test this I created a basic WPF application that contains a textbox, button and textarea. This allows me to check the returned values as I test the app. I also have a series of unit tests, but found the visual display to be a great help. With the code as it stands we get the response as shown below

JSON Return

Interpreting the JSON

What we have so far just gives us the raw JSON request, we can at least see that the OCR is working, but its not very user friendly at this stage. To allow us to handle the returned data we are going to parse the JSON into an OCRData class. For this we need to create the OCRData class and several other classes to cover Regions, Lines and Words which are returned by the web request.

public class OCRData
{
    public List regions { get; set; }
}

public class Region
{
    public string boundingBox { get; set; }
    public List lines { get; set; }
}

public class Line
{
    public string boundingBox { get; set; }
    public List words { get; set; }
}

public class Word
{
    public string boundingBox { get; set; }
    public string text { get; set; }
}

With these in place we can change our GetOCRData class to return a Task and to change the JsonConvert.DeserializeObject method to one that returns an OCRData type as shown in the sample code below

public async Task GetOCRData(string filename)
{
    var client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "8ecc1bf6548a4bdf1c4e831191");
    var uri = "https://api.projectoxford.ai/vision/v1.0/ocr";
    HttpResponseMessage response;
    using (var content = new ByteArrayContent(GetBytesFromFilepath(filename)))
    {
        content.Headers.ContentType = new Headers.MediaTypeHeaderValue("application/octet-stream");
        response = await client.PostAsync(uri, content).ConfigureAwait(false);
    }
    var x = JsonConvert.DeserializeObject(await 
        response.Content.ReadAsStringAsync());

    return x;
}

Populating a ContactCard object and Regex, Regex everywhere!

We now have an OCRData object being returned, but we still need to analyse this data and create a useable contact object from this information. To do this we create a ContactCard class that includes a number of probably properties on a business card. I have also created an override for ToString to allow us to display the contact data easily in our OCR Tester application.

public class ContactCard
{
    public string Name { get; set; }
    public string Company { get; set; }
    public string Position { get; set; }
    public string PhoneNo { get; set; }
    public string Email { get; set; }
    public string Website { get; set; }
    public string Facebook { get; set; }
    public string Twitter { get; set; }

    public override string ToString()
    {
        StringBuilder sb = new StringBuilder();
        sb.AppendLine("Name: " + Name);
        sb.AppendLine("Company: " + Company);
        sb.AppendLine("Position: " + Position);
        sb.AppendLine("Phone: " + PhoneNo);
        sb.AppendLine("Email: " + Email);
        sb.AppendLine("Website: " + Website);
        sb.AppendLine("Facebook: " + Facebook);
        sb.AppendLine("Twitter: " + Twitter);

        return sb.ToString();
    }
}

With the ContactCard object ready and our OCR data available, we now expand out our code. We create an new method called ReadBusinessCard which takes the filename for the image file. This method first calls our GetOCRData method an then creates a new ContactCard. For each property in ContactCard we then call a method to get the relevant data out of our OCRData object.

For the Name, Company and Position I have actually cheated and given defined locations for these based on the sample cards I am using to test this app. The reason for this is that they are much harder to detect. My intention is to identify these using the LUIS natural language service, but this would over complicate this post so they have been omitted. For those interested in using this natural language service, my previous blog post on the Bot Framework has examples of this.

What I have done for this version is create a GetFromRegex method that allows us to send a pattern to the method and have a Regex check done on all the returned words. This then allows us to populate the contact card with phone numbers, emails, websites, twitter handles etc.

public async Task ReadBusinessCard(string filename)
{
    OCRData data = await GetOCRData(filename).ConfigureAwait(false);
    ContactCard cc = new ContactCard();
    Region r = data.regions[0];
    cc.Name = GetName(r);
    cc.Company = GetCompany(r);
    cc.Position = GetPosition(r);
    cc.PhoneNo = GetFromRegex(r, @"^d+$");
    cc.Email = GetFromRegex(r, @"^([a-z0-9_.-]+)@([da-z.-]+).([a-z.]{2,6})$");
    cc.Website = GetFromRegex(r, "^www.", "facebook");
    cc.Facebook = GetFromRegex(r, @"^www.Facebook.com");
    cc.Twitter = GetFromRegex(r, "^@");

    return cc;
}

private string GetFromRegex(Region r, string pattern, string notContains = null)
{
    foreach(Line l in r.lines)
    {
        foreach(Word w in l.words)
        {
            if (Regex.IsMatch(w.text, pattern, RegexOptions.IgnoreCase))
            {
                if (string.IsNullOrEmpty(notContains))
                    return w.text;
                else
                {
                    if (!w.text.Contains(notContains))
                        return w.text;
                }
            }
        }
    }
    return "";
}

Now I’m going to be completely honest, I don’t think the GetFromRegex method or my regex patterns are as good as they could be. However for the sake of this demonstration they allow me to show how we can break down the returned OCR text data into the sections of a contact card.
With the code ready now and getting the data, the next logical step would be to add this to a mobile app. However for the ease of the demo I have modified my OCR Tester application to display the returned ContactCard. From here we could easily add this information to a phones contact or other contact application, but I will leave that for today and look at that another time.

Contact Card Return

Conclusion

This sample shows how powerful and simple to implement OCR using the cognitive services. If you have any questions or suggestions for this post, please add them to the comments below.

Developing a Bot with the Microsoft Bot Framework

Introduction

In this post we are going to look at creating a very basic bot using the Microsoft Bot Framework. Although the bot itself will be simple, we will connect it to the LUIS natural language service to allow us to train the service to handle responses. From this example, it should be easy to move forward with more complex bots and features.

Create Project in Visual Studio

To create a bot application, you need to install the Bot Application template and Bot Emulator which can both be found here.

When these are installed, create a new project using the Bot Application template. We have called our one MagicEightBall.

Bot Project

The template provides the basic code needed to run a bot. The main area you will edit is the MessagesController class in the Controllers folder. The example code as shown below will return a count of how many characters you have entered. It is a good idea to debug this and test your emulator now with this code.

public async Task Post([FromBody]Message message)
{
    if (message.Type == "Message")
    {
        // calculate something for us to return
        int length = (message.Text ?? string.Empty).Length;

        // return our reply to the user
        return message.CreateReplyMessage($"You sent {length} characters");
    }
    else
    {
        return HandleSystemMessage(message);
    }
}

Using Natural Language

To simplify the process of creating question and responses for our Bot we are going to use LUIS (Language Understanding Intelligent Service). This service allows you to train intents to understand what users have entered and then respond accordingly. The LUIS site can be found here.

Create a new LUIS application

Once you have registered for LUIS, create a new application for the Magic Eight Ball as shown below. The LUIS site has an excellent 10 minute video on creating and training an app, if you have not created one before then I highly recommend that you view this.

LUIS

Train the application to understand a question

The first thing you need to do is assign an intent, which will determine the questions that the user can ask. We have created a basic one called Question that we will train. We would then add new utterances to show what examples of text would be assigned to this intent. The LUIS server will then learn and try and pick up other text that it considers as having the same meaning.

Capture3

If we were getting entities to pass data to our code, we can assign part of our utterances to that entity. In the example below, the intent would be picked up as “Am I”, whilst the Action entity for this would be assigned to “funny”. This allows you to handle various options and settings sent by users in a statement.

Capture4

Connecting to LUIS service

The code sample below is of a MagicEight class that extends the LuisDialog. For each intent we have we should have a method that has the attribute [LuisIntent([IntentName])] as shown below. This method will be called when the bot determines that this question/intent has been called. For our example we do not care about any actions or details from the message, just that it was a valid question. We then send a message of a random response from the list of responses on a standard magic eight ball.

We also provide a method for the None intent by using the attribute [LuisIntent(“”)]. This allows us to provide a default response to users when it can not determine the intent.

Please note the App Id and Key shown in the example below is not valid and you will need to get these from a valid LUIS application.

[LuisModel("b514324f-d6d3-418e-a911-c7fasda6699e2", "a91e3e2044a987876291c54a153f3a6")]
[Serializable]
public class MagicEight : LuisDialog<object>
{
    private static Random rand = new Random(14891734);
    string[] responses = { "It is certain.", "It is decidedly so.",
        "Without a doubt.", "Yes, definately.", "You may rely on it.",
        "As I see it, yes.", "Most likely.", "Outlook good.", "Yes.",
        "Signs point to yes.", "Reply hazy try again.", "Ask again later.",
        "Better not tell you now.", "Cannot predict now",
        "Concentrate and ask again", "Don't count on it.",
        "My reply is no.", "My sources say no.", "Outlook not so good",
        "Very doubtful."};

    [LuisIntent("Question")]
    public async Task AnswerQuestion(IDialogContext context, LuisResult result)
    {
        int x = rand.Next(20);
        string message = responses[x - 1];
        await context.PostAsync(message);
        context.Wait(MessageReceived);
    }

    [LuisIntent("")]
    public async Task None(IDialogContext context, LuisResult<object> result)
    {
        string message = "Sorry I did not understand";
        await context.PostAsync(message);
        context.Wait(MessageReceived);
    }
}

Our final task is to return to our MessagesController class and update the Post method to send new messages to our MagicEight class.

public async Task Post([FromBody]Message message)
{
    if (message.Type == "Message")
    {
        return await Conversation.SendAsync(message, () => new MagicEight());
    }
    else
    {
        return HandleSystemMessage(message);
    }
}

Test in Emulator

Now that we have the code complete, test the application in the emulator.

Capture5

You may not always get the correct response straight away. This is due to the LUIS application having not been trained much, the more work you put into labelling the utterings within the LUIS application the more accurate it will become. Any uttering sent to the service will be available for you to label, or approve the automatically assigned label that the service determined.

You can now take what you have learnt in this post and expand it into more complex bots and connect what you have created into web/windows applications.