Motivation for this post came from a recent image identification project I did where given a dog image the application would identify its breed. It turned out be more interesting and enjoyable than when I started the work. Even though I ended up spending substantial time learning some new concepts; refreshing linear algebra and calculus; reading other related articles and research papers.
In this project I used AWS Sagemaker, a Machine Learning (ML) platform to build, train and deploy the models. It provides all needed components including prebuilt image suitable for ML projects, Jupyter Notebook environment, infrastructure to deploy with single click from notebook, etc. It uses a ResNet CNN that can be trained from scratch, or trained using transfer learning when a large number of training images are not available.
If you want to jump to notebook code here.
Neural Network (NN)
Neural networks draw inspiration from their counter part in biological neural networks. There are many NN machine learning algorithms based on those including perceptron, Hopfield networks, CNN, RNN, LSTM, etc. Here in the article briefly covering perceptron and CNN.
In a neuron/perceptron the weighted sum of signals from previous neurons is passed down through axon and synaptic terminals and forms the bases for output which in turn becomes the input the next layer. Back propagation and activation or threshold-triggering play critical roles in the process as well.
With the weighted summation and the output from a single layer network the simplified diagram would result in similar to below where y is the output vector, while x is input vector and w weight matrix.
Fig 2. Dot product of matrix multiplication
Convolutional Neural Network (CNN)
CNN has gained high popularity in image processing in the last few years with its improved pattern recognition. Though artificial neural network (ANN) can effectively solve any problem that CNN can, the computation for moderately complex input, say image of high resolution with millions of images, can immediately become computationally prohibitive. By applying convolution and pooling with multiple layers it is reduced. In this setup the input is processed, sampled and output generated is used in adjusting (back propagation) the weights over many iterations.
Fig 3. A written digit #3 of size 32×32 pixel training through CNN
Dog Breed Identifier
In this project I used dog images available from Stanford University. It has more than 20,000 images in 120 breeds. For more info see http://vision.stanford.edu/aditya86/ImageNetDogs/ The images are of different sizes and needs to be pre-processed before use in CNN. Also, Sagemaker recommends using recordIO data format.
Tip: do not try to download the dataset to your computer and then uploading to AWS which may take some time due to local network bandwidth. Instead use an EC2 instance and work from there.
You can find the code with details of each step on my Github repo. In the image below the application identified the dog correctly belonging to Blenheim spaniel with 98.8% probability. There were images with highest probability of 40% and trailing off slowly with other breeds’ probabilities. For fun and curiosity I also uploaded a cat image which it categorized as Chihuahua breed ;)! Since the model is only for dogs’ breed any image will be categorized to some breed.
Have a web interface for a user to upload an image to identify the breed it belongs to.
If you are downloading files from publicly shared gdrive files try this utility.