Building an image classification model using Amazon Sagemaker


In this blog post, we will use Amazon SageMaker to build an image classification model using a collection of images from this grocery store dataset. The solution architecture is shown below in Figure 1.

This post builds on some of the concepts discussed in a previous blog, where we discussed building an image classification model using Amazon Rekognition. You can read this blog post here.

Figure 1. High Level Design of the Image Classification architecture

Overview of Amazon SageMaker

Amazon SageMaker is a fully managed service that enables developers and data scientists to build, train, and deploy machine learning models at scale. It provides a range of tools and services that make it easy to create custom machine learning algorithms and models, without requiring deep expertise in machine learning or large-scale data processing.

SageMaker has a wide range of features that make it a powerful tool for machine learning, including:

  1. Data preparation and labelling tools, including data cleansing, feature engineering, and data transformation.
  2. Machine learning model training, including pre-built algorithms and frameworks such as TensorFlow and MXNet.
  3. Model hosting: Once a model is trained, SageMaker makes it easy to host the model in a scalable and secure way, allowing for real-time predictions and batch processing. This blog will cover both processes.
  4. Automatic model tuning, which automatically optimises machine learning models for the best performance.
  5. Integration with other AWS services, including Amazon S3, making it easier to build and deploy machine learning applications. In this blog, the results of the batch transform were saved to an S3 bucket.

Source and image setup

The grocery store dataset was used to train and to create a model for this project. There are 5,125 images separated into 81 fine-grained classes. The coarse-grained classes are split into three different categories: fruit, packages and vegetables. 

An important aspect in getting the most optimal results in any machine learning classification training is to get the correct split of data points between the train and test phases. We have decided to use the traditional 80/20 approach: 80% of the images were used to train the model and 20% of the images were used to test the model.

Another important aspect is the size of the image. All of the images should have the same dimensions. In this project the images were 348×348 pixels.

Amazon SageMaker Storage

Sagemaker has native integration with Amazon S3, so all of the images and files required for Sagemaker were stored in an S3 bucket. The results generated by SageMaker were also stored there.

Training the image classification model

After the images have been prepared and uploaded to an S3 bucket, it is time to create the image classification model on Training Jobs under Sagemaker/Training.

We have trained the model using the Train with Image Format. The algorithm used for this training was the Image Classification MXNet model, which is pre-trained and maintained by AWS. The machine instance type used was ml.p3.2xlarge. You need to use an instance type with GPU (Accelerated Computing) otherwise the job won’t run. The job will error if you have selected an ill suited instance type for this activity.

The Train with Image Format method requires you to pass the value application/x-image to the content type, and to create four channels for the Input Data Configuration parameter:

  • Train
  • Validation
  • Train_lst
  • Calidation_lst


The train and validation channels contain information about the location of images and any other relevant information (e.g. compression type). The train_lst and validation_lst channels contain the location of a list file detailing the required contents of each image. A .lst file is a tab-separated file with three columns that contains a list of image files. The first column specifies the image index, the second column specifies the class label index for the image, and the third column specifies the relative path of the image file.

To create this list, we have used the script from the Apache MXNet project. Below, Figure 2 shows the job hyperparameters used in this training job.

Figure 2. Hyperparameters used in the training job.

Training Job Results

The training job was completed in 24 minutes. The output of this task is a single compressed file containing the generated model.

The following table demonstrates the results of the training job, including the accuracy level at each epoch.


Train accuracy

Validation accuracy
















Model Inference

After the model artefacts have been created in the previous step, a model will need to be deployed to Amazon SageMaker to return responses to inference requests. To create a new model for deployment, go to Sagemaker/Inference/Models. The model needs two input variables:

– The artefact created in the previous step
– The image classification algorithm used to generate the model


Now is the time to prove if all of the hard work done in resizing and categorising the images, deciding on the most appropriate algorithm and fine tuning the hyperparameters actually pays off. AWS provides four different ways to classify new images, and your workload will dictate which inference process is the most appropriate:

  • Real-time inference
  • Serverless inference
  • Batch transform
  • Asynchronous inference


In this blog, we have used the real-time inference and the batch transform deployment options. 

Real-time inference (one image per API call)

Real-time inference is ideal for online inferences that have low latency or high throughput requirements. Use real-time inference for a persistent and fully managed endpoint (REST API) that can handle sustained traffic, backed by the instance type of your choice. Real-time inference can support payload sizes up to 6 MB and processing times of 60 seconds. The endpoint must be available before an inference is run, and you are charged on a per hour and instance type usage ($0.30/hr for ml.m4.xlarge instance type). 

After creating this endpoint, we have used a Jupyter notebook (run from a notebook instance on SageMaker) to communicate with the endpoint. The below snippet loads the required libraries to run the code:

# Import the required libraries
import boto3
import json
import os
import io
from PIL import Image
import sagemaker
import pandas as pd
import numpy as np
import re

We also need to pass the endpoint name and the SageMaker runtime.

# The name of the SageMaker endpoint running in our account. 
# Note this needs to be in the same region as this running code.
client = boto3.client(‘sagemaker-runtime’)
endpointName = ‘image-classifier’

We now load the test image and call the inference endpoint. The endpoint will send a response and then we convert the response to JSON format, so we can display the result of the inference.

# Load the binary data of each image into a Python variable.
f = open(‘./test-images/Banana.jpg’, ‘rb’) # opening a binary file
data =

# Call the endpoint with the image in memory. The endpoint is invoked and a result is recorded.
response = client.invoke_endpoint(
result = json.loads(response[‘Body’].read().decode())

Visualising the results

There are multiple ways to visualise results (e.g. graphs, tables, text). In this blog, we have reported the results using text only. In the snippet below, the first variable we pass is labels., as the model stores classes as numbers so we are just converting it back to the names of the objects (e.g. 0 = Golden-Delicious, 1 = Granny-Smith, etc).

The inference will go over all of the classes and give a probability for each class. The probability varies from 0 to 1, so in this case each image will be compared against 81 classes. We have picked the highest probability item and have printed the results.


# Set the labels for each class.
labels = [‘Golden-Delicious’,‘Granny-Smith’,‘Pink-Lady’,‘Red-Delicious’,‘Royal-Gala’,‘Avocado’,‘Banana’,‘Kiwi’,‘Lemon’,‘Lime’,‘Mango’,‘Cantaloupe’,‘Galia-Melon’,‘Honeydew-Melon’,‘Watermelon’,‘Nectarine’,‘Orange’,‘Papaya’,‘Passion-Fruit’,‘Peach’,‘Anjou’,‘Conference’,‘Kaiser’,‘Pineapple’,‘Plum’,‘Pomegranate’,‘Red-Grapefruit’,‘Satsumas’,‘Bravo-Apple-Juice’,‘Bravo-Orange-Juice’,‘God-Morgon-Apple-Juice’,‘God-Morgon-Orange-Juice’,‘God-Morgon-Orange-Red-Grapefruit-Juice’,‘God-Morgon-Red-Grapefruit-Juice’,‘Tropicana-Apple-Juice’,‘Tropicana-Golden-Grapefruit’,‘Tropicana-Juice-Smooth’,‘Tropicana-Mandarin-Morning’,‘Arla-Ecological-Medium-Fat-Milk’,‘Arla-Lactose-Medium-Fat-Milk’,‘Arla-Medium-Fat-Milk’,‘Arla-Standard-Milk’,‘Garant-Ecological-Medium-Fat-Milk’,‘Garant-Ecological-Standard-Milk’,‘Oatly-Natural-Oatghurt’,‘Oatly-Oat-Milk’,‘Arla-Ecological-Sour-Cream’,‘Arla-Sour-Cream’,‘Arla-Sour-Milk’,‘Alpro-Blueberry-Soyghurt’,‘Alpro-Vanilla-Soyghurt’,‘Alpro-Fresh-Soy-Milk’,‘Alpro-Shelf-Soy-Milk’,‘Arla-Mild-Vanilla-Yoghurt’,‘Arla-Natural-Mild-Low-Fat-Yoghurt’,‘Arla-Natural-Yoghurt’,‘Valio-Vanilla-Yoghurt’,‘Yoggi-Strawberry-Yoghurt’,‘Yoggi-Vanilla-Yoghurt’,‘Asparagus’,‘Aubergine’,‘Cabbage’,‘Carrots’,‘Cucumber’,‘Garlic’,‘Ginger’,‘Leek’,‘Brown-Cap-Mushroom’,‘Yellow-Onion’,‘Green-Bell-Pepper’,‘Orange-Bell-Pepper’,‘Red-Bell-Pepper’,‘Yellow-Bell-Pepper’,‘Floury-Potato’,‘Solid-Potato’,‘Sweet-Potato’,‘Red-Beet’,‘Beef-Tomato’,‘Regular-Tomato’,‘Vine-Tomato’,‘Zucchini’]

# Show the prediction in text
index_of_prediction = np.argmax(result)
label_of_prediction = labels[index_of_prediction]
confidence = np.round(result[index_of_prediction]*100, decimals=4)

print(“We are {}% confident this looks like a {}. The source image is a banana.”.format(confidence, label_of_prediction))

Batch transform inference process (multiple files at a time)

The Batch transform is suitable for offline processing when large amounts of data are available upfront and you don’t need a persistent endpoint, which you are paying for, independently of usage. It can support payload sizes of GBs for large datasets and processing times of days.

To infer the contents of multiple images at the same time, we have run the batch transform jobs. We need to pass the following configuration in this job:

  • Model name: image-classifier-poc
  • Content type: application/x-image
  • Input data: s3://ml-models-poc/dataset/sample_images/
  • Output data: s3://ml-models-poc/output/


The duration of the batch job was 8 minutes, using a ml.m4.xlarge instance type (for inference jobs we can use a less powerful machine, contrary to when we were training the model), which included evaluating 20 images.

Visualising the results

In this code snippet, we retrieve the results from the S3 bucket by specifying the bucket name and folder path. We will then loop over every single object in that folder, open it, read it and then convert to JSON. We then applied regular expression to remove unwanted text so the code created is more dynamic.

s3 = boto3.client(‘s3’)

# specify the S3 bucket name
bucket_name = ‘ml-models-poc’

# specify the folder path
folder_path = ‘output/natural’

# get a list of all objects in the folder
objects = s3.list_objects_v2(Bucket=bucket_name, Prefix=folder_path)

# loop over the objects in the folder
for obj in objects[‘Contents’]:
    key = obj[‘Key’]

    # download the object from S3
    file_object = s3.get_object(Bucket=bucket_name, Key=key)
    # load the object’s contents into a dictionary
    result = json.load(file_object[‘Body’])

    # collect and convert variables to plot the results of the batch job
    index_of_prediction = np.argmax(result[‘prediction’])
    confidence = np.round(result[‘prediction’][index_of_prediction]*100, decimals=4)
    label_of_prediction = labels[index_of_prediction]
    # create a regular expression to match the desired string
    pattern = r”.*/(.*)\..*”
    # use the regular expression to extract the desired string
    image_name = re.sub(pattern, r”\1″, key)
    print(“We are {}% confident this looks like a {}. The source image is {}.”.format(confidence, label_of_prediction, image_name))


The results below indicate the model can infer and correctly guess in most cases, but there is plenty of room for improvement.


Amazon Sagemaker provides excellent out-of-box capabilities and built-in algorithms that can accelerate your image classification project. We have demonstrated that we can train, create a model and infer the model successfully. The model can correctly infer new images in the majority of cases, but it certainly needs improvement before used in a production environment.

In this blog, the hyperparameters used were mostly default and therefore hyperparameter tuning is required in order to improve the results. By employing this technique we can optimise the algorithm to this specific dataset and thus improve the model’s performance which will produce better results with fewer errors. Fine-tuning hyperparameters is an important aspect in machine learning models, but this is a topic for a different blog post.  

Another crucial point in a successful project is to have your images correctly labelled and classified as it will increase the confidence and accuracy of the model.

Lastly, classifying an object that goes through ripening (e.g. fruits and vegetables) or have very few distinct features (e.g. a cartoon of milk) present some interesting challenges in how you select the images that will be used in the creation of the model. In this case, making sure your model can be easily updated with new data and re-trained is pivotal to continuous improve your image classification model.

Enjoyed this blog?

Share it with your network!

You may also like

Move faster with confidence