Generative AI: Building systems, not just chatbots

The rise of Generative AI

I recall the moment when ChatGPT burst into the mainstream around late 2022. It seemed like everyone, from casual observers to avid enthusiasts, could not stop discussing the wonders of Generative AI (GenAI) and its potential to revolutionise the world.  

I will admit, I too got swept up in the excitement. I experimented with early iterations of Stability AI and Claude, tinkering away on throwaway projects via AWS Bedrock. It was intriguing to witness the concerted efforts of major LLM companies pushing the narrative that building chatbots was the way forward. But here is the thing: I was not sold on the idea of using LLMs only for developing chatbots, and I am sure I was not alone in that sentiment. 

OpenAI started it all

Following the debut of ChatGPT, OpenAI rolled out the Chat Completions API, a development that significantly altered the landscape for many. It provided programmatic access to the GPT models, essentially opening doors to a reality where developers could tap into these powerful resources without the hassle of managing and maintaining infrastructure. 

This was monumental because it meant that applications could now harness the powers of these models effortlessly. Suddenly, developers could create text-generating, text-classifying, and even text-translating applications with minimal coding. 

AWS Bedrock followed next

Being more inclined to build systems on AWS, I eagerly anticipated the preview release of Bedrock a few months later. This provided me access to OpenAI alternatives, enabling me to develop applications on the platform.  

However, a challenge arose: the library ecosystem surrounding LLMs, particularly those within the Bedrock framework, lagged behind OpenAI’s offerings.  

Despite the presence of promising libraries like LangChain and LlamaIndex, the capacity to build systems with robust support for structured output remained deficient. 

Even OpenAI’s efforts with JSON Mode and Function Calling fell short of expectations. Consequently, whether utilising OpenAI, Bedrock or similar models, many developers found themselves confined to crafting chatbot-like applications. 

Harnessing GenAI for building systems

Many developers, myself included, remained skeptical about how Generative AI could genuinely enhance the creation of more intelligent systems. I showed no interest in constructing chatbots or developing applications solely to generate text. 

As a developer focused on building systems, my priority lay in leveraging Generative AI to reliably produce structured output. I desired a solution that wouldn’t require me to cajole or coax the models into providing the desired structured responses, only to discover later that their reliability was questionable at best. Despite the possibility of eventually obtaining the desired outcome, the probabilistic nature of LLMs always felt uncertain. 

Building reliable systems with Instructor

A couple of weeks ago, I stumbled into Instructor, a Python library designed to empower developers to create AI systems beyond just chatbots. Built on the foundation of Pydantic, a widely acclaimed library for data validation in Python, Instructor offers a lightweight solution with immense potential. 

While initially rooted in the OpenAI chat completions paradigm, it has evolved to support a variety of models, including the state-of-the-art offerings within AWS Bedrock, such as Anthropic’s Claude 3 models: Opus, Sonnet and Haiku. 

What sets Instructor apart is its ability to enable the development of intelligent systems. By harnessing the capabilities of these models, users can obtain structured output reliably. Moreover, with its universal API, regardless of the model in use, developers can create systems that are model-agnostic and easily adaptable for future changes, requiring minimal code changes. 

Among its array of features, Instructor boasts functionalities like retry management and flexible LLM backends. However, perhaps its most important feature lies in its capacity to provide reliable structured output through response models and validation, facilitating seamless integration into diverse domains for new and existing applications. 

Example with Instructor, Claude 3 and AWS Bedrock

Let us delve into a common scenario: email classification aimed at generating tickets within a helpdesk system. 

This task mirrors a common challenge encountered across various businesses. In simpler domains, a rule-based system might suffice. However, in more intricate environments where rules are ambiguous, leveraging Instructor and Claude 3 models possibly offer a better option. 

Ordinarily, in the absence of automation, a human would be tasked with reading emails, categorising them, and subsequently creating tickets in the helpdesk system. The underlying aim is to expedite the processing of these requests or, alternatively, to achieve more with the personnel at hand. This enables the allocation of resources more efficiently, allowing key personnel to focus on more valuable tasks. 

Example with email classification using Claude 3 SOTA models

With the necessary dependencies—Instructor, Pydantic, Anthropic, and Boto3—installed, we are primed to start building our system. First, we instantiate the Instructor client, specifying the Claude 3 model of choice. 

It is worth noting that, Instructor has recently unveiled support for Anthropic models, including those within Bedrock. Utilising AnthropicBedrock, we simply patch instructor.from_anthropic with the desired model selection, and we are set. 

Link to code snippet 

The choice of which Claude 3 model to use rests with you, and you can opt for the one that aligns best with your specific use case. Personally, I tend to opt for the smallest (most economical) model capable of fulfilling the task at hand – Haiku, in this case. 

However, should you observe that the smaller model struggles to provide reliable responses or if you begin encountering retries, it may be prudent to transition to a more capable alternative. 

Link to code snippet 

I requested ChatGPT to generate 15 emails across three topics: IT Support, HR, and Personal. These generated emails serve as the data for this example. 

We can then classify the emails using the code below: 

Link to code snippet 

And that is all there is to classifying emails using Claude 3. 

Before LLMs, one would typically have to train a Natural Language Processing (NLP) model like Spacy or one of the NLP libraries in sklearn. However, training and maintaining such a model is a considerable undertaking. Instead, we can capitalise on LLMs, especially the state-of-the-art ones, which have undergone extensive training against a vast corpus of data. 

It is worth noting the creation of a Pydantic model named Email, which plays a pivotal role here. Within Instructor, it extends the Claude 3 call to incorporate a response model—in this case, the Pydantic Email model. 

In line 31 of the above code snippet, this is where the magic happens. With just that one line, we prompt, validate, and obtain structured output from the LLM. 

Why is this good?

This approach offers several advantages. Firstly, leveraging the Pydantic model allows for structured prompting 

Notably, the Email class includes a docstring and descriptions for each field. While subtle, these elements are transmitted to the LLM as prompts. Structured prompting enhances the efficiency and accuracy of interactions with the model. A good side effect is that your prompts are not free text, but actual Python code. 

Moreover, the inclusion of validation mechanisms is important. For instance, if the LLM responds with a classification not recognised within the EmailCategory enum—due to model hallucination—Instructor triggers a validation error through the Pydantic model.  

Subsequently, a retry attempt is initiated, wherein the request is resubmitted to the LLM. Typically, this results in the correct classification being provided. 

Upon successful validation, the response is returned as a Pydantic Email model. This completes the cycle, encompassing structured prompting, validation, and structured response, thereby ensuring robustness and reliability in the classification process. 

Not just Email Categorisation

Another noteworthy aspect is that, alongside requesting Claude 3 to categorise our emails, we are also tasking it with additional functions such as summarisation, sentiment detection, grammar and spelling correction, and entity identification within the email. 

Through the seamless integration of Instructor and the Claude 3 model, all these tasks can be done in a single invocation, eliminating the need to make multiple calls or engage multiple models. This streamlined approach not only enhances efficiency but also simplifies the workflow, offering a more cohesive and comprehensive solution. 


In summary, we have utilised a library called Instructor to facilitate the integration of intelligence in our email classifier. Consider the many use cases where intelligence can be seamlessly weaved into the systems that we build. 

Working with LLMs has become significantly more accessible with many libraries like this and advanced models like Claude 3 through AWS Bedrock, which continuously improve with each iteration. This convergence of powerful tools not only simplifies development but also opens a world of possibilities for enhancing the intelligence of our applications. 

Cevo Australia specialises in building systems within the AWS ecosystem. If you require assistance in developing systems using state-of-the-art models such as Claude 3, do not hesitate to contact us. We are here to help bring your projects to fruition with expertise and efficiency. 

Enjoyed this blog?

Share it with your network!

Move faster with confidence