AI is more than just Generative AI

When I tell people I work as an AI Consultant, they often assume my expertise is limited to just Generative AI. While I do have expertise in this area, it’s important to highlight that AI is much broader than just Generative AI. It encompasses a wide range of fields, including Machine Learning and Statistical Models, which share common techniques and face similar challenges. 

To understand how this came to be, let us have a short journey into recent AI history to remind ourselves what it means when people mention AI. 

Data Science & Statistical Machine Learning (2010-2017)

To simplify this history lesson, let us start from around 2010, when activity in pre-training vision models started to emerge. 

The earlier phase in this short AI history lesson focuses on the smaller models used in typical machine learning use cases such as image classification, spam detection, time series forecasting and other prediction tasks. There are varied model types and architectures used which include neural networks, decision trees and many statistical and machine learning models. 

Supervised machine learning in this era used these smaller models and careful attention was made to prepare datasets with their corresponding labels. There was also the presence of data science practitioners that ensured the data was carefully curated, including using iterative experimentation and model training process to discover model parameters that maximise the model’s accuracy. The product of this workflow was a model that could then be served on a suitable platform to produce the predictions it was built for. 

Image: VGG-16 CNN Deep Learning Architecture (TensorFlow/Keras) 

From around 2013 to 2017, the concept of pre-training these smaller models against a huge dataset became prevalent starting with image models such as AlexNet, and continued with the popularity of deep learning models such as VGGNet and ResNet. Around this time there was a flurry of model development activity, and the beginnings of transfer learning become more prevalent in the computer vision field. 

Transfer learning is basically the process of fine-tuning a pre-trained model, where a drastically lesser amount of data is required to fine-tune the model against your own dataset.   

Foundation Models & Transfer Learning (Around 2017)

I remember sometime in mid 2017 was the time when I said I wanted to study and focus on machine learning. It was the comedy sitcom Silicon Valley when Jian-Yang demoed the “Not Hotdog” app to his friends and investors. They were delighted when it correctly predicted that a hotdog was a Hotdog. I also remember their frustration and despair when a slice of pizza was a Not Hotdog. That was classic, and the rest, as they say, is history.  

From around 2017, transfer learning started to become more prevalent with natural language processing (NLP), following its successes in vision systems in the last couple of years. Now the term foundation model has become more commonplace, but this has the same meaning as the pre-trained models used by vision models in the previous era.  

NLP researchers found that transfer learning also worked similarly well for language tasks. More attention was then focused on more efficient embedding models that can easily be fine-tuned on much smaller datasets and used for more specific downstream tasks such as text classification and entity extraction.  

Vector embeddings were now more known to extend even beyond data science practitioners. A lesser-known fact was that in the models in this and the previous era, a large part of the model was dedicated in generating this vector embedding representation, and a final layer, sometimes called head, that is bolted on the end of that model, performs the final model task such as classification. 

Image: Vector embeddings form the foundations of retrieval systems 

Another lesser-known fact is that one can produce an embedding model from either the vision models and these NLP models, just by simply removing this final head, and we are left with a model that can generate embedding vectors, the functionality that enables semantic search, powering Generative AI applications that we know today.  

Finally, the technology for pre-training NLP models continued growing, where they have now resorted to automation, exposing them to internet-scale datasets. This produced models that became increasingly larger and better each time, which also increased exponentially the compute required, and GPU and accelerator use became more commonplace.  

Generative AI (2022+)

After the experimentation of NLP models in the previous era to generate increasingly larger text “autocomplete” models, there was the emergence of large language models (LLMs) like BERT, GPT-2, then finally GPT-3. 

Although several LLMs already existed before 2022, it was the release of ChatGPT in late 2022 that really pushed LLMs into global popularity, and at this stage, it was GPT-3.5 that shared the spotlight. It’s still surprising to me how a simple web interface in ChatGPT single handedly pushed LLMs and GPT to the mainstream. 

By now, transfer learning was a cornerstone in machine learning as we know it today. Where it was now commonplace for LLM companies to build pre-trained models, allowing easy fine-tuning to specific use cases and different domains.  

Image: Fine-tuning an LLM is the same as fine-tuning a traditional ML model 

This brings us to the present, where we just use these pre-trained or fine-tuned LLMs by simply prompting them with natural language instructions. There have also been more complex architectures underneath these systems – many of them augment the LLM capability by pulling in other more traditional models as “tools” in the LLM inference pipeline.  

Further, embedding representations have not gone away, still being used in all the layers of the LLM, as well as in tasks such as semantic retrieval to augment the LLMs knowledge base, in a technique we now know as Retrieval Augmented Generation (RAG). 

Summary and Key Takeaways

Historical modelling techniques continue to play a crucial role in various use cases, depending on factors such as model size, computational demands and the availability of labelled data. Approaches like carefully curated datasets, experimentation and transfer learning remain highly relevant in the development of modern LLMs. 

Building Generative AI applications today no longer requires traditional experts like data scientists, which was typical in earlier times. These days all we need is the same software engineering skills used in building traditional software products.  

While extremely capable, even generative models are still fundamentally parameterised software functions, and they are not intelligent or sentient beings. As stated earlier, we need the same engineering principles as in building traditional software, and GPU and acceleration to help them run faster.  

As AI/ML has progressed, the core focus on learning rich data representations and embeddings has remained a common thread across the different models, and things like vector embeddings and vector databases have not lost their utility in the Generative AI era, becoming more relevant now than ever.  

In summary, while Generative AI has captured tremendous attention, it is important to understand it as the latest innovation emerging from a longer history of AI and machine learning techniques – each with its strengths for different use cases. The future will involve combining these different capabilities into more intelligent systems. 

Cevo Australia has expertise in building AI systems, ranging from traditional Machine Learning systems to Generative AI solutions for enterprises. For example, we have assisted our customers in developing an AI assistant using Amazon Q Business, creating a secure, scalable and dependable retrieval-augmented generation (RAG) system for enterprise use.

Enjoyed this blog?

Share it with your network!

Move faster with confidence