Small Language Models vs. LLMs: Which One Reigns Supreme?
Why in the NEWS?
In 2024, researchers focused on small language models (SLMs) because scaling larger models offered marginal benefits.
Key Points:
Ilya Sutskever, former chief scientist at OpenAI, made an important statement at the recent NeurIPS conference: “We have achieved the pinnacle of data, we now have to work with it.”
This comment came at a time when it was being felt that the pace of development of large language models (LLMs) was slowing down, and scaling had reached its final stage.
However, the previously prevalent idea that the bigger the AI model, the smarter it is, is now beginning to change. The rise of small language models (SLMs) has led to this changing perspective.
Major tech companies such as OpenAI and Google have made small models part of their AI architectures, providing cheap and effective options for specific tasks.
What will you read next in this topic?
Shift to smaller models
Benefits and Drawbacks of Small and Large Models
Use cases of small and large models
Role of Small Models in India
Future of Small and Big Models
Shift to smaller models
OpenAI released GPT-3 with 175 billion parameters in 2020, which was a big step towards building larger AI models.
Subsequently, GPT-4 was trained on 1.7 trillion parameters, but in 2024 researchers started realizing that there was not much to gain by scaling to data collected from the internet.
Keeping this perspective in mind, there was a need for smaller language models, which are now becoming increasingly popular.
Companies like Google, Meta, and Anthropic launched smaller models such as Gemini Ultra, Llama 3, and Opus, making it clear that smaller language models can now play an important role.
Benefits and Drawbacks of Small and Large Models
Small Language Models (SLMs):
Small models are cheaper and ideal for specific tasks.
For example, if a company only needs a model for a specific task like translation, customer support, or limited data processing, then smaller models can prove to be more effective and cost-effective than larger models.
These models require less time, fewer resources, and less data to train.
Additionally, smaller models use new technologies like smart fencing, which makes them even more attractive.
For example, Microsoft launched a family of small language models called Phi, with the Phi-3-mini available with 3.8 billion parameters.
Even big brands like Apple are now using smaller AI models in their devices, which can perform as well as larger models.
Large Language Models (LLMs):
These models are trained at scale and are capable of solving complex problems.
They are adept at extensive knowledge, logical problems, coding, and other complex tasks.
However, training them is expensive and time-consuming.
Large models are used when a task requires a high level of artificial general intelligence (AGI).
Use of small and large models
Small models are ideal for specialized tasks.
For example, small models are very effective when users are learning languages or performing simple translation tasks in WhatsApp or Meta applications.
They are great for simple tasks, but their performance can drop as the complexity of the task increases.
Small models are good at basic tasks such as translation and calculations, but they are not as capable of solving coding, logical problems, or complex tasks.
On the other hand, large models are capable of complex problems such as logical reasoning, extracting information from large data sets, and efficient coding.
However, these models are expensive and require more resources to run.
Role of Small Models in India
For developing countries like India, where there is a huge need for AI but resources are limited, small models can be the ideal solution.
Institutes like IIT Hyderabad are creating data sets for small language models, which can be used in healthcare, agriculture, education, and preserving cultural diversity in Indian languages.
Small models can be suitable for India's huge population, as they do not require much resources, and can provide maximum utility.
Also, small models adapt well with local languages, which can help promote India's cultural and linguistic diversity.
In this direction, Vishwam, another Indian AI project, is developing small models, which will be used in sectors like agriculture, healthcare, and education.
Future of Small and Big Models
As AI technology continues to develop, both types of models have their place.
While larger models will be able to solve complex problems, smaller models will be ideal for specific use cases.
Vivek Raghavan, co-founder of Sarvam AI says, “We want to build GenAI that can be used by a billion Indians,” indicating that smaller models could have a bigger role in the Indian context.
Smaller models can make AI accessible to the general public, while larger models will be available for broader and more complex tasks.
Q. "Phi" is a small language model released by which company?