The rise of small language models in enterprise AI

Posted on 15th May 2025

Written by Robbie Jerrom, Senior Principle Technologist AI, Red Hat

Over the next three years, 92% of organisations plan to increase their investment in artificial intelligence (AI), which is set to drive greater automation, faster decision-making and improved customer engagement.

The majority are embracing generative AI (gen AI) in the form of large language models (LLMs) for their ability to reason and generate valuable content and insights. Sometimes augmented with techniques like retrieval-augmented generation (RAG) or fine-tuning to further improve accuracy and fill knowledge gaps, generalised LLMs can deliver excellent results for enterprise organisations and line-of-business application owners.

However, the high operational costs of LLM services, risks around data privacy when using public platforms, and the complexity of fine-tuning large models for domain-specific accuracy all demand careful evaluation.

Small language models, or SLMs, are a subset of LLMs designed for specific, task-focused domains. They offer similar capabilities with reduced computational requirements and will be an essential component for many organisations.

Open source models like Mistral 7B, LLaMA 3, and IBM’s Granite series are widely available and offer enterprises scalable, domain-specific AI capabilities without the infrastructure demands of larger models. These models are actively being fine-tuned for real-world use cases in areas such as finance, healthcare and customer service, demonstrating their maturity and enterprise readiness, and often deliver comparable accuracy to LLMs with lower costs, enhanced data privacy and simplified deployment.

The benefits and challenges of enterprise LLM adoption

LLMs possess capabilities such as text generation, summarization of multimodal content, language translation, content rewriting, data classification and categorization, information analysis, and image creation.. All of these abilities provide humans with a powerful toolset to augment our creativity and improve problem solving. Enterprises commonly use LLMs to:

Drive automation and efficiency

LLMs can help supplement or entirely take on the role of language-related tasks like customer support, data analysis and content generation. This automation can reduce operational costs while freeing up human resources for more strategic tasks.

Generate insight

LLMs can quickly scan large volumes of text, enabling businesses to better understand market trends and customer feedback by scraping sources like social media, reviews and research papers.

Create a better customer experience

LLMs help businesses deliver highly personalised content to stakeholders driving engagement and improving user experience. This could include implementing a chatbot to provide round-the-clock customer support, tailoring marketing messages to specific user personas or facilitating language translation.

LLM deployment considerations

Before investing in an LLM, organisations must review if it will be appropriate for their use case and requirements, and analyse key considerations:

Cost-efficiency

Training a large language model from scratch is beyond the reach of most organizations, demanding massive computational resources, vast amounts of data, and deep machine learning expertise. While adopting pre-trained foundation models is more accessible, the costs associated with running and adapting these large models can still be significant.

Operating LLMs at enterprise scale typically involves fine-tuning for specific use cases, which introduces additional complexity and expense. These models process information in units called tokens — small chunks of text such as words or characters — with many cloud providers charging per token. As usage scales across an enterprise, the number of tokens processed grows rapidly, driving up operational costs.

Small language models, on the other hand, offer a more cost-effective alternative. They are lightweight enough to run efficiently in both public and private cloud environments, while still delivering high-quality results for domain-specific tasks. This makes them especially appealing for organizations seeking scalable, predictable AI costs without compromising on performance.

Privacy and security

Compliance, data privacy and security considerations are at the heart of successful gen AI adoption, especially for businesses that operate in highly regulated industries like finance and healthcare. Before deploying LLMs trained on public data, a company must understand the risks posed to the sensitive data stored by the providers.

Accuracy and mitigating bias

Foundation models, trained on broad datasets, generate generic responses which can be problematic when specificity is critical for enterprise use cases. In industries where precision, compliance, and domain expertise matter, relying on a general-purpose model may lead to inaccurate, irrelevant or highly general outputs.

Additionally, if an LLM is trained on biased or incomplete data, it can produce flawed responses, reinforcing misinformation or skewed perspectives. Enterprises must proactively monitor model outputs for accuracy and bias, ensuring AI-driven decisions align with business objectives and ethical standards. This necessitates continuous oversight and evaluation, often utilizing a supporting model — sometimes referred to as a ‘guardian’ model — such as an SLM, in conjunction with tools to audit and enhance model fairness and response accuracy, like the open source project TrustyAI.

Business agility

Enterprises need the flexibility of hybrid cloud to run AI models across cloud and in-house infrastructures. However, larger LLM deployments can become unwieldy in hybrid cloud environments. Organisations need to take note of which models offer greater portability and integration capabilities and which are going to present a scaling challenge both in terms of resource usage, locality and cost.

Model compression and efficient inference with vLLM and LLM Compressor (open source projects focussed on delivering faster and more efficient inference for gen AI models) present one option to help reduce costs, reduce resource usage and accelerate model response times.

Most production AI systems take advantage of multiple AI models; this applies to traditional predictive models alongside generative AI models. Small language models can deliver distinct advantages in this situation. Smaller task-focussed models can work standalone, or in concert with other SLMs and LLMs, fulfilling complementary roles. It is a similar approach to microservice architectures we have adopted for scaling modern enterprise applications.

Why SLMs work for the enterprise

SLMs are compact, efficient alternatives to larger LLMs, designed to deliver strong performance for specific tasks while requiring significantly fewer computational resources. For instance, a 10-billion parameter SLM can be optimised for enterprise applications at a fraction of the cost and complexity of a much larger 400-billion parameter LLM. SLMs can also maximise hardware use, reducing power consumption and heat generation, lowering operational costs.

SLMs prioritise what truly matters for enterprises. They enable organisations to integrate AI with private, secured datasets, reducing exposure to third-party risks and ensuring compliance with industry regulations. Purpose-built for task- and domain-specific applications, ranging from assisting with financial fraud detection systems to healthcare diagnostics, SLMs can be fine-tuned for greater precision while maintaining efficiency.

Their smaller footprint also makes them more flexible and easier to deploy in hybrid cloud environments, allowing businesses to run AI where it delivers the most value — whether on-premises, in private clouds or at the edge.

Rolling out SLMs

Furthermore, adopting SLMs doesn’t require a massive technological overhaul.

It’s important to commence rollouts by defining a model’s purpose. This can help reveal where AI adoption makes sense, and where it does not. Focused use cases where SLMs excel include automating customer support, enhancing fraud detection and improving operational efficiency through fine-tuned insights. Once a model’s use case has been defined, SLMs like IBM Granite models can be adopted strategically.

Top considerations for deployment include:

Optimising SLMs for security and compliance

Training SLMs on the enterprise’s private datasets can support a secured and controlled deployment. Red Hat’s AI security approach emphasises the value of using open source AI models to enhance trust and transparency, helping enterprises retain control over AI decision-making.

Hybrid cloud scalability and AI innovation

A recent Red Hat survey found that preparing for AI adoption is a top-three driver for cloud growth across Europe this year. Given today’s need for portability and consistent experience in any cloud environment, it is highly advantageous to use a hybrid cloud application platform that can support AI-based apps as ‘just another workload’. Some platforms include native capabilities for AI, like data acquisition and preparation, model training and fine-tuning, model serving and model monitoring. Red Hat OpenShift AI, for example, brings data scientists, engineers and app developers together in one place to create and deliver AI-enabled applications at scale across hybrid cloud environments.

This combination of open source AI and a hybrid cloud platform can be extremely powerful. For example, consulting firm Guidehouse and the US Department of Veteran Affairs collaborated with Red Hat to help address the issue of veteran deaths by suicide in the United States. Guidehouse built an AI solution on top of Red Hat OpenShift, using containers to quickly deploy and snap together models that could be published back into veterans’ electronic health record systems. Over in Türkiye, DenizBank adopted Red Hat OpenShift AI to provide the bank’s data scientists with greater autonomy and more consistent standards. This enables the bank to create AI models that help identify loans for customers or potential fraud.

By focusing on right-sized AI, enterprises can tailor solutions to their specific business needs, enabling compliance, cost-effectiveness and seamless integration into hybrid cloud environments. The future of enterprise AI isn’t solely about scaling up; it’s about scaling intelligently — implementing AI solutions that are efficient, secure, and sustainable within the enterprise ecosystem.

Originally posted here