Debunking misconceptions about LLMs in data and analytics

Posted on 30th April 2024

Written by Antony Heljula, Director of Tech & Data Innovation, TPXimpact

I’m a big advocate for Generative AI (GenAI) and Large Language Models (LLMs) and they’re potential benefits. But with all the buzz, it’s easy to get caught up and misunderstand what these technologies can really do, especially in the world of data and analytics.

There’s certainly been some overselling going on, and people need to know what the common misconceptions are, and why people are key to GenAI and LLMs producing the results we need.

“The copilot will sort it”

More and more software vendors are integrating copilots into their data and analytics platforms to turbocharge development, smooth out code migration, and dish out automated insights. This is all great, but these tools do not provide a one-size-fits-all solution and it’s important to remember that. It’s crucial to grasp their limitations before rolling them out to users and understand that you’ll still need developers in the trenches crafting applications.

Copilots can do a range of great and really helpful stuff, one being automatically building reports with built-in insights and anomaly detection. But while features like this are useful, the reality is you can’t just sit back and trust that these reports are flawless. You’ll need to double-check that all the filters and definitions are spot-on to guarantee 100% accuracy. This means every report churned out by copilots will need a human expert’s stamp of approval before it’s rolled out.

“The LLMs can do the maths for us”

These solutions are designed to be like neural networks that mimic the astonishing capabilities of the human brain. But as amazing and as powerful as our brains are, we’re not calculators, and neither are LLMs.

Just like us LLMs are prone to mathematical slip-ups, so banking on them for 100% accuracy is a big gamble. They do shine when it comes to handling basic number crunching on smaller data sets, like your run-of-the-mill sums and averages. But if you’re thinking they can tackle complex mathematical tasks like forecasting or intricate what-if analyses, you’ll be left disappointed.

If we want LLMs to work effectively with our data, it’s wise to have people spoon-feed them pre-summarised and pre-calculated information. That way, we minimise the chances of any mistakes occurring.

“We can move on from data analysts”

LLMs are becoming increasingly popular as a way to provide commentary on our data. This is because they excel at condensing information into neat summaries and even spotlighting interesting points and outliers.

However, they’re only as good as the data we feed them, and the amount of information we can share is relatively modest compared to most databases.

At the same time, if we try to use LLMs to produce commentary that involves its base model, such as its internet knowledge base, we risk creating hallucinations where the platform makes up its own story.

We also can’t yet rely on them to be experts in our fields. They won’t always have the insider knowledge or grasp all the ins and outs that matter when explaining our data unless a human analyst steps in to fill in the blanks

In summary, while LLMs can help us tell a story with our data, they’re not a one-stop shop and we’ll still need a trusty data analyst to manage and oversee this process.

“We can get rid of our databases”

Theoretically, we should be able to place an LLM on top of all our data and ask it questions, removing the need to write SQL queries. For those one-off inquiries, this can work well. Need to know the status of order 12345? LLMs should be able to answer that with minimal effort.

Challenges can arise though because these solutions can only process small amounts of data at a time. So while they can handle questions about individual records, throw in a more complex query like “What’s my total order value?” and they can then hit roadblocks. They would need to scan through loads of records to find that answer, which would exceed the LLM’s input limits, meaning we would get a result, but it would probably be incorrect.

They would also not know how to calculate and apply precise business definitions that often exist within data models or be able to process complex relationships.

“Generative AI can generate our charts”

GenAI can create sample code, but this doesn’t mean we can use it to accurately generate code to produce a chart. These platforms would only be able to give us some sample code which we can’t guarantee will work.

It’s the same with images. GenAI can create imaginative images like a cat driving a car, but can’t generate something specific like a bar chart with 15 different bars and labels. For example, if you asked a GenAI platform to produce an image containing a company’s logo, it would generate something that looks similar to that organisation’s branding, but it wouldn’t be a perfect copy.

When it comes to data and analytics, these tools are text wizards, not miracle workers. You’ll still need the human touch to turn their output into something truly useful.

“ LLMs can fix all our data quality issues”

LLMs can help us manage issues around data quality. Take “Eurpe” for instance; they’ll spot that typo and correct it with minimal fuss.

However, LLMs aren’t well suited to dealing with large volumes and are not subject matter experts who can natively understand all the nuances within your information.

Lastly, these tools are trained on what’s out there in the digital ether, they’re not equipped to handle what doesn’t exist or the unknown. They will struggle to identify incomplete data.

People and technology in harmony

GenAI and LLMs already have many impressive capabilities that can work wonders when used wisely. But it’s important to remember they can’t replace the vital jobs that developers, analysts, databases, and BI tools do. People, with their expertise and insights, remain the unsung heroes ensuring these technologies hit the mark and drive organisational success. It’s a team effort, with humans and AI working hand in hand to unlock the full potential of data and analytics.