Learning from a day in the life of a data scientist

Written by Matthew O'Kane, VP, EMEA Head of AI & Analytics, Cognizant

Here’s how organisations can reduce the friction data scientists face in developing and deploying models into production – and realise value more quickly.

A bank turned to one of its data scientists to predict the risk of customers missing their next collections payment, with the goal of reducing bad debt. The data scientist was fortunate enough to have access to a comprehensive data mart containing data across many dimensions – including customers, product holdings, transactions and collections activity.

He started by building a snapshot of every customer at a certain stage in collections and an indicator of whether they made a payment in the next month. He added many features (curated variables used in machine learning algorithms) to the dataset, building code to combine, aggregate and transform the raw data to be ready for model development.

Once the dataset was complete, the data scientist tried multiple machine learning algorithms on the data, using a holdout set to check the predictive power of each resulting model. He decided on the best-performing model and created the necessary deployment code.

To prepare for deployment, he worked with a data engineer to rewrite his chosen features for the model into a production data pipeline. He had to drop a few features in the model and rebuild it when they realised some data wouldn’t be available in the batch production process.

Finally, with the model ready for production, he worked with his business stakeholders in collections on a suitable action plan to use the model outputs. His colleagues were impressed by how well the model could predict customers who would make or miss their next payment. One asked him, “What should we do with customers who are very likely to make their next payment? Do we need to call them at all? And what about customers with a low probability of making their next payment? Should we call them or are they a lost cause?”

The data scientist wasn’t sure how to answer these questions – so they agreed on some simple rules to apply on top of the model and proceeded to change the business process. Overall, the end-to-end process took a few months, and the data scientist then moved onto his next project.

Reducing Friction for Data Scientists

I’m sure the above story sounds like a typical model development and deployment project in many organisations today. What may surprise you is that the data scientist was me – almost two decades ago!

Despite the progress made in machine learning and AI since that time, data scientists still face the same challenges in developing and deploying models into production – even while the number of use cases for machine learning across organisations grows exponentially and talent is in short supply. Only by reducing the friction that data scientists face will organisations realise more business value in a shorter period of time.

The story highlights three key areas of friction in a typical machine learning project:

Feature engineering: The process of manually combining, aggregating and transforming data ready for machine learning.
Organisations are now beginning to recognise the degree to which feature engineering consumes effort in model development. By building a company-wide feature store, duplicate effort can be reduced, and the discovery of new features can be enabled. One bank created a feature store to power its fraud and credit risk modeling. The resulting store reduced total model development time down to two weeks and enabled full data lineage for regulatory reporting. If your organisation could build and deploy AI in two weeks, how much business impact could you achieve in a year?

Data and model deployment: The process of creating data and model pipelines that enable real-time capabilities.
Models are typically constrained by the data available at the point of scoring. Many machine learning use cases require data that is simply not available in a timely manner from batch environments such as Hadoop data lakes. Understanding what the customer has done in the last few hours, for example, will better fuel next-best action models and better customer intent predictors. Uber’s Michelangelo platform has helped the company deploy models that use the number of restaurant orders in the last 30 minutes to predict when your order will arrive. How many use cases in your organisation could benefit from data received in the past 24 hours?

Business outcome optimisation: Connecting models to the key performance indicators of the business and optimising decisions. AI will only be useful if it directly improves business outcomes, but most machine learning ends when a prediction is made. To directly improve business outcomes, we need to build prescriptive models. These models, given a context and possible set of actions, estimate the impact of each action on key business outcomes. At Cognizant, we’re helping clients tackle these types of problems with Evolutionary AI.

In our collections example, this would entail understanding the impact of each customer decision on both customer experience and payments. A prescriptive model could then be deployed to optimise and improve these business objectives over time. How many use cases in your organisation are actually prescriptive in nature?

We are quite lucky to live in an era when AI can truly change the world around us. But the only way to enable this exciting future is by reducing the friction on the development and deployment of AI across the organisation. Now is the time to help data scientists scale AI and transform businesses.


Originally published here.

More thought leadership

Join Digital Leaders

By submitting your contact information, you agree that Digital Leaders may contact you regarding relevant content and events.