Why there are no shortcuts to machine learning

As long as companies understand that good data science takes time in an enterprise, and give these people room to learn and grow, they won’t need shortcuts

Why there are no shortcuts to machine learning
Thinkstock

Big data remains a game for the 1 percent. Or the 15 percent, as new O’Reilly survey data suggests. According to the survey, most enterprises (85 percent) still haven’t cracked the code on AI and machine learning. A mere 15 percent “sophisticated” enterprises have been running models in production for more than five years. Importantly, these same companies tend to give more time and attention to critical areas like model bias and data privacy, whereas comparative newbies are still trying to find the On button.

Unfortunately, for those companies hoping to close the data science gap with automated shortcuts like Google’s AutoML or through paid consultants, the answer seems to be that getting data science right takes time. There are no shortcuts.

Smart companies focus on the deep end of data

First, it’s important to note that O’Reilly’s survey data comes from a self-selected bunch: people who have attended O’Reilly events or otherwise engaged with the company through webinars or other means. Such people have a proactive interest in data science, even if (as the survey data shows) most aren’t really doing much with it. For those steeped in big data experience, however, this is a great demographic, with those dubbed “sophisticated” having models running in production for more than five years.

One interesting data point that emerges from the survey is how these people talk about themselves. Companies with extensive data experience call a data science spade a data science spade. Those stuck in 1990s “data mining” mindsets prefer “analyst,” as the figure shows.

oreilly ml survey figure 1 O'Reilly

Big data remains a game for the 1 percent. Or the 15 percent, as new O’Reilly survey data suggests. According to the survey, most enterprises (85 percent) still haven’t cracked the code on AI and machine learning. A mere 15 percent “sophisticated” enterprises have been running models in production for more than five years. Importantly, these same companies tend to give more time and attention to critical areas like model bias and data privacy, whereas comparative newbies are still trying to find the On button.

Unfortunately, for those companies hoping to close the data science gap with automated shortcuts like Google’s AutoML or through paid consultants, the answer seems to be that getting data science right takes time. There are no shortcuts.

Smart companies focus on the deep end of data

First, it’s important to note that O’Reilly’s survey data comes from a self-selected bunch: people who have attended O’Reilly events or otherwise engaged with the company through webinars or other means. Such people have a proactive interest in data science, even if (as the survey data shows) most aren’t really doing much with it. For those steeped in big data experience, however, this is a great demographic, with those dubbed “sophisticated” having models running in production for more than five years.

One interesting data point that emerges from the survey is how these people talk about themselves. Companies with extensive data experience call a data science spade a data science spade. Those stuck in 1990s “data mining” mindsets prefer “analyst,” as the figure shows.

oreilly ml survey figure 1 O'Reilly

Whatever companies choose to call their data professionals, the more experienced the enterprise with AI and machine learning, the more likely they are to rely on internal data science teams to build their models, as the figure shows.

oreilly ml survey figure 2 O’Reilly

Virtually no one is looking to the cloud machine learning services (at least, not yet), while companies with less than two years of production experience tend to rely on external consultants to build their machine learning models. This may feel like an opportunity for such companies to get the benefits of data science without making the investment in people, but that’s a fool’s-gold approach.

The more sophisticated the company with data, the more its data science team both builds the models andevaluates key metrics for a project’s success. Across all companies, product managers tend to define project success metrics (36 percent), with executive management (29 percent) and data science teams (21 percent) also involved.

But for experienced companies, while product managers still get cited most (34 percent), data science leads (27 percent) are roughly equal with executive staff (28 percent).

The least experienced companies tend to look to executive management (31 percent) and rarely to their data science leads (16 percent). That wouldn’t be a problem but for the fact that those data science teams are best positioned to figure out how to use the data and to measure its success.

Too often, it’s the blind leading the blind

This reliance on executive management to drive data science calls to mind the surveys that show executives calling themselves data-driven but then ignoring data that doesn't support decisions prompted by gut instinct (62 percent admit to doing this).

Enterprises that lack big-data savvy seem to want to pay lip service to data, but they don’t understand the nuances of effective data science. They simply lack the requisite experience to ensure that they’re gleaning meaningful, unbiased insights to data.

More sophisticated enterprises grasp what Gartner’s Andrew White means when he talks about understanding machine learning models and how that can breed trust in the results:

What is new [with AI] is that AI is able to redraw the line— what was thought of us too complex and not routine can now be exploited with AI. AI can (so the promise goes) cope with more complex and more cognitive work than previous technologies.
This new reality will only survive the light of day if the outcome of the automated work left to AI make sense. If the new-fangled black box takes decisions and changes outcomes that humans don’t understand, those humans will likely turn off the box. So, understanding the decision to some degree is very important.
However, understanding or interpreting a decision is quite different to understanding how the algorithm works. A human should be able to grasp the principles of inputs, choice, weights, and results, even if an algorithm combines many of these to an extent that we cannot even prove the process. If the gap between outcome and approximate inputs are too varied, trust in the algorithm will likely fail—that’s just human nature.

Getting to this level of understanding can’t be bought for the price of a consultant. Nor does it come ready-made in the cloud. Tools like Google’s AutoML purport to “enable developers with limited machine learning expertise to train high-quality models specific to their business needs.” This sounds great, but so much of the benefits that derive from data science require experience with data science. It’s not just a matter of tuning a model, but rather knowing how to do so, which is born in the trial and error of experience.

Additionally, doing data science right requires a cultural mindset that, again, comes with experience. There are no shortcuts. In practice, this means that those companies that invested early in data science should find themselves ahead of peers that have not—a competitive differentiation that could well persist.

For those companies hoping to catch up, Gartner analyst Svetlana Sicular’s classic advice still rings true: “Companies should look within. Organizations already have people who know their own data better than mystical data scientists.” As long as companies understand that good data science takes time in an enterprise, and give these people room to learn and grow, they won’t need shortcuts.

This story, "Why there are no shortcuts to machine learning" was originally published by InfoWorld.