Evaluating the opportunity for embedded ai in data productivity tools
1. Evaluating the opportunity for embedded
AI in data productivity tools
AI-for-AI is gaining attention - but is the capacity for embedding AI for data productivity
overlooked? Let's do a gut check on the views of industry experts.
In a recent article by the BCG Henderson Institute, How to Win with Artificial
Intelligence, the authors posed the question:
Among the many companies investing in artificial intelligence, there is one surprisingly
exclusive group: companies that generate value from AI. And right now, at least, the odds
against gaining admission are sobering. According to a survey of more than 2,500
executives - conducted for a new report by MIT Sloan Management Review, BCG Gamma,
and BCG Henderson Institute - seven out of ten companies report minimal or no gains so
far from their AI initiatives. Why do some efforts succeed, but many more fail?
The new report by MIT Sloan Management Review, BCG Gamma, and BCG Henderson
Institute offers six suggestions to improve the rate at which organizations can succeed
with AI:
1. Integrate AI strategy with business strategy
2. Prioritize revenue growth over cost reduction
3. Take on large projects with big impact - even if they're risky
4. Align the production of AI with the consumption of AI
5. Treat AI as a major business transformation effort
6. Invest in AI talent, data governance, and process change
That's all pretty obvious. The only one that struck me was #3. Not that is is so novel, but
rather that the authors take a stand on what is still a hotly debated issue: aim-high-but-
start-low, versus start-low. I wouldn't consider #3 a given. It depends on too many
factors.
One area where AI has quietly made verifiable and useful progress is not applications.
We see a rapidly expanding use of AI embedded in process at the other end of the
application pipeline: managing, interpreting, and provisioning information.
The point is, AI for digital transformation and creating gee-wiz customer-facing apps is a
lot harder than engineering embedded AI into tools. It’s a win-win because failure to
2. provide clean and AI-ready data is the most-often heard reason for lack of progress.
Doing so, getting the data ready, at the scale and cadence needed to push AI through,
has become an inhuman task. Using AI to get prepared for AI; it has a beautiful sound to
it.
Scanning an article by Gil Press in Forbes.com,120 AI Predictions for 2021 I’m getting
some support for this idea. (note: there are only sixty comments in the article, it is Part 1
I suppose). I include these as examples. This is not a complete industry survey. Here are
some highlights from the respondents, and my reactions.
Joe Hellerstein, Co-Founder, and CSO of Trifacta. Joe is a Professor of Computer Science
at the University of California, Berkeley, whose work focuses on data-centric systems and
the way they drive computing. He is an ACM Fellow, an Alfred P. Sloan Research Fellow,
and the recipient of two ACM-SIGMOD "Test of Time" awards for his research.
Hellerstein sees the role of AI behind the data:
Expect also to see increased investment in data preparation—an integral component in
any data project that is still often regarded as the biggest bottleneck for many—driving
improvement in data quality and relieve IT from the pressures of preparing data.
Haoyuan Li, Founder, and CTO, Alluxio:
What used to be statistical models now has converged with computer science and has
become AI and machine learning. So data, analytics, and AI teams can't be siloed from
one another any longer. They need to collaborate and work together to derive value from
the same data that they all use. In 2021, we'll see more organizations building dedicated
teams around the data stack.
MyPOV: And they’ll be using tools with AI-assisted development. These dedicated
teams will need to drastically reduce the amount of code they write for data
management, version control, one-click to the cloud of algorithms (or tensors). AI/ML
doesn't get enough attention for what it can do for the productivity of AI engineers.
Philippe Vincent, CEO, Virtana:
…enterprises … will require AIOps-based solutions that integrate infrastructure monitoring,
workload automation, and capacity planning into one platform. As such, vendors who fail
to adopt an AIOps model of service and enterprises who fail to invest in end-to-end
infrastructure visibility will be unable to deliver on customer requirements and
performance SLAs
3. MyPOV: Good thought, but doesn’t really talk about AI.
Dan Sommer, Senior Director, Global Market Intelligence Lead, Qlik:
It's easier now than ever to do in-database indexing and analytics, and we have tools to
make sure data can be moved to the right place. The mysticism of data is gone:
consolidation and the rapid demise of Hadoop distributors in 2019 is a signal of this shift.
The next focus area will be very distributed, or 'wide data.' Data formats are becoming
more varied and fragmented, and as a result different types of databases suitable for
various flavors of data have more than doubled.
MyPOV: No human will be able to organize this smorgasbord without tools with
embedded machine learning, deep learning, and NLP.
Sanjay Srivastava, Chief Digital Officer, Genpact:
We’ll see the rise of Digital Ethics Officers, who will be responsible for implementing
ethical frameworks to make decisions. This includes security, bias, intended use, and built-
in governance
MyPOV: This is the only comment that mentions ethics; otherwise, it has nothing to do
with the topic. I just wanted to point out fifty-nine out of sixty prognosticators
overlooked the one thing is going to be red-hot in 2021.
Yaffa Cohen-Ifrah, CMO and Head of Corporate Communications, Sapiens:
AI enables insurers to better utilize the troves of data at their disposal to benefit from vital
client insights that maximize their services and products. This results in satisfied customers
and a more efficient business.
MyPOV: Actually, it depends on which type of insurer. Long-tail risks, like life insurance,
have data in aging applications, with records forty, fifty or more years old. Utilizing these
“troves of data” is very difficult, and ripe for solutions using AI, rationalizing a difficult
mix of semantics.
Sanjay Jupudi, President, Qentelli:
2021 will see more focus on explainable AI, to reduce any bias in the predictions. Data
scientists will become an integral part of the product teams and work closely with them to
create a data-first approach to app development, instead of focusing on making sense of
data generated by apps
4. MyPOV: I don’t understand what he means by “data-first” exactly, but I think it alludes
to data pipelines informed by AI, so good.
Laurent Bride, CTO, and COO, Talend:
Whether being used to automate repetitive tasks (data prep, etc.) or connecting pipelines
through contextual information from you and your peers, AI will begin to infiltrate all
areas of business functions.
MyPOV: I don’t believe anyone else mentioned pipelines. The pharmaceutical company
GSK demonstrates in public presentations that the consolidation of all clinical trial data
operates with more 10,000 pipelines, all orchestrated by StreamSets, but I have not
heard explicitly that StreamSets is using AI in their product, but I suspect they do.
Carl Vause, CEO, Soft Robotics:
With respect to Artificial Intelligence, just because data exists within an organization
doesn’t mean that data is in a usable, transferable format. 2021 is the year that businesses
will begin to understand that their data is not AI-ready, rendering their business processes
inefficient, ineffective or inaccurate.
MyPOV: Not mentioned is the need to get that data usable - demanding AI solutions.
Notably, a few examples of technology providers that are infusing their DataOps and/or
information integration offerings with AI are:
Informatica: Introduced three years ago, Informatica announced a product called
CLAIRE, which purported to be a suite of AI capabilities infused in its extensive product
platform. It took another to years for me to understand more or less what they were
doing. CLAIRE is now a standalone product with devoted engineers, marketing, and
management, which provides AI (ML, various types of Neural Nets, NLP) to support the
entire product platform.
UnifiSofware (now part of Boomi): Unifi OneMind AI technology underlies the
functionality from data prep and data catalog recommendations to the discovery of
similar datasets, to Natural Language Query support. Based on a Knowledge Graph, it
employs Recurrent Convolutional Neural Network, Hidden Markov Model, and Gene
Sequencing algorithms.
5. Trifacta describes itself as a “data wrangler” (which so far is not an official, technical
term, but maybe watch this space). It uses a combination of machine learning with
human nudging to query and organize data to produce various insights.
Paxata (now part of DataRobot) Paxata has AI-like capabilities though not explicitly
described as an an AI engine. It possesses the ability to auto-discover dependent data
preparation projects and data sets and automatically creates multi-project data flows. It
goes without saying that DataRobot, a company conceived as an AI company, will add
AI to Paxata’s data integration and catalog capabilities.
Tamr: There are two main places where Tamr uses machine learning: entity
consolidation (deduplification) and entity classification. An interesting aspect of this is
that Tamr uses reinforcement learning when there isn’t sufficient training data to build a
model initially. Entity classification (record classification) is a multi-step process, with ML
parts.
My take
The bulk of conversations about AI involve people, applications, and automation – the
end of the application train. Opportunities for AI to energize the invisible parts of that
train are maturing and will provide a boost to the AI success (failure) experience.
Maye a little tangential to my suggestion, Hackernoon opines:
When AI is applied to how we develop applications, it will transform the way we used to
manage the infrastructure. AIOps will replace DevOps, and it will enable your IT
department staff to conduct precise root cause analysis. Additionally, it will make it easy
for you to find useful insights and patterns from massive data set in no time. Large scale
enterprises and cloud vendors will benefit from the convergence of DevOps with AI.
I’d have to ask Irfan Ahmed Khan what happened to DataOps, which seems to have
come and gone in no time, but I suppose AIOps is just smarter DataOps.