This session was recorded in San Francisco on February 5th, 2019 and can be viewed at: https://youtu.be/l1d9he0ARPQ
Bio: Paul C. Zikopoulos, is the VP of Cognitive BigData Systems at IBM. He’s an award winning writer and speaker who has been consulted on the topic of BigData by the popular TV show “60 Minutes,” advises various universities on their graduate analytics programs, and named to over a dozen “Experts to Follow” lists in social media. You’ll also find Paul taking a very active role around Women in Technology (including a seated board member for Women 2.0 - a global brand for women in tech and entrepreneurship that works to close the gender gaps of tech companies). Paul has written 19 books and over 360 articles on data. He doesn’t think NoSQL is something you put on a resume if you don’t have SQL skills and he knows JSON isn’t a person in his department. Ultimately, Paul is trying to figure out the world according to Chloë—his daughter, whom he notes didn’t come with a handbook and is more complex than the topic of BigData itself, but more fun too. The rest of the bio? It would be BLAH BLAH, BLAH, so find him on Twitter @BigData_paulz
2. agricultural
revolution
lasted about 8000 years
industrial
revolution
then ... 120 years later
this was
pretty
cool
90 years after this
we put this
dude here
and 22 years later
changed
things
forever
9 years later
3. high visible value
investment is obvious
well understood
gold (high value per byte data)
can be near invisible within the
dirt (low value per byte data)
Schema First Schema {Need, Read, Never}
4. your data is like a gym membership...
it has no value unless you use it.
NETWORK EFFECTS
80% OF THE WORLD’S
DATA CAN’T BE GOOGLEDECONOMIES OF SCALE
9. 1st Epoch
at a cand outsors, whele havise took
i with holdiss, that he has, that,
intener.
Her arathishess of has seated.
”it, as teen, a seremest as inspant at
vind. it wolks.
10.
11. today AI is like this car’s owner…
those that participate
are the privileged few
12. make AI a team sport,
inviting everyone to participate
and democratize it for the many…
13. requires different sets
of skills
FINE-TUNE & DEPLOY
“rinse & repeat”
MAINTAIN
ACCURACY &
EXPLAIN
THYSELF
iterate
faster and
do it againassisted
parameter
selection and
tuning
~80% of an AI project’s
is time spent here
DATA
PREPARATION
up and
running
over a
quick
lunch time spent
drops from
80% to 30%
extremely long
training times
curtailing broader
proliferation
BUILD, TRAIN,
OPTIMIZE
9 days
to train a
model
becomes
4 hours
weeks to months
UP & RUNNING imagine if everyday
users could contribute
business domain expertise,
help with data preparation,
and even build initial
models so data science
teams could focus on fine
tuning the models
data science skill needed as
the hard stuff happens here
… help with quick detection
of sub-optimal hyper
parameter or feature
selection, ‘what if’
exploration ...this is where
you want data science
teams to spend their time
15. Titanic
Kaggle
Competition
34 min of
data analysis
33 min of automated feature
engineering & model buildings
95.7%
1hr 7min
4 hours data
preparation
4 hours feature
engineering
1hr code dev and
data exploration
1hr creating 5 models
to find best solution
95.5%
10 hours
16. “Double the propensity for our banking
customers to accept an offer
pinpoint credit
and default risks
with greater
accuracy than
17. that in
turn requires
much higher
computation
to train them
as models go deeper
or get boosted, they
provide a dramatic
increase in accuracy.
18. talent | time | trust
9days
Shape
Attenuation
Boundary
morphology
54x
What will you do?
Create more models?
More accurate models?
BOTH??
4hours
4hours
4hours
faster
data ingest
2.x
faster
feature engineering
1.5x
faster
machine learning
30x
train more | build more | know more
19. talent | time | trust train more | build more | know more
forethought
open architecture
afterthought
closed architecture