Odsc machine-learning-guide-v1

Machine Learning
Guide
20
The Open Data
Community’s Top
Resources

2018 was another year of intense growth - and attention - for the fields
of machine learning and artificial intelligence. As technologies and
techniques improve across industry and academia, this space draws
more investment, but also more scrutiny.
As a result, in 2018 we saw the community of practitioners paying more
attention to the impacts of their work and research on the wider world.
All corners of the field saw a focus on technical approaches to resolv-
ing bias, moving away from operational black boxes, and taking up the
ideas of transparency and explainability.
Now is the time to ask and answer these questions. The power of the
tools and applications built from the research in the field is clear; we
must now consider the world we wish to shape using these tools. They
are excellent magnifying glasses, capable of revealing intricate patterns
in the information all around us. They have the ability to help us bet-
ter understand the world that we have, and more effectively build the
world that we want. But like all tools, we must handle them carefully
or we risk harming ourselves and others. We risk further reinforcing
the walls we find in our society today built during a less informed and
analytical era.
The only way to combat this is to widen the conversation, to equip
anyone who wants to learn with an understanding of the fundamental
principles and technical knowledge of the field. Only by this sharing of
knowledge, this open exchange of ideas, will we be able to make sure
these tools truly benefit everyone. The work that was done in 2018 and
that will be done in 2019 will define not just the machine learning com-
munity, but how the world uses these technologies in our day-to-day
lives.
In this report, you will find the 10 most popular talks from the 2018
Open Data Science Conferences and the 10 most popular blogs from
OpenDataScience.com in 2018. I hope you take the information and
inspiration herein and use it to further your journey into machine
learning. Please share it widely with anyone you know who wants to be
a part of the community of data scientists and engineers that will help
shape the future of data science and machine learning.
Katherine Gorman
Executive Producer and Co-host of Talking Machines &
Executive Producer, Collective Next
Contents
4
top blogs
8
from the
experts
9editor’s note6top
ODSC talks
10what’s
next
FOREWORD

Top Blogs | Page 4
TOP OPEN DATA
SCIENCE BLOGS
Machine Learning Anthology | 2018
10
Client-side Web Development and Machine Learning,
Caspar Wylie
See how and why machine learning and web development are beginning to collaborate
rather successfully, such as through the collaboration with JavaScript.
Read it here.
In 2018 alone, we published nearly 400 articles
on data science. Machine learning was a common
and well-received topic in our community. Here
are the 10 most-read machine learning blogs.
Machine Learning for Beginners — a How-to
Guide, Spencer Norris
Check out this tutorial to get a general outline of how different popular
machine learning algorithms work, plus some code recipes you can use
if you want to experiment.
Read it here.
How to Define a Machine Learning Problem Like a
Detective, Spencer Norris
What steps do you need to take to identify the best machine learning
problems to ask? Learn how to define a machine learning problem, but
through the eyes of a detective.
Read it here.
Efficient, Simplistic Training Pipelines for GANs in
the Cloud with Paperspace, Spencer Norris
Generative adversarial networks are making waves in the world of machine
learning. Learn about a package that wraps PyTorch implementations for ten
different types of GANs in an easy-to-use interface.
Read it here.
Crash Course: Pool-Based Sampling in Active
Learning, Spencer Norris
Active learning is a class of machine learning problems where labeled data
isn’t available for supervised algorithms. Read more about one of the most
common active learning problems in the field: pool-based sampling.
Read it here.
Gain insights into the three primary types of gradient descent, the most commonly used
optimization method deployed in machine learning and deep learning algorithms.
Read it here.
Understanding the Three Primary Types of Gradient
Descent, Daniel Gutierrez
This recap of the reinforcement learning workshop from ODSC London 2018 will give you
the key takeaways you need to implement the framework yourself.
Read it here.
An Introduction to Reinforcement Learning Concepts,
Diego Arenas
Learn the basics behind gradient boosting for statistical learning and the popular XG-
Boost implementation — the most recent evolution of gradient boosting. See more
about how XGBoost became king of the hill for data
scientists desiring accurate predictions.
Read it here.
Gradient Boosting and XGBoost, Daniel Gutierrez
Overfitting is an interesting problem with fascinating solutions embedded in
the very structure of the algorithms you’re using. Here, we break down what
overfitting is and how we can provide an antidote to it in the real world.
Read it here.
What Overfitting is and How to Fix It,
Spencer Norris
Unsupervised machine learning can be very powerful in its own right, and
clustering is by far the most common expression of this group of problems.
Check out this quick rundown of three of the most popular clustering
approaches and what situations each is best suited to.
Read it here.
Three Popular Clustering Methods and When to
Use Each, Spencer Norris

Top Talks | Page 6 Machine Learning Anthology | 2018
Out of 300+ talks from ODSC conferences in 2018, here
are the 10 top-rated sessions covering machine learning.
TOP TALKS FROM ODSC
CONFERENCES
10
Andreas Mueller
Introduction to Machine Learning &
Intermediate Machine Learning with scikit-learn
Start with the basics of machine learning before you dive into the tools,
applications, and advanced functions!
Watch part one here.
Watch part two here.
Bernard Marr
Artificial Intelligence and Machine Learning in Practice
This ODSC Europe keynote features Bernard Marr, internationally best-selling
business author and strategic advisor to companies and governments. Gain
insight into applied AI and machine learning for your organization.
Watch here.
Kirk Borne
A Tour of Machine Learning Algorithms: The Usual
Suspects in Some Unusual Applications
Walk through use cases for several machine learning algorithms to see how to
implement them in interesting and unexpected ways. Topics include predictive
modeling and anomaly discovery.
Watch here.
Jon Peck
OS for AI: How Serverless Computing Enables the
Next Gen of ML
This talk examines the need for and implementations of an “Operating System
for AI” — a common interface to use and combine algorithms and a general ar-
chitecture for serverless machine learning that is discoverable, versioned, scal-
able, and shareable.
Watch here.
Jared Lander
Machine Learning in R Parts I & II
This two-part course focuses on the available methods for implementing
machine learning algorithms in R and examines some of the underlying theory
behind the curtain.
Watch part one here.
Watch part two here.
Randy Olson
The Past, Present, and Future of Automated ML
In this talk, Randy Olson draws from his AutoML research to discuss the benefits
of AutoML and highlights some promising future directions of the field.
Watch here.
Jeffrey Yau
Multivariate Time Series Forecasting Using Statistical
and ML Models
This lecture discusses the formulation Vector Autoregressive (VAR) Models, one
of the most important classes of multivariate time series statistical models, and
neural network-based techniques.
Watch here.
Yuriy Guts
Target Leakage in Machine Learning
Target leakage is one of the most difficult problems in developing real-world
machine learning models. Hear more about real-life examples of data leakage at
different stages of the data science project lifecycle, and discuss various counter-
measures and best practices for model validation.
Watch here.
Crissman Loomis
Machine Learning in Chainer Python
When choosing a framework for working on neural networks, it is important to
choose a framework that is flexible and allows for customization. Chainer is a
neural network framework written almost entirely in Python. Gain the knowledge
you need to get started with Chainer, from data formatting and augmentation to
reinforcement learning and more.
Watch here.
Keith Santarelli & Eric Schles
Making the Most of Your Time Series: Signal
Processing for Machine Learning applications
Learn more about common machine learning signal processing techniques on
time series data. This talk discusses a number of common tools in signal
processing and shows how they can be implemented in various Python
packages, including tools to remove “noise” to find underlying trends.
Watch here.

From the Experts | Page 8 Machine Learning Anthology | 2018
FROM THE EXPERTS:
My hope is that new practitioners will start to have a clearer idea of how to break into the field. Currently
there is no standardization across different types of data science roles and titles. In addition, since the
field is multidisciplinary, those trying to learn have a difficult time prioritizing what they should learn and
don’t feel confident in starting to apply to jobs because the learning could theoretically go on forever. My
message to those people is to start applying so that they can receive feedback from the job market to
help them in their pursuit.
Kristen Kehrer
Founder, Data Moves Me LLC
Almost all ML models are based on statistics. No one, including your customers and employees, likes to
be treated as a statistic, which is what your ML processes will tend to do. Plan to mitigate that from the
beginning.
Adam Breindel
Independent Consultant and Instructor, ML/AI and Data Engineering
2018 was the year that machine learning, and by extension deep learning, have taken many industries by
storm. Machine learning has touched just about every problem domain, bringing accelerated diversity in
the types of problems being solved.
Daniel Gutierrez
Data Science Consultant
One common thread I’ve seen
at every stop in my career is the
ever-increasing focus on using data
to make informed decisions.
Now, perched in a place to see
day-to-day developments in the
field, I’ve learned so much about
various frameworks and languages,
the unique ways that different
industries are using AI, and of
course, how strong the open data
science community is in sharing its
knowledge.
There are definitely a few topics
in particular that I’m paying extra
attention to heading into 2019.
Apache Spark’s MLlib is a frequent
topic of conversation when I speak
with data science experts, largely
thanks to MLib’s remarkable scaling
ability in implementing multiple
ML algorithms. At this rate, I can
see the scikit-learn library for
Python becoming a common job
requirement for ML experts — it’s
appearing in social feeds, featured
blogs, and job listings that cross
my desk more and more. The
TensorFlow framework is popping
up in countless ML tutorials, so
I’m hoping people stay active
with it, even with newer libraries
releasing frequently. This should be
a no-brainer, but start looking into
automated machine learning if you
haven’t already, as it will help to
increase the pace in which you can
create more complex ML processes
and algorithms. Let the machines
work for you!
Data scientists need to make
2019 a strong year to address the
common “black box” problem. As
Daniel Gutierrez, a data science
consultant and frequent author for
OpenDataScience.com put it, “I
hope the trend of ‘explainability’ or
‘interpretability’ of AI will continue to
be seen as critical to the continued
acceptance of the technology.”
Since machine learning may open
up more chances for vulnerabilities
to appear, developers need to be
careful with their applications and
to avoid leaving entryways for
malicious attacks, and to be well-
prepared to defend against potential
adversarial attacks. It’s better to
spend the time building up your
defenses rather than going through
the headache of resolving issues
from hackers and malware.
Whether you use the information
provided in this anthology to learn
a new tool, framework, or you’re
now more invested in security
and defense, I hope that all of the
videos, blogs, and insights from
the experts in this anthology prove
useful for you. Whether you’re
new to data science and machine
learning or a seasoned vet, already
working in applied data science
or academia, or you’re just a fan
of new technology, the open data
science community is a good place
to share knowledge and expand
understanding of the most exciting
topics in applied data science
Alex Landa, ODSC content manager
Letter from
the Editor
“
To be successful, machine learning
adopters must enable a flexible
infrastructure and be agile. It is critical
to experiment and accept failure in
exchange for quicker learning and less
money spent going after projects that
aren’t successful.
Kate Strachnyi
Data Visualization Specialist; Host of Humans of Data Science;
Author of “The Disruptors: Data Science Leaders and Journey to Data Scientist”
Avoid learning everything at once. Content from the ML/data science community can make you feel like
you’ve got to know everything, like yesterday. Resist this. Don’t pick up a textbook and read it cover to
cover. If you’re interested in getting started, spend time thinking about one question from one subject area
that you’re interested in. Find some data, clean it, explore it, and use it to answer your questions. You
won’t know how. Research what you need to know and figure it out, step by step.
Brandon Dey
Technical Team Lead, Data Science Global Marketing, Fisher Investments
What do leading data scientists think
about the state of machine learning?
Open Data Science
HIGHLIGHTS
Conferences in
5countries
blog on
@ODSC
Medium:
Three Popular
Clustering Methods and
When to Use Each,
Spencer Norris
#1
70,000subscriptions to our
weekly newsletter
Learn.AI
course with the
HIGHEST
ENROLLMENT:
Machine Learning and
NLP for Detecting
Fake News
Watch here

There are many ways
you can engage with
the Open Data Science
Community today!
BE A PART OF THE
ODSC COMMUNITY
ODSC Events
East 2019: April 30-May 3
India 2019: August 7-10
West 2019: October 29-November 1
Europe 2019: November 19-22
Meetups
We hold meetups in 37 cities around
the world, designed to convene data
scientists for education, networking,
and even a little fun. See upcoming
events here.
Weekly Newsletter
Don’t miss any future articles on
data science and machine learning!
Sign up for our weekly newsletter
and get tutorials, insights, and the
latest news sent to you directly.
Webinars
We offer free webinars several times
a month, covering a variety of topics.
Follow this page to learn more about
upcoming webinars.
Become a Speaker
Are you a technical or business
expert in the world of data science
and AI? Consider speaking at one of
our events! Each event has its own
speaker submission page:
ODSC East 2019
ODSC Europe 2019
And more coming soon!
Partner with ODSC
We also offer opportunities for
partnerships! Have your product,
service, or research seen by
thousands of data scientists at an
event. Learn more here.

Odsc machine-learning-guide-v1

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Odsc machine-learning-guide-v1

Similaire à Odsc machine-learning-guide-v1 (20)

Dernier

Dernier (20)

Odsc machine-learning-guide-v1