SlideShare a Scribd company logo
1 of 24
Download to read offline
The Snake in Your Data
How Python is Used Today by Data Science Teams
Matt Price
Principal Research Engineer
2019.09.24
2SLIDE
Agenda
● About ZeroFOX
● The Data Science Lifecycle
● Data Science at ZeroFOX
● Data Science Tools
● Prodigy Demo
● Q & A
3
About ZeroFOX
It’s a Digital World. Engage Securely.
Our Mission
ZeroFOX exists to protect digital engagement
Our Story
ZeroFOX was founded with the goal of creating
customer champions
With global reach and operation centers in the
United States, United Kingdom, Chile and India,
ZeroFOX provides best in class software, support
and services to organizations of all sizes.
Most Recognized. Most Awarded.
4
Social and Digital Channels
Your Organization
Domains | Executives | VIP’s | Employees | Brands | Locations
AI-Driven Analysis
Automated Analysis | Alerts | Reporting
Human-Driven Analysis
ZeroFOX OnWatch™ | ZeroFOX Alpha Team
Remediation
Takedown-as-a-Service™
Complete Digital Visibility & Protection
The ZeroFOX
Platform
Identify
Risks on social and
digital platforms
Protect
What matters to
your organization
Remediate
Threats to your brand
and business
Protection
Identification
Analysis
Remediation
5SLIDE
Agenda
● About ZeroFOX
● The Data Science Lifecycle
● Data Science at ZeroFOX
● Data Science Tools
● Prodigy Demo
● Q & A
6SLIDE
The Data Science Lifecycle
● Each stage builds on subsequent
stages
● Most effort is around data
collection efforts
● Iterative process
● Python is used throughout the
entire workflow
7SLIDE
Agenda
● About ZeroFOX
● The Data Science Lifecycle
● Data Science at ZeroFOX
● Data Science Tools
● Prodigy Demo
● Q & A
8SLIDE
ZeroFOX AI
Machine
Learning
Deep
Learning
Artificial Intelligence
NLP CV
Artificial Intelligence (AI)
The simulation of intelligent behavior
in machines
AI Techniques
Machine Learning (ML)
Study and use of algorithms and
statistical models that learn from data
Deep Learning
A technique within ML that uses
“large” Neural Networks
9SLIDE
ZeroFOX Data Science Architecture
● Tied into production data ingest
● Feedback loop from analysts
● Labeling is open to the entire
company
● Architecture is optimized for quick
iterations
10SLIDE
Agenda
● About ZeroFOX
● The Data Science Lifecycle
● Data Science at ZeroFOX
● Data Science Tools
● Prodigy Demo
● Q & A
11SLIDE
Python Tooling Categories
Data manipulation
Data structures and data transformations
Data visualization
Understanding what the data is
Modeling
Teaching machines to learn the underlying patterns in the data
Deployment
Integrating with the platform and making models available to the end customer
12SLIDE
Data Manipulation Tools
● Multi-dimensional arrays and matrices
● High level mathematical functions
● Fast, vectorized operations
● Multi-dimensional matrices wrapped in DataFrames
● Time series logic and operations
● Data analysis functions and tools
● CV and ML library
● Fast operations - focus on real-time video
● Low level operations
● PIL fork
● General image processing library
● High level operations
13SLIDE
ZeroFOX Data Science Architecture
NumPy
OpenCV
Pillow
NumPy
OpenCV
Pillow
NumPy
OpenCV
Pillow
NumPy
OpenCV
Pillow
NumPy
OpenCV
Pillow
Pandas
14SLIDE
Data Visualization Tools
● Interactive computing via notebooks
● Kernels run code and return output
● Focus on scientific computing
● Plotting library
● Low level plotting interface
● Compatible with a number of GUI toolkits
● Built on top of matplotlib
● High level plotting interface
● Categorical variable support
● Framework for building data visualization apps
● Open source and enterprise versions
● Interactive charts
15SLIDE
ZeroFOX Data Science Architecture
Jupyter
Matplotlib
Seaborn
Plotly
Matplotlib
Seaborn
Plotly
Jupyter
Matplotlib
Seaborn
Plotly
16SLIDE
Modeling Tools
● Solves the labeling problem
● Enables active learning
● Programmatic workflow definitions
● Extremely flexible
prodigy
● Machine learning and data analysis library
● Built on top of NumPy, SciPy, LIBSVM, and matplotlib
● Number of various scikits available
● High level deep learning library
● Serves as an interface to lower level backends
● Tensorflow supplies low level building blocks
● Pre-defined models
● Production-focused NLP framework
● Deep learning models powered by Thinc
● Define pipeline which outputs annotated
documents
17SLIDE
ZeroFOX Data Science Architecture
Prodigy
Prodigy
Scikit-learn
Prodigy
Keras + Tensorflow
spaCy
Scikit-learn
Keras + Tensorflow
spaCy
Scikit-learn
18SLIDE
Deployment
● Web server and framework focused on
high performance
● Secondarily focused on ease of use
● Flask-like framework API
● Decent extension ecosystem
● Python 3.6+ (heavily relies on async/await)
● MVC web framework
● Focused on easing development of
database-driven websites
● Large extension ecosystem
● CRUD interface for administrative tasks
19SLIDE
ZeroFOX Data Science Architecture
Sanic
Django
20SLIDE
Agenda
● About ZeroFOX
● The Data Science Lifecycle
● Data Science at ZeroFOX
● Data Science Tools
● Prodigy Demo
● Q & A
21SLIDE
Prodigy
● Created by Explosion.AI (Matthew Honnibal and Ines Montani)
○ Same company that develops spaCy and Thinc
● Designed to make annotating data simple but can do much more
● Is a tool (Python package) that you purchase
● Why Prodigy?
○ Solves the “hardest” problem in applied data science
○ Can programmatically define entire model workflow in a recipe
○ Out of the box support for spaCy
○ Supports computer vision annotation
○ Exports trained models as Python packages
22Slide
/
Prodigy Live Demo
23SLIDE
Agenda
● About ZeroFOX
● The Data Science Lifecycle
● Data Science at ZeroFOX
● Data Science Tools
● Prodigy Demo
● Q & A
24Slide
/
Questions?

More Related Content

What's hot

PLNOG 13: B. van der Sloot, S. Abdel-Hafez: Running a 2 Tbps global IP networ...
PLNOG 13: B. van der Sloot, S. Abdel-Hafez: Running a 2 Tbps global IP networ...PLNOG 13: B. van der Sloot, S. Abdel-Hafez: Running a 2 Tbps global IP networ...
PLNOG 13: B. van der Sloot, S. Abdel-Hafez: Running a 2 Tbps global IP networ...
PROIDEA
 

What's hot (9)

FIWARE Global Summit - DRACO: Managing the Stream of Context Information Hist...
FIWARE Global Summit - DRACO: Managing the Stream of Context Information Hist...FIWARE Global Summit - DRACO: Managing the Stream of Context Information Hist...
FIWARE Global Summit - DRACO: Managing the Stream of Context Information Hist...
 
SQL o NoSQL? Progettare applicazioni 'Big Data-ready' attraverso l'utilizzo d...
SQL o NoSQL? Progettare applicazioni 'Big Data-ready' attraverso l'utilizzo d...SQL o NoSQL? Progettare applicazioni 'Big Data-ready' attraverso l'utilizzo d...
SQL o NoSQL? Progettare applicazioni 'Big Data-ready' attraverso l'utilizzo d...
 
PLNOG 13: B. van der Sloot, S. Abdel-Hafez: Running a 2 Tbps global IP networ...
PLNOG 13: B. van der Sloot, S. Abdel-Hafez: Running a 2 Tbps global IP networ...PLNOG 13: B. van der Sloot, S. Abdel-Hafez: Running a 2 Tbps global IP networ...
PLNOG 13: B. van der Sloot, S. Abdel-Hafez: Running a 2 Tbps global IP networ...
 
FIWARE Global Summit - Using ML/AI Techniques with FIWARE and Connected IoT D...
FIWARE Global Summit - Using ML/AI Techniques with FIWARE and Connected IoT D...FIWARE Global Summit - Using ML/AI Techniques with FIWARE and Connected IoT D...
FIWARE Global Summit - Using ML/AI Techniques with FIWARE and Connected IoT D...
 
ORACLE ESPM Blockchain - Parte 03 - Discussão pós-jogo
ORACLE ESPM Blockchain - Parte 03 - Discussão pós-jogoORACLE ESPM Blockchain - Parte 03 - Discussão pós-jogo
ORACLE ESPM Blockchain - Parte 03 - Discussão pós-jogo
 
FIWARE Global Summit - Keyrock: Protecting Microservices
FIWARE Global Summit - Keyrock: Protecting MicroservicesFIWARE Global Summit - Keyrock: Protecting Microservices
FIWARE Global Summit - Keyrock: Protecting Microservices
 
Meetup code security
Meetup code securityMeetup code security
Meetup code security
 
Technology behind-real-time-log-analytics
Technology behind-real-time-log-analytics Technology behind-real-time-log-analytics
Technology behind-real-time-log-analytics
 
Druid meetup 2018-03-13
Druid meetup 2018-03-13Druid meetup 2018-03-13
Druid meetup 2018-03-13
 

Similar to Python meetup

The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
Big Brother for Enterprises - The WSO2 Advantage
Big Brother for Enterprises - The WSO2 AdvantageBig Brother for Enterprises - The WSO2 Advantage
Big Brother for Enterprises - The WSO2 Advantage
WSO2
 
Nordics Edition - The Neo4j Graph Data Platform Today & Tomorrow
Nordics Edition - The Neo4j Graph Data Platform Today & TomorrowNordics Edition - The Neo4j Graph Data Platform Today & Tomorrow
Nordics Edition - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
RedisGraph A Low Latency Graph DB: Pieter Cailliau
RedisGraph A Low Latency Graph DB: Pieter CailliauRedisGraph A Low Latency Graph DB: Pieter Cailliau
RedisGraph A Low Latency Graph DB: Pieter Cailliau
Redis Labs
 

Similar to Python meetup (20)

Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers ProgramSession 8 - Creating Data Processing Services | Train the Trainers Program
Session 8 - Creating Data Processing Services | Train the Trainers Program
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...Data Science in Production: Technologies That Drive Adoption of Data Science ...
Data Science in Production: Technologies That Drive Adoption of Data Science ...
 
Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020
 
The path to success with graph database and graph data science_ Neo4j GraphSu...
The path to success with graph database and graph data science_ Neo4j GraphSu...The path to success with graph database and graph data science_ Neo4j GraphSu...
The path to success with graph database and graph data science_ Neo4j GraphSu...
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
The Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdfThe Neo4j Data Platform for Today & Tomorrow.pdf
The Neo4j Data Platform for Today & Tomorrow.pdf
 
From open data to API-driven business
From open data to API-driven businessFrom open data to API-driven business
From open data to API-driven business
 
Big Brother for Enterprises - The WSO2 Advantage
Big Brother for Enterprises - The WSO2 AdvantageBig Brother for Enterprises - The WSO2 Advantage
Big Brother for Enterprises - The WSO2 Advantage
 
CHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshopCHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshop
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain Optimization
 
Nordics Edition - The Neo4j Graph Data Platform Today & Tomorrow
Nordics Edition - The Neo4j Graph Data Platform Today & TomorrowNordics Edition - The Neo4j Graph Data Platform Today & Tomorrow
Nordics Edition - The Neo4j Graph Data Platform Today & Tomorrow
 
Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 Overview
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & TomorrowAmsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
 
An overview of data and web-application development with Python
An overview of data and web-application development with PythonAn overview of data and web-application development with Python
An overview of data and web-application development with Python
 
DDDP 2019 - Brown to Green
DDDP 2019  - Brown to GreenDDDP 2019  - Brown to Green
DDDP 2019 - Brown to Green
 
Enterprise Application Development in Python.pptx
Enterprise Application Development in Python.pptxEnterprise Application Development in Python.pptx
Enterprise Application Development in Python.pptx
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
RedisGraph A Low Latency Graph DB: Pieter Cailliau
RedisGraph A Low Latency Graph DB: Pieter CailliauRedisGraph A Low Latency Graph DB: Pieter Cailliau
RedisGraph A Low Latency Graph DB: Pieter Cailliau
 

More from Jeffrey Clark

Zpugdc2007 101105081808-phpapp01
Zpugdc2007 101105081808-phpapp01Zpugdc2007 101105081808-phpapp01
Zpugdc2007 101105081808-phpapp01
Jeffrey Clark
 
Zpugdc deformpresentation-100709203803-phpapp01
Zpugdc deformpresentation-100709203803-phpapp01Zpugdc deformpresentation-100709203803-phpapp01
Zpugdc deformpresentation-100709203803-phpapp01
Jeffrey Clark
 
Zpugdccherry 101105081729-phpapp01
Zpugdccherry 101105081729-phpapp01Zpugdccherry 101105081729-phpapp01
Zpugdccherry 101105081729-phpapp01
Jeffrey Clark
 

More from Jeffrey Clark (20)

Python memory management_v2
Python memory management_v2Python memory management_v2
Python memory management_v2
 
Jwt with flask slide deck - alan swenson
Jwt with flask   slide deck - alan swensonJwt with flask   slide deck - alan swenson
Jwt with flask slide deck - alan swenson
 
Genericmeetupslides 110607190400-phpapp02
Genericmeetupslides 110607190400-phpapp02Genericmeetupslides 110607190400-phpapp02
Genericmeetupslides 110607190400-phpapp02
 
Pyramiddcpythonfeb2013 131006105131-phpapp02
Pyramiddcpythonfeb2013 131006105131-phpapp02Pyramiddcpythonfeb2013 131006105131-phpapp02
Pyramiddcpythonfeb2013 131006105131-phpapp02
 
Dc python meetup
Dc python meetupDc python meetup
Dc python meetup
 
Zpugdc2007 101105081808-phpapp01
Zpugdc2007 101105081808-phpapp01Zpugdc2007 101105081808-phpapp01
Zpugdc2007 101105081808-phpapp01
 
Zpugdc deformpresentation-100709203803-phpapp01
Zpugdc deformpresentation-100709203803-phpapp01Zpugdc deformpresentation-100709203803-phpapp01
Zpugdc deformpresentation-100709203803-phpapp01
 
Zpugdccherry 101105081729-phpapp01
Zpugdccherry 101105081729-phpapp01Zpugdccherry 101105081729-phpapp01
Zpugdccherry 101105081729-phpapp01
 
Tornado
TornadoTornado
Tornado
 
Science To Bfg
Science To BfgScience To Bfg
Science To Bfg
 
The PSF and You
The PSF and YouThe PSF and You
The PSF and You
 
Using Grok to Walk Like a Duck - Brandon Craig Rhodes
Using Grok to Walk Like a Duck - Brandon Craig RhodesUsing Grok to Walk Like a Duck - Brandon Craig Rhodes
Using Grok to Walk Like a Duck - Brandon Craig Rhodes
 
What Makes A Great Dev Team - Mike Robinson
What Makes A Great Dev Team - Mike RobinsonWhat Makes A Great Dev Team - Mike Robinson
What Makes A Great Dev Team - Mike Robinson
 
What Makes A Great Dev Team - Mike Robinson
What Makes A Great Dev Team - Mike RobinsonWhat Makes A Great Dev Team - Mike Robinson
What Makes A Great Dev Team - Mike Robinson
 
Plone I18n Tutorial - Hanno Schlichting
Plone I18n Tutorial - Hanno SchlichtingPlone I18n Tutorial - Hanno Schlichting
Plone I18n Tutorial - Hanno Schlichting
 
Real World Intranets - Joel Burton
Real World Intranets - Joel BurtonReal World Intranets - Joel Burton
Real World Intranets - Joel Burton
 
State Of Zope 3 - Stephan Richter
State Of Zope 3 - Stephan RichterState Of Zope 3 - Stephan Richter
State Of Zope 3 - Stephan Richter
 
KSS Techniques - Joel Burton
KSS Techniques - Joel BurtonKSS Techniques - Joel Burton
KSS Techniques - Joel Burton
 
Zenoss: Buildout
Zenoss: BuildoutZenoss: Buildout
Zenoss: Buildout
 
Opensourceweblion
OpensourceweblionOpensourceweblion
Opensourceweblion
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Python meetup

  • 1. The Snake in Your Data How Python is Used Today by Data Science Teams Matt Price Principal Research Engineer 2019.09.24
  • 2. 2SLIDE Agenda ● About ZeroFOX ● The Data Science Lifecycle ● Data Science at ZeroFOX ● Data Science Tools ● Prodigy Demo ● Q & A
  • 3. 3 About ZeroFOX It’s a Digital World. Engage Securely. Our Mission ZeroFOX exists to protect digital engagement Our Story ZeroFOX was founded with the goal of creating customer champions With global reach and operation centers in the United States, United Kingdom, Chile and India, ZeroFOX provides best in class software, support and services to organizations of all sizes. Most Recognized. Most Awarded.
  • 4. 4 Social and Digital Channels Your Organization Domains | Executives | VIP’s | Employees | Brands | Locations AI-Driven Analysis Automated Analysis | Alerts | Reporting Human-Driven Analysis ZeroFOX OnWatch™ | ZeroFOX Alpha Team Remediation Takedown-as-a-Service™ Complete Digital Visibility & Protection The ZeroFOX Platform Identify Risks on social and digital platforms Protect What matters to your organization Remediate Threats to your brand and business Protection Identification Analysis Remediation
  • 5. 5SLIDE Agenda ● About ZeroFOX ● The Data Science Lifecycle ● Data Science at ZeroFOX ● Data Science Tools ● Prodigy Demo ● Q & A
  • 6. 6SLIDE The Data Science Lifecycle ● Each stage builds on subsequent stages ● Most effort is around data collection efforts ● Iterative process ● Python is used throughout the entire workflow
  • 7. 7SLIDE Agenda ● About ZeroFOX ● The Data Science Lifecycle ● Data Science at ZeroFOX ● Data Science Tools ● Prodigy Demo ● Q & A
  • 8. 8SLIDE ZeroFOX AI Machine Learning Deep Learning Artificial Intelligence NLP CV Artificial Intelligence (AI) The simulation of intelligent behavior in machines AI Techniques Machine Learning (ML) Study and use of algorithms and statistical models that learn from data Deep Learning A technique within ML that uses “large” Neural Networks
  • 9. 9SLIDE ZeroFOX Data Science Architecture ● Tied into production data ingest ● Feedback loop from analysts ● Labeling is open to the entire company ● Architecture is optimized for quick iterations
  • 10. 10SLIDE Agenda ● About ZeroFOX ● The Data Science Lifecycle ● Data Science at ZeroFOX ● Data Science Tools ● Prodigy Demo ● Q & A
  • 11. 11SLIDE Python Tooling Categories Data manipulation Data structures and data transformations Data visualization Understanding what the data is Modeling Teaching machines to learn the underlying patterns in the data Deployment Integrating with the platform and making models available to the end customer
  • 12. 12SLIDE Data Manipulation Tools ● Multi-dimensional arrays and matrices ● High level mathematical functions ● Fast, vectorized operations ● Multi-dimensional matrices wrapped in DataFrames ● Time series logic and operations ● Data analysis functions and tools ● CV and ML library ● Fast operations - focus on real-time video ● Low level operations ● PIL fork ● General image processing library ● High level operations
  • 13. 13SLIDE ZeroFOX Data Science Architecture NumPy OpenCV Pillow NumPy OpenCV Pillow NumPy OpenCV Pillow NumPy OpenCV Pillow NumPy OpenCV Pillow Pandas
  • 14. 14SLIDE Data Visualization Tools ● Interactive computing via notebooks ● Kernels run code and return output ● Focus on scientific computing ● Plotting library ● Low level plotting interface ● Compatible with a number of GUI toolkits ● Built on top of matplotlib ● High level plotting interface ● Categorical variable support ● Framework for building data visualization apps ● Open source and enterprise versions ● Interactive charts
  • 15. 15SLIDE ZeroFOX Data Science Architecture Jupyter Matplotlib Seaborn Plotly Matplotlib Seaborn Plotly Jupyter Matplotlib Seaborn Plotly
  • 16. 16SLIDE Modeling Tools ● Solves the labeling problem ● Enables active learning ● Programmatic workflow definitions ● Extremely flexible prodigy ● Machine learning and data analysis library ● Built on top of NumPy, SciPy, LIBSVM, and matplotlib ● Number of various scikits available ● High level deep learning library ● Serves as an interface to lower level backends ● Tensorflow supplies low level building blocks ● Pre-defined models ● Production-focused NLP framework ● Deep learning models powered by Thinc ● Define pipeline which outputs annotated documents
  • 17. 17SLIDE ZeroFOX Data Science Architecture Prodigy Prodigy Scikit-learn Prodigy Keras + Tensorflow spaCy Scikit-learn Keras + Tensorflow spaCy Scikit-learn
  • 18. 18SLIDE Deployment ● Web server and framework focused on high performance ● Secondarily focused on ease of use ● Flask-like framework API ● Decent extension ecosystem ● Python 3.6+ (heavily relies on async/await) ● MVC web framework ● Focused on easing development of database-driven websites ● Large extension ecosystem ● CRUD interface for administrative tasks
  • 19. 19SLIDE ZeroFOX Data Science Architecture Sanic Django
  • 20. 20SLIDE Agenda ● About ZeroFOX ● The Data Science Lifecycle ● Data Science at ZeroFOX ● Data Science Tools ● Prodigy Demo ● Q & A
  • 21. 21SLIDE Prodigy ● Created by Explosion.AI (Matthew Honnibal and Ines Montani) ○ Same company that develops spaCy and Thinc ● Designed to make annotating data simple but can do much more ● Is a tool (Python package) that you purchase ● Why Prodigy? ○ Solves the “hardest” problem in applied data science ○ Can programmatically define entire model workflow in a recipe ○ Out of the box support for spaCy ○ Supports computer vision annotation ○ Exports trained models as Python packages
  • 23. 23SLIDE Agenda ● About ZeroFOX ● The Data Science Lifecycle ● Data Science at ZeroFOX ● Data Science Tools ● Prodigy Demo ● Q & A