SlideShare une entreprise Scribd logo
1  sur  71
Télécharger pour lire hors ligne
Data Science at Scale - The DevOps Approach
DevOps Practices for Data Scientists and Engineers
Mihai Criveti
22nd September 2019
https://github.com/crivetimihai
http://galaxy.ansible.com/crivetimihai
1
1 Data Science Landscape
2 Process and Flow
3 The Data
4 Data Science Toolkit
5 Cloud Computing Solutions
6 The rise of DevOps
7 Reusable Assets and Practices
8 Skills Development
2
Speaker Bio
Mihai Criveti, IBM
Designs and builds multi-cloud
customer solutions for Cloud Native
applications, Big Data analytics and
Machine Learning.
Pursuing a MSc in Data Science at
UCD.
Leads the Cloud Native competency
for IBM Cloud Solutioning.
Passionate about Open Source
software, DevOps and Data Science.
3
1 Data Science Landscape
What is Data Science
Data Science
Multi-disciplinary field that brings together Computer Science, Statistics/Machine Learning, and
Data Analysis to understand and extract insights from ever-increasing amounts of data.
Machine Learning
The science of getting computers to act without being explicitly programmed.
Deep Learning
Family of machine learning methods based on learning data representations, as opposed to
task-specific algorithms. Can be supervised, semi-supervised or unsupervised.
AI
Intelligent machines that work and react like humans.
4
Moving Towards Big Data
Figure 1: Powerful models and big data support Machine Learning 5
Data Scientist Domain
Data Engineering:
• Linux, Cloud, Big Data Platforms.
• Streaming, big data pipelines.
Software Development
• Coding skills, such as Python, R, SQL.
• Development practices: Agile, DevOps,
CI/CD and using GitOps effectively.
Figure 2: Data Scientist Venn Diagram
“While data scientists are recognized for their brilliant algorithms, up to 80% of their time
could be spent collecting, cleaning and organizing data.” 1
1
Forbes: Radical Change Is Coming To Data Science Jobs
6
Data Science Roles
Data Scientist / Analyst:
• Turn raw data into valuable insights that an organization needs in order to grow or compete.
• Analytical data experts with the skill to solve complex problems and the curiosity to explore
what problems need solving.
• They use Data, visualization, machine learning, deep learning, pattern recognition, natural
language processing, analytics.
• Always curious: ”What can we learn from this data? What actions can we take after?
Data Engineer / Data Architect:
• Prepare the “big data” infrastructure to be analysed by Data Scientists.
• Software engineers who design, build, integrate data from various resources, and manage big
data.
7
2 Process and Flow
Often, it’s a manual process
Figure 3: Pen and paper - planning 8
Data Science is awesome
Data Science is OSEMN (pronounced AWESOME!)2
- an interactive process that consists, largely, of
the following steps:
1. Inquire: ask a meaningful question.
2. Obtain: get the required data.
3. Scrub: clean the data.
4. Explore: learn about your data, try stuff.
5. Model: create a couple of models and test them.
6. iNterpret: gain insight from data, present it in a usable form (reports, dashboards,
applications, etc).
2
O’Reilly - “Data Science at the Command Line, Facing the Future with Time-Tested Tools, Jeroen Janssens”
9
CRISP-DM - Cross-industry standard process for data mining
CRISP-DM is a widely used, general purpose, too independent form of data-mining model.
Stage Description
1. Business Understanding Ask relevant questions, define objectives
2. Data Mining Gather the necessary data
3. Data Cleaning Scrub and fix data inconsistencies
4. Data Exploration Form hypothesis about the data
5. Feature Engineering Select / construct important features
6. Predictive Modeling Train Machine Learning Models
7. Data Visualization Communicate findings to key stakeholders
8. Data Automation Automate and deploy ML models
Figure 4: CRISP-DM
10
Design Thinking
Figure 5: Design Thinking 11
Data Development Lifecycle
Figure 6: Development, Data and Analytics Lifecycle 12
3 The Data
Data is fundamental
Figure 7: Data and the AI Ladder 13
Types of Data
Data can be:
• Structured: tables, spreadsheets, relational
databases.
• Unstructured: text, images, audio, video.
• Numerical / Quantitative: ex: pulse, temperature.
• Categorical / qualitative: ex: hair colour.
• Big Data: massive data sets that cannot fit in memory
on a single machine.
Figure 8: Different types of data
Data becomes information when viewed in context or post-analysis.
14
Obtaining Data
Private / Enterprise Data
• Private data can often be found in: Data warehouse, SQL Database.
• NoSQL store, Data Lakes or HDFS, document repositories.
• Private wiks, ERP and CRM platforms, object storage and more often then not, spreadsheets.
Public Data
• Weather, social media, location / geographical data, stock data, public internet data
(scraping), wikis, Eurostat, kaggle, government data portal - are all sources of external data.
Data compliance, governance and security are key to a successful data strategy.
15
Data Portal
Figure 9: Open Data portals such as data.gov.ie 16
Common Data Formats
• XML, JSON, YAML
• CSV, TSV, Parquet, XLSX
• Markdown, HTML, DOCX
• TXT, PDF
• Audio, Video
• Data APIs that return JSON or XML
• Streaming data
• SQL and other database formats
• HDFS and other big data stores or encapsulated data on object storage
17
The Big Data Challenge
Figure 10: Making sense of ever growing data sets through automation and machine learning 18
4 Data Science Toolkit
Tools Data Scientists use
• Mathematics - Linear Algebra, Statistics, Combinatorics
• Some of them use R - focusing on statistics
• A lot of them use Python - usually with Jupyter notebook as a front-end
• Libraries such as Pandas and Numpy are very handy!
• Natural Language Processing with NLTK
• or Machine Learning libraries - Scikit-Learn, Tensorflow or PyTorch
• SQL and databases tend to be quite popular. After all, where does data live?
• NoSQL databases such as MongoDB are quite useful too…
• And a whole bunch of Big Data tools: Hadoop, Spark, Kafka, etc.
• They write papers too, so Markdown and LaTeX come in handy!
• Lots of code, so typical software development tools (git, IDEs, CI/CD, etc.)
• Processes (SCRUM, Agile, Lean, CRISP-DM, Design Thinking)
19
Tools to IOSEMN process
+-----------------+ Project Management / Lifecycle
| INQUIRE | Git, Github, Gitlab (Project documentation)
+-----------------+ Documentation systems
v
+------------------+ Requests, APIs, sensors, surveys
| OBTAIN | SQL, CSV, JSON, XLS, NoSQL, Hadoop, Spark
+------------------+ Store / Cache data locally (SQLite, PostgreSQL)
v (Gather internal and external data)
+-----------------+ Jupyter Notebook
| SCRUB | Regular Expression (re), BeautifulSoup
+-----------------+ SQLite, ETL, Glue
20
Tools (continued)
+-----------------+ Jupyter Notebook
| EXPLORE | Pandas, Orange
+-----------------+ Matplotlib
^ v (Explore and understand the data)
+-----------------+ SciKit-Learn, Tensorflow
| MODEL | PyTorch, NumPy
+-----------------+ Machine Learning
RE-INQUIRE | (Model: predict, check accuracy, evaluate model)
^ +-----------------+ Jupyter Notebook, MatplotLib
+--------- | INTERPRET | Bokeh, D3.JS, XLSXWriter
+-----------------+ Dashboards, Reports, etc.
(Choose a good representation, interpret the results)
21
Jupyter Lab / Notebook
Figure 11: Jupyter Notebook 22
Graphing and Dashboards
Figure 12: Grafana: dashboard for time series analytics 23
Apache Superset Visualization
Figure 13: Apache Supserset 24
Geospacial Data Visualization
Figure 14: Visualize geospatial data with deck.gl 25
Local Cloud - Docker Compose
version: '3'
services:
jupyter:
image: cmihai/jupyter:v1
container_name: jupyter
volumes:
- ./notebooks:/notebooks
ports:
- '9000:9000'
links:
- postgres
- redis
postgres:
image: postgres
container_name: postgres
ports:
- '5432:5432'
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
volumes:
- pgdata:/var/lib/postgresql/data
redis:
image: redis:alpine
26
Composable Environments
+-----------------+
| Jupyter |
PYTHON | ports:9000 +---------------------------------+
| vol: /notebooks | |
| (Anaconda 3) +-----------------+ |
+---------|-------+ | |
| | |
+---------v-------+ +-----v------+ +-----v-----+
| PostgreSQL | NOSQL: | REDIS | | MONGODB |
SQL | | | | | |
| | | | | |
| | | | | |
+-----------------+ +------------+ +-----------+
27
Machine Learning Frameworks
Figure 15: Architecture: Jupyter Notebook using Keras with Tensorflow
28
Open Data, Open Tools
Figure 16: Open tools analysing open medical data
29
Gartner Hype Cycle
Figure 17: Gartner Hype Cycle 2018 30
5 Cloud Computing Solutions
Cloud Computing
A model for enabling convenient, on-demand network access to a shared pool of config-
urable computing resources that can be rapidly provisioned and released with minimal
management effort or service provider interaction.3
Characteristics
On-demand Self-service
Broad network access
Resource pooling
Rapid elasticity
Measured service
Service Models
IaaS Infrastructure as a Service
PaaS Platform as a Service
FaaS Functions as a Service
SaaS Software as a Service
BPaaS Business Process as a Service
Deployment Models
Private Cloud
Community Cloud
Public Cloud
Hybrid Cloud
Multi Cloud
3
The NIST Definition of Cloud Computing, Special Publication 800-145
31
Why take advantage of Cloud in Data Science projects
1. Big Data: analytics for ever increasing, varied data sets.
2. IoT: Easily gather and process data from smart devices (Internet of Things).
3. Collaborative: Set up data environments with ease, and collaborate with your team.
4. Scalable: Access to virtually unlimited resources, GPU computing, etc.
5. Automated: orchestrate and provision systems and tear them down when no longer needed.
Chances are, you’re likely already using it. Github? Kaggle notebooks? Google Docs? Dropbox? AWS
Free Tier? JupyterHub?
32
Machine Learning as a Service
Figure 18: IBM Watson Machine Learning Services 33
Cloud, Multi-Cloud, Data Lakes
Figure 19: Data Lake Architecture on AWS 34
Kubernetes: Container Orchestration at Scale
Figure 20: Kubernetes is Desired State Management
35
Kubernetes: container platform that enables portability across infrastructures providers
Kubernetes facilitates both declarative configuration and automation.
Container Orchestration Benefits:
• Cluster management: Federate hosts and manage them.
• Self-healing: Detect and replace unhealthy container Pods and hosts. Attempt to put the
cluster back to the wanted state.
• Replication: Ensure that the wanted number of Pod replicas is running.
• Service discovery: Locate and distribute client requests across running containers.
• Scheduling: Distribute containers across the worker nodes.
• Scaling: Adding or removing containers to match workload.
• Persistent Storage: Manage persistent storage.
36
6 The rise of DevOps
Collaborate to continuously deliver
Figure 21: Practices to implement DevOps 37
Cultural Transformation
• Culture: Build trust and align your team with better communication and transparency.
• Discover: Understand the problem domain and align on common goals.
• Think: Know your audience and meet its needs faster than the competition.
• Develop: Collaborate to build, continuously integrate and deliver high-quality code.
• Reason: Apply AI techniques so that you can make better decisions.
• Operate: Harness the power of the cloud to quickly get your minimum viable product (MVP)
into production, and monitor and manage your applications to a high degree of quality and
meet your service level agreements. Grow or shrink your resources based on demand.
• Learn: Gain insights from your users as they interact with your application.
38
End-to-End CD4ML Process by Martin Fowler
Figure 22: Continuous Delivery for Machine Learning end-to-end process 39
Build reproducible images with Packer
Figure 23: Packer building a VirtualBox image for RHEL 8 using Kickstart Automated Install 40
Secure your environment with OpenSCAP:
Figure 24: Automate Continuous Compliance and Remediation (ex: HIPAA, PCI) 41
Ansible: Provisioning and Configuration Management
Figure 25: Ansible: Application Deployment + Configuration Management + Continuous Delivery 42
What can I do with Ansible?
Figure 26: Automate your entire infrastructure and data pipeline using Ansible
43
Molecule: Test your Ansible Playbooks on Docker, Vagrant or Cloud
Molecule provides support for testing with multiple instances, operating systems and distributions,
virtualization providers, test frameworks and testing scenarios.
Instantly create a Vagrant or Docker machine and test your playbook:
molecule create -s vagrant-centos-7
molecule converge -s vagrant-centos-7
molecule login
Integrate your tests as part of your CI/CD
molecule test
44
7 Reusable Assets and Practices
Reproducible Research and Giveback
Figure 27: https://paperswithcode.com - Reproducible Research 45
The Open Practice Library
Figure 28: openpracticelibrary.com: A community-driven
repository of practices and tools
An Outcome Delivery framework:
• Discovery - generate the Outcomes
• Options - identify how to get there
• Delivery - implement and put ideas to the test.
Learn what works and what doesn’t.
46
The Open Practice Library - Discovery
Figure 29: What problems are you trying to solve, for whom and why? How will you measure Outcomes? 47
The Open Practice Library - Options Pivot
Figure 30: What are the different options? What do you need to make this happen? 48
The Open Practice Library - Delivery
Figure 31: What was measured impact? What did you learn? 49
The Open Practice Library - Foundation
Figure 32: Creating a team culture, an environment of collaboration and technical engineering practices 50
8 Skills Development
Skills Map
Figure 33: Cross-functional Skills Map 51
Example skills for Data Science scenarios:
1. Data Engineering: Infrastructure as Code (CloudFormation, Terraform) Shell Scripting, Cloud,
APIs (boto3), etc.
2. DevOps: putting together CI/CD pipelines, Jenkins, CodeStar*, CodePipeline, CodeBuild, etc.
3. Event Driven Architectures: putting an object on S3 triggers a lambda function that performs
ETL and loads the result into RedShift.
4. GPU computing: accelerate workloads for general-purpose scientific and engineering
computing. 4
5. Container platforms: develop microservices and set up environments with ease.
4
Forbes: From Deep Learning To Data Science: Everything You Need To Know
52
Reasoning Skills
Figure 34: Avoid Congitive Biases 53
Technical Books
Figure 35: Journey from Linux and Python to Big Data and Machine Learning 54
Soft Skills and Career
Figure 36: Career Development, Soft Skills, Stakeholder Management 55
Meetups and Events
Networking:
• Absorb ideas and filter them through your own experience at meetups and events.
• Build a strong network.
Giveback:
• Give back to the community by speaking and events, supporting, sponsoring or co-organising.
• Contribute to Open Source projects. Meetups and hackathons are a great place to start!
Social Eminence
• Be an active member of the Data Science community and build your social eminence!
56
Communities and Competitions
Figure 37: Compete on Kaggle or check out notebooks and datasets 57
Developing a Personal Brand and Building your Portfolio
Figure 38: Develop your own journey and personal brand
58
Great places to learn
Figure 39: MooC: CognitiveClass.AI - Free Courses and Badges 59
Gamification with Badges
Figure 40: Collect free badges and accreditations 60
Example Courses
cognitiveclass.ai
Learning Paths and Badges on Containers, Kubernetes, SQL, Big Data, Python, R, Deep Learning,
Analytics, Hadoop and Spark.
katakoda.com
Interactive, hands-on courses on Containers, Machine Learning, DevOps, Software Engineering
Practices and more straight in your browser.
Data Science Courses
• edx.org - you can ‘audit’ courses for free.
• coursera.org - great machine learning courses (ex: Andrew Ng.)
• fast.ai - free courses on Deep Learning, Computational Linear Algebra.
61
What they don’t teach you
Sometimes, it’s just you.. and there won’t be a:
• Data Engineer to build and automate your pipelines
• A Cloud Architect to design your infrastructure
• A Network Engineer
• A DevOps Engineer to create your Code deployment pipelines and help with CI/CD
• A Systems Administrator to support your Linux environment.
Go SaaS
• Could I go SaaS?
Build it yourself!
• How do I build all this myself?
62
Questions and Contact
Twitter: @CrivetiMihai
LinkedIn: https://www.linkedin.com/in/crivetimihai/
GitHub: crivetimihai
Blog: blog.boreas.ro
Ansible Galaxy: https://galaxy.ansible.com/crivetimihai
63

Contenu connexe

Tendances

Brokering Data: Accelerating Data Evaluation with Databricks White Label
Brokering Data: Accelerating Data Evaluation with Databricks White LabelBrokering Data: Accelerating Data Evaluation with Databricks White Label
Brokering Data: Accelerating Data Evaluation with Databricks White LabelDatabricks
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningDatabricks
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichDatabricks
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...Databricks
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesLars Albertsson
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeVasu S
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control TowerDatabricks
 
Closing Keynote
Closing KeynoteClosing Keynote
Closing KeynoteNeo4j
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...Romeo Kienzler
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists jlacefie
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarBig Data Spain
 
Netflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupNetflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupBlake Irvine
 
TechEvent Building a Data Lake
TechEvent Building a Data LakeTechEvent Building a Data Lake
TechEvent Building a Data LakeTrivadis
 

Tendances (20)

Brokering Data: Accelerating Data Evaluation with Databricks White Label
Brokering Data: Accelerating Data Evaluation with Databricks White LabelBrokering Data: Accelerating Data Evaluation with Databricks White Label
Brokering Data: Accelerating Data Evaluation with Databricks White Label
 
Data engineering
Data engineeringData engineering
Data engineering
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data Lake
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control Tower
 
Closing Keynote
Closing KeynoteClosing Keynote
Closing Keynote
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
 
Netflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupNetflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering Meetup
 
TechEvent Building a Data Lake
TechEvent Building a Data LakeTechEvent Building a Data Lake
TechEvent Building a Data Lake
 

Similaire à Data Science at Scale - The DevOps Approach

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfahmedibrahimghnnam01
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial IntelligenceZavain Dar
 
Data science meetup - Spiros Antonatos
Data science meetup - Spiros AntonatosData science meetup - Spiros Antonatos
Data science meetup - Spiros AntonatosSpiros Antonatos
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceInstitute of Contemporary Sciences
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
 
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Debraj GuhaThakurta
 

Similaire à Data Science at Scale - The DevOps Approach (20)

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial Intelligence
 
Data science meetup - Spiros Antonatos
Data science meetup - Spiros AntonatosData science meetup - Spiros Antonatos
Data science meetup - Spiros Antonatos
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
IT webinar 2016
IT webinar 2016IT webinar 2016
IT webinar 2016
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...Developing and deploying AI solutions on the cloud using Team Data Science Pr...
Developing and deploying AI solutions on the cloud using Team Data Science Pr...
 

Plus de Mihai Criveti

10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation OptionsMihai Criveti
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Mihai Criveti
 
Ansible Workshop for Pythonistas
Ansible Workshop for PythonistasAnsible Workshop for Pythonistas
Ansible Workshop for PythonistasMihai Criveti
 
Mihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate EverythingMihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate EverythingMihai Criveti
 
ShipItCon - Continuous Deployment and Multicloud with Ansible and Kubernetes
ShipItCon - Continuous Deployment and Multicloud with Ansible and KubernetesShipItCon - Continuous Deployment and Multicloud with Ansible and Kubernetes
ShipItCon - Continuous Deployment and Multicloud with Ansible and KubernetesMihai Criveti
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleMihai Criveti
 
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShift
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShiftKubernetes Story - Day 3: Deploying and Scaling Applications on OpenShift
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShiftMihai Criveti
 
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...Mihai Criveti
 
Kubernetes Story - Day 1: Build and Manage Containers with Podman
Kubernetes Story - Day 1: Build and Manage Containers with PodmanKubernetes Story - Day 1: Build and Manage Containers with Podman
Kubernetes Story - Day 1: Build and Manage Containers with PodmanMihai Criveti
 
Container Technologies and Transformational value
Container Technologies and Transformational valueContainer Technologies and Transformational value
Container Technologies and Transformational valueMihai Criveti
 
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...Mihai Criveti
 
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...Mihai Criveti
 

Plus de Mihai Criveti (12)

10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options10 Limitations of Large Language Models and Mitigation Options
10 Limitations of Large Language Models and Mitigation Options
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
Ansible Workshop for Pythonistas
Ansible Workshop for PythonistasAnsible Workshop for Pythonistas
Ansible Workshop for Pythonistas
 
Mihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate EverythingMihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate Everything
 
ShipItCon - Continuous Deployment and Multicloud with Ansible and Kubernetes
ShipItCon - Continuous Deployment and Multicloud with Ansible and KubernetesShipItCon - Continuous Deployment and Multicloud with Ansible and Kubernetes
ShipItCon - Continuous Deployment and Multicloud with Ansible and Kubernetes
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image Lifecycle
 
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShift
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShiftKubernetes Story - Day 3: Deploying and Scaling Applications on OpenShift
Kubernetes Story - Day 3: Deploying and Scaling Applications on OpenShift
 
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...
Kubernetes Story - Day 2: Quay.io Container Registry for Publishing, Building...
 
Kubernetes Story - Day 1: Build and Manage Containers with Podman
Kubernetes Story - Day 1: Build and Manage Containers with PodmanKubernetes Story - Day 1: Build and Manage Containers with Podman
Kubernetes Story - Day 1: Build and Manage Containers with Podman
 
Container Technologies and Transformational value
Container Technologies and Transformational valueContainer Technologies and Transformational value
Container Technologies and Transformational value
 
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...
OpenShift Commons - Adopting Podman, Skopeo and Buildah for Building and Mana...
 
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
 

Dernier

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 

Dernier (20)

Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 

Data Science at Scale - The DevOps Approach

  • 1. Data Science at Scale - The DevOps Approach DevOps Practices for Data Scientists and Engineers Mihai Criveti 22nd September 2019 https://github.com/crivetimihai http://galaxy.ansible.com/crivetimihai 1
  • 2. 1 Data Science Landscape 2 Process and Flow 3 The Data 4 Data Science Toolkit 5 Cloud Computing Solutions 6 The rise of DevOps 7 Reusable Assets and Practices 8 Skills Development 2
  • 3. Speaker Bio Mihai Criveti, IBM Designs and builds multi-cloud customer solutions for Cloud Native applications, Big Data analytics and Machine Learning. Pursuing a MSc in Data Science at UCD. Leads the Cloud Native competency for IBM Cloud Solutioning. Passionate about Open Source software, DevOps and Data Science. 3
  • 4. 1 Data Science Landscape
  • 5. What is Data Science Data Science Multi-disciplinary field that brings together Computer Science, Statistics/Machine Learning, and Data Analysis to understand and extract insights from ever-increasing amounts of data. Machine Learning The science of getting computers to act without being explicitly programmed. Deep Learning Family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Can be supervised, semi-supervised or unsupervised. AI Intelligent machines that work and react like humans. 4
  • 6. Moving Towards Big Data Figure 1: Powerful models and big data support Machine Learning 5
  • 7. Data Scientist Domain Data Engineering: • Linux, Cloud, Big Data Platforms. • Streaming, big data pipelines. Software Development • Coding skills, such as Python, R, SQL. • Development practices: Agile, DevOps, CI/CD and using GitOps effectively. Figure 2: Data Scientist Venn Diagram “While data scientists are recognized for their brilliant algorithms, up to 80% of their time could be spent collecting, cleaning and organizing data.” 1 1 Forbes: Radical Change Is Coming To Data Science Jobs 6
  • 8. Data Science Roles Data Scientist / Analyst: • Turn raw data into valuable insights that an organization needs in order to grow or compete. • Analytical data experts with the skill to solve complex problems and the curiosity to explore what problems need solving. • They use Data, visualization, machine learning, deep learning, pattern recognition, natural language processing, analytics. • Always curious: ”What can we learn from this data? What actions can we take after? Data Engineer / Data Architect: • Prepare the “big data” infrastructure to be analysed by Data Scientists. • Software engineers who design, build, integrate data from various resources, and manage big data. 7
  • 10. Often, it’s a manual process Figure 3: Pen and paper - planning 8
  • 11. Data Science is awesome Data Science is OSEMN (pronounced AWESOME!)2 - an interactive process that consists, largely, of the following steps: 1. Inquire: ask a meaningful question. 2. Obtain: get the required data. 3. Scrub: clean the data. 4. Explore: learn about your data, try stuff. 5. Model: create a couple of models and test them. 6. iNterpret: gain insight from data, present it in a usable form (reports, dashboards, applications, etc). 2 O’Reilly - “Data Science at the Command Line, Facing the Future with Time-Tested Tools, Jeroen Janssens” 9
  • 12. CRISP-DM - Cross-industry standard process for data mining CRISP-DM is a widely used, general purpose, too independent form of data-mining model. Stage Description 1. Business Understanding Ask relevant questions, define objectives 2. Data Mining Gather the necessary data 3. Data Cleaning Scrub and fix data inconsistencies 4. Data Exploration Form hypothesis about the data 5. Feature Engineering Select / construct important features 6. Predictive Modeling Train Machine Learning Models 7. Data Visualization Communicate findings to key stakeholders 8. Data Automation Automate and deploy ML models Figure 4: CRISP-DM 10
  • 13. Design Thinking Figure 5: Design Thinking 11
  • 14. Data Development Lifecycle Figure 6: Development, Data and Analytics Lifecycle 12
  • 16. Data is fundamental Figure 7: Data and the AI Ladder 13
  • 17. Types of Data Data can be: • Structured: tables, spreadsheets, relational databases. • Unstructured: text, images, audio, video. • Numerical / Quantitative: ex: pulse, temperature. • Categorical / qualitative: ex: hair colour. • Big Data: massive data sets that cannot fit in memory on a single machine. Figure 8: Different types of data Data becomes information when viewed in context or post-analysis. 14
  • 18. Obtaining Data Private / Enterprise Data • Private data can often be found in: Data warehouse, SQL Database. • NoSQL store, Data Lakes or HDFS, document repositories. • Private wiks, ERP and CRM platforms, object storage and more often then not, spreadsheets. Public Data • Weather, social media, location / geographical data, stock data, public internet data (scraping), wikis, Eurostat, kaggle, government data portal - are all sources of external data. Data compliance, governance and security are key to a successful data strategy. 15
  • 19. Data Portal Figure 9: Open Data portals such as data.gov.ie 16
  • 20. Common Data Formats • XML, JSON, YAML • CSV, TSV, Parquet, XLSX • Markdown, HTML, DOCX • TXT, PDF • Audio, Video • Data APIs that return JSON or XML • Streaming data • SQL and other database formats • HDFS and other big data stores or encapsulated data on object storage 17
  • 21. The Big Data Challenge Figure 10: Making sense of ever growing data sets through automation and machine learning 18
  • 22. 4 Data Science Toolkit
  • 23. Tools Data Scientists use • Mathematics - Linear Algebra, Statistics, Combinatorics • Some of them use R - focusing on statistics • A lot of them use Python - usually with Jupyter notebook as a front-end • Libraries such as Pandas and Numpy are very handy! • Natural Language Processing with NLTK • or Machine Learning libraries - Scikit-Learn, Tensorflow or PyTorch • SQL and databases tend to be quite popular. After all, where does data live? • NoSQL databases such as MongoDB are quite useful too… • And a whole bunch of Big Data tools: Hadoop, Spark, Kafka, etc. • They write papers too, so Markdown and LaTeX come in handy! • Lots of code, so typical software development tools (git, IDEs, CI/CD, etc.) • Processes (SCRUM, Agile, Lean, CRISP-DM, Design Thinking) 19
  • 24. Tools to IOSEMN process +-----------------+ Project Management / Lifecycle | INQUIRE | Git, Github, Gitlab (Project documentation) +-----------------+ Documentation systems v +------------------+ Requests, APIs, sensors, surveys | OBTAIN | SQL, CSV, JSON, XLS, NoSQL, Hadoop, Spark +------------------+ Store / Cache data locally (SQLite, PostgreSQL) v (Gather internal and external data) +-----------------+ Jupyter Notebook | SCRUB | Regular Expression (re), BeautifulSoup +-----------------+ SQLite, ETL, Glue 20
  • 25. Tools (continued) +-----------------+ Jupyter Notebook | EXPLORE | Pandas, Orange +-----------------+ Matplotlib ^ v (Explore and understand the data) +-----------------+ SciKit-Learn, Tensorflow | MODEL | PyTorch, NumPy +-----------------+ Machine Learning RE-INQUIRE | (Model: predict, check accuracy, evaluate model) ^ +-----------------+ Jupyter Notebook, MatplotLib +--------- | INTERPRET | Bokeh, D3.JS, XLSXWriter +-----------------+ Dashboards, Reports, etc. (Choose a good representation, interpret the results) 21
  • 26. Jupyter Lab / Notebook Figure 11: Jupyter Notebook 22
  • 27. Graphing and Dashboards Figure 12: Grafana: dashboard for time series analytics 23
  • 28. Apache Superset Visualization Figure 13: Apache Supserset 24
  • 29. Geospacial Data Visualization Figure 14: Visualize geospatial data with deck.gl 25
  • 30. Local Cloud - Docker Compose version: '3' services: jupyter: image: cmihai/jupyter:v1 container_name: jupyter volumes: - ./notebooks:/notebooks ports: - '9000:9000' links: - postgres - redis postgres: image: postgres container_name: postgres ports: - '5432:5432' environment: POSTGRES_USER: postgres POSTGRES_PASSWORD: postgres volumes: - pgdata:/var/lib/postgresql/data redis: image: redis:alpine 26
  • 31. Composable Environments +-----------------+ | Jupyter | PYTHON | ports:9000 +---------------------------------+ | vol: /notebooks | | | (Anaconda 3) +-----------------+ | +---------|-------+ | | | | | +---------v-------+ +-----v------+ +-----v-----+ | PostgreSQL | NOSQL: | REDIS | | MONGODB | SQL | | | | | | | | | | | | | | | | | | +-----------------+ +------------+ +-----------+ 27
  • 32. Machine Learning Frameworks Figure 15: Architecture: Jupyter Notebook using Keras with Tensorflow 28
  • 33. Open Data, Open Tools Figure 16: Open tools analysing open medical data 29
  • 34. Gartner Hype Cycle Figure 17: Gartner Hype Cycle 2018 30
  • 35. 5 Cloud Computing Solutions
  • 36. Cloud Computing A model for enabling convenient, on-demand network access to a shared pool of config- urable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.3 Characteristics On-demand Self-service Broad network access Resource pooling Rapid elasticity Measured service Service Models IaaS Infrastructure as a Service PaaS Platform as a Service FaaS Functions as a Service SaaS Software as a Service BPaaS Business Process as a Service Deployment Models Private Cloud Community Cloud Public Cloud Hybrid Cloud Multi Cloud 3 The NIST Definition of Cloud Computing, Special Publication 800-145 31
  • 37. Why take advantage of Cloud in Data Science projects 1. Big Data: analytics for ever increasing, varied data sets. 2. IoT: Easily gather and process data from smart devices (Internet of Things). 3. Collaborative: Set up data environments with ease, and collaborate with your team. 4. Scalable: Access to virtually unlimited resources, GPU computing, etc. 5. Automated: orchestrate and provision systems and tear them down when no longer needed. Chances are, you’re likely already using it. Github? Kaggle notebooks? Google Docs? Dropbox? AWS Free Tier? JupyterHub? 32
  • 38. Machine Learning as a Service Figure 18: IBM Watson Machine Learning Services 33
  • 39. Cloud, Multi-Cloud, Data Lakes Figure 19: Data Lake Architecture on AWS 34
  • 40. Kubernetes: Container Orchestration at Scale Figure 20: Kubernetes is Desired State Management 35
  • 41. Kubernetes: container platform that enables portability across infrastructures providers Kubernetes facilitates both declarative configuration and automation. Container Orchestration Benefits: • Cluster management: Federate hosts and manage them. • Self-healing: Detect and replace unhealthy container Pods and hosts. Attempt to put the cluster back to the wanted state. • Replication: Ensure that the wanted number of Pod replicas is running. • Service discovery: Locate and distribute client requests across running containers. • Scheduling: Distribute containers across the worker nodes. • Scaling: Adding or removing containers to match workload. • Persistent Storage: Manage persistent storage. 36
  • 42. 6 The rise of DevOps
  • 43. Collaborate to continuously deliver Figure 21: Practices to implement DevOps 37
  • 44. Cultural Transformation • Culture: Build trust and align your team with better communication and transparency. • Discover: Understand the problem domain and align on common goals. • Think: Know your audience and meet its needs faster than the competition. • Develop: Collaborate to build, continuously integrate and deliver high-quality code. • Reason: Apply AI techniques so that you can make better decisions. • Operate: Harness the power of the cloud to quickly get your minimum viable product (MVP) into production, and monitor and manage your applications to a high degree of quality and meet your service level agreements. Grow or shrink your resources based on demand. • Learn: Gain insights from your users as they interact with your application. 38
  • 45. End-to-End CD4ML Process by Martin Fowler Figure 22: Continuous Delivery for Machine Learning end-to-end process 39
  • 46. Build reproducible images with Packer Figure 23: Packer building a VirtualBox image for RHEL 8 using Kickstart Automated Install 40
  • 47. Secure your environment with OpenSCAP: Figure 24: Automate Continuous Compliance and Remediation (ex: HIPAA, PCI) 41
  • 48. Ansible: Provisioning and Configuration Management Figure 25: Ansible: Application Deployment + Configuration Management + Continuous Delivery 42
  • 49. What can I do with Ansible? Figure 26: Automate your entire infrastructure and data pipeline using Ansible 43
  • 50. Molecule: Test your Ansible Playbooks on Docker, Vagrant or Cloud Molecule provides support for testing with multiple instances, operating systems and distributions, virtualization providers, test frameworks and testing scenarios. Instantly create a Vagrant or Docker machine and test your playbook: molecule create -s vagrant-centos-7 molecule converge -s vagrant-centos-7 molecule login Integrate your tests as part of your CI/CD molecule test 44
  • 51. 7 Reusable Assets and Practices
  • 52. Reproducible Research and Giveback Figure 27: https://paperswithcode.com - Reproducible Research 45
  • 53. The Open Practice Library Figure 28: openpracticelibrary.com: A community-driven repository of practices and tools An Outcome Delivery framework: • Discovery - generate the Outcomes • Options - identify how to get there • Delivery - implement and put ideas to the test. Learn what works and what doesn’t. 46
  • 54. The Open Practice Library - Discovery Figure 29: What problems are you trying to solve, for whom and why? How will you measure Outcomes? 47
  • 55. The Open Practice Library - Options Pivot Figure 30: What are the different options? What do you need to make this happen? 48
  • 56. The Open Practice Library - Delivery Figure 31: What was measured impact? What did you learn? 49
  • 57. The Open Practice Library - Foundation Figure 32: Creating a team culture, an environment of collaboration and technical engineering practices 50
  • 59. Skills Map Figure 33: Cross-functional Skills Map 51
  • 60. Example skills for Data Science scenarios: 1. Data Engineering: Infrastructure as Code (CloudFormation, Terraform) Shell Scripting, Cloud, APIs (boto3), etc. 2. DevOps: putting together CI/CD pipelines, Jenkins, CodeStar*, CodePipeline, CodeBuild, etc. 3. Event Driven Architectures: putting an object on S3 triggers a lambda function that performs ETL and loads the result into RedShift. 4. GPU computing: accelerate workloads for general-purpose scientific and engineering computing. 4 5. Container platforms: develop microservices and set up environments with ease. 4 Forbes: From Deep Learning To Data Science: Everything You Need To Know 52
  • 61. Reasoning Skills Figure 34: Avoid Congitive Biases 53
  • 62. Technical Books Figure 35: Journey from Linux and Python to Big Data and Machine Learning 54
  • 63. Soft Skills and Career Figure 36: Career Development, Soft Skills, Stakeholder Management 55
  • 64. Meetups and Events Networking: • Absorb ideas and filter them through your own experience at meetups and events. • Build a strong network. Giveback: • Give back to the community by speaking and events, supporting, sponsoring or co-organising. • Contribute to Open Source projects. Meetups and hackathons are a great place to start! Social Eminence • Be an active member of the Data Science community and build your social eminence! 56
  • 65. Communities and Competitions Figure 37: Compete on Kaggle or check out notebooks and datasets 57
  • 66. Developing a Personal Brand and Building your Portfolio Figure 38: Develop your own journey and personal brand 58
  • 67. Great places to learn Figure 39: MooC: CognitiveClass.AI - Free Courses and Badges 59
  • 68. Gamification with Badges Figure 40: Collect free badges and accreditations 60
  • 69. Example Courses cognitiveclass.ai Learning Paths and Badges on Containers, Kubernetes, SQL, Big Data, Python, R, Deep Learning, Analytics, Hadoop and Spark. katakoda.com Interactive, hands-on courses on Containers, Machine Learning, DevOps, Software Engineering Practices and more straight in your browser. Data Science Courses • edx.org - you can ‘audit’ courses for free. • coursera.org - great machine learning courses (ex: Andrew Ng.) • fast.ai - free courses on Deep Learning, Computational Linear Algebra. 61
  • 70. What they don’t teach you Sometimes, it’s just you.. and there won’t be a: • Data Engineer to build and automate your pipelines • A Cloud Architect to design your infrastructure • A Network Engineer • A DevOps Engineer to create your Code deployment pipelines and help with CI/CD • A Systems Administrator to support your Linux environment. Go SaaS • Could I go SaaS? Build it yourself! • How do I build all this myself? 62
  • 71. Questions and Contact Twitter: @CrivetiMihai LinkedIn: https://www.linkedin.com/in/crivetimihai/ GitHub: crivetimihai Blog: blog.boreas.ro Ansible Galaxy: https://galaxy.ansible.com/crivetimihai 63