SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Ph.D in Computer Science at ENS Paris/INRIA
Postdoctoral Fellow at Carnegie Mellon University
>500 citations, Best Paper Award at 2009 CVPR Conference
NEC Labs (Bell Labs) in Cupertino (Silicon Valley)
Senior Researcher at Intel (3 pending patents)
- Developed ML algorithms for face recognition
Invited speaker to CMU, Samsung, Tokyo Univ, SNU, etc.
Co-Founder of Solidware
Olivier Duchenne
Co-founder | Chief Machine Learning Scientist
8 years experience in Machine learning, Computer Vision and Big Data
Guidelines for using Machine Learning on real data
Avoid Common Mistakes
Understand Better the Data
1.Big Enough Data?
2.Changing Data
Machine Learning and Data Science
From Computer Vision Experience
To Solving Companies issues:
Ex: car accident prediction (insurance),
default prediction (bank),
stock value prediction
Machine Learning and Data Science
Prediction Function
Predicted Target Value
ML Algorithms analyze
historical data
to detect patterns
PAST DATA
(Training Data Set)
Internal Data
Ex: Age, Gender
External Data
Ex: Web Crawl
Target Value
Machine-Learning based Predictive Modeling
Newly Incoming Data
Unknown
Target Value
Internal Data External Data
1. Prediction Function. Ex: a linear function, a neural net,…
2. The prediction function is parametrized. Ex: 𝐟 𝜶 𝐗 = 𝜶𝒊 𝑿𝒊𝒊
3. The goal is to find the best prediction function, i.e. the best
parameters.
4. We build an objective function, that represents how good a
prediction function is.
5. The objective function always has a data term. Ex: 𝐨𝐛𝐣 𝜶 =
𝒇 𝜶 𝑿 𝒔 − 𝒀 𝒔 𝟐
𝒔
6. The algorithm tries to find the best parameters, that optimizes this
objective function. Ex: closed form solution, stochastic gradient
descent, …
Basic Explanation of Machine Learning
History of Machine Learning for Computer Vision
Model-Driven Mixed Data-Driven
1970s
Hand-designed Model
1980s
Alignment
Method
2000s
Deformable
Model
2010s
Conv. Network
1990s
Grid Model
Why didn’t people use ML since the beginning?
General Assumptions for the reason
1.“Better Computer” available now
2.“Better Algorithms”
3.“Amount of Data”
“We create so much data that 90% of the data in the world today has
been created in the last two years alone”
- Petter Bae Brandtzæ g, SINTEF ICT
How much data did CV Researcher use?
Image source: http://www.vision.caltech.edu/ Image source: http://doi.ieeecomputersociety.org/
2004
Caltech 101
10K Images
2005-2010
Pascal VOC
2K  30K objects
2010-2015
Image Net
10M  15M images
http://www.image-net.org/
The answer is… “Amount of Data”
Image source - Smartdatacollective.com
• Most Advanced Machine
Learning cannot be applied if
there are not enough data
• Critical mass of data is
necessary to use, for example,
deep learning
• When the amount of data
increases, the machine
learning models and, therefore,
the prediction model becomes
more complex and better
With enough data, ANY algorithms okay?
Support vector machines Bayesian networks
Regression forestSparse dictionary learning
Artificial neural networksK-Nearest neighbors
Deep learning Boosting
Deep Learning Neural Networks Log. Regression
No, it depends on the company and the problem you are trying to solve
A B C
What Changed in Machine Learning Domain
From the Past to the Present:
Synonym: Over generalizing
That is like visiting a new place during one day, seeing a mountain fire.
And believing that there are fires everyday there.
Why do we need lots of data?
Overfitting
In real life, we do not have many chances of having
clean & BIG data
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Seoul Busan Daejeon Gwangju
Prob. To default
Prob. To default
… (many more cities)
An example: Overfitting due to lack of data
As there are many
categories,
some categories with small
data show outlier results
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Seoul Busan Daejon Kyangju
Prob. To default
Prob. To default
… (many more cities)
So, always use error bars
You want to detect an event which occur on average with probability: p=5%
Let’s say you have many cities with ~50 samples
On average, 1/13 will have this event 0 times.
Without proper handling, the extreme case, will be all wrong.
This kind of error can happen often
How to fight against overfitting
Data
More Samples
Less Variables
Artificial Data Extension
Algorithm
Simpler Objective Function
Regularization
Bagging
Modeling
Feature Engineering
Data Normalization
Data
In Computer Vision, it is possible to extend the data.
Ex: Hiring annotator, Amazon Mechanical Turk, Google Re-Captcha
Companies often they have a limited number of samples, and cannot extend it.
Ex: A Korean Bank that gives ~100K loans per year
1. Count only positives ( Detecting rare events require more data)
Ex: Image Detection. It’s easy to find an infinite number of negatives.
Often company want to detect rare events (few positives)
Ex: predicting car accident / ad clicks / defaults / online purchase
How to count your data?
2. Difficulty of the task
How to count your data?
• Learning addition ( 𝒚 = 𝟏 ∗ 𝑿 𝟏 + 𝟏 ∗ 𝑿 𝟐 )
(Requires ~100 samples)
• Learning object recognition
( Requires ~10M samples)
3. Probabilistic event detection is harder.
What is in this image? Will this user click on a car advertisement?
Client #1: Male, 27y.o, lives in Seoul, Salary
man in the construction sector, already
previously clicked on a car advertisement
Client #2: Male, 27y.o, lives in Seoul, Salary
man in the construction sector, already
previously clicked on a car advertisement
Yes
No
How to count your data?
Algorithm
1. Many algorithms exist: GLM, Boosting, Lasso, Regression Forest, SVM,
Gaussian Process, Bayesian Networks, Deep Learning, …
2. The complexity of their prediction functions differ.
3. The more complex the prediction function is, the more it fits the data.
Purchase
Prob.
Age
Purchase
Prob.
Age
Purchase
Prob.
Age
Underfitting Overfitting
Algorithm
1. Less parameters  Less overfitting
2. More parameters  Less underfitting
3. Ex: Best of both worlds: Deep Conv Nets
Algorithm
Avoiding “Too Many Categories” problem
Busan
Seoul
Dae-
jeon
Dae
-gou
Po-
hang
In-
cheon
Soo-
won
Ul-
San
Avoiding “Too Many Categories” problem
Busan
Seoul
Dae-
jeon
Dae
-gou
Po-
hang
In-
cheon
Soo-
won
Ul-
San
Grouping
Merging
Avoiding “Too Many Categories” problem
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1 2 3 4 5 6
Prob. To default
Prob. To default log10(population)
Regularization
𝑚𝑖𝑛 𝜃 𝑠 𝑙𝑜𝑠𝑠 𝑔𝑡 𝑠, 𝑓𝜃 𝑋𝑠 + 𝜆Ω(𝜃)
𝑚𝑖𝑛 𝜃 𝑠 𝑙𝑜𝑠𝑠 𝑔𝑡 𝑠, 𝑓𝜃 𝑋𝑠 , s.t. Ω 𝜃 < 𝜆
Ω 𝜃 =
𝜃 2
𝜃 1
Data Normalization
Removing variance that has no impact on the target value  Help the ML system to focus on meaningful variance
Deep Face (Facebook 2014), DB size: 120M images
Bagging
1. Randomly modify slightly the training set.
2. Do the training
3. Repeat
4. Average all prediction functions
• Market changes
• Law/Regulation Changes
• Collected Data changes
• Client filtering / Marketing changes
 Data change through time
 Representation of data change
• Variable names change
• Category names change
Changing Data
• Cyclic Data Changes
 Seasonality
• Trending has to be handled separately
 Interpolation – Extrapolation
Why is time so different from other variables ?
Prob.
To buy
A
smartphone
Age
Prob.
To buy
A
smartphone
Time
?
?
Interpolation Extrapolation
Time is correlated with hidden variables
Cost for car
insurance
(one type of
insurance)
Time
New Law
Change causes can be unknown, but consistant
Cost for car
insurance
(one type of
insurance)
Time
Seasonality
Cost for car
insurance
(one type of
insurance)
Time
Changing Data Representation
• Collected Data changes
• Category splitting, merging
• Variable names change
• Category names change
Job Applications: contact@solidware.io
Visit our booth 
Thank you
Visit our website: solidware.io

Contenu connexe

Tendances

FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsFCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsKusano Hitoshi
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codesNAVER D2
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aiaYi-Fan Liou
 
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)Alexander Ulanov
 
Machine teaching tbo_20190518
Machine teaching tbo_20190518Machine teaching tbo_20190518
Machine teaching tbo_20190518Yi-Fan Liou
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowNicholas McClure
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介Kenta Oono
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through ExamplesSri Ambati
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
200612_BioPackathon_ss
200612_BioPackathon_ss200612_BioPackathon_ss
200612_BioPackathon_ssSatoshi Kume
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Jen Aman
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaSpark Summit
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingbutest
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksDatabricks
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkitde:code 2017
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012Amazon Web Services
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Kenta Oono
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305mjfrankli
 

Tendances (20)

FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsFCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aia
 
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
 
Machine teaching tbo_20190518
Machine teaching tbo_20190518Machine teaching tbo_20190518
Machine teaching tbo_20190518
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介
 
Chainer v3
Chainer v3Chainer v3
Chainer v3
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
200612_BioPackathon_ss
200612_BioPackathon_ss200612_BioPackathon_ss
200612_BioPackathon_ss
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 

En vedette

[244] 분산 환경에서 스트림과 배치 처리 통합 모델
[244] 분산 환경에서 스트림과 배치 처리 통합 모델[244] 분산 환경에서 스트림과 배치 처리 통합 모델
[244] 분산 환경에서 스트림과 배치 처리 통합 모델NAVER D2
 
[242] wifi를 이용한 실내 장소 인식하기
[242] wifi를 이용한 실내 장소 인식하기[242] wifi를 이용한 실내 장소 인식하기
[242] wifi를 이용한 실내 장소 인식하기NAVER D2
 
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현NAVER D2
 
[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuit[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuitNAVER D2
 
[223] h base consistent secondary indexing
[223] h base consistent secondary indexing[223] h base consistent secondary indexing
[223] h base consistent secondary indexingNAVER D2
 
[232] 수퍼컴퓨팅과 데이터 어낼리틱스
[232] 수퍼컴퓨팅과 데이터 어낼리틱스[232] 수퍼컴퓨팅과 데이터 어낼리틱스
[232] 수퍼컴퓨팅과 데이터 어낼리틱스NAVER D2
 
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기NAVER D2
 
[224] 번역 모델 기반_질의_교정_시스템
[224] 번역 모델 기반_질의_교정_시스템[224] 번역 모델 기반_질의_교정_시스템
[224] 번역 모델 기반_질의_교정_시스템NAVER D2
 
[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기NAVER D2
 
[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2NAVER D2
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fiNAVER D2
 
[233] level 2 network programming using packet ngin rtos
[233] level 2 network programming using packet ngin rtos[233] level 2 network programming using packet ngin rtos
[233] level 2 network programming using packet ngin rtosNAVER D2
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼NAVER D2
 
[212] large scale backend service develpment
[212] large scale backend service develpment[212] large scale backend service develpment
[212] large scale backend service develpmentNAVER D2
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝NAVER D2
 
[214] data science with apache zeppelin
[214] data science with apache zeppelin[214] data science with apache zeppelin
[214] data science with apache zeppelinNAVER D2
 
[222]대화 시스템 서비스 동향 및 개발 방법
[222]대화 시스템 서비스 동향 및 개발 방법[222]대화 시스템 서비스 동향 및 개발 방법
[222]대화 시스템 서비스 동향 및 개발 방법NAVER D2
 
[213] ethereum
[213] ethereum[213] ethereum
[213] ethereumNAVER D2
 
[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_sparkNAVER D2
 

En vedette (20)

[244] 분산 환경에서 스트림과 배치 처리 통합 모델
[244] 분산 환경에서 스트림과 배치 처리 통합 모델[244] 분산 환경에서 스트림과 배치 처리 통합 모델
[244] 분산 환경에서 스트림과 배치 처리 통합 모델
 
[242] wifi를 이용한 실내 장소 인식하기
[242] wifi를 이용한 실내 장소 인식하기[242] wifi를 이용한 실내 장소 인식하기
[242] wifi를 이용한 실내 장소 인식하기
 
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
 
[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuit[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuit
 
[223] h base consistent secondary indexing
[223] h base consistent secondary indexing[223] h base consistent secondary indexing
[223] h base consistent secondary indexing
 
[232] 수퍼컴퓨팅과 데이터 어낼리틱스
[232] 수퍼컴퓨팅과 데이터 어낼리틱스[232] 수퍼컴퓨팅과 데이터 어낼리틱스
[232] 수퍼컴퓨팅과 데이터 어낼리틱스
 
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
 
[224] 번역 모델 기반_질의_교정_시스템
[224] 번역 모델 기반_질의_교정_시스템[224] 번역 모델 기반_질의_교정_시스템
[224] 번역 모델 기반_질의_교정_시스템
 
[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기
 
[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fi
 
[233] level 2 network programming using packet ngin rtos
[233] level 2 network programming using packet ngin rtos[233] level 2 network programming using packet ngin rtos
[233] level 2 network programming using packet ngin rtos
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼
 
[212] large scale backend service develpment
[212] large scale backend service develpment[212] large scale backend service develpment
[212] large scale backend service develpment
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝
 
[214] data science with apache zeppelin
[214] data science with apache zeppelin[214] data science with apache zeppelin
[214] data science with apache zeppelin
 
[222]대화 시스템 서비스 동향 및 개발 방법
[222]대화 시스템 서비스 동향 및 개발 방법[222]대화 시스템 서비스 동향 및 개발 방법
[222]대화 시스템 서비스 동향 및 개발 방법
 
[213] ethereum
[213] ethereum[213] ethereum
[213] ethereum
 
[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark
 

Similaire à [243] turning data into value

Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer universityLászló Kovács
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.Theo Schlossnagle
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUnity Technologies
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learningMax Pagels
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Srinath Perera
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New BossAndreas Dewes
 
Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Dr. Kim (Kyllesbech Larsen)
 
Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.Shashank Garg
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Mathieu DESPRIEE
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningGovind Mudumbai
 
Modex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual OverviewModex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual OverviewModex
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkSaratoga
 

Similaire à [243] turning data into value (20)

Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model training
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning
 
DataScience_introduction.pdf
DataScience_introduction.pdfDataScience_introduction.pdf
DataScience_introduction.pdf
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
ML basics.pptx
ML basics.pptxML basics.pptx
ML basics.pptx
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.
 
Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Modex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual OverviewModex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual Overview
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you Think
 

Plus de NAVER D2

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다NAVER D2
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...NAVER D2
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발NAVER D2
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈NAVER D2
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&ANAVER D2
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기NAVER D2
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep LearningNAVER D2
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applicationsNAVER D2
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingNAVER D2
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지NAVER D2
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기NAVER D2
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화NAVER D2
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)NAVER D2
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual SearchNAVER D2
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화NAVER D2
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지NAVER D2
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터NAVER D2
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?NAVER D2
 

Plus de NAVER D2 (20)

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
 

Dernier

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

[243] turning data into value

  • 1.
  • 2. Ph.D in Computer Science at ENS Paris/INRIA Postdoctoral Fellow at Carnegie Mellon University >500 citations, Best Paper Award at 2009 CVPR Conference NEC Labs (Bell Labs) in Cupertino (Silicon Valley) Senior Researcher at Intel (3 pending patents) - Developed ML algorithms for face recognition Invited speaker to CMU, Samsung, Tokyo Univ, SNU, etc. Co-Founder of Solidware Olivier Duchenne Co-founder | Chief Machine Learning Scientist 8 years experience in Machine learning, Computer Vision and Big Data
  • 3. Guidelines for using Machine Learning on real data Avoid Common Mistakes Understand Better the Data 1.Big Enough Data? 2.Changing Data Machine Learning and Data Science
  • 4. From Computer Vision Experience To Solving Companies issues: Ex: car accident prediction (insurance), default prediction (bank), stock value prediction Machine Learning and Data Science
  • 5. Prediction Function Predicted Target Value ML Algorithms analyze historical data to detect patterns PAST DATA (Training Data Set) Internal Data Ex: Age, Gender External Data Ex: Web Crawl Target Value Machine-Learning based Predictive Modeling Newly Incoming Data Unknown Target Value Internal Data External Data
  • 6. 1. Prediction Function. Ex: a linear function, a neural net,… 2. The prediction function is parametrized. Ex: 𝐟 𝜶 𝐗 = 𝜶𝒊 𝑿𝒊𝒊 3. The goal is to find the best prediction function, i.e. the best parameters. 4. We build an objective function, that represents how good a prediction function is. 5. The objective function always has a data term. Ex: 𝐨𝐛𝐣 𝜶 = 𝒇 𝜶 𝑿 𝒔 − 𝒀 𝒔 𝟐 𝒔 6. The algorithm tries to find the best parameters, that optimizes this objective function. Ex: closed form solution, stochastic gradient descent, … Basic Explanation of Machine Learning
  • 7. History of Machine Learning for Computer Vision Model-Driven Mixed Data-Driven 1970s Hand-designed Model 1980s Alignment Method 2000s Deformable Model 2010s Conv. Network 1990s Grid Model
  • 8. Why didn’t people use ML since the beginning? General Assumptions for the reason 1.“Better Computer” available now 2.“Better Algorithms” 3.“Amount of Data” “We create so much data that 90% of the data in the world today has been created in the last two years alone” - Petter Bae Brandtzæ g, SINTEF ICT
  • 9. How much data did CV Researcher use? Image source: http://www.vision.caltech.edu/ Image source: http://doi.ieeecomputersociety.org/ 2004 Caltech 101 10K Images 2005-2010 Pascal VOC 2K  30K objects 2010-2015 Image Net 10M  15M images http://www.image-net.org/
  • 10. The answer is… “Amount of Data” Image source - Smartdatacollective.com • Most Advanced Machine Learning cannot be applied if there are not enough data • Critical mass of data is necessary to use, for example, deep learning • When the amount of data increases, the machine learning models and, therefore, the prediction model becomes more complex and better
  • 11. With enough data, ANY algorithms okay? Support vector machines Bayesian networks Regression forestSparse dictionary learning Artificial neural networksK-Nearest neighbors Deep learning Boosting Deep Learning Neural Networks Log. Regression No, it depends on the company and the problem you are trying to solve A B C
  • 12. What Changed in Machine Learning Domain From the Past to the Present:
  • 13. Synonym: Over generalizing That is like visiting a new place during one day, seeing a mountain fire. And believing that there are fires everyday there. Why do we need lots of data? Overfitting In real life, we do not have many chances of having clean & BIG data
  • 14. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Seoul Busan Daejeon Gwangju Prob. To default Prob. To default … (many more cities) An example: Overfitting due to lack of data As there are many categories, some categories with small data show outlier results
  • 15. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Seoul Busan Daejon Kyangju Prob. To default Prob. To default … (many more cities) So, always use error bars
  • 16. You want to detect an event which occur on average with probability: p=5% Let’s say you have many cities with ~50 samples On average, 1/13 will have this event 0 times. Without proper handling, the extreme case, will be all wrong. This kind of error can happen often
  • 17. How to fight against overfitting Data More Samples Less Variables Artificial Data Extension Algorithm Simpler Objective Function Regularization Bagging Modeling Feature Engineering Data Normalization
  • 18. Data In Computer Vision, it is possible to extend the data. Ex: Hiring annotator, Amazon Mechanical Turk, Google Re-Captcha Companies often they have a limited number of samples, and cannot extend it. Ex: A Korean Bank that gives ~100K loans per year
  • 19. 1. Count only positives ( Detecting rare events require more data) Ex: Image Detection. It’s easy to find an infinite number of negatives. Often company want to detect rare events (few positives) Ex: predicting car accident / ad clicks / defaults / online purchase How to count your data?
  • 20. 2. Difficulty of the task How to count your data? • Learning addition ( 𝒚 = 𝟏 ∗ 𝑿 𝟏 + 𝟏 ∗ 𝑿 𝟐 ) (Requires ~100 samples) • Learning object recognition ( Requires ~10M samples)
  • 21. 3. Probabilistic event detection is harder. What is in this image? Will this user click on a car advertisement? Client #1: Male, 27y.o, lives in Seoul, Salary man in the construction sector, already previously clicked on a car advertisement Client #2: Male, 27y.o, lives in Seoul, Salary man in the construction sector, already previously clicked on a car advertisement Yes No How to count your data?
  • 22. Algorithm 1. Many algorithms exist: GLM, Boosting, Lasso, Regression Forest, SVM, Gaussian Process, Bayesian Networks, Deep Learning, … 2. The complexity of their prediction functions differ. 3. The more complex the prediction function is, the more it fits the data. Purchase Prob. Age Purchase Prob. Age Purchase Prob. Age Underfitting Overfitting Algorithm
  • 23. 1. Less parameters  Less overfitting 2. More parameters  Less underfitting 3. Ex: Best of both worlds: Deep Conv Nets Algorithm
  • 24. Avoiding “Too Many Categories” problem Busan Seoul Dae- jeon Dae -gou Po- hang In- cheon Soo- won Ul- San
  • 25. Avoiding “Too Many Categories” problem Busan Seoul Dae- jeon Dae -gou Po- hang In- cheon Soo- won Ul- San Grouping Merging
  • 26. Avoiding “Too Many Categories” problem 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 1 2 3 4 5 6 Prob. To default Prob. To default log10(population)
  • 27. Regularization 𝑚𝑖𝑛 𝜃 𝑠 𝑙𝑜𝑠𝑠 𝑔𝑡 𝑠, 𝑓𝜃 𝑋𝑠 + 𝜆Ω(𝜃) 𝑚𝑖𝑛 𝜃 𝑠 𝑙𝑜𝑠𝑠 𝑔𝑡 𝑠, 𝑓𝜃 𝑋𝑠 , s.t. Ω 𝜃 < 𝜆 Ω 𝜃 = 𝜃 2 𝜃 1
  • 28. Data Normalization Removing variance that has no impact on the target value  Help the ML system to focus on meaningful variance Deep Face (Facebook 2014), DB size: 120M images
  • 29. Bagging 1. Randomly modify slightly the training set. 2. Do the training 3. Repeat 4. Average all prediction functions
  • 30. • Market changes • Law/Regulation Changes • Collected Data changes • Client filtering / Marketing changes  Data change through time  Representation of data change • Variable names change • Category names change Changing Data • Cyclic Data Changes  Seasonality • Trending has to be handled separately  Interpolation – Extrapolation
  • 31. Why is time so different from other variables ? Prob. To buy A smartphone Age Prob. To buy A smartphone Time ? ? Interpolation Extrapolation
  • 32. Time is correlated with hidden variables Cost for car insurance (one type of insurance) Time New Law
  • 33. Change causes can be unknown, but consistant Cost for car insurance (one type of insurance) Time
  • 34. Seasonality Cost for car insurance (one type of insurance) Time
  • 35. Changing Data Representation • Collected Data changes • Category splitting, merging • Variable names change • Category names change
  • 36. Job Applications: contact@solidware.io Visit our booth  Thank you Visit our website: solidware.io