SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Simon Chan
simon@prediction.io
Data Science London - April 24, 2013
Big Data Week
Machine Learning is....
computers learning to predict
from data
putting
Machine Learning
into practice
challenge #1
Scalability
Big Data Bottlenecks
Machine Learning Processing
PredictionIO has a
horizontally scalable
architecture
Async SDK
Client client = new Client(appkey);
// Adding user behaviors
req = client.getUserRateItemRequestBuilder(uid, iid, rate);
client.userRateItemAsFuture(req);
Play
Framework
‣ stateless - no server session
‣ non-blocking web request
Play: A Non-blocking Example
def index = Action {
val futureInt = scala.concurrent.Future { slowDataProcess() }
Async {
futureInt.map(i => Ok(views.html.result.render(i)))
}
}
MongoDB
‣ Read scaling: Replica Sets
‣ Write scaling: Sharding
‣ Indexes (e.g. geospatial)
{ geoSearch : "places", near : [33, 33],
maxDistance : 6, search : { uid : "user1" } }
Hadoop
Hadoop&
Cascading&(Java)&
Scalding&(Scala)&
MapReduce
- Native Java
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws .....{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) { sum += val.get(); }
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
MapReduce
- Scalding
class ScaldingTestJob(args: Args) extends Job(args) {
Tsv(args(0), 'text)
.flatMap('text -> 'word) { text : String => text.split("s+") }
.groupBy('word) { _.size }
.write(Tsv(args(1))
}
Sample Code
### Sample PredictionIO Python SDK Code
client = predictionio.Client(appkey="<your app key>")
# Add Data
client.create_user(uid=”user123”)
client.create_item(iid=”itemXYZ”, itypes=(1,))
client.user_view_item(uid=”user123”, iid=”itemXYZ”)
# Get Prediction
rec = client.get_itemrec(engine="<engine name>", uid=”user123”, n=5)
Getting
Involved!
- @PredictionIO
- prediction.io - Newsletter
- github.com/predictionio
Q&A
Q: Selecting the right features is a big problem. Can PredictionIO solve this problem?
A: Not at this moment.That’s why we focus on collaborative filtering algorithms right now
which don’t require the use of features.And we believe that the involvement of data
scientists is needed for many specific problems. PredictionIO is positioned as a tool to
make their work easier, but not as a replacement.
Q: How’s PredictionIO different from Weka?
A:Weka, like Mahout, is a ML algorithm library.You can see PredictionIO as a layer on top
of it, which helps you to implement algorithm into production environment by providing a
complete infrastructure.
Q: How do you compare PredictionIO with RapidMiner?
A: RapidMiner is a great product to define data engineering workflow visually.
PredictionIO focuses on a different problem -- i.e. deploying ML solution into production
environment.
Q: How does the algorithm evaluation metrics work in PredictionIO?
A: At this moment, you can evaluate algorithms by some offline metrics, such as Mean
Average Precision, based on your existing data.
Q:What’s the business model?
A: We focus on making PredictionIO a useful open source product at this moment.

Contenu connexe

Tendances

Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...Ontico
 
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCMegan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCSri Ambati
 
Introducing AWS AppSync: serverless data driven apps with real-time and offli...
Introducing AWS AppSync: serverless data driven apps with real-time and offli...Introducing AWS AppSync: serverless data driven apps with real-time and offli...
Introducing AWS AppSync: serverless data driven apps with real-time and offli...Amazon Web Services
 
Using Azure Machine Learning Models
Using Azure Machine Learning ModelsUsing Azure Machine Learning Models
Using Azure Machine Learning ModelsEng Teong Cheah
 
Intershop Commerce Management with Microsoft SQL Server
Intershop Commerce Management with Microsoft SQL ServerIntershop Commerce Management with Microsoft SQL Server
Intershop Commerce Management with Microsoft SQL ServerMauro Boffardi
 
Supercharging Applications with GraphQL and AWS AppSync
Supercharging Applications with GraphQL and AWS AppSyncSupercharging Applications with GraphQL and AWS AppSync
Supercharging Applications with GraphQL and AWS AppSyncAmazon Web Services
 
How BigQuery broke my heart
How BigQuery broke my heartHow BigQuery broke my heart
How BigQuery broke my heartGabriel Hamilton
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLChijioke “CJ” Ejimuda
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...javier ramirez
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsJohann Schleier-Smith
 
Pathway to Cloud-Native .NET
Pathway to Cloud-Native .NETPathway to Cloud-Native .NET
Pathway to Cloud-Native .NETVMware Tanzu
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the CloudRoss McNeely
 
Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiSri Ambati
 
Performance optimisation with GraphQL
Performance optimisation with GraphQLPerformance optimisation with GraphQL
Performance optimisation with GraphQLyann_s
 
Big objects in Salesforce Technology
Big objects in Salesforce TechnologyBig objects in Salesforce Technology
Big objects in Salesforce TechnologyDivya Agrawal
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsJohann Schleier-Smith
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud eventPreetyKhatkar
 

Tendances (20)

Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
 
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCMegan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
 
AppSync and GraphQL on iOS
AppSync and GraphQL on iOSAppSync and GraphQL on iOS
AppSync and GraphQL on iOS
 
Introducing AWS AppSync: serverless data driven apps with real-time and offli...
Introducing AWS AppSync: serverless data driven apps with real-time and offli...Introducing AWS AppSync: serverless data driven apps with real-time and offli...
Introducing AWS AppSync: serverless data driven apps with real-time and offli...
 
Using Azure Machine Learning Models
Using Azure Machine Learning ModelsUsing Azure Machine Learning Models
Using Azure Machine Learning Models
 
Intershop Commerce Management with Microsoft SQL Server
Intershop Commerce Management with Microsoft SQL ServerIntershop Commerce Management with Microsoft SQL Server
Intershop Commerce Management with Microsoft SQL Server
 
Supercharging Applications with GraphQL and AWS AppSync
Supercharging Applications with GraphQL and AWS AppSyncSupercharging Applications with GraphQL and AWS AppSync
Supercharging Applications with GraphQL and AWS AppSync
 
How BigQuery broke my heart
How BigQuery broke my heartHow BigQuery broke my heart
How BigQuery broke my heart
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQL
 
Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
Pathway to Cloud-Native .NET
Pathway to Cloud-Native .NETPathway to Cloud-Native .NET
Pathway to Cloud-Native .NET
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
 
Performance optimisation with GraphQL
Performance optimisation with GraphQLPerformance optimisation with GraphQL
Performance optimisation with GraphQL
 
Big objects in Salesforce Technology
Big objects in Salesforce TechnologyBig objects in Salesforce Technology
Big objects in Salesforce Technology
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud event
 

En vedette

PredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and AppsPredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and Appspredictionio
 
Machine Learning & Ecommerce - by David Jones - PAPIs Connect
Machine Learning & Ecommerce - by David Jones - PAPIs ConnectMachine Learning & Ecommerce - by David Jones - PAPIs Connect
Machine Learning & Ecommerce - by David Jones - PAPIs ConnectPAPIs.io
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Nikhil Garg
 
Инфраструктура как услуга (IaaS) в Windows Azure
Инфраструктура как услуга (IaaS) в Windows AzureИнфраструктура как услуга (IaaS) в Windows Azure
Инфраструктура как услуга (IaaS) в Windows AzureNatalia Efimtseva
 
Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...
Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...
Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...Microsoft
 
Презентация MS Azure
Презентация MS AzureПрезентация MS Azure
Презентация MS AzureDmitry Moskvin
 
Naive application of Machine Learning to Software Development
Naive application of Machine Learning to Software DevelopmentNaive application of Machine Learning to Software Development
Naive application of Machine Learning to Software DevelopmentAndriy Khavryuchenko
 
Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clusteringbutest
 
Pragmatic machine learning for the real world
Pragmatic machine learning for the real worldPragmatic machine learning for the real world
Pragmatic machine learning for the real worldLouis Dorard
 
Setting up a Machine Learning Platform - Monitoring social media the “smart” way
Setting up a Machine Learning Platform - Monitoring social media the “smart” waySetting up a Machine Learning Platform - Monitoring social media the “smart” way
Setting up a Machine Learning Platform - Monitoring social media the “smart” way10x Nation
 
Seldon - Open Sourcing a Predictive API - Data Science London #ds_ldn
Seldon - Open Sourcing a Predictive API - Data Science London #ds_ldnSeldon - Open Sourcing a Predictive API - Data Science London #ds_ldn
Seldon - Open Sourcing a Predictive API - Data Science London #ds_ldnAlex Housley
 
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Sparksscdotopen
 
Big wins with small data. PredictionIO in ecommerce - David Jones
Big wins with small data. PredictionIO in ecommerce - David JonesBig wins with small data. PredictionIO in ecommerce - David Jones
Big wins with small data. PredictionIO in ecommerce - David JonesPAPIs.io
 
Prediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutPrediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutHa Phuong
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...Vishal Chowdhary
 
AI For Enterprise
AI For EnterpriseAI For Enterprise
AI For EnterpriseNVIDIA
 
The Universal Recommender
The Universal RecommenderThe Universal Recommender
The Universal RecommenderPat Ferrel
 

En vedette (19)

PredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and AppsPredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and Apps
 
Machine Learning & Ecommerce - by David Jones - PAPIs Connect
Machine Learning & Ecommerce - by David Jones - PAPIs ConnectMachine Learning & Ecommerce - by David Jones - PAPIs Connect
Machine Learning & Ecommerce - by David Jones - PAPIs Connect
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
 
Инфраструктура как услуга (IaaS) в Windows Azure
Инфраструктура как услуга (IaaS) в Windows AzureИнфраструктура как услуга (IaaS) в Windows Azure
Инфраструктура как услуга (IaaS) в Windows Azure
 
Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...
Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...
Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...
 
Презентация MS Azure
Презентация MS AzureПрезентация MS Azure
Презентация MS Azure
 
Naive application of Machine Learning to Software Development
Naive application of Machine Learning to Software DevelopmentNaive application of Machine Learning to Software Development
Naive application of Machine Learning to Software Development
 
Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clustering
 
Discovery
DiscoveryDiscovery
Discovery
 
Pragmatic machine learning for the real world
Pragmatic machine learning for the real worldPragmatic machine learning for the real world
Pragmatic machine learning for the real world
 
Setting up a Machine Learning Platform - Monitoring social media the “smart” way
Setting up a Machine Learning Platform - Monitoring social media the “smart” waySetting up a Machine Learning Platform - Monitoring social media the “smart” way
Setting up a Machine Learning Platform - Monitoring social media the “smart” way
 
Seldon - Open Sourcing a Predictive API - Data Science London #ds_ldn
Seldon - Open Sourcing a Predictive API - Data Science London #ds_ldnSeldon - Open Sourcing a Predictive API - Data Science London #ds_ldn
Seldon - Open Sourcing a Predictive API - Data Science London #ds_ldn
 
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Spark
 
Big wins with small data. PredictionIO in ecommerce - David Jones
Big wins with small data. PredictionIO in ecommerce - David JonesBig wins with small data. PredictionIO in ecommerce - David Jones
Big wins with small data. PredictionIO in ecommerce - David Jones
 
Prediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutPrediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handout
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
 
201203 Adaptive Empathetic Software
201203 Adaptive Empathetic Software201203 Adaptive Empathetic Software
201203 Adaptive Empathetic Software
 
AI For Enterprise
AI For EnterpriseAI For Enterprise
AI For Enterprise
 
The Universal Recommender
The Universal RecommenderThe Universal Recommender
The Universal Recommender
 

Similaire à PredictionIO - Scalable Machine Learning Architecture

GDSC Backend Bootcamp.pptx
GDSC Backend Bootcamp.pptxGDSC Backend Bootcamp.pptx
GDSC Backend Bootcamp.pptxSaaraBansode
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...
How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...
How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...Matt Spradley
 
Evolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB StitchEvolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB StitchMongoDB
 
Developing Next-Gen Enterprise Web Application
Developing Next-Gen Enterprise Web ApplicationDeveloping Next-Gen Enterprise Web Application
Developing Next-Gen Enterprise Web ApplicationMark Gu
 
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018 Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018 Codemotion
 
MongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB
 
I Know It Was MEAN, But I Cut the Cord to LAMP Anyway
I Know It Was MEAN, But I Cut the Cord to LAMP AnywayI Know It Was MEAN, But I Cut the Cord to LAMP Anyway
I Know It Was MEAN, But I Cut the Cord to LAMP AnywayAll Things Open
 
Large scale data capture and experimentation platform at Grab
Large scale data capture and experimentation platform at GrabLarge scale data capture and experimentation platform at Grab
Large scale data capture and experimentation platform at GrabRoman
 
Sufan presentation
Sufan presentationSufan presentation
Sufan presentationSufanhk
 
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...MongoDB
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemYael Garten
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data cultureSimon Dittlmann
 
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di PalmaEvolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di PalmaMongoDB
 

Similaire à PredictionIO - Scalable Machine Learning Architecture (20)

GDSC Backend Bootcamp.pptx
GDSC Backend Bootcamp.pptxGDSC Backend Bootcamp.pptx
GDSC Backend Bootcamp.pptx
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?
 
Coding Naked 2023
Coding Naked 2023Coding Naked 2023
Coding Naked 2023
 
How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...
How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...
How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...
 
Evolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB StitchEvolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB Stitch
 
Mobile optimization
Mobile optimizationMobile optimization
Mobile optimization
 
Abhishek_Kumar
Abhishek_KumarAbhishek_Kumar
Abhishek_Kumar
 
Developing Next-Gen Enterprise Web Application
Developing Next-Gen Enterprise Web ApplicationDeveloping Next-Gen Enterprise Web Application
Developing Next-Gen Enterprise Web Application
 
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018 Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
 
MongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDB
 
Clean Architecture @ Taxibeat
Clean Architecture @ TaxibeatClean Architecture @ Taxibeat
Clean Architecture @ Taxibeat
 
I Know It Was MEAN, But I Cut the Cord to LAMP Anyway
I Know It Was MEAN, But I Cut the Cord to LAMP AnywayI Know It Was MEAN, But I Cut the Cord to LAMP Anyway
I Know It Was MEAN, But I Cut the Cord to LAMP Anyway
 
Large scale data capture and experimentation platform at Grab
Large scale data capture and experimentation platform at GrabLarge scale data capture and experimentation platform at Grab
Large scale data capture and experimentation platform at Grab
 
Sufan presentation
Sufan presentationSufan presentation
Sufan presentation
 
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data culture
 
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di PalmaEvolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 

Dernier (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

PredictionIO - Scalable Machine Learning Architecture

  • 1. Simon Chan simon@prediction.io Data Science London - April 24, 2013 Big Data Week
  • 2. Machine Learning is.... computers learning to predict from data
  • 5. Big Data Bottlenecks Machine Learning Processing
  • 6. PredictionIO has a horizontally scalable architecture
  • 7.
  • 8. Async SDK Client client = new Client(appkey); // Adding user behaviors req = client.getUserRateItemRequestBuilder(uid, iid, rate); client.userRateItemAsFuture(req);
  • 9. Play Framework ‣ stateless - no server session ‣ non-blocking web request
  • 10. Play: A Non-blocking Example def index = Action { val futureInt = scala.concurrent.Future { slowDataProcess() } Async { futureInt.map(i => Ok(views.html.result.render(i))) } }
  • 11. MongoDB ‣ Read scaling: Replica Sets ‣ Write scaling: Sharding ‣ Indexes (e.g. geospatial) { geoSearch : "places", near : [33, 33], maxDistance : 6, search : { uid : "user1" } }
  • 13. MapReduce - Native Java public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws .....{ String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
  • 14. MapReduce - Scalding class ScaldingTestJob(args: Args) extends Job(args) { Tsv(args(0), 'text) .flatMap('text -> 'word) { text : String => text.split("s+") } .groupBy('word) { _.size } .write(Tsv(args(1)) }
  • 16. ### Sample PredictionIO Python SDK Code client = predictionio.Client(appkey="<your app key>") # Add Data client.create_user(uid=”user123”) client.create_item(iid=”itemXYZ”, itypes=(1,)) client.user_view_item(uid=”user123”, iid=”itemXYZ”) # Get Prediction rec = client.get_itemrec(engine="<engine name>", uid=”user123”, n=5)
  • 17. Getting Involved! - @PredictionIO - prediction.io - Newsletter - github.com/predictionio
  • 18. Q&A Q: Selecting the right features is a big problem. Can PredictionIO solve this problem? A: Not at this moment.That’s why we focus on collaborative filtering algorithms right now which don’t require the use of features.And we believe that the involvement of data scientists is needed for many specific problems. PredictionIO is positioned as a tool to make their work easier, but not as a replacement. Q: How’s PredictionIO different from Weka? A:Weka, like Mahout, is a ML algorithm library.You can see PredictionIO as a layer on top of it, which helps you to implement algorithm into production environment by providing a complete infrastructure. Q: How do you compare PredictionIO with RapidMiner? A: RapidMiner is a great product to define data engineering workflow visually. PredictionIO focuses on a different problem -- i.e. deploying ML solution into production environment. Q: How does the algorithm evaluation metrics work in PredictionIO? A: At this moment, you can evaluate algorithms by some offline metrics, such as Mean Average Precision, based on your existing data. Q:What’s the business model? A: We focus on making PredictionIO a useful open source product at this moment.