SlideShare a Scribd company logo
1 of 26
Download to read offline
Dave Wetzel, COO and CTO, MLS Listings
Sergey Ermolin, Solutions Architect, Intel
Real Estate Search Ranking
with BigDL Framework on
Microsoft Azure Platform
#ExpSAIS16
MLSListings Inc, Sunnyvale, California
MLSListings Business Use-Case:
Personalized Visual Search Ranking
Image similarity is an
extra search parameter
along with area, location,
size, price, etc.
If you looked at this house….. You will want to look at this one, too
Business need: real estate search results need to be
sorted based on image similarities of attached photos
4
Implementation Example - 1
#ExpSAIS16
5
Implementation Example - 2
#ExpSAIS16
LIVE DEMO
6
High-Level Data Workflow
7#ExpSAIS16
OData Media
API
Machine
Learning
Process
Property
Details
WebPage
API
Ranked
Images
Top
10
8
Labeled Dataset
of Real Estate
Images
(Bing Search)
BigDL
VGG
Arch
BigDL
Trained
Model
+ =
Data Engineering
Deep
Learning
● Long Compute. 8500 images
● 2 nodes, 28 cores/node. 3 minutes for a one single pass
● Model parameters are changing.
● Repeat until convergence
● But: only do once !
Note-1: Images are *not* stored in the model
Note-2: you can trade compute resources for time.
Compute
TrainingDeep Learning Data Flow
#ExpSAIS16
Deep Learning Data Flow
9
Inference
Feature
Vector
BigDL
Trained
Model
Image Class
(Front, Bdr, Bath,…)
House Style Tag
(Ranch, Victorian,…)
House Levels (1, 2..)
Latent Features (25k
entries)
+
=
● Short Compute. Real Time
● 1 node, 1 core/node
● Model parameters unchanged.
● Only run once per image
● But: need to do for every image in
the searched dataset !
Compute Only
#ExpSAIS16
10
Labeled Dataset
of Real Esate
Images
(Bing Search)
BigDL
VGG
CNN
BigDL
Trained
Model
+ =
Feature
Vector
BigDL
Trained
Model
Image Class
(Front, Bdr, Bath,…)
House Style Tag
(Ranch, Victorian,…)
House Levels (1, 2..)
Latent Features (25k
entries)
+
=
TrainingInference
Deep Learning Data Flow – putting it together
#ExpSAIS16
11
Deep Learning Data Flow
Listing
images
BigDL
Trained
Model
Feature
Vectors
MLS Query
Landing Page
BigDL
Trained
Model
Cosine
Similarity
+ Tag
+ Class
Rank
Top
10
MLS
Listings
Front
End
Real-time
Ranking
#ExpSAIS16
(Weighed)
12
Deep Learning Data Flow
Labeled
Image
Dataset (Bing
Search)
Big DL Model+ =
Listing Images
Big DL
+
Model
Big DL
+
Model
Cosine
Similarity
+ Tag
+ Class
Rank
• Room Type (Class)
• House Style (Tag)
• House Levels (Tag)
• Features (25k vec)
Top
10
MLS
Listings
Front
End
#ExpSAIS16
13
Implementation Example - 3
#ExpSAIS16
Implementation - BigDL Building BigDL Graph
• Prepare Training/Validation data.
• Image Transformer:
– Image scale/crop
– Channel color normalizing
• Caffe Model Import
• Render BigDL as SparkML Transformer
• Create BigDL Linear SoftMax model
• Define Classifier, SparkML Transformer
• Set up SparkML Pipeline
Executing BigDL Graph
.
Referethub
https://github.com/intel-analytics/analytics-zoo
BigDL is an open-source distributed deep
learning library for Apache Spark* that can
run directly on top of existing Spark or
Apache Hadoop* clusters
Feature Parity &
Model Exchange
with TensorFlow*,
Caffe*, Keras, Torch*
Lower TCO and
improved ease of
use with existing
infrastructure
Deep Learning on
Big Data Platform,
Enabling Efficient
Scale-Out
BigDL
Spark Core
High BigDL: Performance Deep Learning for
Apache Spark* on CPU Infrastructure
No need to deploy costly GPUs, duplicate data,
or suffer through scaling headaches!
Designed and Optimized for Intel® Xeon®
Ideal for DL Models TRAINING and INFERENCE
Powered by Intel® MKL and multi-threaded programming
#ExpSAIS16 15
https://github.com/intel-analytics/analytics-zoo
ModelsInteroperability Support
• Model Snapshot
• Long training work checkpoint
• Model deployment and sharing
• Fine-tune
• Caffe/Torch/Tensorflow Model Support
• Model file load
• Easy to migrate your Caffe/Torch/Tensorflow
code base to Spark
• NEW - BigDL supports loading pre-defined
Keras models (Keras 1.2.2)
Caffe Model
File
Torch Model
File
Storage
BigDL
BigDL Model
File
Load
Save
Tensorflow
Model File
16
#ExpSAIS16 #ExpSAIS16
https://github.com/intel-analytics/analytics-zoo
Visualization for Learning
BigDL integration with TensorBoard
• TensorBoard is a suite of web applications from Google for visualizing and
understanding deep learning applications
17
#ExpSAIS16https://github.com/intel-analytics/analytics-zoo
2018-BigDLAnalyticsZooStack
Reference Use Cases
Anomaly detection, sentiment analysis, fraud detection,
chatbot, sequence prediction, etc.
Built-In Algorithms and
Models
Image classification, object detection, text classification,
recommendations, GAN, etc.
Feature Engineering and
Transformations
Image, text, speech, 3D imaging, time series, etc.
High-Level Pipeline APIs
DataFrames, ML Pipelines, Autograd, Transfer Learning,
etc.
Runtime Environment Spark, BigDL, Python, etc.
Making it easier to build end-to-end analytics + AI applications
https://github.com/intel-analytics/analytics-zoo
Engineering Team
19
● Data scientist, proficient in Machine Learning / Deep Learning
● Software Engineer, experience with Apache Spark.
● Technical project manager
Domain Expertise:
● Machine Learning / Deep Learning,
● Python, Scala
● Software Engineer, Web API
● Software Engineer, Web UI
Domain Expertise:
● OData, .net Core MSSQL
● C#, HTML, JavaScript
#ExpSAIS16
“ROAD AHEAD”
20
“Fireplace in the living room”
1. Feature extraction and tagging.
2. Image-based listings search
3. Feature verification based on listing images.
4. Image-based compliance and quality check
21
LIVE DEMO
22
Infrastructure
23#ExpSAIS16
MLS Listings Apps and Services
24#ExpSAIS16
Microsoft
Azure
RESO
ODATA
API
homes.mlslistings.com
MLSL
Azure
Web App
Service
Azure
SQL
DBData Feed Customers
Full
PayloadVOW
PayloadIDX
Payload
OPENIDCONNECT
Big DL
Roles and Responsibilities
• Microsoft: Microsoft's data science team in Mountain View, CA participated in project
discussions and provided Azure Data Science VM to deploy and train the deep learning
model.
• • Microsoft - Apache Spark Cloud Service Provider
• • Intel - BigDL distributed Deep Learning Library
• • MLSListings - RESO Web API Provider
• Intel: Team members worked to integrate MLSListings’s OData Media Services to deploy a
custom real estate image similarity comparison solution on Azure using Big DL.
• MLSListings : MLSListings’s team working on new web portal provided Media API and
worked on the user interface to integrate with Big DL API.
25#ExpSAIS16
BigDL: Python API
• Support deep learning model training,
evaluation, inference
• Support Spark v1.6 - 2.2
• Support Python 2.7/3.5/3.6
• Based on PySpark, Python API in
BigDL allows use of existing Python
libs (Numpy, Scipy, Pandas, Scikit-
learn, NLTK, Matplotlib, etc)
26#ExpSAIS16

More Related Content

What's hot

KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Databricks
 

What's hot (20)

Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data PayloadsHarnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data Payloads
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu Ma
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFsAutomating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs
 
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
 
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
 
Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySpark
 

Similar to Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience at Scale with Sergey Ermolin and Dave Wetzel

Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
Peyman Mohajerian
 

Similar to Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience at Scale with Sergey Ermolin and Dave Wetzel (20)

Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Belgrade R - Intro to H2O and Deep Water
Belgrade R - Intro to H2O and Deep WaterBelgrade R - Intro to H2O and Deep Water
Belgrade R - Intro to H2O and Deep Water
 
H2O at BelgradeR Meetup
H2O at BelgradeR MeetupH2O at BelgradeR Meetup
H2O at BelgradeR Meetup
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Real-Time Image Recognition with Apache Spark with Nikita Shamgunov
Real-Time Image Recognition with Apache Spark with Nikita ShamgunovReal-Time Image Recognition with Apache Spark with Nikita Shamgunov
Real-Time Image Recognition with Apache Spark with Nikita Shamgunov
 
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
 
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience at Scale with Sergey Ermolin and Dave Wetzel

  • 1. Dave Wetzel, COO and CTO, MLS Listings Sergey Ermolin, Solutions Architect, Intel Real Estate Search Ranking with BigDL Framework on Microsoft Azure Platform #ExpSAIS16
  • 3. MLSListings Business Use-Case: Personalized Visual Search Ranking Image similarity is an extra search parameter along with area, location, size, price, etc. If you looked at this house….. You will want to look at this one, too Business need: real estate search results need to be sorted based on image similarities of attached photos
  • 7. High-Level Data Workflow 7#ExpSAIS16 OData Media API Machine Learning Process Property Details WebPage API Ranked Images Top 10
  • 8. 8 Labeled Dataset of Real Estate Images (Bing Search) BigDL VGG Arch BigDL Trained Model + = Data Engineering Deep Learning ● Long Compute. 8500 images ● 2 nodes, 28 cores/node. 3 minutes for a one single pass ● Model parameters are changing. ● Repeat until convergence ● But: only do once ! Note-1: Images are *not* stored in the model Note-2: you can trade compute resources for time. Compute TrainingDeep Learning Data Flow #ExpSAIS16
  • 9. Deep Learning Data Flow 9 Inference Feature Vector BigDL Trained Model Image Class (Front, Bdr, Bath,…) House Style Tag (Ranch, Victorian,…) House Levels (1, 2..) Latent Features (25k entries) + = ● Short Compute. Real Time ● 1 node, 1 core/node ● Model parameters unchanged. ● Only run once per image ● But: need to do for every image in the searched dataset ! Compute Only #ExpSAIS16
  • 10. 10 Labeled Dataset of Real Esate Images (Bing Search) BigDL VGG CNN BigDL Trained Model + = Feature Vector BigDL Trained Model Image Class (Front, Bdr, Bath,…) House Style Tag (Ranch, Victorian,…) House Levels (1, 2..) Latent Features (25k entries) + = TrainingInference Deep Learning Data Flow – putting it together #ExpSAIS16
  • 11. 11 Deep Learning Data Flow Listing images BigDL Trained Model Feature Vectors MLS Query Landing Page BigDL Trained Model Cosine Similarity + Tag + Class Rank Top 10 MLS Listings Front End Real-time Ranking #ExpSAIS16 (Weighed)
  • 12. 12 Deep Learning Data Flow Labeled Image Dataset (Bing Search) Big DL Model+ = Listing Images Big DL + Model Big DL + Model Cosine Similarity + Tag + Class Rank • Room Type (Class) • House Style (Tag) • House Levels (Tag) • Features (25k vec) Top 10 MLS Listings Front End #ExpSAIS16
  • 14. Implementation - BigDL Building BigDL Graph • Prepare Training/Validation data. • Image Transformer: – Image scale/crop – Channel color normalizing • Caffe Model Import • Render BigDL as SparkML Transformer • Create BigDL Linear SoftMax model • Define Classifier, SparkML Transformer • Set up SparkML Pipeline Executing BigDL Graph . Referethub https://github.com/intel-analytics/analytics-zoo
  • 15. BigDL is an open-source distributed deep learning library for Apache Spark* that can run directly on top of existing Spark or Apache Hadoop* clusters Feature Parity & Model Exchange with TensorFlow*, Caffe*, Keras, Torch* Lower TCO and improved ease of use with existing infrastructure Deep Learning on Big Data Platform, Enabling Efficient Scale-Out BigDL Spark Core High BigDL: Performance Deep Learning for Apache Spark* on CPU Infrastructure No need to deploy costly GPUs, duplicate data, or suffer through scaling headaches! Designed and Optimized for Intel® Xeon® Ideal for DL Models TRAINING and INFERENCE Powered by Intel® MKL and multi-threaded programming #ExpSAIS16 15 https://github.com/intel-analytics/analytics-zoo
  • 16. ModelsInteroperability Support • Model Snapshot • Long training work checkpoint • Model deployment and sharing • Fine-tune • Caffe/Torch/Tensorflow Model Support • Model file load • Easy to migrate your Caffe/Torch/Tensorflow code base to Spark • NEW - BigDL supports loading pre-defined Keras models (Keras 1.2.2) Caffe Model File Torch Model File Storage BigDL BigDL Model File Load Save Tensorflow Model File 16 #ExpSAIS16 #ExpSAIS16 https://github.com/intel-analytics/analytics-zoo
  • 17. Visualization for Learning BigDL integration with TensorBoard • TensorBoard is a suite of web applications from Google for visualizing and understanding deep learning applications 17 #ExpSAIS16https://github.com/intel-analytics/analytics-zoo
  • 18. 2018-BigDLAnalyticsZooStack Reference Use Cases Anomaly detection, sentiment analysis, fraud detection, chatbot, sequence prediction, etc. Built-In Algorithms and Models Image classification, object detection, text classification, recommendations, GAN, etc. Feature Engineering and Transformations Image, text, speech, 3D imaging, time series, etc. High-Level Pipeline APIs DataFrames, ML Pipelines, Autograd, Transfer Learning, etc. Runtime Environment Spark, BigDL, Python, etc. Making it easier to build end-to-end analytics + AI applications https://github.com/intel-analytics/analytics-zoo
  • 19. Engineering Team 19 ● Data scientist, proficient in Machine Learning / Deep Learning ● Software Engineer, experience with Apache Spark. ● Technical project manager Domain Expertise: ● Machine Learning / Deep Learning, ● Python, Scala ● Software Engineer, Web API ● Software Engineer, Web UI Domain Expertise: ● OData, .net Core MSSQL ● C#, HTML, JavaScript #ExpSAIS16
  • 20. “ROAD AHEAD” 20 “Fireplace in the living room” 1. Feature extraction and tagging. 2. Image-based listings search 3. Feature verification based on listing images. 4. Image-based compliance and quality check
  • 21. 21
  • 24. MLS Listings Apps and Services 24#ExpSAIS16 Microsoft Azure RESO ODATA API homes.mlslistings.com MLSL Azure Web App Service Azure SQL DBData Feed Customers Full PayloadVOW PayloadIDX Payload OPENIDCONNECT Big DL
  • 25. Roles and Responsibilities • Microsoft: Microsoft's data science team in Mountain View, CA participated in project discussions and provided Azure Data Science VM to deploy and train the deep learning model. • • Microsoft - Apache Spark Cloud Service Provider • • Intel - BigDL distributed Deep Learning Library • • MLSListings - RESO Web API Provider • Intel: Team members worked to integrate MLSListings’s OData Media Services to deploy a custom real estate image similarity comparison solution on Azure using Big DL. • MLSListings : MLSListings’s team working on new web portal provided Media API and worked on the user interface to integrate with Big DL API. 25#ExpSAIS16
  • 26. BigDL: Python API • Support deep learning model training, evaluation, inference • Support Spark v1.6 - 2.2 • Support Python 2.7/3.5/3.6 • Based on PySpark, Python API in BigDL allows use of existing Python libs (Numpy, Scipy, Pandas, Scikit- learn, NLTK, Matplotlib, etc) 26#ExpSAIS16