"Reproducible data science and business solutions" presentation by Antonio Rueda-Toicen for the Women and Diversity in Economics Group at the University of San Francisco, 21/04/2021.
2. ● Translation of business problems to technical solutions
● Secure medical records
● Problems in computer vision
Image quality enhancement aka ‘beautification’
Image similarity evaluation aka ‘matching’
Image classification aka ‘tagging’
We’ll be talking about
3. Antonio Rueda-Toicen
Senior Data Scientist at Parkling GmbH
● Work on computer vision
● Background in computer science & biomedical applications
● Previously worked in academia, now teach data science at DSR and Thinkful
● Currently host the Berlin Computer Vision Group (look us up in Meetup!)
About me
7. What is ‘computer vision’?
What a human sees What the computer ‘sees’
8. ● We are a search engine of vacation rentals
● We have 17 million offers and hundreds of millions of
images, the largest vacation rental inventory in the world
● Users want to envision the experience of a rental before
booking
Why we do computer vision at
HomeToGo?
15. Iphone 3GS camera Canon 70D (DSLR camera)
3 MP 20 MP
2048 x 1536 image size 3648 x 2432 image size
Original Blurred Original Blurred
How does image quality change look?
15
22. Why do we need to match offers
● Inventory understanding (we have a lot of it!)
● Providing the best deals for our users (sample use case: strike prices)
22
23. ● Semantic similarity can be different to perceptual similarity
● We use a variety of distance and similarity metrics
● We also use different models ensembled in a deduplication pipeline
Evaluating similarity
23
26. How we evaluate our matching algorithms
True Positive = duplicate labeled as duplicate
True Negative = non duplicate labeled as non duplicate
False Positive = non duplicate labeled as duplicate
False Negative = duplicate labeled as non duplicate
26
34. Image Classification
● Outdoor
● Building
● Snow? Do we care about snow?
○ Enough of these images need
to be shown to the algorithm
what the computer “sees”
35. Why we do image classification?
● Inventory understanding
○ How many of our offers have pools, balconies, sea views?
○ Which images have better conversion rates?
● Targeted advertisement (SEO, CRM)
○ Newsletters
○ SEO landing pages
36. What do users care about?
● We do user research to define data
taxonomies
● We also define which rules are
convenient/feasible for our
algorithms
○ E.g. ‘if the sky is visible but we
are looking at it through a
window, the image should be
labeled as “indoor”’
36
42. Getting more out of the humans in the loop
“Anybody that is trying to solve the problem of image tagging within a company
ends up rediscovering ‘active learning’, which is just using your model to guide
your labeling. Why should we be labeling everything if the machine is only doing
mistakes on these two hard classes?”
Jeremy Howard
● Services like Amazon SageMaker Groundtruth and human labeling in the
Google Vision API platform make this easier
42