Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Computer Vision in Real Estate

Vassil Lunchev, CEO of Homeheed (https://www.homeheed.com/) presented at our July Meetup how to detect fake listings using #ComputerVision and #MachineLearning.

Imagine that you have 600,000 real estate listings with a total of 5,000,000 photos. What you know is that many of these listings are fake and some of the challenges Vassil shared in his presentation how you can detect the fake ones including the approach that works, and those which do not. Apart from that, he presented what kind of additional data is necessary to detect the fake ones.

  • Soyez le premier à commenter

Computer Vision in Real Estate

  1. 1. Computer Vision in Real Estate PRESENTED BY VASSIL LUNCHEV WWW.HOMEHEED.COM
  2. 2. There is a problem in Sofia’s Real Estate  Demo 1
  3. 3. Setup There are many duplicates We have 600,000 listings Many listings are fake Goals Cluster all listings into Homes Classify homes as available or not-available (as of today)
  4. 4. Approaches (for clustering) 1. Location (GPS coordinates) • Works great for Booking, Airbnb, Expedia • Only 24% of the listings have GPS coordinates • Even if a listing has location, it is ”wrong” 2. Texts, numbers and categories • Price, m2, district, text description, … • (demo 2) 3. Images
  5. 5. The image based approach 1. Find equal images • Dataset of 5,000,000 images • Keypoint matching 2. Given 2 listings, classify equal or non-equal • 1 listing has about 10 images • Machine learning classification 3. Is this home available today? • 2 main signals – history and reputation • Ground truth dataset
  6. 6. Finding equal images  Demo 3
  7. 7. Keypoint matching 1. Detect keypoings • 500 keypoints per image 2. Describe keypoints • 256 bits (32 bytes) per keypoint 3. Match keypoints • Hamming distance < THRESHOLD • Locality Sensitive Hashing (LSH) 4. Match images • RANSAC and homography
  8. 8. Keypoint detection
  9. 9. Keypoint detection
  10. 10. Keypoint description 256 bits 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0
  11. 11. Keypoint matching Keypoint 1 (256 bits) 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 Keypoint 2 (256 bits) 1 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 XOR = Result 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 Hamming distance: 2 <= 3 (constant THRESHOLD) => These keypoints are equal
  12. 12. Keypoint matching I think this keypoint is equal to that keypoint
  13. 13. Image matching I think the left image is equal to this part of the right image
  14. 14. The image based approach 1. Find equal images • Dataset of 5,000,000 images • Keypoint matching 2. Given 2 listings, classify equal or non-equal • 1 listing has about 10 images • Machine learning classification 3. Is this home available today? • 2 main signals – history and reputation • Ground truth dataset ✓ Presentation up to now
  15. 15. Given 2 listings, classify equal or non-equal  Random forest  Sources:  Image matches (each listing has about 10 images)  Uniqueness of the images  Numeric data (price, square meters, floor, year, …)  Category parameters (neighborhood, apartment type, build type, …)  Text data (bag of words from description)  Features are differences not absolute values
  16. 16. Is this home available today?  3 shades of fakeness • Available means “you can get in that home today” • Fake means “this home has never existed” • Outdated means “This home is already rented/sold”  Classification per day • Is this available (without a date) can be both True and False (Schrödinger’s listing) • A user looking at a snapshot of a listing (just today) misses most of the information Disclaimers:
  17. 17. Is this home available today? 2 main signals • Home history (new and removed listings) • Lister reputation (how much I trust this guy) Ground truth dataset • Manual labeling of auto generated candidates • The book a showing feature of Homeheed
  18. 18. QUESTIONS…

    Soyez le premier à commenter

    Identifiez-vous pour voir les commentaires

  • EugeneZozulya

    Apr. 20, 2020

Vassil Lunchev, CEO of Homeheed (https://www.homeheed.com/) presented at our July Meetup how to detect fake listings using #ComputerVision and #MachineLearning. Imagine that you have 600,000 real estate listings with a total of 5,000,000 photos. What you know is that many of these listings are fake and some of the challenges Vassil shared in his presentation how you can detect the fake ones including the approach that works, and those which do not. Apart from that, he presented what kind of additional data is necessary to detect the fake ones.

Vues

Nombre de vues

237

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

0

Actions

Téléchargements

2

Partages

0

Commentaires

0

Mentions J'aime

1

×