Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
FARROT: 
Filter Amazon Review Ratings 
Over Time 
Andy Lai
Problem 
Amazon doesn't allow filtering review ratings 
and totals by state or time 
http://youtu.be/w78X0IpjI5c
UI DEMO 
http://youtu.be/w78X0IpjI5c
Data set 
Stanford SNAP Amazon reviews 
35GB 
35M reviews 
University of Illinois Amazon member info 
142MB 
Member locati...
Pipeline 
ImportTsv 
SNAP 
REVIEWS in 
10 rows per 
review 
UIC MEMBER 
LOCATION 
TSV HappyBase
Pipeline 
ImportTsv 
SNAP 
REVIEWS in 
10 rows per 
review 
UIC MEMBER 
LOCATION 
TSV HappyBase
Pipeline 
ImportTsv 
SNAP 
REVIEWS in 
10 rows per 
review 
UIC MEMBER 
LOCATION 
BT0S0V006HAXW Rock Rhythm & Doo Wop Grea...
Pipeline 
PIG to CLEAN, 
JOIN and 
AGGREGATE 
rating reviews and 
totals 
ImportTsv 
SNAP 
REVIEWS in 
10 rows per 
review...
Pipeline 
ImportTsv 
SNAP 
REVIEWS in 
10 rows per 
review 
UIC MEMBER 
LOCATION 
TSV HappyBase
HBase Schema 
Table Schemas: 
PRODUCTID_STATE, 
TOTAL REVIEWS, AVG RATING 
PRODUCTID_STATE_BYYEAR_EPOCH, 
TOTAL REVIEWS, A...
Retrospective 
Design Considerations 
• HBase was used for optimizations for reads, range 
scans, and scalability 
• Data ...
About me – Andy Lai 
 UC Berkeley (B.S. Electrical Engineering & 
Computer Science) 
 SJSU (M.S. Engineering) 
 Softwar...
Prochain SlideShare
Chargement dans…5
×

0

Partager

Télécharger pour lire hors ligne

FARROT - Filter Amazon Review Ratings Over Time

Télécharger pour lire hors ligne

FARR - Filter Amazon Review Ratings Over Time

Livres associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir
  • Soyez le premier à aimer ceci

FARROT - Filter Amazon Review Ratings Over Time

  1. 1. FARROT: Filter Amazon Review Ratings Over Time Andy Lai
  2. 2. Problem Amazon doesn't allow filtering review ratings and totals by state or time http://youtu.be/w78X0IpjI5c
  3. 3. UI DEMO http://youtu.be/w78X0IpjI5c
  4. 4. Data set Stanford SNAP Amazon reviews 35GB 35M reviews University of Illinois Amazon member info 142MB Member location information joeme 92 5/26 Cleveland, OH United States Joseph M. Kotow B00006HAXW OH
  5. 5. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION TSV HappyBase
  6. 6. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION TSV HappyBase
  7. 7. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION BT0S0V006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDHE9a-ppyBase N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES I have all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…
  8. 8. Pipeline PIG to CLEAN, JOIN and AGGREGATE rating reviews and totals ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION BT0S0V006HAXW Rock Rhythm & Doo Wop Greatest Early Rock unknown A1RSDHE9a-ppyBase N6RSZF Joseph M Kotow 9/9 5.0 1042502400 Pittsburgh – Home of the OLDIES I have all of the doo wop DVD’s and this one is as good or better than the 1st ones. Rem…
  9. 9. Pipeline ImportTsv SNAP REVIEWS in 10 rows per review UIC MEMBER LOCATION TSV HappyBase
  10. 10. HBase Schema Table Schemas: PRODUCTID_STATE, TOTAL REVIEWS, AVG RATING PRODUCTID_STATE_BYYEAR_EPOCH, TOTAL REVIEWS, AVG RATING PRODUCTID_STATE_BYMONTH_EPOCH, TOTAL REVIEWS, AVG RATING PRODUCTID_STATE_BYDAY_EPOCH, TOTAL REVIEWS, AVG RATING • Example: B00003CWT6_CA_BYMONTH_1008115200000
  11. 11. Retrospective Design Considerations • HBase was used for optimizations for reads, range scans, and scalability • Data was bucketed by state and different time intervals for query performance by avoiding the cost of recalculating aggregates at the expense of storage • Java MR was used to convert multi-row reviews to tabular format Future • Scrape Amazon for new reviews • Filter and display reviews
  12. 12. About me – Andy Lai  UC Berkeley (B.S. Electrical Engineering & Computer Science)  SJSU (M.S. Engineering)  Software Engineer (DB2, Relational database)  Interests:

FARR - Filter Amazon Review Ratings Over Time

Vues

Nombre de vues

493

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

108

Actions

Téléchargements

5

Partages

0

Commentaires

0

Mentions J'aime

0

×