SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
7/20/16
1
Data and Algorithmic Bias
in the Web
Ricardo Baeza-Yates
California (NTENT), Catalonia (UPF), Chile (UChile)
WebVisions Barcelona, July 2016
Steve Jobs and Bias
Pervasive Optimistic Bias (Kahneman)
Reality Distortion Field
7/20/16
2
Every Website is an
Information Market
Good Design
Good Interaction
Right Incentives
7/20/16
3
All Data has Bias
§  Gender
§  Racial
§  Sexual
§  Religious
§  Social
§  Linguistic
§  Geographic
§  Political
§  Educational
§  Economic
§  Technological
§  from Noise or Spam
§  Validity (e.g. temporal)
§  Completeness
§  Gathering process
§  ….
However many people extrapolate
results to the whole population
(e.g., social media analysis)
In addition there is bias when
measuring bias as well as bias
towards measuring it!
Yes, We Live in a (Very) Biased World!
7/20/16
4
A Non-Technical Question
Algorithm
Biased
Data
Neutral?
Same
Bias
Not
Always!
Unbias the data
Tune the algorithm
Unbias the output
Bias awareness!
Big Data and Bias
15
§  The quality of any algorithm is bounded by the
quality of the data that uses
§  Data bias awareness
§  Algorithmic fairness
§  Key issues for machine learning
§  Uniformity of data properties
§  In the Web, distributions resemble a power law
§  Uniformity of error
§  Data sample methodology
§  E.g., sample size to see infrequent events or
sampling bias issues
7/20/16
5
Data bias
Activity bias
Selection bias
Sampling bias and size
Algorithmic bias
Interface
(Self) selection bias
Second order bias
Sparsity
Privacy
Algorithm
7/20/16
6
Quantity
Quality
User-
generated
Traditional
publishing
What is in the Web? How Much Data?
How Good is it? 168 million active web servers,
1083 million hostnames and
infinitely many pages!
26
What else is in the Web?
7/20/16
7
Noise and Spam
27
§  Noise may come from many places:
§  Instruments that measure (e.g., IoT)
§  How we interpret the data
§  Spam is everywhere
§  Fight both with the wisdom of the crowds
Data Bias and Redundancy
38
§  There is any dependency in the data?
§  There is any duplication?
§  Lexical duplication in the Web is around 25%
§  Semantic duplication is larger (more later)
§  Any other biases? Many!
§  Web structure (economic, cultural)
§  Web content (linguistic, geography, gender)
7/20/16
8
39
[Baeza-Yates, Castillo & López. Characteristics of the Web of Spain.
The Information Professional (Spanish), 2006, vol. 15, n. 1, pp. 6-17]
Economic Bias in Links
Number of linked domains
Exports(thousandsofUS$)
40
Baeza-Yates & Castillo, WWW2006
Exports/Imports vs. Domain Links
7/20/16
9
41
[Baeza-Yates, Castillo, Efthimiadis, TOIT 2007]
Website Structure
Minimal effortShame
42
Linguistic Bias
7/20/16
10
Geographical Bias
[E. Graells-Garrido and M. Lalmas,
“Balancing diversity to counter-measure
geographical centralization in microblogging
platforms”, ACM Hypertext’14]
Gender Bias
[E. Graells-Garrido et al,. “First Women, Second Sex: Gender Bias in Wikipedia”, ACM Hypertext’15]
Systemic bias?
Equal opportunity?
7/20/16
11
•  The Web already is influenced by small groups
•  "0.05% of the user population, attract almost
50% of all attention within Twitter" (50K users)
[Wu, Hofman, Mason & Watts, WWW 2011]
•  We explored this issue further with four different datasets:
1.  a large one from Twitter (2011),
2.  a small one from Facebook (2009),
3. Amazon reviews (2013), and
4.  Wikipedia editors (2015).
•  Digital desert: the content that is never seen
Activity Bias: Wisdom of a Few?
[Baeza-Yates & Saez-Trumper, ACM Hypertext 2015]
Examples
[Baeza-Yates & Saez-Trumper, ACM Hypertext 2015]
7/20/16
12
October 2015
Quality of Content?
51 Yahoo Confidential & Proprietary
•  Adding content implies adding wisdom?
•  We use Amazon’s reviews helpfulness
•  We computed the text entropy
•  Content-based-wise users
•  How many of those users are being paid?
7/20/16
13
Digital Desert
52 Yahoo Confidential & Proprietary
Weblands
of Wisdom
7/20/16
14
Bias in the Interface
Position bias
Ranking bias
Presentation bias
Social bias
Interaction bias
Presentation Bias
§  Interaction data will be biased to what is shown
§  In recommender systems, items recommended will get
more clicks than items not recommended
§  In search systems top ranked results will get more clicks
than other results
›  Ranking bias
›  Interaction bias
CTR
(log)
1 11 21 Rank
[Dupret & Piwowarski, SIGIR 2008]
[Chapelle & Zhang, WWW 2009]bias
7/20/16
15
[WHY AMAZON’S RATINGS MIGHT MISLEAD YOU; The Story of Herding Effects
Ting Wang and Dashun Wang, Big Data, 2014]
Social Bias
Extreme Algorithmic Bias
7/20/16
16
Second Order Bias in Web Content
[Baeza-Yates, Pereira & Ziviani,
Geneological Trees in the Web, WWW 2008]
Person
Web content is redundant
Clicks in results are biased to
the ranking and the interaction
Query
Ranking bias
Redundancy grows (35%)
Search results
New
Most measures in the Web follow a power law
The Long Tail: Sparsity
[Anatomy of the long tail: Ordinary People with Extraordinary Tastes,
Goel, Broder, Gabrilovich, Pang; WSDM 2010]
§  Why there is a long tail?
§  Sampling in the tail
§  When the crowd dominates
§  Empowering the tail
7/20/16
17
When the Crowd
Dominates
Kills the long tail
80
Personalization “facets”:
•  Language (not always)
•  Location
•  Semantic facets per user
•  Query intent prediction in search
Empowering the Tail
The Filter “Bubble”, Eli Pariser
•  Avoid the Poor get Poorer Syndrome
•  Avoid the Echo Chamber
•  How to expose opposite views?
81
Cold start problem solution:
Explore & Exploit
Solutions:
•  Diversity
•  Novelty
•  Serendipity
7/20/16
18
A Data Portrait is a visual
context where users can
explore how the system
understand their interests.
This context is used to
embed content-based
recommendations, displayed
visually to facilitate
exploration and user
engagement.
To combat homophily,
recommendations are
generated having political
diversity in mind.
Does it work? Yes, by using
intermediary topics that are shared!
But only when users are interested
in politics.
Demo at http://auroratwittera.cl/perfil/YahooLabs
[E. Graells-Garrido, M. Lalmas and R. Baeza-Yates, ACM UAI 2016]
•  Exploit the context (and deep learning!)
91% accuracy to predict the next app you will use
[Baeza-Yates et al, WSDM 2015]
•  Personalization vs. Contextualization
Recall that user interaction is another long tail
People
Interests
Aggregating in theTail
7/20/16
19
[De Choudhury et al, ACM HT 2010]
87[Quercia et al, ACM HT 2014]
Crowdsourcing Data: Good Paths
7/20/16
20
Regions from Pictures
[Thomee et al, Demo at CHI 2014]
AOL Query Logs Release Incident
§  No. 4417749 conducted hundreds of searches over
a three-month period on topics ranging from “numb
fingers” to “60 single men”.
§  Other queries: “landscapers in Lilburn, Ga,” several
people with the last name Arnold and “homes sold in
shadow lake subdivision gwinnett county georgia.”
§  Data trail led to Thelma Arnold, a 62-year-old widow
who lives in Lilburn, Ga., frequently researches her
friends’ medical ailments and loves her three dogs.
A Face Is Exposed for AOL Searcher No. 4417749,
By MICHAEL BARBARO and TOM ZELLER Jr,
The New York Times, Aug 9 2006
90
7/20/16
21
91
Risks of Privacy in Query Logs
§  Profile [Jones, Kumar, Pang, Tompkins, CIKM 2007]
•  Gender: 84%
•  Age (±10): 79%
•  Location (ZIP3): 35%
§  Vanity Queries [Jones et al, CIKM 2008]
•  Partial name: 8.9%
•  Complete: 1.2%
§  More information:
•  A Survey of query log privacy-enhancing techniques
from a policy perspective [Cooper, ACM TWEB 2008]
§  A good anonymization technique is still an open problem
7/20/16
22
Privacy Awareness
§ How our privacy changes when we change our social network?
§ Information gain to predict a private attribute based on public data
§ Each user may have a promiscuity score
§ Example: new friendship request
Promiscuity( me ) > Promiscuity( new)
Promiscuity( me ) ≥ Promiscuity( new ) + max-gain-I-allow
Promiscuity( me ) < Promiscuity( new ) + max-gain-I-allow
Related work by [Estivill-Castro & Nettleton; Singh, ASONAM 2015]
The Web Works Thanks to Bias!
§ Web traffic
›  Local caching
›  Proxy/Akamai caching
§ Search engines
›  Answer caching
›  Essential web pages
•  25% queries can be answered with less than 1% of the URLs!
[Baeza-Yates, Boldi, Chierichetti, WWW 2015]
§ E-Commerce
›  Large fraction of revenue comes from few popular items
Activity bias
(Self) selection bias
7/20/16
23
Web Data
§  A mirror of ourselves, the good, the bad and the ugly
§  The web amplifies everything, good or bad, but always
leaves traces
§  We have to be aware of the biases and contrarrest them
§  We have to be aware of our privacy
Big Data of People is huge…..
….. but is tiny compared to the future
Big Data of the Internet of Things (IoT)
It’s Hard to Get Data to Tell the Truth
§  The blindness of the averages
§  Look at distributions
§  Absolute vs. relative
§  Income per capita vs. Inequality
§  Local vs. global optimization
§  Teams competing without knowing, uncorrelated criteria
§  You can always see/torture data as you wish
›  61 analysts, 29 teams: 20 yes and 9 no (Univ. of Virginia, COS)
7/20/16
24
Contact: rbaeza@acm.org
www.baeza.cl
@polarbearby
ASIST 2012
Book of the
Year Award
Questions?
Biased Questions?

Contenu connexe

Tendances

HR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforceHR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforceIBM Smarter Workforce
 
2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey results2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey resultsDeloitte United States
 
Big Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our OrganizationsBig Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our OrganizationsAgile Technologies
 
Rapid fire with Douglas Van Praet
Rapid fire with Douglas Van PraetRapid fire with Douglas Van Praet
Rapid fire with Douglas Van PraetPraz Hari
 
The female millennial: A new era of talent
The female millennial: A new era of talentThe female millennial: A new era of talent
The female millennial: A new era of talentPwC
 
Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016Adobe
 
Onboarding AI by Jana Eggers
Onboarding AI by Jana EggersOnboarding AI by Jana Eggers
Onboarding AI by Jana EggersGlobant
 
Connecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems WebinarConnecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems WebinarNetDimensions
 
WUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebWUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebRich Miller
 
Living in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HRLiving in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HRMartin Sutherland
 
The Future of Personalised Education
The Future of Personalised EducationThe Future of Personalised Education
The Future of Personalised EducationIBM Government
 
The Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile WorkforceThe Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile WorkforceCatalant Technologies
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Jason Miller
 
Designing Mobile Experiences
Designing Mobile ExperiencesDesigning Mobile Experiences
Designing Mobile ExperiencesBrian Fling
 
The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!Jennie Vickers
 
Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger Hoerl
 
Social Media is about People not Technology
Social Media is about People not TechnologySocial Media is about People not Technology
Social Media is about People not TechnologyFatmir Hyseni
 
JESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding PresentationJESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding PresentationJESS3
 

Tendances (20)

HR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforceHR Experts Share How Analytics are Shaping a #SmarterWorkforce
HR Experts Share How Analytics are Shaping a #SmarterWorkforce
 
2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey results2015 back-to-school and back-to-college survey results
2015 back-to-school and back-to-college survey results
 
Big Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our OrganizationsBig Data & The Role Analytics Can Play In Our Organizations
Big Data & The Role Analytics Can Play In Our Organizations
 
Rapid fire with Douglas Van Praet
Rapid fire with Douglas Van PraetRapid fire with Douglas Van Praet
Rapid fire with Douglas Van Praet
 
The female millennial: A new era of talent
The female millennial: A new era of talentThe female millennial: A new era of talent
The female millennial: A new era of talent
 
Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016Full Study: Adobe State of Create 2016
Full Study: Adobe State of Create 2016
 
Onboarding AI by Jana Eggers
Onboarding AI by Jana EggersOnboarding AI by Jana Eggers
Onboarding AI by Jana Eggers
 
The Future of Work
The Future of Work The Future of Work
The Future of Work
 
Connecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems WebinarConnecting Learning to the Right Systems Webinar
Connecting Learning to the Right Systems Webinar
 
WUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebWUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the Web
 
Living in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HRLiving in a data economy: Transforming the role of HR
Living in a data economy: Transforming the role of HR
 
The Future of Personalised Education
The Future of Personalised EducationThe Future of Personalised Education
The Future of Personalised Education
 
The Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile WorkforceThe Future of Work: Winning With an Agile Workforce
The Future of Work: Winning With an Agile Workforce
 
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
Driving Revenue w/ Social, Content, Marketing Automation - Scoop.It Meetup
 
Designing Mobile Experiences
Designing Mobile ExperiencesDesigning Mobile Experiences
Designing Mobile Experiences
 
Digital Ethics
Digital EthicsDigital Ethics
Digital Ethics
 
The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!The Customer Experience Revolution Coming to Everywhere Near You!
The Customer Experience Revolution Coming to Everywhere Near You!
 
Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013Roger hoerl say award presentation 2013
Roger hoerl say award presentation 2013
 
Social Media is about People not Technology
Social Media is about People not TechnologySocial Media is about People not Technology
Social Media is about People not Technology
 
JESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding PresentationJESS3 x Power to Fly Meet Neil Branding Presentation
JESS3 x Power to Fly Meet Neil Branding Presentation
 

En vedette

Organizing for Success with Digital Retail
Organizing for Success with Digital RetailOrganizing for Success with Digital Retail
Organizing for Success with Digital RetailJDA Software
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcasesAndrii Gakhov
 
Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016Brenda Leibowitz
 
Myth busting and the Nigerian Prince
Myth busting and the Nigerian PrinceMyth busting and the Nigerian Prince
Myth busting and the Nigerian PrinceDean Shareski
 
Airing of grievances
Airing of grievancesAiring of grievances
Airing of grievancesDean Shareski
 
Nuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garciaNuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garciaMariana Garcia Ballesteros
 
นางสาวกรุณา สุขโนนทอง
นางสาวกรุณา   สุขโนนทองนางสาวกรุณา   สุขโนนทอง
นางสาวกรุณา สุขโนนทองsuknontong
 
Advantages of native apps
Advantages of native appsAdvantages of native apps
Advantages of native appsJatin Dabas
 
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...Bryan K. O'Rourke
 
Psychological Improvement program
Psychological Improvement programPsychological Improvement program
Psychological Improvement programFarah Hoque
 
טיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעייםטיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעייםOrit Levav
 
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...360mnbsu
 
Vancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB FinancialVancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB FinancialGlassdoor
 
Are you a Feminist?
Are you a Feminist?Are you a Feminist?
Are you a Feminist?Farah Hoque
 
NEXT11 Sponsoring Opportunites
NEXT11 Sponsoring OpportunitesNEXT11 Sponsoring Opportunites
NEXT11 Sponsoring OpportunitesNEXT Conference
 

En vedette (20)

Reference 2.0
Reference 2.0Reference 2.0
Reference 2.0
 
Thirstier
ThirstierThirstier
Thirstier
 
Organizing for Success with Digital Retail
Organizing for Success with Digital RetailOrganizing for Success with Digital Retail
Organizing for Success with Digital Retail
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcases
 
Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016Education faculty sotl workshopc 25 may 2016
Education faculty sotl workshopc 25 may 2016
 
Imperialismo
ImperialismoImperialismo
Imperialismo
 
Myth busting and the Nigerian Prince
Myth busting and the Nigerian PrinceMyth busting and the Nigerian Prince
Myth busting and the Nigerian Prince
 
Airing of grievances
Airing of grievancesAiring of grievances
Airing of grievances
 
Nuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garciaNuevas tecnologías de la información mariana garcia
Nuevas tecnologías de la información mariana garcia
 
นางสาวกรุณา สุขโนนทอง
นางสาวกรุณา   สุขโนนทองนางสาวกรุณา   สุขโนนทอง
นางสาวกรุณา สุขโนนทอง
 
Advantages of native apps
Advantages of native appsAdvantages of native apps
Advantages of native apps
 
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
Networked Fitness 2014 - What Is It And What Does It Mean For Health Clubs An...
 
The Full Gospel
The Full GospelThe Full Gospel
The Full Gospel
 
Psychological Improvement program
Psychological Improvement programPsychological Improvement program
Psychological Improvement program
 
טיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעייםטיפוח והזנת העור מרכיבים טבעיים
טיפוח והזנת העור מרכיבים טבעיים
 
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
Attracting Manufacturing Talent: How the Dream It. Do It. Recruitment Strateg...
 
Vancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB FinancialVancouver Best Places to Work Roadshow | ATB Financial
Vancouver Best Places to Work Roadshow | ATB Financial
 
Are you a Feminist?
Are you a Feminist?Are you a Feminist?
Are you a Feminist?
 
NEXT11 Sponsoring Opportunites
NEXT11 Sponsoring OpportunitesNEXT11 Sponsoring Opportunites
NEXT11 Sponsoring Opportunites
 
Online Marketing and SEO Workshop
Online Marketing and SEO WorkshopOnline Marketing and SEO Workshop
Online Marketing and SEO Workshop
 

Similaire à Data and Algorithmic Bias in the Web

Ux day2018 ricardo baeza yayes search-biases-semantics
Ux day2018   ricardo baeza yayes search-biases-semanticsUx day2018   ricardo baeza yayes search-biases-semantics
Ux day2018 ricardo baeza yayes search-biases-semanticsMultiplica
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeTyrone Grandison
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data Sharjeel Imtiaz
 
Using Big Data to Tell Your Story
Using Big Data to Tell Your StoryUsing Big Data to Tell Your Story
Using Big Data to Tell Your StoryBen Wright
 
Policy primer net303 study period 3, 2017
Policy primer net303  study period 3, 2017Policy primer net303  study period 3, 2017
Policy primer net303 study period 3, 2017Steve Mckee
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatesPROS-UPV
 
Big data in the web
Big data in the webBig data in the web
Big data in the webcaise2013
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatescaise2013vlc
 
Unpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureUnpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureTim Davies
 
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Cloudera, Inc.
 
Informationliteracy
InformationliteracyInformationliteracy
InformationliteracyYvonne M
 
Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015know4drr
 

Similaire à Data and Algorithmic Bias in the Web (20)

Ux day2018 ricardo baeza yayes search-biases-semantics
Ux day2018   ricardo baeza yayes search-biases-semanticsUx day2018   ricardo baeza yayes search-biases-semantics
Ux day2018 ricardo baeza yayes search-biases-semantics
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With Purpose
 
Webinar v.5.23.11
Webinar v.5.23.11Webinar v.5.23.11
Webinar v.5.23.11
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Using Big Data to Tell Your Story
Using Big Data to Tell Your StoryUsing Big Data to Tell Your Story
Using Big Data to Tell Your Story
 
Purdue IronHacks
Purdue IronHacksPurdue IronHacks
Purdue IronHacks
 
Policy primer net303 study period 3, 2017
Policy primer net303  study period 3, 2017Policy primer net303  study period 3, 2017
Policy primer net303 study period 3, 2017
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Your organization and Big Data: Managing access, privacy, and security
Your organization and Big Data: Managing access, privacy, and securityYour organization and Big Data: Managing access, privacy, and security
Your organization and Big Data: Managing access, privacy, and security
 
Innovations in Data for Decision Making
Innovations in Data for Decision MakingInnovations in Data for Decision Making
Innovations in Data for Decision Making
 
Discovering and mapping your community needs
Discovering and mapping your community needsDiscovering and mapping your community needs
Discovering and mapping your community needs
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Big data in the web
Big data in the webBig data in the web
Big data in the web
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Gettind data used
Gettind data usedGettind data used
Gettind data used
 
Unpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureUnpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructure
 
SLA RGC Universe
SLA RGC Universe SLA RGC Universe
SLA RGC Universe
 
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
Hadoop World 2011: The Hadoop Award for Government Excellence - Bob Gourley -...
 
Informationliteracy
InformationliteracyInformationliteracy
Informationliteracy
 
Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015Know4 drr shadrock_roberts_may2015
Know4 drr shadrock_roberts_may2015
 

Plus de WebVisions

Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...WebVisions
 
Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"WebVisions
 
Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"WebVisions
 
Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”WebVisions
 
The Importance of Side Projects
The Importance of Side ProjectsThe Importance of Side Projects
The Importance of Side ProjectsWebVisions
 
Commit to the Crazy
Commit to the CrazyCommit to the Crazy
Commit to the CrazyWebVisions
 
Intuition and Reason in Design
Intuition and Reason in DesignIntuition and Reason in Design
Intuition and Reason in DesignWebVisions
 
Activism x Technology
Activism x TechnologyActivism x Technology
Activism x TechnologyWebVisions
 
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"WebVisions
 
Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"WebVisions
 
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"WebVisions
 
Art + Commerce
Art + CommerceArt + Commerce
Art + CommerceWebVisions
 
Users are People Too
Users are People TooUsers are People Too
Users are People TooWebVisions
 
Happily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free PrioritizationHappily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free PrioritizationWebVisions
 
Taming Context in the Internet of Things
Taming Context in the Internet of ThingsTaming Context in the Internet of Things
Taming Context in the Internet of ThingsWebVisions
 
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer DynamicMind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer DynamicWebVisions
 
Poetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities ExperimentPoetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities ExperimentWebVisions
 
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"WebVisions
 
Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"WebVisions
 
Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"WebVisions
 

Plus de WebVisions (20)

Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
Christian Titze, "Hello From the Other Side: Adapting the Agile Agency to Cli...
 
Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"Amélie Lamont, "Design Anthropology 101"
Amélie Lamont, "Design Anthropology 101"
 
Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"Nate Clinton, "Conversations with Machines"
Nate Clinton, "Conversations with Machines"
 
Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”Thomas Phinney, “Fonts. Everything is Changing. Again.”
Thomas Phinney, “Fonts. Everything is Changing. Again.”
 
The Importance of Side Projects
The Importance of Side ProjectsThe Importance of Side Projects
The Importance of Side Projects
 
Commit to the Crazy
Commit to the CrazyCommit to the Crazy
Commit to the Crazy
 
Intuition and Reason in Design
Intuition and Reason in DesignIntuition and Reason in Design
Intuition and Reason in Design
 
Activism x Technology
Activism x TechnologyActivism x Technology
Activism x Technology
 
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
Mike Monteiro, "This is the Golden Age of Design...and We're Screwed"
 
Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"Mark Wyner, "A New Dawn of the Human Experience"
Mark Wyner, "A New Dawn of the Human Experience"
 
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
Kevin Hoyt, "On the Verge of Genius: Smart Cities Workshop"
 
Art + Commerce
Art + CommerceArt + Commerce
Art + Commerce
 
Users are People Too
Users are People TooUsers are People Too
Users are People Too
 
Happily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free PrioritizationHappily Ever After: Pain-Free Prioritization
Happily Ever After: Pain-Free Prioritization
 
Taming Context in the Internet of Things
Taming Context in the Internet of ThingsTaming Context in the Internet of Things
Taming Context in the Internet of Things
 
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer DynamicMind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
Mind Melds and BattleBots: Creating the Right Kind of Designer/Developer Dynamic
 
Poetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities ExperimentPoetry for Robots: A Digital Humanities Experiment
Poetry for Robots: A Digital Humanities Experiment
 
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
Kent Nichols, "Downshifting Your Life to Rev Up Your Creativity"
 
Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"Robert Stulle, "Stories From the Agile Agency"
Robert Stulle, "Stories From the Agile Agency"
 
Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"Mona Patel, "Excuses, Excuses, Excuse Personas"
Mona Patel, "Excuses, Excuses, Excuse Personas"
 

Dernier

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Dernier (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Data and Algorithmic Bias in the Web

  • 1. 7/20/16 1 Data and Algorithmic Bias in the Web Ricardo Baeza-Yates California (NTENT), Catalonia (UPF), Chile (UChile) WebVisions Barcelona, July 2016 Steve Jobs and Bias Pervasive Optimistic Bias (Kahneman) Reality Distortion Field
  • 2. 7/20/16 2 Every Website is an Information Market Good Design Good Interaction Right Incentives
  • 3. 7/20/16 3 All Data has Bias §  Gender §  Racial §  Sexual §  Religious §  Social §  Linguistic §  Geographic §  Political §  Educational §  Economic §  Technological §  from Noise or Spam §  Validity (e.g. temporal) §  Completeness §  Gathering process §  …. However many people extrapolate results to the whole population (e.g., social media analysis) In addition there is bias when measuring bias as well as bias towards measuring it! Yes, We Live in a (Very) Biased World!
  • 4. 7/20/16 4 A Non-Technical Question Algorithm Biased Data Neutral? Same Bias Not Always! Unbias the data Tune the algorithm Unbias the output Bias awareness! Big Data and Bias 15 §  The quality of any algorithm is bounded by the quality of the data that uses §  Data bias awareness §  Algorithmic fairness §  Key issues for machine learning §  Uniformity of data properties §  In the Web, distributions resemble a power law §  Uniformity of error §  Data sample methodology §  E.g., sample size to see infrequent events or sampling bias issues
  • 5. 7/20/16 5 Data bias Activity bias Selection bias Sampling bias and size Algorithmic bias Interface (Self) selection bias Second order bias Sparsity Privacy Algorithm
  • 6. 7/20/16 6 Quantity Quality User- generated Traditional publishing What is in the Web? How Much Data? How Good is it? 168 million active web servers, 1083 million hostnames and infinitely many pages! 26 What else is in the Web?
  • 7. 7/20/16 7 Noise and Spam 27 §  Noise may come from many places: §  Instruments that measure (e.g., IoT) §  How we interpret the data §  Spam is everywhere §  Fight both with the wisdom of the crowds Data Bias and Redundancy 38 §  There is any dependency in the data? §  There is any duplication? §  Lexical duplication in the Web is around 25% §  Semantic duplication is larger (more later) §  Any other biases? Many! §  Web structure (economic, cultural) §  Web content (linguistic, geography, gender)
  • 8. 7/20/16 8 39 [Baeza-Yates, Castillo & López. Characteristics of the Web of Spain. The Information Professional (Spanish), 2006, vol. 15, n. 1, pp. 6-17] Economic Bias in Links Number of linked domains Exports(thousandsofUS$) 40 Baeza-Yates & Castillo, WWW2006 Exports/Imports vs. Domain Links
  • 9. 7/20/16 9 41 [Baeza-Yates, Castillo, Efthimiadis, TOIT 2007] Website Structure Minimal effortShame 42 Linguistic Bias
  • 10. 7/20/16 10 Geographical Bias [E. Graells-Garrido and M. Lalmas, “Balancing diversity to counter-measure geographical centralization in microblogging platforms”, ACM Hypertext’14] Gender Bias [E. Graells-Garrido et al,. “First Women, Second Sex: Gender Bias in Wikipedia”, ACM Hypertext’15] Systemic bias? Equal opportunity?
  • 11. 7/20/16 11 •  The Web already is influenced by small groups •  "0.05% of the user population, attract almost 50% of all attention within Twitter" (50K users) [Wu, Hofman, Mason & Watts, WWW 2011] •  We explored this issue further with four different datasets: 1.  a large one from Twitter (2011), 2.  a small one from Facebook (2009), 3. Amazon reviews (2013), and 4.  Wikipedia editors (2015). •  Digital desert: the content that is never seen Activity Bias: Wisdom of a Few? [Baeza-Yates & Saez-Trumper, ACM Hypertext 2015] Examples [Baeza-Yates & Saez-Trumper, ACM Hypertext 2015]
  • 12. 7/20/16 12 October 2015 Quality of Content? 51 Yahoo Confidential & Proprietary •  Adding content implies adding wisdom? •  We use Amazon’s reviews helpfulness •  We computed the text entropy •  Content-based-wise users •  How many of those users are being paid?
  • 13. 7/20/16 13 Digital Desert 52 Yahoo Confidential & Proprietary Weblands of Wisdom
  • 14. 7/20/16 14 Bias in the Interface Position bias Ranking bias Presentation bias Social bias Interaction bias Presentation Bias §  Interaction data will be biased to what is shown §  In recommender systems, items recommended will get more clicks than items not recommended §  In search systems top ranked results will get more clicks than other results ›  Ranking bias ›  Interaction bias CTR (log) 1 11 21 Rank [Dupret & Piwowarski, SIGIR 2008] [Chapelle & Zhang, WWW 2009]bias
  • 15. 7/20/16 15 [WHY AMAZON’S RATINGS MIGHT MISLEAD YOU; The Story of Herding Effects Ting Wang and Dashun Wang, Big Data, 2014] Social Bias Extreme Algorithmic Bias
  • 16. 7/20/16 16 Second Order Bias in Web Content [Baeza-Yates, Pereira & Ziviani, Geneological Trees in the Web, WWW 2008] Person Web content is redundant Clicks in results are biased to the ranking and the interaction Query Ranking bias Redundancy grows (35%) Search results New Most measures in the Web follow a power law The Long Tail: Sparsity [Anatomy of the long tail: Ordinary People with Extraordinary Tastes, Goel, Broder, Gabrilovich, Pang; WSDM 2010] §  Why there is a long tail? §  Sampling in the tail §  When the crowd dominates §  Empowering the tail
  • 17. 7/20/16 17 When the Crowd Dominates Kills the long tail 80 Personalization “facets”: •  Language (not always) •  Location •  Semantic facets per user •  Query intent prediction in search Empowering the Tail The Filter “Bubble”, Eli Pariser •  Avoid the Poor get Poorer Syndrome •  Avoid the Echo Chamber •  How to expose opposite views? 81 Cold start problem solution: Explore & Exploit Solutions: •  Diversity •  Novelty •  Serendipity
  • 18. 7/20/16 18 A Data Portrait is a visual context where users can explore how the system understand their interests. This context is used to embed content-based recommendations, displayed visually to facilitate exploration and user engagement. To combat homophily, recommendations are generated having political diversity in mind. Does it work? Yes, by using intermediary topics that are shared! But only when users are interested in politics. Demo at http://auroratwittera.cl/perfil/YahooLabs [E. Graells-Garrido, M. Lalmas and R. Baeza-Yates, ACM UAI 2016] •  Exploit the context (and deep learning!) 91% accuracy to predict the next app you will use [Baeza-Yates et al, WSDM 2015] •  Personalization vs. Contextualization Recall that user interaction is another long tail People Interests Aggregating in theTail
  • 19. 7/20/16 19 [De Choudhury et al, ACM HT 2010] 87[Quercia et al, ACM HT 2014] Crowdsourcing Data: Good Paths
  • 20. 7/20/16 20 Regions from Pictures [Thomee et al, Demo at CHI 2014] AOL Query Logs Release Incident §  No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men”. §  Other queries: “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.” §  Data trail led to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. A Face Is Exposed for AOL Searcher No. 4417749, By MICHAEL BARBARO and TOM ZELLER Jr, The New York Times, Aug 9 2006 90
  • 21. 7/20/16 21 91 Risks of Privacy in Query Logs §  Profile [Jones, Kumar, Pang, Tompkins, CIKM 2007] •  Gender: 84% •  Age (±10): 79% •  Location (ZIP3): 35% §  Vanity Queries [Jones et al, CIKM 2008] •  Partial name: 8.9% •  Complete: 1.2% §  More information: •  A Survey of query log privacy-enhancing techniques from a policy perspective [Cooper, ACM TWEB 2008] §  A good anonymization technique is still an open problem
  • 22. 7/20/16 22 Privacy Awareness § How our privacy changes when we change our social network? § Information gain to predict a private attribute based on public data § Each user may have a promiscuity score § Example: new friendship request Promiscuity( me ) > Promiscuity( new) Promiscuity( me ) ≥ Promiscuity( new ) + max-gain-I-allow Promiscuity( me ) < Promiscuity( new ) + max-gain-I-allow Related work by [Estivill-Castro & Nettleton; Singh, ASONAM 2015] The Web Works Thanks to Bias! § Web traffic ›  Local caching ›  Proxy/Akamai caching § Search engines ›  Answer caching ›  Essential web pages •  25% queries can be answered with less than 1% of the URLs! [Baeza-Yates, Boldi, Chierichetti, WWW 2015] § E-Commerce ›  Large fraction of revenue comes from few popular items Activity bias (Self) selection bias
  • 23. 7/20/16 23 Web Data §  A mirror of ourselves, the good, the bad and the ugly §  The web amplifies everything, good or bad, but always leaves traces §  We have to be aware of the biases and contrarrest them §  We have to be aware of our privacy Big Data of People is huge….. ….. but is tiny compared to the future Big Data of the Internet of Things (IoT) It’s Hard to Get Data to Tell the Truth §  The blindness of the averages §  Look at distributions §  Absolute vs. relative §  Income per capita vs. Inequality §  Local vs. global optimization §  Teams competing without knowing, uncorrelated criteria §  You can always see/torture data as you wish ›  61 analysts, 29 teams: 20 yes and 9 no (Univ. of Virginia, COS)