SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Google Trends: Normalization &
De-Normalization
Shahan A. Memon*, Saquib Razak*, Ingmar Weber+
* Carnegie Mellon University
+ Qatar Computing Research Institute
Google Trends: Normalization & De-Normalization 2
Google Trends
Google Trends: Normalization & De-Normalization 3
Google Trends
[..] shows how often a particular search-term is entered
relative to the total search-volume across various
regions of the world, (for different time periods starting
2004), and in various languages.[1]
Google Trends: Normalization & De-Normalization 4
Google Trends
primarily shows two kinds of data:
Temporal (time-series), and Spatial (location-specific)
Google Trends: Normalization & De-Normalization 5
Temporal versus Spatial Data
Figure below shows temporal as well as spatial search intensity for the term “diabetes” in
the United States for the time period 2004-present
{
{
Temporal Data
Spatial Data
Google Trends: Normalization & De-Normalization 6
Google Trends:
Normalization
Google Trends: Normalization & De-Normalization 7
Google Trends
normalizes data across time and space.
Google Trends: Normalization & De-Normalization 8
Temporal Normalization
Google returns data such that the period with the highest relative search intensity (circled
below) corresponds to an arbitrary reference value of 100. All other temporal units are
normalized with respect to this absolute maximum of 100.
Google Trends: Normalization & De-Normalization 9
Spatial Normalization
Similarly, when getting search data across spatial units such as U.S. states, the state with
the highest search intensity (circled below) is assigned a value of 100, and all other spatial
units are normalized relative to the search intensity of that peak.
Google Trends: Normalization & De-Normalization 10
Consequences of
Google Trends
Normalization
Google Trends: Normalization & De-Normalization 11
Characterizing a global slow-moving trend
Unlike fast-moving phenomenon such as influenza, stock markets, etc., slow-moving trends
are trends that yield sparse temporal resolution (i.e. mostly measured across months,
seasons or years). Examples: Rates of Diabetes, Obesity, Unemployment, etc.
○ Fitting a global temporal-only model using Google Trends is hence infeasible
due to the sparse set of data points (at max equal to number of years from
2004 to present)
Solution: Fit a spatial model to apply across time?
Example: Fit a model by using data points across each state and across each year to
predict next year’s trend
Google Trends: Normalization & De-Normalization 12
Modeling temporal variation using spatial-only data
● Spatial data alone does not reveal information about the changes in the overall
search volume.
○ For example, if the global search volume for the United States was to go up
from 2014 to 2015 with everything else the same (i.e. with ranking of states
being the same), spatial data alone would not pick up that temporal trend. In
fact it would treat 2014 exactly as 2015.
Google Trends: Normalization & De-Normalization 13
Modeling temporal variation using spatial-only data
● Moreover, within each year across space, data is normalized independently.
○ Hence, the value of 100 in year 2014 for the state New Mexico cannot be
compared to the value of 100 in the year 2015 for the state of Mississippi.
Google Trends: Normalization & De-Normalization 14
Modeling temporal variation using spatial-only
spatio-temporal data
Reconstructing the state-level contribution to the global value:
● Solution 1: Using actual absolute search volume from
each of the states - ABSOLUTE SEARCH VOLUME NOT
AVAILABLE
● Solution 2: Approximate the absolute search volume by
creating a spatio-temporal index for each state by
de-normalizing Google Trends data
Google Trends: Normalization & De-Normalization 15
Google Trends:
De-Normalization
Google Trends: Normalization & De-Normalization 16
Google Trends: De-Normalization (steps)
1. Collect Spatial as well as Temporal data
2. Choose a reference year r
3. Rescale each value using the following formula:
{
{
{
{
New Spatio-Temporal
Index
Spatial Data value
from Google Trends
Ratio of Temporal
Increase from year r
to year y
Ratio of the sum of
spatial data in year r
to that of year y
Terminology:
● G signifies that the data was
collected via Google Trends.
● Gl
signifies spatial data.
● y represents the year.
● n represents the number of states.
● r represents the reference year.
● Gt
(zy
) represents the value of the
corresponding keyword in year y
across time.
● Gt
(zr
)represents the value of the
corresponding keyword in reference
year r across time.
● ∑i
n
Gl
(xri
) represents the sum of the
regional distribution of the
corresponding keyword in the
reference year r.
● ∑i
n
Gl
(xyi
) represents the sum of the
regional distribution of the
corresponding keyword in the year y.
Google Trends: Normalization & De-Normalization 17
Google Trends: De-Normalization (fictitious example)
1. Imagine a country “Utopia” with three states: Hogsmeade, Metropolis and La la land.
2. Imagine the Google Trends spatial and temporal data looks as follows with reference year 2011:
States 2011 2012 2013
Hogsmeade 48.3 86.4 100
Metropolis 100 100 80.3
La La Land 64.5 12.3 17.8
Utopia 44.9 72.5 100
}
}
Temporal Data
Spatial Data
Google Trends: Normalization & De-Normalization 18
Google Trends: De-Normalization (fictitious example)
Since the values are normalized, so then absolute search volume would be equal to:
States 2011 2012 2013
Hogsmeade 48.3a 86.4b 100c
Metropolis 100a 100b 80.3c
La La Land 64.5a 12.3b 17.8c
Utopia 44.9u 72.5u 100u
2011: 48.3a + 100a + 64.5a = 44.9u
2012: 86.4b + 100b + 12.3b = 72.5u
2013: 100c + 80.35c + 17.85c = 100u
2011: a (48.3 + 100 + 64.5) = 44.9u
2012: b (86.4 + 100 + 12.3) = 72.5u
2013: c (100 + 80.35 + 17.85) = 100u
2011: a ∑i
n
Gl
(Sr=2011,i
) = Gt
(z r=2011
) *
u
2012: b ∑i
n
Gl
(Sy=2012,i
) = Gt
(z y=2012
) * u
2013: c ∑i
n
Gl
(Sy=2013,i
) = Gt
(z y=2012
) *
u
where a, b, c and u are hidden from us.
Google Trends: Normalization & De-Normalization 19
Google Trends: De-Normalization (fictitious example)
Since the values are normalized, so then absolute search volume would be equal to:
States 2011 2012 2013
Hogsmeade 48.3a 86.4b 100c
Metropolis 100a 100b 80.3c
La La Land 64.5a 12.3b 17.8c
Utopia 44.9u 72.5u 100u
2011: 48.3a + 100a + 64.5a = 44.9u
2012: 86.4b + 100b + 12.3b = 72.5u
2013: 100c + 80.35c + 17.85c = 100u
2011: a (48.3 + 100 + 64.5) = 44.9u
2012: b (86.4 + 100 + 12.3) = 72.5u
2013: c (100 + 80.35 + 17.85) = 100u
2011: a ∑i
n
Gl
(Sr=2011,i
) = Gt
(z r=2011
) *
u
2012: b ∑i
n
Gl
(Sy=2012,i
) = Gt
(z y=2012
) * u
2013: c ∑i
n
Gl
(Sy=2013,i
) = Gt
(z y=2012
) *
u
a = (Gt
(z r=2011
) * u) / (∑i
n
Gl
(Sr=2011,i
) )
b = (Gt
(z y=2012
) * u) / (∑i
n
Gl
(Sy=2012,i
))
c = (Gt
(z y=2012
) * u) / (∑i
n
Gl
(Sy=2013,i
))
Google Trends: Normalization & De-Normalization 20
Google Trends: De-Normalization (fictitious example)
a = (Gt
(z r=2011
) * u) / (∑i
n
Gl
(Sr=2011,i
) )
b = (Gt
(z y=2012
) * u) / (∑i
n
Gl
(Sy=2012,i
))
c = (Gt
(z y=2012
) * u) / (∑i
n
Gl
(Sy=2013,i
))
Note that according to our formula, if
we now want to denormalize w.r.t to
the reference year 2011, so by using
our equation,
States 2011 2012 2013
Hogsmeade 48.3 86.4 100
Metropolis 100 100 80.3
La La Land 64.5 12.3 17.8
States 2011 2012 2013
Hogsmeade 48.3 a/a 86.4 b/a 100 c/a
Metropolis 100 a/a 100 b/a 80.3 c/a
La La Land 64.5 a/a 12.3 b/a 17.8 c/a
== 1/a as 2011 is our
example reference year
== a or b or c depending on the year of the
value being denormalized
Google Trends: Normalization & De-Normalization 21
Google Trends: De-Normalization (fictitious example)
By applying our de-normalization method, our indices are hence off by a factor of “a”
States 2011 2012 2013
Hogsmeade 48.3 a 86.4 b 100 c
Metropolis 100 a 100 b 80.3 c
La La Land 64.5 a 12.3 b 17.8 c
States 2011 2012 2013
Hogsmeade 48.3 a/a 86.4 b/a 100 c/a
Metropolis 100 a/a 100 b/a 80.3 c/a
La La Land 64.5 a/a 12.3 b/a 17.8 c/a
Absolute Search Volume (Hidden) Approximated Absolute Search Volume
Google Trends: Normalization & De-Normalization 22
De-Normalization (accounting for population and internet
penetration)
Finally, to de-bias data of the population level, we would need to adjust each value by a product of the population in each state multiplied by the Internet penetration to get an
approximate number of the Google search users in each state:
{
{
{
{
New Spatio-Temporal
Index
Spatial Data value
from Google Trends
Ratio of Temporal
Increase from year r
to year y
Ratio of the sum of
spatial data in year r
to that of year y
{
Ratio of population
size of state n for the
reference year r to
the year y
{
Ratio of internet
penetration of state n
for the reference year
r to the year y
Google Trends: Normalization & De-Normalization 23
Questions?
Email: SHAHAN A. MEMON at
samemon@cs.cmu.edu

Contenu connexe

Tendances

Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
jexp
 

Tendances (20)

C++ basics
C++ basicsC++ basics
C++ basics
 
Graph Database Query Languages
Graph Database Query LanguagesGraph Database Query Languages
Graph Database Query Languages
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
CouchDB Vs MongoDB
CouchDB Vs MongoDBCouchDB Vs MongoDB
CouchDB Vs MongoDB
 
Neo4j in depth session1
Neo4j in depth session1Neo4j in depth session1
Neo4j in depth session1
 
CKAN overview
CKAN overviewCKAN overview
CKAN overview
 
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use CasesNeo4j GraphTalk Helsinki - Introduction and Graph Use Cases
Neo4j GraphTalk Helsinki - Introduction and Graph Use Cases
 
Cloud sql datasheet
Cloud sql datasheetCloud sql datasheet
Cloud sql datasheet
 
Graph Thinking: Why it Matters
Graph Thinking: Why it MattersGraph Thinking: Why it Matters
Graph Thinking: Why it Matters
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and Semantics
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
 
ArangoDB
ArangoDBArangoDB
ArangoDB
 
ntroducing to the Power of Graph Technology
ntroducing to the Power of Graph Technologyntroducing to the Power of Graph Technology
ntroducing to the Power of Graph Technology
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
How to Do Anything You Want in Google Data Studio - Google Marketing Platform...
How to Do Anything You Want in Google Data Studio - Google Marketing Platform...How to Do Anything You Want in Google Data Studio - Google Marketing Platform...
How to Do Anything You Want in Google Data Studio - Google Marketing Platform...
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time Analytics
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the Cloud
 

Similaire à Google Trends Normalization and De-normalization

Chapter 16
Chapter 16Chapter 16
Chapter 16
bmcfad01
 
Rating Prediction for Restaurant
Rating Prediction for Restaurant Rating Prediction for Restaurant
Rating Prediction for Restaurant
Yaqing Wang
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docx
vrickens
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docx
bagotjesusa
 
Firm Decision (Linked In Paper)
Firm Decision (Linked In Paper)Firm Decision (Linked In Paper)
Firm Decision (Linked In Paper)
Erik Ekukanju
 
Google trends correlate
Google trends   correlateGoogle trends   correlate
Google trends correlate
Bitsytask
 
Running head RISK REDUCTION .docx
Running head RISK REDUCTION                                    .docxRunning head RISK REDUCTION                                    .docx
Running head RISK REDUCTION .docx
toltonkendal
 
Usa Retail Sales Analysis.pdf
Usa Retail Sales Analysis.pdfUsa Retail Sales Analysis.pdf
Usa Retail Sales Analysis.pdf
Vishwas Saini
 

Similaire à Google Trends Normalization and De-normalization (20)

Chapter 16
Chapter 16Chapter 16
Chapter 16
 
Data Harmonization Lusaka, 12-16 November 2012
Data Harmonization Lusaka, 12-16 November 2012Data Harmonization Lusaka, 12-16 November 2012
Data Harmonization Lusaka, 12-16 November 2012
 
ONS local presents clustering
ONS local presents clusteringONS local presents clustering
ONS local presents clustering
 
MRMW N America 2016 presentation kelly and zanutto naxion
MRMW N America 2016 presentation kelly and zanutto naxionMRMW N America 2016 presentation kelly and zanutto naxion
MRMW N America 2016 presentation kelly and zanutto naxion
 
Time series
Time seriesTime series
Time series
 
Final_Report.docx (2)
Final_Report.docx (2)Final_Report.docx (2)
Final_Report.docx (2)
 
Rating Prediction for Restaurant
Rating Prediction for Restaurant Rating Prediction for Restaurant
Rating Prediction for Restaurant
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docx
 
Introduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docxIntroduction to Data MiningInstructor’s Solution Manual.docx
Introduction to Data MiningInstructor’s Solution Manual.docx
 
Firm Decision (Linked In Paper)
Firm Decision (Linked In Paper)Firm Decision (Linked In Paper)
Firm Decision (Linked In Paper)
 
Google trends correlate
Google trends   correlateGoogle trends   correlate
Google trends correlate
 
Forecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptxForecasting_CO2_Emissions.pptx
Forecasting_CO2_Emissions.pptx
 
Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...
Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...
Time Series Least Square Forecasting Analysis and Evaluation for Natural Gas ...
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
 
Running head RISK REDUCTION .docx
Running head RISK REDUCTION                                    .docxRunning head RISK REDUCTION                                    .docx
Running head RISK REDUCTION .docx
 
Usa Retail Sales Analysis.pdf
Usa Retail Sales Analysis.pdfUsa Retail Sales Analysis.pdf
Usa Retail Sales Analysis.pdf
 
Gray Associates - Program Selection and Assessment
Gray Associates - Program Selection and AssessmentGray Associates - Program Selection and Assessment
Gray Associates - Program Selection and Assessment
 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data
 
HackerOne_Report
HackerOne_ReportHackerOne_Report
HackerOne_Report
 

Plus de Shahan Ali Memon (6)

An Agent Based Approach to Human Migration Movement
An Agent Based Approach to Human Migration MovementAn Agent Based Approach to Human Migration Movement
An Agent Based Approach to Human Migration Movement
 
An agent-based model of the effects of message interventions on opinion dynam...
An agent-based model of the effects of message interventions on opinion dynam...An agent-based model of the effects of message interventions on opinion dynam...
An agent-based model of the effects of message interventions on opinion dynam...
 
Competition among memes in a world with limited attention
Competition among memes in a world with limited attentionCompetition among memes in a world with limited attention
Competition among memes in a world with limited attention
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
MOM_Abstract_Shahan_Rohith
MOM_Abstract_Shahan_RohithMOM_Abstract_Shahan_Rohith
MOM_Abstract_Shahan_Rohith
 
MOM_Final_Poster_Shahan_Rohith_revised
MOM_Final_Poster_Shahan_Rohith_revisedMOM_Final_Poster_Shahan_Rohith_revised
MOM_Final_Poster_Shahan_Rohith_revised
 

Dernier

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Dernier (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Google Trends Normalization and De-normalization

  • 1. Google Trends: Normalization & De-Normalization Shahan A. Memon*, Saquib Razak*, Ingmar Weber+ * Carnegie Mellon University + Qatar Computing Research Institute
  • 2. Google Trends: Normalization & De-Normalization 2 Google Trends
  • 3. Google Trends: Normalization & De-Normalization 3 Google Trends [..] shows how often a particular search-term is entered relative to the total search-volume across various regions of the world, (for different time periods starting 2004), and in various languages.[1]
  • 4. Google Trends: Normalization & De-Normalization 4 Google Trends primarily shows two kinds of data: Temporal (time-series), and Spatial (location-specific)
  • 5. Google Trends: Normalization & De-Normalization 5 Temporal versus Spatial Data Figure below shows temporal as well as spatial search intensity for the term “diabetes” in the United States for the time period 2004-present { { Temporal Data Spatial Data
  • 6. Google Trends: Normalization & De-Normalization 6 Google Trends: Normalization
  • 7. Google Trends: Normalization & De-Normalization 7 Google Trends normalizes data across time and space.
  • 8. Google Trends: Normalization & De-Normalization 8 Temporal Normalization Google returns data such that the period with the highest relative search intensity (circled below) corresponds to an arbitrary reference value of 100. All other temporal units are normalized with respect to this absolute maximum of 100.
  • 9. Google Trends: Normalization & De-Normalization 9 Spatial Normalization Similarly, when getting search data across spatial units such as U.S. states, the state with the highest search intensity (circled below) is assigned a value of 100, and all other spatial units are normalized relative to the search intensity of that peak.
  • 10. Google Trends: Normalization & De-Normalization 10 Consequences of Google Trends Normalization
  • 11. Google Trends: Normalization & De-Normalization 11 Characterizing a global slow-moving trend Unlike fast-moving phenomenon such as influenza, stock markets, etc., slow-moving trends are trends that yield sparse temporal resolution (i.e. mostly measured across months, seasons or years). Examples: Rates of Diabetes, Obesity, Unemployment, etc. ○ Fitting a global temporal-only model using Google Trends is hence infeasible due to the sparse set of data points (at max equal to number of years from 2004 to present) Solution: Fit a spatial model to apply across time? Example: Fit a model by using data points across each state and across each year to predict next year’s trend
  • 12. Google Trends: Normalization & De-Normalization 12 Modeling temporal variation using spatial-only data ● Spatial data alone does not reveal information about the changes in the overall search volume. ○ For example, if the global search volume for the United States was to go up from 2014 to 2015 with everything else the same (i.e. with ranking of states being the same), spatial data alone would not pick up that temporal trend. In fact it would treat 2014 exactly as 2015.
  • 13. Google Trends: Normalization & De-Normalization 13 Modeling temporal variation using spatial-only data ● Moreover, within each year across space, data is normalized independently. ○ Hence, the value of 100 in year 2014 for the state New Mexico cannot be compared to the value of 100 in the year 2015 for the state of Mississippi.
  • 14. Google Trends: Normalization & De-Normalization 14 Modeling temporal variation using spatial-only spatio-temporal data Reconstructing the state-level contribution to the global value: ● Solution 1: Using actual absolute search volume from each of the states - ABSOLUTE SEARCH VOLUME NOT AVAILABLE ● Solution 2: Approximate the absolute search volume by creating a spatio-temporal index for each state by de-normalizing Google Trends data
  • 15. Google Trends: Normalization & De-Normalization 15 Google Trends: De-Normalization
  • 16. Google Trends: Normalization & De-Normalization 16 Google Trends: De-Normalization (steps) 1. Collect Spatial as well as Temporal data 2. Choose a reference year r 3. Rescale each value using the following formula: { { { { New Spatio-Temporal Index Spatial Data value from Google Trends Ratio of Temporal Increase from year r to year y Ratio of the sum of spatial data in year r to that of year y Terminology: ● G signifies that the data was collected via Google Trends. ● Gl signifies spatial data. ● y represents the year. ● n represents the number of states. ● r represents the reference year. ● Gt (zy ) represents the value of the corresponding keyword in year y across time. ● Gt (zr )represents the value of the corresponding keyword in reference year r across time. ● ∑i n Gl (xri ) represents the sum of the regional distribution of the corresponding keyword in the reference year r. ● ∑i n Gl (xyi ) represents the sum of the regional distribution of the corresponding keyword in the year y.
  • 17. Google Trends: Normalization & De-Normalization 17 Google Trends: De-Normalization (fictitious example) 1. Imagine a country “Utopia” with three states: Hogsmeade, Metropolis and La la land. 2. Imagine the Google Trends spatial and temporal data looks as follows with reference year 2011: States 2011 2012 2013 Hogsmeade 48.3 86.4 100 Metropolis 100 100 80.3 La La Land 64.5 12.3 17.8 Utopia 44.9 72.5 100 } } Temporal Data Spatial Data
  • 18. Google Trends: Normalization & De-Normalization 18 Google Trends: De-Normalization (fictitious example) Since the values are normalized, so then absolute search volume would be equal to: States 2011 2012 2013 Hogsmeade 48.3a 86.4b 100c Metropolis 100a 100b 80.3c La La Land 64.5a 12.3b 17.8c Utopia 44.9u 72.5u 100u 2011: 48.3a + 100a + 64.5a = 44.9u 2012: 86.4b + 100b + 12.3b = 72.5u 2013: 100c + 80.35c + 17.85c = 100u 2011: a (48.3 + 100 + 64.5) = 44.9u 2012: b (86.4 + 100 + 12.3) = 72.5u 2013: c (100 + 80.35 + 17.85) = 100u 2011: a ∑i n Gl (Sr=2011,i ) = Gt (z r=2011 ) * u 2012: b ∑i n Gl (Sy=2012,i ) = Gt (z y=2012 ) * u 2013: c ∑i n Gl (Sy=2013,i ) = Gt (z y=2012 ) * u where a, b, c and u are hidden from us.
  • 19. Google Trends: Normalization & De-Normalization 19 Google Trends: De-Normalization (fictitious example) Since the values are normalized, so then absolute search volume would be equal to: States 2011 2012 2013 Hogsmeade 48.3a 86.4b 100c Metropolis 100a 100b 80.3c La La Land 64.5a 12.3b 17.8c Utopia 44.9u 72.5u 100u 2011: 48.3a + 100a + 64.5a = 44.9u 2012: 86.4b + 100b + 12.3b = 72.5u 2013: 100c + 80.35c + 17.85c = 100u 2011: a (48.3 + 100 + 64.5) = 44.9u 2012: b (86.4 + 100 + 12.3) = 72.5u 2013: c (100 + 80.35 + 17.85) = 100u 2011: a ∑i n Gl (Sr=2011,i ) = Gt (z r=2011 ) * u 2012: b ∑i n Gl (Sy=2012,i ) = Gt (z y=2012 ) * u 2013: c ∑i n Gl (Sy=2013,i ) = Gt (z y=2012 ) * u a = (Gt (z r=2011 ) * u) / (∑i n Gl (Sr=2011,i ) ) b = (Gt (z y=2012 ) * u) / (∑i n Gl (Sy=2012,i )) c = (Gt (z y=2012 ) * u) / (∑i n Gl (Sy=2013,i ))
  • 20. Google Trends: Normalization & De-Normalization 20 Google Trends: De-Normalization (fictitious example) a = (Gt (z r=2011 ) * u) / (∑i n Gl (Sr=2011,i ) ) b = (Gt (z y=2012 ) * u) / (∑i n Gl (Sy=2012,i )) c = (Gt (z y=2012 ) * u) / (∑i n Gl (Sy=2013,i )) Note that according to our formula, if we now want to denormalize w.r.t to the reference year 2011, so by using our equation, States 2011 2012 2013 Hogsmeade 48.3 86.4 100 Metropolis 100 100 80.3 La La Land 64.5 12.3 17.8 States 2011 2012 2013 Hogsmeade 48.3 a/a 86.4 b/a 100 c/a Metropolis 100 a/a 100 b/a 80.3 c/a La La Land 64.5 a/a 12.3 b/a 17.8 c/a == 1/a as 2011 is our example reference year == a or b or c depending on the year of the value being denormalized
  • 21. Google Trends: Normalization & De-Normalization 21 Google Trends: De-Normalization (fictitious example) By applying our de-normalization method, our indices are hence off by a factor of “a” States 2011 2012 2013 Hogsmeade 48.3 a 86.4 b 100 c Metropolis 100 a 100 b 80.3 c La La Land 64.5 a 12.3 b 17.8 c States 2011 2012 2013 Hogsmeade 48.3 a/a 86.4 b/a 100 c/a Metropolis 100 a/a 100 b/a 80.3 c/a La La Land 64.5 a/a 12.3 b/a 17.8 c/a Absolute Search Volume (Hidden) Approximated Absolute Search Volume
  • 22. Google Trends: Normalization & De-Normalization 22 De-Normalization (accounting for population and internet penetration) Finally, to de-bias data of the population level, we would need to adjust each value by a product of the population in each state multiplied by the Internet penetration to get an approximate number of the Google search users in each state: { { { { New Spatio-Temporal Index Spatial Data value from Google Trends Ratio of Temporal Increase from year r to year y Ratio of the sum of spatial data in year r to that of year y { Ratio of population size of state n for the reference year r to the year y { Ratio of internet penetration of state n for the reference year r to the year y
  • 23. Google Trends: Normalization & De-Normalization 23 Questions? Email: SHAHAN A. MEMON at samemon@cs.cmu.edu