SlideShare une entreprise Scribd logo
1  sur  40
Data Science Innovation:
Transforming Instagram Data
Into Location Intelligence and Internet of Things
April 2014
Suresh.sood@uts.edu.au
or
linkedin.com/in/sureshsood
Topic Areas
1. Statistics/Data mining or Data Science?
2. Data Science workflows/discovery
3. Research informing our thinking about location intelligence
4. Data Science innovation and exploratory analysis
5. Motivations for Instagram project
6. Pattern mining trajectories/Data mining
7. Instagram analytics tools
8. NoSQL- MongoDB
9. Datafication 3 back end (walk thru)
10. Location Social Recommender system
11. Q&A
Statistics, Data Mining or Data Science ?
• Statistics
– precise deterministic causal analysis over precisely collected data
• Data Mining
– deterministic causal analysis over re-purposed data carefully sampled
• Data Science
– trending/correlation analysis over existing data using bulk of
population i.e. big data
Adapted from:
NIST Big Data taxonomy draft report (see http://bigdatawg.nist.gov /show_InputDoc.php)
Data Science Workflows & Discovery
Useful References Informing our Thinking about
Location Intelligence
(Silva et al (2013) A comparison of Foursquare and Instagram to the study of city
dynamics and urban social behavior, Proceedings of the 2nd ACM SIGKDD
International Workshop on Urban Computing
Instagram and Foursquare datasets might be compatible in finding popular regions of
city
Chaoming Song, et al. (2010), Limits of Predictability in Human Mobility, Science
There is a potential 93% average predictability in user mobility, an exceptionally high
value rooted in the inherent regularity of human behavior. Yet it is not the 93%
predictability that we find the most surprising. Rather, it is the lack of variability in
predictability across the population.
Scellato et al. (2011), NextPlace: A Spatio-temporal Prediction Framework for
Pervasive Systems. Proceedings of the 9th International Conference on Pervasive
Computing (Pervasive'11)
Daily and weekly routines => Few significant places every day => Regularity in human
activities => Regularity leads to predictability
Domenico, A. Lima, Musolesi.M. (2012) Interdependence and Predictability of Human
Mobility and Social Interactions. Proceedings of the Nokia Mobile Data Challenge
Workshop.
we have shown that it is possible to exploit the correlation between movement data and
social interactions in order to improve the accuracy of forecasting of the future geographic
position of a user. In particular, mobility correlation, measured by means of mutual
information, and the presence of social ties can be used to improve movement forecasting
by exploiting mobility data of friends. Moreover, this correlation can be used as indicator of
potential existence of physical or distant social interactions and vice versa.
Sadilek, A and Krumm, J. (2012) Far Out: Predicting Long-Term Human Mobility
Where are you going to be 285 days from now at 2pm …we show that it is possible to
predict location of a wide variety of hundreds of subjects even years into the future and
with high accuracy.
Useful References Informing our Thinking about
Location Intelligence
“One of the most fascinating aspects of location-based
data is the stability and predictability of patterns that can
be mined from seemingly unrelated data. A cluster of
random dots on a map can represent a daily
transportation route, the most popular dating spots or
the neighborhoods with the highest concentration of
gang violence. These patterns, analyzed over time and in
large numbers, begin to allow for informed predictions of
behaviors and events.
For government, this analytical capability enables better
resource allocation and more effective outcomes”.
Interview with G. Edward DeSeve, former White House ARRA chief administrator,
December 15, 2011. Seen in “The power of zoom: Transforming government
through location intelligence” by Deloitte Consulting LLP
Source: https://www.deloitte.com/assets/Dcom-
UnitedStates/Local%20Assets/Documents/Federal/us_fed_govlab_power_of_zo
om_report_100212.pdf
Useful References Informing our Thinking about
Location Intelligence
Useful NSW Govt resources on Location Intelligence
• NSW Globe – globe.six.nsw.gov.au
– Uses Google Earth to explore spatial data and images
• NSW Location Intelligence Strategy (April 2014)
– http://www.finance.nsw.gov.au/ict/sites/default/files/
NSW Location Intelliegence Strategy.pdf
• NSW Government datasets
– http://data.nsw.gov.au/
Data Science Innovation
Data Science innovation is something an
organization has not done before or even
something nobody anywhere has done before. A
data science innovation focuses on discovering
and using new or untraditional data sources to
solve new problems.
Adapted from:
Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son
The ANZ Heavy Traffic Index comprises flows
of vehicles weighing more than 3.5 tonnes
(primarily trucks) on 11 selected roads around
NZ. It is contemporaneous with GDP growth.
The ANZ Light Traffic Index is made up of light
or total traffic flows (primarily cars and
vans) on 10 selected roads around the
country. It gives a six month lead on GDP
growth
http://www.anz.co.nz/commercial-institutional/economic-markets-research/truckometer/
Discovery (Exploratory) Analytics
 Exploratory
– Unstructured
– Machine learning
– Data mining
– Complex analysis
– Data diversity
 Richness of new sources
X Business Intelligence
– Dashboard
– Real time decisioning
– Alerts
– Fresh data
– Response time
 Speed of Query
Data Science Innovation
New sources of information for data driven applications and Internet of Things
Number of journeys made
Distances travelled
Types of roads used
Speed
Time of travel
Levels of acceleration and braking
Any accidents which may occur
The Industrial Ecology Lab -
towards an integrated
Australian research platform
Black Box Insurance
• Telematics technology (black box) helps assess the driving
behavior and deliver true driver centric premiums by
capturing:
– Number of journeys
– Distances travelled
– Types of roads
– Speed
– Time of travel
– Acceleration and braking
– Any accidents
• Benefits low mileage, smooth and safe drivers
• Privacy vs. Saving monies on insurance (Canada)
– http://bit.ly/Black_box
Internet of Things
“trillion sensors”
Source: www.tsensorssummit.org
Smartphone, Google Glass or Apple Watchwill
Know What you Want before you do
“…from 2014 your phone [glasses or watch] will
anticipate your needs, do the research, tell you
what what you want to know – sometimes
before the question even occurs to you…”
Chapman, Jake (2013), The Wired World in 2014
Push Notification Providers
1. Appboy
2. Urban Airship
3. StackMob
4. Parse
5. https://notifica.re
6. http://www.xtify.com
7. http://push.io
8. http://streamin.io
9. https://pushbots.com
10.http://appsfire.com
11.mBlox
12.http://quickblox.com/
13.https://www.mobdb.net
14.http://www.elementwave.com
15.Kahuna - http://www.usekahuna.com/
http://www.quora.com/What-are-some-alternatives-to-Urban-Airship-for-mobile-push
Mobile Relationship Management Workflow (Urban Airship)
What/When?/Where?
Apple Passbook Styles
Urban Airship
Motivations for Instagram Project
• Trajectory data (not i.i.d. – independent and identically distributed)
• A new authentication approach based on trajectory
• Predictive capability phones, glasses and watches
• Internet of Things (Sensors, RFID, Wheelchairs and Drones)
• Indoor GPS
• Car parking “anywhere”
• Location based services e.g. advertising
• Tourist recommender system
• Food analytics and traceability (farm fork)
• Mobile apps with trajectory data e.g. Foursquare, Instagram, Nike+ EveryTrial
• Insurance “pay as you drive”– telematics black box based insurance policy
Pattern Mining Trajectories
Group
of
Trajectories
Trajectory Patterns:
1. Hot regions (basic unit)
2. Trajectory pattern is
relationships amongst regions
Opportunities : Location based networks
Destination prediction
Car-pooling
Personal route planning
Group buying
Loyalty
Credit card data
Adapted from: Chang, Wei, Yeh and Peng, “Discovering Personalised Routes from Trajectories”
ACM, LBSN’11, Chicago,illinois,USA, 1 November 2011
Open Source Artifact Highlighting 68 Data Mining Algorithms
First Australian Instagram Study Conducted by UTS:AAI
Why is Instagram Popular ?
• Mobile photo sharing app + social network
• Mobile first Workflow:
– take picture or select => crop/filter => geo-tag/hashtag/description/share
• Instagram is “Twitter but with photo updates”
• Status updates are transformed photos
• Default is pictures and accounts are public
• Pictures include:
– Geolocation, hashtags, comments and likes
• Mobile app friendly vs. desktop
Instagram Analytics Tools (off the shelf)
• Statigram
– Lifetime likes
– Total comments
– New followers/last 7 days
– Most liked photos
• Simply Measured
– Total engagement Instagram, Facebook and Twitter
– Engaging photo/filter/location
– Top photos by date
– Active commenters
– Best time for engagement
– Best day for engagement
– Top filters
• Nitrogram
– Countries of followers
– Most engaging
– Most commented
– Likes and comments on a photo
MongoDB - An Innovation in Databases?
“MongoDB gets the job done”
“document-oriented NoSQL database”
“MongoDB is natural choice when dealing with JSON”
“Same data model in code = same model in database”
“Data structure store to model applications”
“In MongoDB Instagram post can be stored in single collection and stored exactly as represented in the program as one
object. In a relational database an Instagram post would occupy multiple tables.”
“MongoDB understands geo-spatial co-ordinates and supports geo-spatial indexing”
“Initial MongoDB prototype RedHat OpenShift (Public/Private or Community “Platform as a Service”)
Recommendation engine integrating Mahout libraries and MongoDB (see Roadmap)
As discussed @ Journey to MongoDB:Trajectory Pattern Mining in Australian Instagram
By Suresh Sood and Xinhua Zhu
**Sydney MongoDB Meetup 30 April 2013
JSON Sources Driving Internet of Things
• RaZberry
– http://www.theregister.co.uk/Print/2013/09/16/zwave_pi_its_time_the_raspberry_pi_took_control/
• Teradata
– http://www.teradata.com.au/newsrelease.aspx?LangType=3081
• Google
– http://googledevelopers.blogspot.com.au/2012/10/got-big-json-bigquery-expands-data.html
• Rich query language
• Native secondary indexes
• Geospatial indexes & search
• Text indexes & search
• Aggregation framework (see Mongo doc for Release 2.4.9)
• Map-Reduce (Javascript ) implementation
• Client-side analytics
MongoDB Analytics Support of Instagram Project
Architectural Implementation using MongoDB
Name Node
Mongo Database distributed across shards
Data
Collection
Data
Collection Stats Stats
Map Reduce
Instagram via API
Client for Instagram project
datafication.com.au/instagram
Timeline based Trajectory Analysis
Google Map based Trajectory Analysis
Social Relationship Analysis
Location based Retrieval
Popular HashTag Analysis
Popular Image Analysis
Peak Usage Time Analysis
Active User Analysis
Roadmap
Data
collection
Individual(Group) Analysis
Find Preference and Behavior
pattern(including Trajectory pattern)
Recommendation
Recommend right product (or
service) to right person ( or
group) at right time and place
Manually Automatically
MongoDB Mahout or Mortar Recommender
Recommended
Trajectories
• Trajectories
• Points of Interest
• User profiles
• Image details
• Recommender engine
(Mahout or Mortar)
Algorithms
MongoDB
Connector for
Hadoop
Version 1.2.0
Supporting Documentation
• Instagram project documentation
– Data Model and Data Collection Procedure (V2.0)
• MongoDB Aggregation and Data Processing
Release 2.4.9

Contenu connexe

Tendances

Into the next dimension
Into the next dimensionInto the next dimension
Into the next dimensionEd Charbeneau
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...James Hendler
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)James Hendler
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMatthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...Joerg Blumtritt
 
WHERE IS THE BIG DATA INDUSTRY GOING? from Structure:Data 2013
WHERE IS THE BIG DATA INDUSTRY GOING? from Structure:Data 2013WHERE IS THE BIG DATA INDUSTRY GOING? from Structure:Data 2013
WHERE IS THE BIG DATA INDUSTRY GOING? from Structure:Data 2013Gigaom
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
U spatial digital-humanities
U spatial digital-humanitiesU spatial digital-humanities
U spatial digital-humanitiesfhap13
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupJames Hendler
 
KR in the age of Deep Learning
KR in the age of Deep LearningKR in the age of Deep Learning
KR in the age of Deep LearningJames Hendler
 
Citizen science
Citizen scienceCitizen science
Citizen sciencesamar1407
 
Mapping Online Publics: New Methods for Twitter Research
Mapping Online Publics: New Methods for Twitter ResearchMapping Online Publics: New Methods for Twitter Research
Mapping Online Publics: New Methods for Twitter ResearchAxel Bruns
 
Quantified Self and Philosophy
Quantified Self and PhilosophyQuantified Self and Philosophy
Quantified Self and PhilosophyJoerg Blumtritt
 

Tendances (18)

Into the next dimension
Into the next dimensionInto the next dimension
Into the next dimension
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-Computing
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
 
WHERE IS THE BIG DATA INDUSTRY GOING? from Structure:Data 2013
WHERE IS THE BIG DATA INDUSTRY GOING? from Structure:Data 2013WHERE IS THE BIG DATA INDUSTRY GOING? from Structure:Data 2013
WHERE IS THE BIG DATA INDUSTRY GOING? from Structure:Data 2013
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
U spatial digital-humanities
U spatial digital-humanitiesU spatial digital-humanities
U spatial digital-humanities
 
Mobile Data Analytics
Mobile Data AnalyticsMobile Data Analytics
Mobile Data Analytics
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic Markup
 
KR in the age of Deep Learning
KR in the age of Deep LearningKR in the age of Deep Learning
KR in the age of Deep Learning
 
Citizen science
Citizen scienceCitizen science
Citizen science
 
Mapping Online Publics: New Methods for Twitter Research
Mapping Online Publics: New Methods for Twitter ResearchMapping Online Publics: New Methods for Twitter Research
Mapping Online Publics: New Methods for Twitter Research
 
Quantified Self and Philosophy
Quantified Self and PhilosophyQuantified Self and Philosophy
Quantified Self and Philosophy
 

Similaire à Transforming instagram data into location intelligence

Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
SPAWAR/N-NC PLTW/STEM brief 20130620
SPAWAR/N-NC PLTW/STEM brief 20130620SPAWAR/N-NC PLTW/STEM brief 20130620
SPAWAR/N-NC PLTW/STEM brief 20130620N/NC - SPAWAR
 
Iemiot tipoftheicebergver1-140826100738-phpapp01
Iemiot tipoftheicebergver1-140826100738-phpapp01Iemiot tipoftheicebergver1-140826100738-phpapp01
Iemiot tipoftheicebergver1-140826100738-phpapp01Kristin Russell
 
Understanding Human Mobility
Understanding Human MobilityUnderstanding Human Mobility
Understanding Human MobilityWidy Widyawan
 
A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationUniversity of South Africa (Unisa)
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DigitYser
 
Instagram Social Media Analytics
Instagram Social Media Analytics Instagram Social Media Analytics
Instagram Social Media Analytics suresh sood
 
The User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive SearchThe User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive SearchDawn Anderson MSc DigM
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Will We Command Our Data? From the Petascale to the Personal
Will We Command Our Data?  From the Petascale to the PersonalWill We Command Our Data?  From the Petascale to the Personal
Will We Command Our Data? From the Petascale to the PersonalRichard Akerman
 
Facebook marketing event - Big data & social
Facebook marketing event - Big data & socialFacebook marketing event - Big data & social
Facebook marketing event - Big data & socialIskander Smit
 
Facebook marketing event - Iskander Smit
Facebook marketing event - Iskander SmitFacebook marketing event - Iskander Smit
Facebook marketing event - Iskander SmitInfo.nl
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentationiHub Research
 
A Perspective from the intersection Data Science, Mobility, and Mobile Devices
A Perspective from the intersection Data Science, Mobility, and Mobile DevicesA Perspective from the intersection Data Science, Mobility, and Mobile Devices
A Perspective from the intersection Data Science, Mobility, and Mobile DevicesYael Garten
 
Data science innovations
Data science innovations Data science innovations
Data science innovations suresh sood
 

Similaire à Transforming instagram data into location intelligence (20)

Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
SPAWAR/N-NC PLTW/STEM brief 20130620
SPAWAR/N-NC PLTW/STEM brief 20130620SPAWAR/N-NC PLTW/STEM brief 20130620
SPAWAR/N-NC PLTW/STEM brief 20130620
 
Iemiot tipoftheicebergver1-140826100738-phpapp01
Iemiot tipoftheicebergver1-140826100738-phpapp01Iemiot tipoftheicebergver1-140826100738-phpapp01
Iemiot tipoftheicebergver1-140826100738-phpapp01
 
Understanding Human Mobility
Understanding Human MobilityUnderstanding Human Mobility
Understanding Human Mobility
 
A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) Education
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
DISUMMIT Keynote presentation from Kirk Borne - From Sensors to Sense-Making
 
Instagram Social Media Analytics
Instagram Social Media Analytics Instagram Social Media Analytics
Instagram Social Media Analytics
 
Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...
Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...
Data collection, Data Integration, Data Understanding e Data Cleaning & Prepa...
 
The User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive SearchThe User is the Query - The Rise of Predictive Proactive Search
The User is the Query - The Rise of Predictive Proactive Search
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Will We Command Our Data? From the Petascale to the Personal
Will We Command Our Data?  From the Petascale to the PersonalWill We Command Our Data?  From the Petascale to the Personal
Will We Command Our Data? From the Petascale to the Personal
 
CSS-Intro-Lecture.pdf
CSS-Intro-Lecture.pdfCSS-Intro-Lecture.pdf
CSS-Intro-Lecture.pdf
 
Facebook marketing event - Big data & social
Facebook marketing event - Big data & socialFacebook marketing event - Big data & social
Facebook marketing event - Big data & social
 
Facebook marketing event - Iskander Smit
Facebook marketing event - Iskander SmitFacebook marketing event - Iskander Smit
Facebook marketing event - Iskander Smit
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentation
 
A Perspective from the intersection Data Science, Mobility, and Mobile Devices
A Perspective from the intersection Data Science, Mobility, and Mobile DevicesA Perspective from the intersection Data Science, Mobility, and Mobile Devices
A Perspective from the intersection Data Science, Mobility, and Mobile Devices
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
Scrlc geo ppt
Scrlc geo pptScrlc geo ppt
Scrlc geo ppt
 

Plus de suresh sood

Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to NowcastingGetting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcastingsuresh sood
 
Data Science Innovations
Data Science InnovationsData Science Innovations
Data Science Innovationssuresh sood
 
Foresight conversation
Foresight conversationForesight conversation
Foresight conversationsuresh sood
 
Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018suresh sood
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science suresh sood
 
Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016suresh sood
 
Beyond dashboards
Beyond dashboardsBeyond dashboards
Beyond dashboardssuresh sood
 
Systemof insight
Systemof insightSystemof insight
Systemof insightsuresh sood
 
Future of jobs, big data & innovation
Future of jobs, big data & innovation Future of jobs, big data & innovation
Future of jobs, big data & innovation suresh sood
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltoolssuresh sood
 

Plus de suresh sood (20)

Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to NowcastingGetting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
 
Bigdata AI
Bigdata AI Bigdata AI
Bigdata AI
 
Bigdata ai
Bigdata aiBigdata ai
Bigdata ai
 
Data Science Innovations
Data Science InnovationsData Science Innovations
Data Science Innovations
 
Foresight conversation
Foresight conversationForesight conversation
Foresight conversation
 
Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018
 
future2020
future2020future2020
future2020
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Swarm jobs
Swarm jobsSwarm jobs
Swarm jobs
 
Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016
 
Beyond dashboards
Beyond dashboardsBeyond dashboards
Beyond dashboards
 
Systemof insight
Systemof insightSystemof insight
Systemof insight
 
TPA
TPATPA
TPA
 
Datapreneurs
DatapreneursDatapreneurs
Datapreneurs
 
Future of jobs, big data & innovation
Future of jobs, big data & innovation Future of jobs, big data & innovation
Future of jobs, big data & innovation
 
Jobs Complexity
Jobs ComplexityJobs Complexity
Jobs Complexity
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
Spark
SparkSpark
Spark
 
Datainnovation
DatainnovationDatainnovation
Datainnovation
 
Bigdatahuman
BigdatahumanBigdatahuman
Bigdatahuman
 

Dernier

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 

Dernier (20)

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 

Transforming instagram data into location intelligence

  • 1. Data Science Innovation: Transforming Instagram Data Into Location Intelligence and Internet of Things April 2014 Suresh.sood@uts.edu.au or linkedin.com/in/sureshsood
  • 2. Topic Areas 1. Statistics/Data mining or Data Science? 2. Data Science workflows/discovery 3. Research informing our thinking about location intelligence 4. Data Science innovation and exploratory analysis 5. Motivations for Instagram project 6. Pattern mining trajectories/Data mining 7. Instagram analytics tools 8. NoSQL- MongoDB 9. Datafication 3 back end (walk thru) 10. Location Social Recommender system 11. Q&A
  • 3. Statistics, Data Mining or Data Science ? • Statistics – precise deterministic causal analysis over precisely collected data • Data Mining – deterministic causal analysis over re-purposed data carefully sampled • Data Science – trending/correlation analysis over existing data using bulk of population i.e. big data Adapted from: NIST Big Data taxonomy draft report (see http://bigdatawg.nist.gov /show_InputDoc.php)
  • 5. Useful References Informing our Thinking about Location Intelligence (Silva et al (2013) A comparison of Foursquare and Instagram to the study of city dynamics and urban social behavior, Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing Instagram and Foursquare datasets might be compatible in finding popular regions of city Chaoming Song, et al. (2010), Limits of Predictability in Human Mobility, Science There is a potential 93% average predictability in user mobility, an exceptionally high value rooted in the inherent regularity of human behavior. Yet it is not the 93% predictability that we find the most surprising. Rather, it is the lack of variability in predictability across the population. Scellato et al. (2011), NextPlace: A Spatio-temporal Prediction Framework for Pervasive Systems. Proceedings of the 9th International Conference on Pervasive Computing (Pervasive'11) Daily and weekly routines => Few significant places every day => Regularity in human activities => Regularity leads to predictability
  • 6. Domenico, A. Lima, Musolesi.M. (2012) Interdependence and Predictability of Human Mobility and Social Interactions. Proceedings of the Nokia Mobile Data Challenge Workshop. we have shown that it is possible to exploit the correlation between movement data and social interactions in order to improve the accuracy of forecasting of the future geographic position of a user. In particular, mobility correlation, measured by means of mutual information, and the presence of social ties can be used to improve movement forecasting by exploiting mobility data of friends. Moreover, this correlation can be used as indicator of potential existence of physical or distant social interactions and vice versa. Sadilek, A and Krumm, J. (2012) Far Out: Predicting Long-Term Human Mobility Where are you going to be 285 days from now at 2pm …we show that it is possible to predict location of a wide variety of hundreds of subjects even years into the future and with high accuracy. Useful References Informing our Thinking about Location Intelligence
  • 7. “One of the most fascinating aspects of location-based data is the stability and predictability of patterns that can be mined from seemingly unrelated data. A cluster of random dots on a map can represent a daily transportation route, the most popular dating spots or the neighborhoods with the highest concentration of gang violence. These patterns, analyzed over time and in large numbers, begin to allow for informed predictions of behaviors and events. For government, this analytical capability enables better resource allocation and more effective outcomes”. Interview with G. Edward DeSeve, former White House ARRA chief administrator, December 15, 2011. Seen in “The power of zoom: Transforming government through location intelligence” by Deloitte Consulting LLP Source: https://www.deloitte.com/assets/Dcom- UnitedStates/Local%20Assets/Documents/Federal/us_fed_govlab_power_of_zo om_report_100212.pdf Useful References Informing our Thinking about Location Intelligence
  • 8. Useful NSW Govt resources on Location Intelligence • NSW Globe – globe.six.nsw.gov.au – Uses Google Earth to explore spatial data and images • NSW Location Intelligence Strategy (April 2014) – http://www.finance.nsw.gov.au/ict/sites/default/files/ NSW Location Intelliegence Strategy.pdf • NSW Government datasets – http://data.nsw.gov.au/
  • 9. Data Science Innovation Data Science innovation is something an organization has not done before or even something nobody anywhere has done before. A data science innovation focuses on discovering and using new or untraditional data sources to solve new problems. Adapted from: Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son
  • 10. The ANZ Heavy Traffic Index comprises flows of vehicles weighing more than 3.5 tonnes (primarily trucks) on 11 selected roads around NZ. It is contemporaneous with GDP growth. The ANZ Light Traffic Index is made up of light or total traffic flows (primarily cars and vans) on 10 selected roads around the country. It gives a six month lead on GDP growth http://www.anz.co.nz/commercial-institutional/economic-markets-research/truckometer/
  • 11. Discovery (Exploratory) Analytics  Exploratory – Unstructured – Machine learning – Data mining – Complex analysis – Data diversity  Richness of new sources X Business Intelligence – Dashboard – Real time decisioning – Alerts – Fresh data – Response time  Speed of Query
  • 12. Data Science Innovation New sources of information for data driven applications and Internet of Things Number of journeys made Distances travelled Types of roads used Speed Time of travel Levels of acceleration and braking Any accidents which may occur The Industrial Ecology Lab - towards an integrated Australian research platform
  • 13. Black Box Insurance • Telematics technology (black box) helps assess the driving behavior and deliver true driver centric premiums by capturing: – Number of journeys – Distances travelled – Types of roads – Speed – Time of travel – Acceleration and braking – Any accidents • Benefits low mileage, smooth and safe drivers • Privacy vs. Saving monies on insurance (Canada) – http://bit.ly/Black_box
  • 14. Internet of Things “trillion sensors” Source: www.tsensorssummit.org
  • 15. Smartphone, Google Glass or Apple Watchwill Know What you Want before you do “…from 2014 your phone [glasses or watch] will anticipate your needs, do the research, tell you what what you want to know – sometimes before the question even occurs to you…” Chapman, Jake (2013), The Wired World in 2014
  • 16. Push Notification Providers 1. Appboy 2. Urban Airship 3. StackMob 4. Parse 5. https://notifica.re 6. http://www.xtify.com 7. http://push.io 8. http://streamin.io 9. https://pushbots.com 10.http://appsfire.com 11.mBlox 12.http://quickblox.com/ 13.https://www.mobdb.net 14.http://www.elementwave.com 15.Kahuna - http://www.usekahuna.com/ http://www.quora.com/What-are-some-alternatives-to-Urban-Airship-for-mobile-push
  • 17. Mobile Relationship Management Workflow (Urban Airship) What/When?/Where?
  • 19. Motivations for Instagram Project • Trajectory data (not i.i.d. – independent and identically distributed) • A new authentication approach based on trajectory • Predictive capability phones, glasses and watches • Internet of Things (Sensors, RFID, Wheelchairs and Drones) • Indoor GPS • Car parking “anywhere” • Location based services e.g. advertising • Tourist recommender system • Food analytics and traceability (farm fork) • Mobile apps with trajectory data e.g. Foursquare, Instagram, Nike+ EveryTrial • Insurance “pay as you drive”– telematics black box based insurance policy
  • 20. Pattern Mining Trajectories Group of Trajectories Trajectory Patterns: 1. Hot regions (basic unit) 2. Trajectory pattern is relationships amongst regions Opportunities : Location based networks Destination prediction Car-pooling Personal route planning Group buying Loyalty Credit card data Adapted from: Chang, Wei, Yeh and Peng, “Discovering Personalised Routes from Trajectories” ACM, LBSN’11, Chicago,illinois,USA, 1 November 2011
  • 21. Open Source Artifact Highlighting 68 Data Mining Algorithms
  • 22. First Australian Instagram Study Conducted by UTS:AAI
  • 23. Why is Instagram Popular ? • Mobile photo sharing app + social network • Mobile first Workflow: – take picture or select => crop/filter => geo-tag/hashtag/description/share • Instagram is “Twitter but with photo updates” • Status updates are transformed photos • Default is pictures and accounts are public • Pictures include: – Geolocation, hashtags, comments and likes • Mobile app friendly vs. desktop
  • 24. Instagram Analytics Tools (off the shelf) • Statigram – Lifetime likes – Total comments – New followers/last 7 days – Most liked photos • Simply Measured – Total engagement Instagram, Facebook and Twitter – Engaging photo/filter/location – Top photos by date – Active commenters – Best time for engagement – Best day for engagement – Top filters • Nitrogram – Countries of followers – Most engaging – Most commented – Likes and comments on a photo
  • 25. MongoDB - An Innovation in Databases? “MongoDB gets the job done” “document-oriented NoSQL database” “MongoDB is natural choice when dealing with JSON” “Same data model in code = same model in database” “Data structure store to model applications” “In MongoDB Instagram post can be stored in single collection and stored exactly as represented in the program as one object. In a relational database an Instagram post would occupy multiple tables.” “MongoDB understands geo-spatial co-ordinates and supports geo-spatial indexing” “Initial MongoDB prototype RedHat OpenShift (Public/Private or Community “Platform as a Service”) Recommendation engine integrating Mahout libraries and MongoDB (see Roadmap) As discussed @ Journey to MongoDB:Trajectory Pattern Mining in Australian Instagram By Suresh Sood and Xinhua Zhu **Sydney MongoDB Meetup 30 April 2013
  • 26. JSON Sources Driving Internet of Things • RaZberry – http://www.theregister.co.uk/Print/2013/09/16/zwave_pi_its_time_the_raspberry_pi_took_control/ • Teradata – http://www.teradata.com.au/newsrelease.aspx?LangType=3081 • Google – http://googledevelopers.blogspot.com.au/2012/10/got-big-json-bigquery-expands-data.html
  • 27. • Rich query language • Native secondary indexes • Geospatial indexes & search • Text indexes & search • Aggregation framework (see Mongo doc for Release 2.4.9) • Map-Reduce (Javascript ) implementation • Client-side analytics MongoDB Analytics Support of Instagram Project
  • 28. Architectural Implementation using MongoDB Name Node Mongo Database distributed across shards Data Collection Data Collection Stats Stats Map Reduce Instagram via API
  • 29. Client for Instagram project datafication.com.au/instagram
  • 31. Google Map based Trajectory Analysis
  • 36. Peak Usage Time Analysis
  • 38. Roadmap Data collection Individual(Group) Analysis Find Preference and Behavior pattern(including Trajectory pattern) Recommendation Recommend right product (or service) to right person ( or group) at right time and place Manually Automatically
  • 39. MongoDB Mahout or Mortar Recommender Recommended Trajectories • Trajectories • Points of Interest • User profiles • Image details • Recommender engine (Mahout or Mortar) Algorithms MongoDB Connector for Hadoop Version 1.2.0
  • 40. Supporting Documentation • Instagram project documentation – Data Model and Data Collection Procedure (V2.0) • MongoDB Aggregation and Data Processing Release 2.4.9