SlideShare a Scribd company logo
1 of 3
Download to read offline
The Path to TrstRank
         Building One-Click Twitter Influence Metrics
                                    Since the launch of Twitter, people have clamored for ways to
                                    access and “slice and dice” its data. One of the most common
                                    ways people use the Twitter data corpus is to measure a person’s
                                    importance and influence. Klout is an example of one product that
                                    specializes in this kind of “influencer” data.
What is
TrstRank?                           A few years ago, we created our own special version of Klout,
                                    one that took advantage of our vast historical record of the
TrstRank is an Infochimps           relationships to create an accurate number describing how
developed dataset and API           influential a Twitter user is. It’s called TrstRank and it ranks a user
that provides Twitter influence     on a scale of 1-10, with 10 being the most influential you
metrics. This API provides          can get.
Twitter influence metrics with
the click of a button! TrstRank     Coming up with such a number like TrstRank is no small task.
measures Twitter user               Setting aside the issues of getting the data, there are some very
reputation, importance and          real Big Data problems surrounding the product that require
influence in a far more             special tools for getting it done efficiently. And when you’re a
robust way than counting the        bootstrapped startup, like we were at the time, you have to be
number of followers. It is a        resourceful if you are going to get by.
sophisticated measure of a
user’s relative importance          The biggest issue with pursuing a new data product like TrstRank
within the entire Twitter           is the same one any company faces when they decide to venture
network.                            into new territory - the high risks of wasting time and money.


                                    Wasting Time
                                    One of the first problems you run into as a small team trying your
                                    hand at data science is the excess time spent on server and ma-
                                    chine configuration, instead of focusing on modeling, algorithms,
                                    and manipulating the data.

© 2012 Infochimps, Inc. All rights reserved.                                                           1
Ramp-up time for even the first phase of a project like TrstRank
                                    can be a whole day or more of engineering time.


                                    Wasting Money
                                    From our earliest days Infochimps has been based on Amazon
                                    Web Services’ (AWS) cloud, taking advantage of the flexibility
                                    and scalability it provides. With AWS, you pay for what you use,
                                    so you are always inclined to eliminate waste. In our early days
                                    we even created decision trees for when to shut down a cluster or
                                    not, depending on how many hours it was to be up but not used.


                                    This can set conflicting goals for the data scientist who would
                                    prefer to leave a cluster up overnight, even if it’s unused, so they
                                    don’t have to deal with setting everything up again the next day!


                                    Enter Ironfan
                                    We created Ironfan to solve our own problems of how to save
                                    time and money during our data science operations in the cloud.
                                    When we came up with the idea for TrstRank, it was a simple
                                    operation to spin up a cluster for early analysis and experimenta-
                                    tion. We could validate some of our algorithms and ideas on a
                                    simple cluster before moving to something more heavyweight.

                                    Ironfan and TrstRank, Now
                                    Ironfan has continued as a key tool for our monthly TrstRank
                                    operation. We continue to scrape Twitter for follower information,
                                    and with the updated data every month we crunch the TrstRank
                                    numbers again.


                                    With Ironfan, we’re able to run a multiple step operation on
                                    8 billion tweets on clusters of 30 m1.xlarge EC2 machines,
                                    while only running the resources we need when they’re needed.
                                    TrstRank takes 72 hours to complete, with resources being paid
                                    for commensurately. Without Ironfan, we’d be looking at 2-3x the
                                    costs in time and money!



© 2012 Infochimps, Inc. All rights reserved.                                                         2
About Infochimps
                                    Our mission is to make the world’s data more accessible.
                                    Infochimps helps companies understand their data. We provide
                                    tools and services that connect their internal data, leverage the
                                    power of cloud computing and new technologies such as Hadoop,
                                    and provide a wealth of external datasets, which organizations
                                    can connect to their own data.


                                    Contact Us
                                    Infochimps, Inc.
                                    1214 W 6th St. Suite 202
                                    Austin, TX 78703


                                    1-855-DATA-FUN (1-855-328-2386)


                                    www.infochimps.com
                                    info@infochimps.com


                                    Twitter: @infochimps




                      Get a free Big Data consultation
                          Let’s talk Big Data in the enterprise!

     Get a free conference with the leading big data experts regarding your enterprise big data
     project. Meet with leading data scientists Flip Kromer and/or Dhruv Bansal to talk shop
     about your project objectives, design, infrastructure, tools, etc. Find out how other compa-
     nies are solving similar problems. Learn best practices and get recommendations — free.




© 2012 Infochimps, Inc. All rights reserved.                                                        8

More Related Content

More from Infochimps, a CSC Big Data Business (12)

Infographic: CIOs & Big Data
Infographic: CIOs & Big DataInfographic: CIOs & Big Data
Infographic: CIOs & Big Data
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
The Other Way of Doing Big Data
The Other Way of Doing Big DataThe Other Way of Doing Big Data
The Other Way of Doing Big Data
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 

Recently uploaded

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 

Recently uploaded (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 

The Path to TrstRank: Building One Click Twitter Influence Metrics

  • 1. The Path to TrstRank Building One-Click Twitter Influence Metrics Since the launch of Twitter, people have clamored for ways to access and “slice and dice” its data. One of the most common ways people use the Twitter data corpus is to measure a person’s importance and influence. Klout is an example of one product that specializes in this kind of “influencer” data. What is TrstRank? A few years ago, we created our own special version of Klout, one that took advantage of our vast historical record of the TrstRank is an Infochimps relationships to create an accurate number describing how developed dataset and API influential a Twitter user is. It’s called TrstRank and it ranks a user that provides Twitter influence on a scale of 1-10, with 10 being the most influential you metrics. This API provides can get. Twitter influence metrics with the click of a button! TrstRank Coming up with such a number like TrstRank is no small task. measures Twitter user Setting aside the issues of getting the data, there are some very reputation, importance and real Big Data problems surrounding the product that require influence in a far more special tools for getting it done efficiently. And when you’re a robust way than counting the bootstrapped startup, like we were at the time, you have to be number of followers. It is a resourceful if you are going to get by. sophisticated measure of a user’s relative importance The biggest issue with pursuing a new data product like TrstRank within the entire Twitter is the same one any company faces when they decide to venture network. into new territory - the high risks of wasting time and money. Wasting Time One of the first problems you run into as a small team trying your hand at data science is the excess time spent on server and ma- chine configuration, instead of focusing on modeling, algorithms, and manipulating the data. © 2012 Infochimps, Inc. All rights reserved. 1
  • 2. Ramp-up time for even the first phase of a project like TrstRank can be a whole day or more of engineering time. Wasting Money From our earliest days Infochimps has been based on Amazon Web Services’ (AWS) cloud, taking advantage of the flexibility and scalability it provides. With AWS, you pay for what you use, so you are always inclined to eliminate waste. In our early days we even created decision trees for when to shut down a cluster or not, depending on how many hours it was to be up but not used. This can set conflicting goals for the data scientist who would prefer to leave a cluster up overnight, even if it’s unused, so they don’t have to deal with setting everything up again the next day! Enter Ironfan We created Ironfan to solve our own problems of how to save time and money during our data science operations in the cloud. When we came up with the idea for TrstRank, it was a simple operation to spin up a cluster for early analysis and experimenta- tion. We could validate some of our algorithms and ideas on a simple cluster before moving to something more heavyweight. Ironfan and TrstRank, Now Ironfan has continued as a key tool for our monthly TrstRank operation. We continue to scrape Twitter for follower information, and with the updated data every month we crunch the TrstRank numbers again. With Ironfan, we’re able to run a multiple step operation on 8 billion tweets on clusters of 30 m1.xlarge EC2 machines, while only running the resources we need when they’re needed. TrstRank takes 72 hours to complete, with resources being paid for commensurately. Without Ironfan, we’d be looking at 2-3x the costs in time and money! © 2012 Infochimps, Inc. All rights reserved. 2
  • 3. About Infochimps Our mission is to make the world’s data more accessible. Infochimps helps companies understand their data. We provide tools and services that connect their internal data, leverage the power of cloud computing and new technologies such as Hadoop, and provide a wealth of external datasets, which organizations can connect to their own data. Contact Us Infochimps, Inc. 1214 W 6th St. Suite 202 Austin, TX 78703 1-855-DATA-FUN (1-855-328-2386) www.infochimps.com info@infochimps.com Twitter: @infochimps Get a free Big Data consultation Let’s talk Big Data in the enterprise! Get a free conference with the leading big data experts regarding your enterprise big data project. Meet with leading data scientists Flip Kromer and/or Dhruv Bansal to talk shop about your project objectives, design, infrastructure, tools, etc. Find out how other compa- nies are solving similar problems. Learn best practices and get recommendations — free. © 2012 Infochimps, Inc. All rights reserved. 8