HTML Injection Attacks: Impact and Mitigation Strategies
Big data paris 2011 is cool florian douetteau
1. Big Data
+ Social
+ Games
@Is Cool
16/03/2012
TITRE DOCUMENT
2. Who is IsCool Entertainment?
Social game publisher based in Agenda
Paris, France • What do we do?
Social Gaming
#1 French publisher in terms of • What kind of (Big) Analytics we do?
audience (450k Daily Active Lots
Users) & revenue • How we do it ?
Hadoop, Python, R, Tableau, Geph and stuff…
2.8 Millions Fans
80 employees
Florian Douetteau
9.1 million € revenue in 2010 CTO
4 live applications on Facebook @fdouetteau
3. Is Cool Games
IsCool, Absolute Solitaire,
Delirious Collectible The best solitaire game
Game available online
Temple Of Mahjong, Belote Multijoueur,
Collect, Play, Exchange Play, Win, Meet
4. Games & Virtual Goods
Play the Game & Gain some
virtual goods
Play again & Gain more
Collaborate with other players
& Gain More
….
Possibly buy
To grow quicker
To help others
5. Virtual Goods Virtual Economy
Virtual Goods Must not be too
easy to get
The game would not be fun !
No monetization
Virtual Goods must not be hard
to get
People would churn because of Let’s Trade 1
Watch against
frustration ! 3 Hammers
Virtual Goods can be usually
traded between players
Virtual and actual “Price” of a
good
6. Why is this Big Data ?
Number of object transactions per day
NYSE 3,600,000,000
18 Million users
generated actions
per day IsCool 2,150,000,000
7 Billions per year.
Nasdaq 1,600,000,000
9,8 TB Data to
Nikkey 1,500,000,000
analyze
Footsie 860,000,000
CAC 40 142,500,000
7. The Real Big Data Challenge
Collaborate for collective insights
Programmers’ Perspective :
Game Designer Perspective : Log Files & Work ?
Nice Charts ?
Realtime?
what
metrics?
data scientist?
BI Veteran: Business Guy Perspective:
Schema Definition ? Revenue Forecast ?
8. Specifics of Game Analytics
Virtual Goods
We are the Factory AND the
Shop, and most of the products
are free.
Social Networks
Network effects are key
Games
The product changes EVERY day !
Sudden wage of unexpected
players from Guatemala !
People try to cheat !
9. Use Case 1 : Understanding Users
1: Defining engagement
Tenure length
Visit frequency
Virality
Traffic Key drivers??? Paying user conversion
ARPPU
Score
Use of feature A,B,C…
10. Case Study 1 - Segment User Behaviours
2: Describing engagement patterns: Running a segment analysis
11. Use Case 2 : Understanding Users as a whole
10 Million Nodes
Around 1 000 Billion
Edges
How does the graph evolve in
time ?
What are the
communities?
12. Understanding Users as a Whole
Lots of small clusters ((mostly 2
players)
Some mid size communities
A very large community
13. Use Case 3 : Analyze Long Terms effect of a feature
A/B Tests
Some features can be A/B tested
…and some cannot !
How to measure the uplift ?
Are players using the new feature…
More engaged?
Generate more virality ?
etc….
Complexity
Multiple variable to observe
(other features, history )
TITRE DOCUMENT 16/03/2012
14. … How
over the last 3 years Analyzing the Offer
• Tools changed • Online Analytics Platform
• Scale changed • Commercial / Open Source ETL
• Focus Changed • Commercial BI Visualization Software
• Commercial / Open Source databases
(column stores)
•…
15. What we learned
Diversity Relativity Superciality
• There's no Hadoop+R • Windows / Linux ? Cloud • Ability to display is more
Magic (Expertise, Entry or on-premise ? important than the
Costs, Maintenance) • Do you have internal data result.
• There’s no XYZ Magical mining experts (yes/no) ?
Product • Do you have internal
scalability
experts (yes/no) ?
• What is _real_ budget ?
0K ? 10K ? 100K ? 1000K
?
16. Mixed Approach
SaaS Analytics Platforms
For common, business metrics (virality,
traffic, engagement)
Corporate Level Visibility
Day-to-day
Internal Datawarehousing
Detailed Business Metrics
Virtual Economy Modeling
Long term behaviours
Business Level Visibility
Week-to-Week
Datamining tools
Ad-hoc analytics
Graph Analytics
17. Datawarehouse for the Big Data era
Hadoop/Hive (through Amazon’s Open Source ETL (PyBabe)
Elastric Map Reduce) • Pure Python ETL
• Used to reduce the amount of information : • Good integration with AWS/ S3
10 GB a day => 1GB a day • Easy to integrate in our development
• High cost of development for "business" environment
related processing
Columnar Database (Infinidb, Open Dashboarding (Tableau Software)
Source) • +Direct connection to the database
• Free (as beer) • +Excel fan biz guy can use it with no training !
• Good performance for analytics tasks on a
few hundreds million lines ( SELECT … GROUP
BY … ORDER … )
• Featured and limited performance compared
to commercial Column Stores