SlideShare une entreprise Scribd logo
1  sur  34
EDHREC, Magic: TG
Recommendation Engine
(and data science on games)
Donald Miner @donaldpminer dminer@minerkasch.com
September 21st, 2015 - Data Science MD Meetup
Games & Stuff in Glen Burnie, MD
About Don
About Don, Planeswalker
Talk agenda
 Background
 EDHREC Overview
 EDHREC Data Analysis
 EDHREC Architecture
 Data Science Application UX Lessons Learned
 Related Work in Magic and Other Domains
 Virtues of Data Science on Games
Magic: The Gathering
 Trading card game
 First published in 1993
 20 million players in 2015 (World of Warcraft has 7.1 million subscribers)
 Organized tournaments
 Secondary market
1993
$27,000
Elder Dragon Highlander / Commander
 One of the Magic “formats”
 Started independently from WOTC late 00’s
 Officially supported starting 2011
 Typically multiplayer
 100-card singleton deck
(instead of 60-card, up to 4x copies)
 Each deck has a single “commander”
(unique to this format)
Data Science
 Term coined around 2008
 Represents a shift in data
analysis in industry
 A mix of computer science,
machine learning, statistics,
programming, visualization,
and domain knowledge
EDHREC Overview
EDHREC Deck Recommendations
EDHREC Commander Stats
EDHREC Card Stats
EDHREC Recommendation Engine
EDHREC Algorithm 1.0
User-based Collaborative Filtering
Image from http://blog.comsysto.com/2013/04/03/background-of-collaborative-filtering-with-mahout/
Analogy:
Deck -> User
Card -> Item
Pros:
Better at picking up bigger themes in decks
Easy to implement
Cons:
Had issues discovering subtle deck themes
Had issues pointing out combos
Recommendation Engine 2.0 Algorithm
31,000
decks
Decks that contain Sanguine Bond AND Exquisite Blood
÷
Decks that contain Sanguine Bond OR Exquisite Blood
Step 1: Card Affinity Matrix
Jaccard / Tanimoto distance
Repeat for every card combination
(15,000 cards)
This is the basis of the Card Analysis page
This matrix is built offline in batch
Image from http://blog.comsysto.com/2013/04/03/background-of-collaborative-filtering-with-mahout/
Recommendation Engine 2.0 Algorithm
31,000
decks
1. Select each row of the Tanimoto matrix corresponding to cards in Deck D
2. Sum the columns
3. Sort by score, display results
Step 2: Calculate Scores
This gives you a sum of the Tanimoto coefficients
I really have no idea what this algorithm is called… I’m not sure if it’s novel or not
This is performed in real time
Lessons learned:
Taking out the garbage
 A lot of garbage gets submitted to EDHREC
 Decks with <20 cards
 Decks with invalid commanders
 Decks with illegal cards
 The algorithms handle this well and rarely do problem cards show up
 However, pruning “worthless” decks significantly improves
performance due to all the O(N^2) algorithms going on
General advice: Think about which pieces of data are worthless in your data set
Lessons learned:
Partitioning (too much or too little)
 Partitioning the user/deck space into subgroups is a great way to speed things
up in recommendation engines
 The 31,000 EDHREC decks are partitioned into 27 partitions
(one per possible color combination)
 Algorithms are ran typically on a single partition
(e.g., Red/Blue deck recommendations only come from other Red/Blue decks)
 However, themes that span color combinations suffer worse recommendations
 However, partitioning too deep causes problems
 I tried partitioning by commander, and that was awful:
new commanders, themes than span commanders suffer
General advice: There is no good way to figure out a partition scheme, just try it out
EDHREC Architecture
Batch Processes
(cron)
EDHREC Architecture
Reddit Bot
(praw)
Batch Processes
(cron)
Reddit Bot
(praw)
Redis
• In-memory key/value data store
• Stores website state
• Utilized as a cache
• Stores all of the decks
• Stores all of the pre-computed stats
• Stores all metadata about Magic cards
• EDHREC serializes most things to common
internal json data formats
• Very fast
• Very easy to use
• Good support with Python
• Getting harder to do “analysis”
• Going to move to Redshift SQL database
for analytical things
Batch Processes
(cron)
Reddit Bot
(praw)
Cherrypy
• “A Minimalist Python Web Framework”
• Runs the website
• Pulls data from Redis and then renders the
results as HTML
• Most of the data from Redis is cached in
memory objects (IPC to Redis too slow)
• EDHREC runs 6 of these in parallel behind
an NGINX round robin proxy
• Very easy to use, doesn’t get in your way
• Very easy to expose Python data science
• Running into problems with
maintainability due to my own sloppiness
Batch Processes
(cron)
Reddit Bot
(praw)
Python
• Programming language
• Plenty of good libraries for data analysis:
numpy, pandas in this case
• Can handle the “full stack” well
(from data analysis to web front end)
• PRAW is a great framework for building
Reddit bots
• Most things run every few hours
Batch Processes
(cron)
Reddit Bot
(praw)
Amazon Web Services
• Infrastructure as a Service
• Easily spin up new servers with
pre-built operating system
• EDHREC runs on one m4.2xlarge
8 CPUs, 32GB RAM, Better network
10 cents per hour ($72/month)
• Great for recovering from failures
• Easy to upgrade machine
• Very good uptime so far
• Easy to backup to s3
Some observations about
User Experience and AI applications
LOL! Look at the dumb bot!
Lesson learned:
Humans LOVE pointing out when something the AI is doing is strange or wrong,
even if it gets it right 90% of the time.
Therefore, I am very conservative of what I end up publishing as
I’ve gotten burned a few times. Which can be a shame sometimes.
(just a couple examples)
The apocalypse is near
 “EDHREC is ruining EDH/Commander”
 “EDHREC is taking the fun out of deck construction”
 “EDHREC kills conversation”
MapQuest takes the fun out of planning trips!
 Mostly these are taken as compliments
 AI is going to have resistance from people who liked the manual labor
 I don’t think the commentary entirely off base… but...
Sometimes too much is too much
 Over-engineering and doing too much is an easy trap
 You want to make it better and provide more “intelligence”
 Give the users ability to discover and find things
 Increases user engagement
 Better results
 Philosophy: EDHREC is a tool, not a solution
 I’m starting to see my other data science projects this way
Lesson learned:
Spend more time on interactive “discovery tools”
than intelligent do-everything algorithms
Interesting related things to look at
RoboRosewater
 Rosewater is the name of the Magic lead designer
 RoboRosewater is a “backwards” neural network, trained on
Magic cards
MTG Finance
Lots of analysis around Magic finance!
mtgstocks.com
Diablo 3 build clustering
Virtues of this whole thing
Community
 Most hobbies are defined by communities
 Technology can bring communities together
Self-Development
 Data has value and getting data of value is hard
 Hobby-based data is relatively easy to acquire (compared to say data used by
health care companies)
 A great way to do real data science on real data (opposed to synthetic data on a
more valuable data set)
Profit!
 Hobbyists are passionate about their hobby and willing to spend money on it
 They will pay for and support services they like
EDHREC, Magic: TG
Recommendation Engine
(and data science on games)
Donald Miner @donaldpminer dminer@minerkasch.com
September 21st, 2015 - Data Science MD Meetup
Games & Stuff in Glen Burnie, MD

Contenu connexe

Tendances

H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
Sri Ambati
 

Tendances (20)

Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design Patterns
 
Beyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleBeyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at Scale
 
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013Ted Willke, Intel Labs MLconf 2013
Ted Willke, Intel Labs MLconf 2013
 
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scala
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
 
Spark - Philly JUG
Spark  - Philly JUGSpark  - Philly JUG
Spark - Philly JUG
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
Hadoop: The Default Machine Learning Platform ?
Hadoop: The Default Machine Learning Platform ?Hadoop: The Default Machine Learning Platform ?
Hadoop: The Default Machine Learning Platform ?
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability Mahout
 
Summary machine learning and model deployment
Summary machine learning and model deploymentSummary machine learning and model deployment
Summary machine learning and model deployment
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
Big Data Science with H2O in R
Big Data Science with H2O in RBig Data Science with H2O in R
Big Data Science with H2O in R
 

Similaire à EDHREC @ Data Science MD

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 
Encode x Tezos: Building a dApp on Tezos
Encode x Tezos: Building a dApp on TezosEncode x Tezos: Building a dApp on Tezos
Encode x Tezos: Building a dApp on Tezos
KlaraOrban
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
Discover Pinterest
 

Similaire à EDHREC @ Data Science MD (20)

Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Blockchains for AI [With New Applications]
Blockchains for AI [With New Applications]Blockchains for AI [With New Applications]
Blockchains for AI [With New Applications]
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Tokens and Complex Systems
Tokens and Complex SystemsTokens and Complex Systems
Tokens and Complex Systems
 
Encode x Tezos: Building a dApp on Tezos
Encode x Tezos: Building a dApp on TezosEncode x Tezos: Building a dApp on Tezos
Encode x Tezos: Building a dApp on Tezos
 
Building a dApp on Tezos
Building a dApp on TezosBuilding a dApp on Tezos
Building a dApp on Tezos
 
Big Data
Big DataBig Data
Big Data
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and Nature
 
Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"Five Ways To Do Data Analytics "The Wrong Way"
Five Ways To Do Data Analytics "The Wrong Way"
 
Building a Holodeck
Building a HolodeckBuilding a Holodeck
Building a Holodeck
 
Breaking DES
Breaking DESBreaking DES
Breaking DES
 
ff.pptx
ff.pptxff.pptx
ff.pptx
 
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal DatabaseDebunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
 
From Data to Visualization, what happens in between?
From Data to Visualization, what happens in between?From Data to Visualization, what happens in between?
From Data to Visualization, what happens in between?
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZone
 
2951085 dzone-2016guidetobigdata
2951085 dzone-2016guidetobigdata2951085 dzone-2016guidetobigdata
2951085 dzone-2016guidetobigdata
 

Plus de Donald Miner

An Introduction to Accumulo
An Introduction to AccumuloAn Introduction to Accumulo
An Introduction to Accumulo
Donald Miner
 
Data, The New Currency
Data, The New CurrencyData, The New Currency
Data, The New Currency
Donald Miner
 

Plus de Donald Miner (6)

Machine Learning Vital Signs
Machine Learning Vital SignsMachine Learning Vital Signs
Machine Learning Vital Signs
 
Survey of Accumulo Techniques for Indexing Data
Survey of Accumulo Techniques for Indexing DataSurvey of Accumulo Techniques for Indexing Data
Survey of Accumulo Techniques for Indexing Data
 
An Introduction to Accumulo
An Introduction to AccumuloAn Introduction to Accumulo
An Introduction to Accumulo
 
SQL on Accumulo
SQL on AccumuloSQL on Accumulo
SQL on Accumulo
 
Data, The New Currency
Data, The New CurrencyData, The New Currency
Data, The New Currency
 
The Amino Analytical Framework - Leveraging Accumulo to the Fullest
The Amino Analytical Framework - Leveraging Accumulo to the Fullest The Amino Analytical Framework - Leveraging Accumulo to the Fullest
The Amino Analytical Framework - Leveraging Accumulo to the Fullest
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

EDHREC @ Data Science MD

  • 1. EDHREC, Magic: TG Recommendation Engine (and data science on games) Donald Miner @donaldpminer dminer@minerkasch.com September 21st, 2015 - Data Science MD Meetup Games & Stuff in Glen Burnie, MD
  • 4. Talk agenda  Background  EDHREC Overview  EDHREC Data Analysis  EDHREC Architecture  Data Science Application UX Lessons Learned  Related Work in Magic and Other Domains  Virtues of Data Science on Games
  • 5. Magic: The Gathering  Trading card game  First published in 1993  20 million players in 2015 (World of Warcraft has 7.1 million subscribers)  Organized tournaments  Secondary market 1993 $27,000
  • 6. Elder Dragon Highlander / Commander  One of the Magic “formats”  Started independently from WOTC late 00’s  Officially supported starting 2011  Typically multiplayer  100-card singleton deck (instead of 60-card, up to 4x copies)  Each deck has a single “commander” (unique to this format)
  • 7. Data Science  Term coined around 2008  Represents a shift in data analysis in industry  A mix of computer science, machine learning, statistics, programming, visualization, and domain knowledge
  • 9.
  • 14. EDHREC Algorithm 1.0 User-based Collaborative Filtering Image from http://blog.comsysto.com/2013/04/03/background-of-collaborative-filtering-with-mahout/ Analogy: Deck -> User Card -> Item Pros: Better at picking up bigger themes in decks Easy to implement Cons: Had issues discovering subtle deck themes Had issues pointing out combos
  • 15. Recommendation Engine 2.0 Algorithm 31,000 decks Decks that contain Sanguine Bond AND Exquisite Blood ÷ Decks that contain Sanguine Bond OR Exquisite Blood Step 1: Card Affinity Matrix Jaccard / Tanimoto distance Repeat for every card combination (15,000 cards) This is the basis of the Card Analysis page This matrix is built offline in batch Image from http://blog.comsysto.com/2013/04/03/background-of-collaborative-filtering-with-mahout/
  • 16. Recommendation Engine 2.0 Algorithm 31,000 decks 1. Select each row of the Tanimoto matrix corresponding to cards in Deck D 2. Sum the columns 3. Sort by score, display results Step 2: Calculate Scores This gives you a sum of the Tanimoto coefficients I really have no idea what this algorithm is called… I’m not sure if it’s novel or not This is performed in real time
  • 17. Lessons learned: Taking out the garbage  A lot of garbage gets submitted to EDHREC  Decks with <20 cards  Decks with invalid commanders  Decks with illegal cards  The algorithms handle this well and rarely do problem cards show up  However, pruning “worthless” decks significantly improves performance due to all the O(N^2) algorithms going on General advice: Think about which pieces of data are worthless in your data set
  • 18. Lessons learned: Partitioning (too much or too little)  Partitioning the user/deck space into subgroups is a great way to speed things up in recommendation engines  The 31,000 EDHREC decks are partitioned into 27 partitions (one per possible color combination)  Algorithms are ran typically on a single partition (e.g., Red/Blue deck recommendations only come from other Red/Blue decks)  However, themes that span color combinations suffer worse recommendations  However, partitioning too deep causes problems  I tried partitioning by commander, and that was awful: new commanders, themes than span commanders suffer General advice: There is no good way to figure out a partition scheme, just try it out
  • 21. Batch Processes (cron) Reddit Bot (praw) Redis • In-memory key/value data store • Stores website state • Utilized as a cache • Stores all of the decks • Stores all of the pre-computed stats • Stores all metadata about Magic cards • EDHREC serializes most things to common internal json data formats • Very fast • Very easy to use • Good support with Python • Getting harder to do “analysis” • Going to move to Redshift SQL database for analytical things
  • 22. Batch Processes (cron) Reddit Bot (praw) Cherrypy • “A Minimalist Python Web Framework” • Runs the website • Pulls data from Redis and then renders the results as HTML • Most of the data from Redis is cached in memory objects (IPC to Redis too slow) • EDHREC runs 6 of these in parallel behind an NGINX round robin proxy • Very easy to use, doesn’t get in your way • Very easy to expose Python data science • Running into problems with maintainability due to my own sloppiness
  • 23. Batch Processes (cron) Reddit Bot (praw) Python • Programming language • Plenty of good libraries for data analysis: numpy, pandas in this case • Can handle the “full stack” well (from data analysis to web front end) • PRAW is a great framework for building Reddit bots • Most things run every few hours
  • 24. Batch Processes (cron) Reddit Bot (praw) Amazon Web Services • Infrastructure as a Service • Easily spin up new servers with pre-built operating system • EDHREC runs on one m4.2xlarge 8 CPUs, 32GB RAM, Better network 10 cents per hour ($72/month) • Great for recovering from failures • Easy to upgrade machine • Very good uptime so far • Easy to backup to s3
  • 25. Some observations about User Experience and AI applications
  • 26. LOL! Look at the dumb bot! Lesson learned: Humans LOVE pointing out when something the AI is doing is strange or wrong, even if it gets it right 90% of the time. Therefore, I am very conservative of what I end up publishing as I’ve gotten burned a few times. Which can be a shame sometimes. (just a couple examples)
  • 27. The apocalypse is near  “EDHREC is ruining EDH/Commander”  “EDHREC is taking the fun out of deck construction”  “EDHREC kills conversation” MapQuest takes the fun out of planning trips!  Mostly these are taken as compliments  AI is going to have resistance from people who liked the manual labor  I don’t think the commentary entirely off base… but...
  • 28. Sometimes too much is too much  Over-engineering and doing too much is an easy trap  You want to make it better and provide more “intelligence”  Give the users ability to discover and find things  Increases user engagement  Better results  Philosophy: EDHREC is a tool, not a solution  I’m starting to see my other data science projects this way Lesson learned: Spend more time on interactive “discovery tools” than intelligent do-everything algorithms
  • 30. RoboRosewater  Rosewater is the name of the Magic lead designer  RoboRosewater is a “backwards” neural network, trained on Magic cards
  • 31. MTG Finance Lots of analysis around Magic finance! mtgstocks.com
  • 32. Diablo 3 build clustering
  • 33. Virtues of this whole thing Community  Most hobbies are defined by communities  Technology can bring communities together Self-Development  Data has value and getting data of value is hard  Hobby-based data is relatively easy to acquire (compared to say data used by health care companies)  A great way to do real data science on real data (opposed to synthetic data on a more valuable data set) Profit!  Hobbyists are passionate about their hobby and willing to spend money on it  They will pay for and support services they like
  • 34. EDHREC, Magic: TG Recommendation Engine (and data science on games) Donald Miner @donaldpminer dminer@minerkasch.com September 21st, 2015 - Data Science MD Meetup Games & Stuff in Glen Burnie, MD

Notes de l'éditeur

  1. Building a Magic: The Gathering card game recommendation engine and using data science on data about hobbies In this talk, Don will give an overview of edhrec.com, a service that provides recommendations for a specific style of play in the Magic: The Gathering trading card game called Commander. The service takes user-created "decks", saves them in a database, and then provides recommendations on what other cards that user should be using in their deck. The website has been around for about a year and is visited by over 50,000 players a month as of September 2015. The talk is geared towards people that don't know anything about Magic or Commander, however, and most of the time will be spent discussing: the methods and approaches used, specifically recommendation engines and the common problems when using them in practice lessons learned about human factor of having a data-driven service that targets a passionate hobbyist population that doesn't know much about data science or even computer science the virtues of spending time on analyzing data for seemingly "toy" domains
  2. Building a Magic: The Gathering card game recommendation engine and using data science on data about hobbies In this talk, Don will give an overview of edhrec.com, a service that provides recommendations for a specific style of play in the Magic: The Gathering trading card game called Commander. The service takes user-created "decks", saves them in a database, and then provides recommendations on what other cards that user should be using in their deck. The website has been around for about a year and is visited by over 50,000 players a month as of September 2015. The talk is geared towards people that don't know anything about Magic or Commander, however, and most of the time will be spent discussing: the methods and approaches used, specifically recommendation engines and the common problems when using them in practice lessons learned about human factor of having a data-driven service that targets a passionate hobbyist population that doesn't know much about data science or even computer science the virtues of spending time on analyzing data for seemingly "toy" domains