Identify the types of graphsand statistics that areappropr
15071
1. 15.071 THE ANALYTICS EDGE
SPRING 2015
Class Time
Section A
Lecture: Mondays and Wednesdays, 1:00pm – 2:30pm, Room E51-315
Recitation: Fridays, 2:00pm – 3:00pm, Room E51-335
Section B
Lecture: Mondays and Wednesdays, 2:30pm – 4:00pm, Room E51-315
Recitation: Fridays, 3:00pm – 4:00pm, Room E51-335
Instructors
Dimitris Bertsimas, E40-147, dbertsim@mit.edu, (617) 253-4223
Allison O’Hair, E40-111, akohair@mit.edu, (617) 452-2116
Teaching Assistants
TBA
Course Description
In the last decade, the amount of data available to organizations has reached
unprecedented levels. Companies and individuals who can use this data together with
analytics give themselves an edge over the competition. In this class, we examine real
world examples of how analytics have been used to transform a business or industry.
These examples include Moneyball, eHarmony, the Framingham Heart Study, Twitter,
IBM Watson, and Netflix. Through these examples and many more, we cover the
following analytics methods and how to implement them: linear regression, logistic
regression, trees, text analytics, clustering, visualization, and optimization.
Readings
The readings are chapters from the following book:
Dimitris Bertsimas, Allison O’Hair and Bill Pulleyblank, The Analytics Edge,
Dynamic Ideas, March 2015.
2. We refer to the book below as the AE book. Electronic copies of some of the book
chapters are available on the Stellar course webpage (please do not distribute without
permission from the authors). We will also provide a copy of the “Analytics Edge R
Manual” on Stellar.
Contents
1. February 4, 2015 Lecture 1 – Introduction to the Analytics Edge
In the first lecture, we will discuss the logistics and goals of the class, the recent impact
of analytics, and the examples that will be covered during the semester. We will then
discuss analytics software, and start working in R. In preparation for this class, you will
need to install R on your personal computer (instructions on Stellar) and download the
datasets provided on Stellar. The reading for Lecture 1 is the first section of the Analytics
Edge R Manual titled “Introduction to R”.
2. February 9, 2015 Lecture 2 – Predicting Wine Quality
We’ll review linear regression, discuss how linear regression can be used to predict the
quality of wine, and cover linear regression in R. Download the dataset provided on
Stellar so you can follow along in class. The readings for Lecture 2 are the first section of
Chapter 1 of the AE book, titled “Predicting the Quality and Prices of Wine,” the first
section of Chapter 21 of the AE book, titled “Multiple Linear Regression,” and the
second section of the Analytics Edge R Manual titled “Linear Regression in R”.
3. February 11, 2015 Lecture 3 – Moneyball
We will discuss how the Oakland A’s used analytics to become a competitive baseball
team, and how these techniques can be applied to other sports. The reading for Lecture 3
is Chapter 4 of the AE book, titled “How to Evaluate Championship Players.”
4. February 17, 2015 Lecture 4 – The Framingham Heart Study
(NOTE: This class is on Tuesday due to President’s Day)
We will discuss the Framingham Heart Study, which led to one of the top 10 cardiology
advances of the 1900s, and paved the way for clinical decision rules. Through this
example, we’ll start discussing the method of logistic regression, and we’ll use the
original Framingham Heart Study data to build logistic regression models in R. The
readings for Lecture 4 are Chapter 7 of the AE book titled “The Framingham Heart
Study”, the second section of Chapter 21 of the AE book, titled “Logistic Regression”,
and the “Logistic Regression in R” section of the Analytics Edge R Manual.
5. February 18, 2015 Lecture 5 – Quality of Healthcare
3. We will discuss how analytics can be used to model the expertise of a physician and
predict the quality of healthcare. Through this example, we will continue to discuss the
method of logistic regression. The reading for Lecture 5 is the second section of Chapter
1 of the AE book, titled “Assessing Quality in Healthcare”.
6. February 23, 2015 Lecture 6 – The Supreme Court
We discuss how a group of academics predicted the outcomes of the United States
Supreme Court. Through this example, we will discuss the analytical methods of CART
and Random Forests, and then use data for Supreme Court cases to build models in R.
The readings for Lecture 6 are the third section of Chapter 1 of the AE book, titled
“Forecasting Supreme Court Decisions,” the third section of Chapter 21 of the AE book,
titled “CART and Random Forests” and the “Trees in R” section of the Analytics Edge R
Manual.
7. February 25, 2015 Lecture 7 – D2Hawkeye
We will present the story of D2Hawkeye, a medical data mining company Dimitris
Bertsimas was involved in from 2001-2009, and present how analytics methods,
specifically CART, were used to predict medical knowledge for individual patients. The
reading for Lecture 7 is Chapter 8 of the AE book, titled “Predicting Healthcare Costs.”
8. March 2, 2015 Lecture 8 – Twitter Sentiment Detection
We present how tweets on the social networking site Twitter can be used to understand
public perception and analyze sentiment. Through this example, we’ll introduce the
method of text analytics, and use tweets in R to build models.
9. March 4, 2015 Lecture 9 – The eDiscovery Problem
In Lecture 9, we discuss how text analytics is being used to find files relevant to a
lawsuit. Specifically, we’ll discuss the story of Enron, and how analytics can be used to
detect relevant emails and provide evidence for a legal case.
10. March 9, 2015 Lecture 10 – Netflix and Clustering
We will discuss the Netflix Prize and recommendation systems in general. As an example
of a type of recommendation system, we introduce the method of clustering. The readings
for Lecture 10 are Chapter 13 of the AE book, titled “Recommendations Worth a
Million,” the fourth section of Chapter 21 of the AE book, titled “Clustering” and the
“Clustering in R” section of the Analytics Edge R Manual.
11. March 11, 2015 Lecture 11 – Patterns of Heart Attacks
4. We present how analytics have been used to understand the patterns of heart attacks. The
reading for Lecture 11 is Chapter 9 of the AE book, titled “Medical Monitoring and
Predictive Diagnosing.”
NO CLASS from March 16 – March 27 due to SIP week and Spring Break.
12. March 30, 2015 Lecture 12 – Fraud Detection
This week, we will discuss examples that have successfully combined many different
analytics methods to create an edge. We will first discuss how predictive methods and
clustering have been used to construct sophisticated algorithms for fraud detection. The
reading for Lecture 21 is Chapter 14 of the AE book, titled “Fraud Detection”.
13. April 1, 2015 Lecture 13 – IBM Watson
We will discuss how IBM build a computer that could beat the best human players at
Jeopardy, a game known for testing human knowledge and reasoning. The reading for
Lecture 13 is Chapter 3 of the AE book, titled “What is Watson?”.
14. April 6, 2015 Lecture 14 – The Power of Visualization
We will discuss the power of visualizations, specifically for WHO, the World Health
Organization. Through this example, we’ll learn how to create visualizations in R.
15. April 8, 2015 Lecture 15 – Data-Driven Policing
We will discuss the use of analytics and visualization in policing, specifically, we’ll
create heat maps, or “hot spot” maps. These maps are currently being used by police
departments all over the country to allocate resources. The reading for Lecture 15 is
Chapter 15 of the AE book, titled “Predictive Policing”.
16. April 13, 2015 Lecture 16 – Sports Scheduling
We will discuss how professional sports use integer optimization to design sports
schedules, and how analytics methods can significantly outperform human scheduling.
Through this example, we’ll learn how to solve optimization models in a powerful
modeling language.
17. April 15, 2015 Lecture 17 – Revenue Management
We will discuss how optimization is used for revenue management, and how airlines and
casinos have relied on the power of analytics to create a competitive edge. The reading
for Lecture 17 is Chapter 17 of the AE book.
5. 18. April 22, 2015 Lecture 18 – eHarmony
We will discuss how the online dating site eHarmony uses logistic regression and
optimization to predict the probability of love and find perfect matches. Through this
example, we’ll see how the results of a predictive model can be used in an optimization
model to make optimal decisions.
19. April 27, 2015 Lecture 19 – The MIT Blackjack Team
We will discuss how a group of MIT students made millions playing blackjack, and how
strategies were developed using data and simulation. The reading for Lecture 19 is
Chapter 6 of the AE book, titled “The MIT Blackjack Team.”
20. April 29, 2015 Lecture 20 – Emergency Room Operations
We will discuss how simulations and analytics can be used to understand the operations
in an emergency room, and to analyze the effects of different decisions on patient care
and hospital efficiency. The reading for Lecture 20 is Chapter 18 of the AE book.
21. May 4, 2015 Lecture 21 – Social Networks
We will discuss social networks, specifically how the social networks of gangs can be
used to better understand gang dynamics and combat crime. We will also discuss the use
of social networks in other applications. The reading for Lecture 21 is Chapter 16 of the
AE book.
22. May 6, 2015 Lecture 22 – Analytics in Finance
We will discuss the use of analytics in finance, including asset management and options
pricing. The readings for Lecture 22 are Chapters 19 and 20 of the AE book.
23. May 11, 2015 Student Project Presentations
During this lecture, selected students will make 15 minute presentations of their projects.
24. May 13, 2015 Student Project Presentations
During this lecture, selected students will make 15 minute presentations of their projects.
Recitations:
Recitations will be held on Fridays in Room E51-335 (2pm – 3pm for Section A, and
3pm – 4pm for Section B).
6. The recitations will be interactive sessions, covering additional examples on the analytics
methods learned in class, and how to create models in R. Attendance is strongly
encouraged.
Assignments:
There will be seven homework assignments, and a final project in teams of two.
The following are tentative due dates and topics for the homework assignments:
• February 17: Data analysis and linear regression in R.
• February 23: Logistic Regression.
• March 2: CART and Random Forests.
• March 9: Text analytics.
• March 30: Clustering.
• April 13: Visualization.
• April 27: Optimization.
All homework assignments are due by the beginning of class on the date assigned.
For the final project, by March 11, each team will submit a one page proposal that
outlines a plan to apply analytical methods to a problem you identify using some of the
concepts and tools discussed in the course. It should include a description of: (1) the
problem, (2) the data that you have or plan to collect to solve the problem, (3) which
analytic techniques you plan to use, and (4) the impact or overall goal of the project (if
you could build a perfect model, what would it be able to do?). The teaching staff will be
available to answer questions over email, and will provide all students with electronic
feedback by March 20.
The week of April 13, each project team will set up a meeting with a member of the
teaching team to show your progress applying the analytical methods you have learned to
your project topic. This meeting is intended to help you progress on your project.
The final project submission consists of a written report of at most 4 pages (not including
appendices) that describes your analysis, as well as a 15 minute presentation (in
powerpoint or pdf format) of your project. Unfortunately, due to time constraints, we will
not be able to have all student teams present in class. However, ALL TEAMS should be
prepared to give a 15 minute presentation on May 11 or May 13, and all teams are
required to submit their presentation for a grade.
To determine who will present on May 11 and May 13, by midnight on Thursday May 7,
each team will electronically submit a) a 1 page abstract summarizing their project
(including the scope and idea of your project, what analytical methods/models you used,
and your results), and b) the presentation. The abstracts will be uploaded to the class
website. Students will vote by the end of the day on Sunday, May 10 about which
projects they would like to see presented in class. The teaching team will vote as well
7. (taking the abstracts and presentations into account), and the presenters will be notified in
real-time during class on May 11 and May 13.
Office Hours:
Allison: Mondays and Wednesdays, 9:30am – 10:30am in E40-111.
Teaching Assistants: TBD
We are also always available by appointment and email.
Policy on Individual Work:
In the case of homework assignments, your assignment must represent your own
individual work. Although you may discuss homework problems with other students,
assignments must represent your own work. Copying from another individual or from
any outside source (including past homework solutions) constitutes a violation of the
Policy on Individual Work. Any student who copies or knowingly allows his/her work to
be copied will receive an F grade for the assignment. If there is a second offense, the
student will receive an F grade in the course.
You may find it useful to discuss broad conceptual issues and general solution procedures with
others. If this is the case, then we enthusiastically recommend that you do so. The objective
here is to learn. In our opinion (and personal experiences), the material of this class is best
learned through individual practice and exposure to a variety of application contexts.
Class Participation and Conduct
Your class participation will be evaluated subjectively, but will rely upon measures of
punctuality, attendance, familiarity with the readings, relevance and insight reflected in
classroom questions, and commentary. Relative differences in technical background will
not be a criterion. Although several lectures will be didactic, we will rely heavily upon
interactive discussion within the class. Students will be expected to be familiar with the
readings, even though they might not understand all of the material in advance. In
general, questions and comments are encouraged.
We will require you to bring and use a personal laptop in some class sessions. However,
if we are not using laptops together as a class, we expect your laptops to be closed or only
used for class materials.
Grading:
Grades for the course will be based upon participation (10%), homework assignments
(50%), and the final project (40%).
8. Prerequisites:
It is highly recommended that students have taken 15.060 (Data, Models and Decisions),
or basic statistics and optimization courses.