Harnessing the Power of GenAI for BI and Reporting.pptx
Rohit garg iim_raipur_method
1. Rohit Garg
Email: pgp12091.rohit@iimraipur.ac.in
Mobile: (+91) 97524 46191
Rohit Garg (12pgp091.rohit@iimraipur.ac.in)
METHODOLOGY – Blockbuster Movies
Disclaimer: OSCAR has been compiled based on the data provided for more than 2000
movies. These are the people were part of the movies that made it to IMDB rating of
more than 7. These people may or may not have received the Oscar in real life.
Data Preparation:
1. The data has Movie title, imdbID, Writer, Director, Actors, Genre, Runtime,
and imdbRating. (The data has 2,576 rows).
2. A Movie can have more than one Writer/s, Director/s, Actor/s, and Genre/s.
In Excel ‘text to columns’ option was used to separate the Writer/s,
Director/s, Actor/s, and Genre/s. (Delimiter is ‘,’).
3. Runtime was in hours and minutes. The Runtime was converted to minutes.
Analysis:
1. Analysis is divided into 4 parts: TRENDS (Data Confessions), OSCARS (The
Writer/s, Director/s, and Actor/s who were part of movies that made to
IMDB Rating of more than 7), SORCERY (Prediction), and SIR ORACLE
(Recommendations).
2. Two datasets were created:
a. First dataset is for TRENDS.
b. Second dataset is for OSCARS, SORCERY, and SIR ORACLE.
Methodology (OSCARS, SORCERY, and SIR ORACLE):
(No methodology is employed for TRENDS as it is data observation).
1. This section highlights the methodologies that we employed for data
prediction and recommendations.
2. We employed a rather straight forward method to come up with the
solution. “What should the producer of a movie do to get an IMDB rating of
more than 7?” (IMDB rating of more than 7 is considered successful movie).
3. We prepared a dataset of movies that had IMDB Rating of more than 7 from
2,576 movies given. (Finally we have 396 rows).
4. Then we made a comprehensive list of all the Writers, Directors, and Actors
present in the all the 396 movies.
5. We made a pivot table of Genre vs all the Writers, Directors, and Actors.
Then for each Genre we checked the number and percentage of movies they
were a part. (Of course if they were not a part of the genre the count is 0).
6. The top Writers, Directors, and Actors were selected. The top Writers,
Directors, and Actors are those who had the highest percentage in a
particular Genre. We called them OSCAR winners.
TRENDS (Please refer to info-graphics for output):
1. In this section we have tried to figure out whether IMDB Rating has affect on
Box Office collections, whether month of release has affect on IMDB Rating
and Box Office collections, whether runtime has affect on IMDB Rating and
Box Office collections.
2. Picked only those rows where BoxOffice amount is present and amount is in
Millions of USD. (Dropped those rows where amount is not present).
3. Picked only those rows where IMDB rating and Tomato rating is present.
4. Picked only those rows where run time is given. (Finally we have 840 rows).
OSCARS, SORCERY, and SIR ORACLE (Please refer to info-graphics for output):
1. OSCAR: Based on the Methodology employed we got a comprehensive list of
the best Writers, Directors, and Actors for different Genre.
2. SORCERY: Based on the OSCAR, the list of 50 upcoming movies was checked.
Flag variable was made for each Writer/s, Director/s, or Actor/s. Flag variable
= “Strong” means found in OSCAR list and Flag variable = “Weak” means not
found in OSCAR list. (Of course if a movie has more than 1 Genre we checked
for all the Genres).
a. SORCERY (top movies): At least 1 Flag Variable = “Strong”.
b. SORCERY (blockbusters): More than 1 Flag Variable = “Strong”.
3. SIR ORACLE: Based on the TRENDS and based on the OSCAR we have made
recommendations. Based on TRENDS we can recommend producer on movie
release month, runtime and rating that will help him to earn a jackpot. Based
on the OSCAR we can recommend producer on choosing the Writer/s,
Director/s, and Actor/s for a movie in a particular Genre. (Of course if the
movie falls in more than 1 Genre the options increase).
Failed Methodologies (OSCARS, SORCERY, and SIR ORACLE):
1. This section highlights the methodologies that we employed for data
prediction and recommendations. But the results were not satisfactory.
2. Dataset was divided into smaller datasets based on Genre.
3. For each Genre we did regression analysis. IMDB Rating was taken as
dependent variable and Writer/s, Director/s, and Actor/s as independent
variable. But the R-square was low.
PleaseprintonA4sheet