S E 3 0 6 0 D A T A W A R E H O U S I N G A N D M I N I N G Spring 2015 Class Project Project Proposal; Deadline: Monday, March 2nd at 11:59PM Final Project; Deadline: Friday, May 1st at 11:59PM You can work alone or in pairs. If you wish to work in pairs contact other students and arrange with them and email me the names by 1/23/2015 1 Project Description The project is the core component of the course, accounting for 35% of the grade. Students are required to work on a project that involves writing code, running experiments, and submitting a project report. As part of the project, you are required to implement the idea, and propose some extensions for the proposed algorithms. Moreover, for validation purposes, students shall demonstrate the effectiveness of the algorithm on real data. At the end of semester, students shall hand in their final project report that explains the problem, and discusses the proposed approach and results. Students will also be asked to make a few slides about their project findings in the final week of classes. What kind of Project: You can evaluate existing Data Mining techniques of an interesting dataset. Develop your own algorithm for solving a given problem. Survey of techniques for solving a specific problem with experimental evaluation on several datasets. Suggested course projects: Outliers Detection. Multi-objective clustering. Market-basket analysis on a real data A Music Recommendation System A Data Mining Based Approach to Determining Causal Associations between Drugs and Conditions Book Recommendation System Using Twitter tweets' sentiments to predict stock price change Text mining (Finding interesting patterns from unstructured textual corpus) Cancer prediction from biological data (classification task) Clustering gene expressions Other ideas can be found through the following link: o http://www.stat.columbia.edu/~madigan/DM08/ideas/ideas.htm 2 Project Proposal For the project proposal, you are required to submit at least a 1-page writeup explaining the motivations for the problem you are addressing and the different approaches proposed in the literature. While writing your proposal, you should cite at least 3 relevant references and include the followings: What are you trying to do What is the problem and why it is important What are the existing techniques What datasets will you be using What measures of assessment What are you going to submit in your report 3 Deliverables You should submit your project on the course’s Moodle site. Submit proposal by March 1st. Proposal will have 10 points of the project grade. A project report by the end of the semester. o Your final project submission must include: 1. Zipped Folder of all source code 2. Report that explains everything you have done in the project Cover page (Project n.