Project Overview
This will be a semester-wide project that you must complete by the end of the semester. It is an open-ended project, i.e., even though suggestions will be provided you can choose a topic of your interest related to data and analytics. I will then provide recommendations and feedback to define the specific problem to be solved and the deliverables. Projects are individual. Group projects might be allowed but in this case the scope of the projects needs to be significantly larger to justify the group team. In a group, all students will get the same grade. So, there are trade-offs in having group-versus-individual projects that I will let to you consider.
Each proposed project needs to include the following subtasks (which are described in more detail in this writeup):
-
Problem motivation and review (due 03/25/2022 at 11:59 pm): Why the project you are proposing is important in your opinion? What is the current state-of-the-art? Why current solutions/answers are not satisfying?
-
Data collection (due 04/08/2022 at 11:59 pm): You will need to identify and collect the appropriate data to complete the proposed project. Why these data are appropriate for answering your questions?
-
Descriptive analysis (due 04/08/2022 at 11:59 pm): You will need to perform an exploratory analysis on the dataset you used. How do the data look like? What is their distribution? Are there missing data?
-
In-depth analysis/modeling (due 04/29/2022 at 11:59 pm): At this part of the project you will need to delve into a deep analysis of the data to answer your question. This might require appropriate models or simply in-depth analysis of the data. In case of modeling, appropriate evaluation of the models need to be performed. In case of in-depth analysis convincing arguments based on the results need to be given for the conclusions drawn.
-
Visualizations (due 04/29/2022 at 11:59 pm): Concrete and clear visualizations of the results need to be provided so that even a non-expert in data analysis can grasp the conclusions and results.
-
Report & code (due 04/29/2022 at 11:59 pm): You must deliver all the above within a report and provide any source code you generated during the project.
-
Presentation (04/11/2022 and 04/18/2022 at 11:59 pm): During the last week of classes you will have to make a 15 minute presentation on your project.
Project suggestions
The following are simple suggestions and you are not required to choose any of these projects.
-
How risky is biking in US cities? Which cities are the safest?
-
How are housing prices affected by the transportation infrastructure?
-
Can you predict the NCAA March Madness tournament?
-
Can you predict which previously bought items by a customer will be in his/her next order?
-
What makes a movie a success? Can you predict the Oscar winners?
-
Can you predict the popularity of songs and identify geographic correlations? Can you recommend songs for a Spotify station?
-
Can you identify restaurants that potentially have health and sanitation issues? You can help the Department of Public Health to target hygiene inspections.
Sample Dataset Sources