This document outlines the Applied Data Science (Big Data) minor program at Fontys Eindhoven. The minor runs for 20 weeks and earns students 30 ECTS credits. It covers topics like data preprocessing, machine learning algorithms, visualization, and social/ethical implications of data science. Students work on an integral project and learn skills like Python, Hadoop/Spark, MongoDB, and data visualization tools. The minor is suited for students interested in programming, data, and using math and statistics to gain insights from large datasets.
16. In summary
Name Applied Data Science (Big Data)
Starts September
Duration 20 weeks (30 ECTS)
Location Fontys Eindhoven
Language English
What Professional Task (15 EC), Courses: Preprocessing (3 EC),
Machine Learning (6 EC), Visualisation & Reporting (3 EC),
Socal Physics, Ethics & Law (3 EC)
Who Experimenter with an open mind, loves programming,
likes math, data- and human-oriented.
How https://progresswww.nl/fontys
Questions olaf.janssen@fontys.nl
17. In comparison
ERP / Business Intelligence Applied Data Science
Business-oriented (SAP) Human-oriented
Advisory reports Working prototype
Projects per course Integral project
Notes de l'éditeur
With some pride and great joy do I present here the new Applied Data Science minor that will start this fall at Fontys University ICT. These slides will give a shallow glimpse of what applied data science is and how the minor is set up.
You all should know by now that the amount of data generated by people (Twitter, Facebook, YouTube) and devices (smartphones, surveillance cameras, factory machines) is tremendous. Applied data science is the art of getting a grasp on this Big Data and turn cold impersonal raw data into a personal life-altering experience. </superlative mode>
Data can tell us a lot, about the data itself. We see white winged birds, larger birds, smaller birds… but that’s basically it.
When we have a lot-a-lot of data, suddenly patterns start to emerge. Here we can clearly discern two blobs. One blob can represent cat owners, while the other can represent dog owners. If we know a visitor of a website falls in either blobs we can show him a nice page full of puppy pictures or planking cat videos, creating a more personal experience, But maybe we can predict other behavior as well.
And what about this little fellow? He’s a strange one. This outlier seems to like neither cat or dog. Maybe we should block him from our website! Do you also see that this picture is basically a data visualisation; from a certain point of view. Imagine taking this picture from the left; the outlier would be hidden in one of the blobs.
This Big Data allows us to ask questions that we would normally not even consider asking, whether it would be too ridiculous or unlikely to ever be answered. Imagine in the old world without large webshops. A recommendation like the one depicted seems quite bizarre (and still is in regular retail stores).
Nonetheless almost every webshop online now offers such a service, such as the Amazon recommendation service. Based on the books I buy, the items I look at, the items other people buy and who knows what extra information, Amazon is able to construct a profile about me and can recommend me items that I should want to buy. Apparently, Amazon thinks I should want to buy a ‘dull’ book about Enterprise Application Architecture and at the same time Mary Shelley’s Frankenstein. I’m getting a bit worried now about their opinion about me.
At any rate, this service will generate a lot of extra sales for Amazon while at the same time offering users a personal meaningful experience.
If we want to go somewhere new, we’re already accustomed to grab Google maps or a similar service and get the quickest or shortest route to our destination. But what if you ask the question; what is the most beautiful or quietest route to that place? Well you can, researchers from various companies have mined Google street views and Flickr images and discovered what streets people consider to be beautiful; and a new route planner was born, now with a more human touch.
Text mining, learning from text, is also common practice now. Analysing Tweets is already quite known. Apparently something happened with PSV a few days ago (April 20, 2015), and they did sentiment analysis to see that the activity was mostly positive so I assume something positive happened. But just try to imagine to weird and possibly useful questions you can answer with this data (maybe in combination with other data).
I could go on forever with cool examples of using machine learning: the self-driving car by Google, personalized medicine based on your DNA profile, TVs that can detect your emotion and can understand what scenes you like and dislike, detecting possible uproars in a city street and influencing this by changing the lighting (Living Labs Eindhoven).
There are so many applications, but searching online you see that most people are still just talking about Big Data and its possibilities and not so much doing it. In fact, there is a huge need for data scientists to do something useful for all that data that is being stored and forgotten about. That is why Data Scientist has been called the sexiest job of the 21st century. And we hope to train you to be one of those sexy people.
So now to the minor. Our minor starts with the situation of having a huge load of data and not knowing what to do with it.
Here the process is shown that you will go through in the course of the minor, from data to output (web app, mobile app, interactive news story, management presentation).
You start with the data, you have to store it somewhere (1 PetaByte won’t fit on a single server) on a Hadoop cluster. SQL databases can’t hold that data so you willl learn about noSQL solutions such as MongoDB. You also can’t analyse all that data at once so you’ll learn about the data summarization algorithm Map-Reduce to get a bite-size chunk of data that you can use Machine Learning on. In the analysis phase you will ask your data questions and you will answer those questions using Machine Learning algorithms in the scikit-learn Python library. You will also consider Social Physics to understand how ideas flow from person to person and how to influence people, and you will take into consideration anonimity and privacy issues related to the data and the analysis. For each step in the process, Python will be used as a programming language. You won’t have to know it beforehand, but it will help of course.
Now that I’ve explained the content of the minor. It is time to talk about the form. The experience we want to give you is depicted here: diving and experimenting in a goldmine of data.
More practically, the 20 weeks of the semester are more or less laid out like shown here. There is a group task (~5 members) that get a big data set from a company or organisation that has this data but not a clear idea what to do with it. Next to this project, courses are given that take the form of workshops: you will learn about all the techniques and methods by practicing them in small assignments. The result of these assignments, together with you effort in the group task will form the basis of the final, individual assessment. Here you can show by using a kind of portfolio if you’ve understood enough of the subjects, have put in enough effort en practice to complete the minor successfully.
The first week consists of an inspirational week in which you will practice all the steps without having had any lectures on them and you will get to know the companies involved.
Here is a summary of the more administrative details. Take special note to who should join this minor.
Here you see a small comparison between the ADS minor and the ERP/BI minor.