Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference
Presenter: Dan Matthews, Trine University
At first, when beginners hear the term “data mining” they wonder, “What kind of mining could a computer possibly do? It must be awfully hard. What would the end product of data mining look like?”. Data mining (analytics) is becoming a core skill for an unprecedented number of professions. There exist software environment that help make the process efficient for the data miner. Tableau is one of the systems I use in my data mining class to teach students data mining. The software helps accelerate the process of converting data to not just information but to knowledge with intuitive drag & drop technology that lets you stop worrying about how to connect to data and lets you spend your time answering questions and forming relationships (knowledge) using critical thinking and creative association. With Tableau's speed and ease of use, students find themselves doing more complex analyses in less time. Tableau has an academic program that gives professional-grade analytics software in the form of Tableau Desktop to full-time students to help prepare them for working in an increasingly data-driven world. Students use Tableau Desktop for class work and extracurricular projects. Tableau offers instructors free access to Tableau Desktop as well to equip them to teach the next generation of data scientists (miners) and analysts. In addition to software, Tableau recognizes that materials and support are essential to teaching with a tool, and to that end they offer a variety of solutions for different classrooms. Dozens of universities are using Tableau in Data Mining classes. I want to share how I use the resources available to me to do quality instruction in this very important new technology discipline. I will define data mining (as best as I can). I will discuss why the subject is so very important. I will discuss a variety of applications. And most of all I will demonstrate some fun things students can do with the mining of the big data sets available in the cloud.
5. A DECENT DEFINITION
• The process of discovering meaningful new
correlations, patterns, and trends but sifting
through large amounts of stored data, using pattern
recognition technologies and statistical and
mathematical techniques.
6. A number of technology skills are needed:
Data
Mining
Database
Management
Machine
Learning
Artificial
Intelligence
Analysis of
Algorithms
Statistics
Visualization
Data
Warehousing
Security
Technology
Ethics
11. Visualization to gain insight and knowledge
David McCandless Data Visualization TED Talk
12. WEKA: the software
• Machine learning/data mining software written in Java
(distributed under the GNU Public License)
• Used for research, education, and applications
• Complements “Data Mining” by Witten & Frank
• Main features:
– Comprehensive set of data pre-processing tools, learning
algorithms and evaluation methods
– Graphical user interfaces (incl. data visualization)
– Environment for comparing learning algorithms
13.
14. @relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
WEKA only deals with “flat” files
16. Visual
Analytics
Business
Integration
Tableau 8
Any
Data
Fast
Performance
Web & Mobile
Authoring
Forecasting
Sets and visual
groups
Shared Filters
Treemaps, bubble
charts, word clouds
New marks card
Freeform dashboards
Data Blending
improvements
Parallelized
dashboards
Faster quick filters
Data Engine &
Extract performance
Fast graphics and
calculations
Performance
recorder
Salesforce.com
Google Analytics &
Google BigQuery
Cloudera Impala,
Cassandra,
HortonWorks,
Hadapt,
Karmasphere
SAP HANA
Data Extract API
JavaScript API
Data Server Security
Server Auditing
Distributed Data
Engine
Web Authoring
iPad and Android
authoring
Local rendering
Subscriptions