This document summarizes various tools for visualizing and analyzing activity data. It discusses tools for data wrangling like Google Refine and DataWrangler. It also covers visualization libraries and platforms like Many Eyes, Matplotlib for time series data, and Graphviz for graphs and networks. Statistical analysis in R and graphics libraries like Protovis and Processing are also mentioned. The document provides links to examples of analyzing hierarchical data, text processing with Unix tools, and visualizing trends and autocorrelation in time series data.
1. Visualising Activity Data Tony Hirst Dept of Communication and Systems, The Open University Scattered puzzle pieces next to solved fragment by HoriaVarlan
19. plot srcfile using ($1):(column(focusCar) -$2) with lines title "VET", srcfileusing ($1):(column(focusCar) -$3) with lines title "WEB", srcfileusing ($1):(column(focusCar) -$4) with lines title "HAM", srcfileusing ($1):(column(focusCar) -$5) with lines title "BUT", srcfileusing ($1):(column(focusCar) -$6) with lines title "ALO", srcfileusing ($1):(column(focusCar) -$7) with lines title "MAS", srcfileusing ($1):(column(focusCar) -$8) with lines title "SCH", srcfileusing ($1):(column(focusCar) -$9) with lines title "ROS", …
21. Text processing with Unix tools[ m5tz63 ] [ lOVySX ] Count number of lines in a file: wc-l L2sample.csv View first few lines in a file: head L2sample.csv or head -n 4 L2sample.csv View last few lines in a file: tail L2sample.csv or tail -n 15 L2sample.csv Sample contiguous rows from start or end of file: head -n 1 L2sample.csv > headers.csv tail -n 20 L2sample.csv > subSample.csv cat headers.csvsubSample.csv > subSampleWithHeaders.csv Sample contiguous rows from middle of file: head -n 15 L2sample.csv | tail -n 6 > middleSample.csv Split large file into smaller files: split -l 15 L2sample.csv subSamples Search for lines containing a term: grepmendeley L2sample.csv grepEBSCO L2sample.csv > rowsContainingEBSCO.csv
22. More text processing tricks Extract columns: cut -f 3 L2sample.csv cut -f 1,2,14,17 L2sample.csv > columnSample.csv Sort data in a column: cut -f 40 L2sample.csv | sort Identify distinct entries in a column: cut -f 40 L2sample.csv | sort | uniq Count how many times each distinct term appears in a column: cut -f 40 L2sample.csv | sort | uniq –c Sort can also sort by column (-k), reverse order (-r): cut -f 40 L2_2011-04.csv | sort | uniq -c | sort -k 1 -r > uniqueSID.csv