Bash can be used for data analytics tasks like preparing and exploring data. The document demonstrates various Bash commands for working with CSV files containing app data. These include commands for viewing headers, counting rows, filtering, sorting, joining files, and aggregating data. Bash allows string manipulation and piping output between commands to programmatically analyze datasets from the command line.
14. apps.csv
package_app,name,category,version,rating_average,votes,star1,star2,star3,star4,star5
a8.kv.chilly,a8 chili slot machine lite,CARDS,1.2,3.7,42,10,2,4,2,24
[censored apps]
accessline.spy_camera,Hidden camera free version,MEDIA_AND_VIDEO,1.41,2.4,67,34,7,5,5,16
acciones.chile,Acciones Chile,FINANCE,1.0,4.2,24,1,1,0,11,11
acgs.topanime.evawp.photos,Evangelion HD Live Wallpaper,SPORTS,1.1,3.7,70,16,2,7,10,35
Adam.androiddev,Anti Dog Repellent / Whistle,TOOLS,2.4,3.1,748,288,30,51,77,302
[…]
15. apps.csv
package_app,name,category,version,rating_average,votes,star1,star2,star3,star4,star5
a8.kv.chilly,a8 chili slot machine lite,CARDS,1.2,3.7,42,10,2,4,2,24
[censored apps]
accessline.spy_camera,Hidden camera free version,MEDIA_AND_VIDEO,1.41,2.4,67,34,7,5,5,16
acciones.chile,Acciones Chile,FINANCE,1.0,4.2,24,1,1,0,11,11
acgs.topanime.evawp.photos,Evangelion HD Live Wallpaper,SPORTS,1.1,3.7,70,16,2,7,10,35
Adam.androiddev,Anti Dog Repellent / Whistle,TOOLS,2.4,3.1,748,288,30,51,77,302
[…]
typical csv file has
comma-separated
list of attribute
names on line 1
16. apps.csv
package_app,name,category,version,rating_average,votes,star1,star2,star3,star4,star5
a8.kv.chilly,a8 chili slot machine lite,CARDS,1.2,3.7,42,10,2,4,2,24
[censored apps]
accessline.spy_camera,Hidden camera free version,MEDIA_AND_VIDEO,1.41,2.4,67,34,7,5,5,16
acciones.chile,Acciones Chile,FINANCE,1.0,4.2,24,1,1,0,11,11
acgs.topanime.evawp.photos,Evangelion HD Live Wallpaper,SPORTS,1.1,3.7,70,16,2,7,10,35
Adam.androiddev,Anti Dog Repellent / Whistle,TOOLS,2.4,3.1,748,288,30,51,77,302
[…]
typical csv file has
comma-separated
list of attribute
names on line 1
… followed by one line per different observation,
each of which has a value for each attribute
33. … and what about Apps
with >1 Version?
tail +2 apps.csv | cut -f 2 -d , | sort -u | wc -l
34. … and what about Apps
with >1 Version?
tail +2 apps.csv | cut -f 2 -d , | sort -u | wc -l
only keep
second column
of comma-
delimited file
35. … and what about Apps
with >1 Version?
tail +2 apps.csv | cut -f 2 -d , | sort -u | wc -l
only keep
second column
of comma-
delimited file
sort
alphabetically
and remove
duplicate lines
37. What is the Maximum
#Versions of an App?
tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n
38. What is the Maximum
#Versions of an App?
tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n
sort, but keep all the lines
39. What is the Maximum
#Versions of an App?
tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n
sort, but keep all the lines
count #occurrences of each
unique line, i.e., group per line and
give #occurrences of each group
40. What is the Maximum
#Versions of an App?
tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n
sort, but keep all the lines
count #occurrences of each
unique line, i.e., group per line and
give #occurrences of each group
sort numerically
42. Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
43. Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only keep app name and category
44. Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only keep app name and category
keep one version
per app name
45. Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only keep app name and category
keep one version
per app name
throw away
app name
46. Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only keep app name and category
keep one version
per app name
throw away
app name
group and
count per
category
47. Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only keep app name and category
keep one version
per app name
throw away
app name
group and
count per
category
sort
categories
per count
49. Let’s Take a Look at the
Obfuscation Data
less apps_labels.csv
50. Let’s Take a Look at the
Obfuscation Data
less apps_labels.csv
buffer file to
scroll up and
down (vs. more)
51. What a Mess?!
More on line-ending: http://www.cyberciti.biz/faq/howto-unix-linux-convert-dos-newlines-cr-lf-unix-text-format/
52. What a Mess?!
tr 'r' 'n' < apps_labels.csv > apps_obfus.csv
More on line-ending: http://www.cyberciti.biz/faq/howto-unix-linux-convert-dos-newlines-cr-lf-unix-text-format/
53. What a Mess?!
tr 'r' 'n' < apps_labels.csv > apps_obfus.csv
fix Windows end-
of-line issues by
replacing the r
character by n
More on line-ending: http://www.cyberciti.biz/faq/howto-unix-linux-convert-dos-newlines-cr-lf-unix-text-format/
54. How to Merge the App Data
with Obfuscation Results? (1)
55. How to Merge the App Data
with Obfuscation Results? (1)
TMP=`head -n 1 apps.csv`
echo "${TMP},obfuscated" > apps_join.csv
tail +2 apps.csv | sort > sorted_apps.csv
tail +2 apps_obfus.csv
| sort > sorted_apps_obfus.csv
56. How to Merge the App Data
with Obfuscation Results? (1)
TMP=`head -n 1 apps.csv`
echo "${TMP},obfuscated" > apps_join.csv
tail +2 apps.csv | sort > sorted_apps.csv
tail +2 apps_obfus.csv
| sort > sorted_apps_obfus.csv
store result of
command in variable
57. How to Merge the App Data
with Obfuscation Results? (1)
TMP=`head -n 1 apps.csv`
echo "${TMP},obfuscated" > apps_join.csv
tail +2 apps.csv | sort > sorted_apps.csv
tail +2 apps_obfus.csv
| sort > sorted_apps_obfus.csv
store result of
command in variable
storing the column names first
58. How to Merge the App Data
with Obfuscation Results? (1)
TMP=`head -n 1 apps.csv`
echo "${TMP},obfuscated" > apps_join.csv
tail +2 apps.csv | sort > sorted_apps.csv
tail +2 apps_obfus.csv
| sort > sorted_apps_obfus.csv
store result of
command in variable
storing the column names first
merging requires
sorted files
59. How to Merge the App Data
with Obfuscation Results? (2)
60. How to Merge the App Data
with Obfuscation Results? (2)
join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv
| cut -f -11,13 -d ,
>> apps_join.csv
61. How to Merge the App Data
with Obfuscation Results? (2)
join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv
| cut -f -11,13 -d ,
>> apps_join.csv
comma-
separate files
62. How to Merge the App Data
with Obfuscation Results? (2)
join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv
| cut -f -11,13 -d ,
>> apps_join.csv
comma-
separate files
lines with same value for first column in
file 1 and in file 2 should be merged
63. How to Merge the App Data
with Obfuscation Results? (2)
join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv
| cut -f -11,13 -d ,
>> apps_join.csv
comma-
separate files
lines with same value for first column in
file 1 and in file 2 should be merged
join removes the specified
-2 column, but keeps rest
of columns of file 2; here
we only want the last
column of file 2, so we
remove the 12th column
(keeping only the first 11
columns and the 13th)
65. Which Category has Most of
the Obfuscated Code?
tail +2 apps_join.csv | grep -e ",Obfuscated"
| cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
66. Which Category has Most of
the Obfuscated Code?
tail +2 apps_join.csv | grep -e ",Obfuscated"
| cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only consider lines that
are obfuscated
67. Bonus: How to Create a Comma-
Separated List from a List of Words?
68. Bonus: How to Create a Comma-
Separated List from a List of Words?
cut -f 3 -d , apps.csv | sort -u
| paste -d , -s -
69. Bonus: How to Create a Comma-
Separated List from a List of Words?
cut -f 3 -d , apps.csv | sort -u
| paste -d , -s -
take input from pipe
70. Bonus: How to Create a Comma-
Separated List from a List of Words?
cut -f 3 -d , apps.csv | sort -u
| paste -d , -s -
take input from pipe
concatenate all lines
71. Bonus: How to Create a Comma-
Separated List from a List of Words?
cut -f 3 -d , apps.csv | sort -u
| paste -d , -s -
take input from pipe
concatenate all lines… and put commas between them