SlideShare une entreprise Scribd logo
1  sur  72
Télécharger pour lire hors ligne
(or: How to Use Bash for Data Analytics)
The Bash Dashboard
Bram Adams
Polytechnique
Montreal
M
C IS
Yes, this kind of stuff :-)
Last time I
checked,
every PC on
earth had
Excel
installed, so
what gives?
(quote by random
grad student)
One word: automation!
Let me
rephrase:
Why Bash if
one has
Python or
R?
(fictitious quote)
To better understand and
prepare your data before
deeper analysis!
Basic Constructs
echo “Bram” > file.txt
echo “Michel” >> file.txt
echo “Giovanni” >> file.txt
cat file.txt | head -n 2
Basic Constructs
echo “Bram” > file.txt
echo “Michel” >> file.txt
echo “Giovanni” >> file.txt
cat file.txt | head -n 2
replace file content
Basic Constructs
echo “Bram” > file.txt
echo “Michel” >> file.txt
echo “Giovanni” >> file.txt
cat file.txt | head -n 2
replace file content
append file content
Basic Constructs
echo “Bram” > file.txt
echo “Michel” >> file.txt
echo “Giovanni” >> file.txt
cat file.txt | head -n 2
replace file content
append file content
pipe: send output of
first command to input
of second command
Basic Constructs
echo “Bram” > file.txt
echo “Michel” >> file.txt
echo “Giovanni” >> file.txt
cat file.txt | head -n 2
Bram
Michel
replace file content
append file content
pipe: send output of
first command to input
of second command
http://www.cs.wm.edu/semeru/data/tse-android/files/apps.csv
http://www.cs.wm.edu/semeru/data/tse-android/files/apps.csv
example
data 1
apps.csv
package_app,name,category,version,rating_average,votes,star1,star2,star3,star4,star5
a8.kv.chilly,a8 chili slot machine lite,CARDS,1.2,3.7,42,10,2,4,2,24
[censored apps]
accessline.spy_camera,Hidden camera free version,MEDIA_AND_VIDEO,1.41,2.4,67,34,7,5,5,16
acciones.chile,Acciones Chile,FINANCE,1.0,4.2,24,1,1,0,11,11
acgs.topanime.evawp.photos,Evangelion HD Live Wallpaper,SPORTS,1.1,3.7,70,16,2,7,10,35
Adam.androiddev,Anti Dog Repellent / Whistle,TOOLS,2.4,3.1,748,288,30,51,77,302
[…]
apps.csv
package_app,name,category,version,rating_average,votes,star1,star2,star3,star4,star5
a8.kv.chilly,a8 chili slot machine lite,CARDS,1.2,3.7,42,10,2,4,2,24
[censored apps]
accessline.spy_camera,Hidden camera free version,MEDIA_AND_VIDEO,1.41,2.4,67,34,7,5,5,16
acciones.chile,Acciones Chile,FINANCE,1.0,4.2,24,1,1,0,11,11
acgs.topanime.evawp.photos,Evangelion HD Live Wallpaper,SPORTS,1.1,3.7,70,16,2,7,10,35
Adam.androiddev,Anti Dog Repellent / Whistle,TOOLS,2.4,3.1,748,288,30,51,77,302
[…]
typical csv file has
comma-separated
list of attribute
names on line 1
apps.csv
package_app,name,category,version,rating_average,votes,star1,star2,star3,star4,star5
a8.kv.chilly,a8 chili slot machine lite,CARDS,1.2,3.7,42,10,2,4,2,24
[censored apps]
accessline.spy_camera,Hidden camera free version,MEDIA_AND_VIDEO,1.41,2.4,67,34,7,5,5,16
acciones.chile,Acciones Chile,FINANCE,1.0,4.2,24,1,1,0,11,11
acgs.topanime.evawp.photos,Evangelion HD Live Wallpaper,SPORTS,1.1,3.7,70,16,2,7,10,35
Adam.androiddev,Anti Dog Repellent / Whistle,TOOLS,2.4,3.1,748,288,30,51,77,302
[…]
typical csv file has
comma-separated
list of attribute
names on line 1
… followed by one line per different observation,
each of which has a value for each attribute
http://www.cs.wm.edu/semeru/data/MSR14-android-reuse/files/apps_labels.csv
http://www.cs.wm.edu/semeru/data/MSR14-android-reuse/files/apps_labels.csv
example
data 2
apps_labels.csv
App package,Category,Type
air.com.huale.Basketball,ARCADE,Obfuscated
air.com.smch.climatekiten,BOOKS_AND_REFERENCE,Obfuscated
air.comicc.app9019,BOOKS_AND_REFERENCE,Obfuscated
ait.podka,MEDIA_AND_VIDEO,Obfuscated
ak.alizandro.smartaudiobookplayer,MUSIC_AND_AUDIO,Obfuscated
amor.developer.android,LIFESTYLE,Obfuscated
[…]
What Kind of Data does
apps.csv Contain?
What Kind of Data does
apps.csv Contain?
head -n 1 apps.csv
What Kind of Data does
apps.csv Contain?
head -n 1 apps.csv
show first line
Oh, does the File Contain the
birthdayChocolate package?
Oh, does the File Contain the
birthdayChocolate package?
grep -e "birthdayChocolate" apps.csv
Oh, does the File Contain the
birthdayChocolate package?
grep -e "birthdayChocolate" apps.csv
search for a literal string
How Many Apps are There?
How Many Apps are There?
wc -l apps.csv
How Many Apps are There?
wc -l apps.csv
#lines in a file
Wait a Minute, What about
the First Line?
Wait a Minute, What about
the First Line?
tail +2 apps.csv | wc -l
Wait a Minute, What about
the First Line?
tail +2 apps.csv | wc -l
all the lines of a
file starting with
line 2 (i.e.,
removing line 1)
… and what about Apps
with >1 Version?
… and what about Apps
with >1 Version?
tail +2 apps.csv | cut -f 2 -d , | sort -u | wc -l
… and what about Apps
with >1 Version?
tail +2 apps.csv | cut -f 2 -d , | sort -u | wc -l
only keep
second column
of comma-
delimited file
… and what about Apps
with >1 Version?
tail +2 apps.csv | cut -f 2 -d , | sort -u | wc -l
only keep
second column
of comma-
delimited file
sort
alphabetically
and remove
duplicate lines
What is the Maximum
#Versions of an App?
What is the Maximum
#Versions of an App?
tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n
What is the Maximum
#Versions of an App?
tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n
sort, but keep all the lines
What is the Maximum
#Versions of an App?
tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n
sort, but keep all the lines
count #occurrences of each
unique line, i.e., group per line and
give #occurrences of each group
What is the Maximum
#Versions of an App?
tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n
sort, but keep all the lines
count #occurrences of each
unique line, i.e., group per line and
give #occurrences of each group
sort numerically
Which App Category
Contains Most of the Apps?
Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only keep app name and category
Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only keep app name and category
keep one version
per app name
Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only keep app name and category
keep one version
per app name
throw away
app name
Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only keep app name and category
keep one version
per app name
throw away
app name
group and
count per
category
Which App Category
Contains Most of the Apps?
tail +2 apps.csv | cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only keep app name and category
keep one version
per app name
throw away
app name
group and
count per
category
sort
categories
per count
Let’s Take a Look at the
Obfuscation Data
Let’s Take a Look at the
Obfuscation Data
less apps_labels.csv
Let’s Take a Look at the
Obfuscation Data
less apps_labels.csv
buffer file to
scroll up and
down (vs. more)
What a Mess?!
More on line-ending: http://www.cyberciti.biz/faq/howto-unix-linux-convert-dos-newlines-cr-lf-unix-text-format/
What a Mess?!
tr 'r' 'n' < apps_labels.csv > apps_obfus.csv
More on line-ending: http://www.cyberciti.biz/faq/howto-unix-linux-convert-dos-newlines-cr-lf-unix-text-format/
What a Mess?!
tr 'r' 'n' < apps_labels.csv > apps_obfus.csv
fix Windows end-
of-line issues by
replacing the r
character by n
More on line-ending: http://www.cyberciti.biz/faq/howto-unix-linux-convert-dos-newlines-cr-lf-unix-text-format/
How to Merge the App Data
with Obfuscation Results? (1)
How to Merge the App Data
with Obfuscation Results? (1)
TMP=`head -n 1 apps.csv`
echo "${TMP},obfuscated" > apps_join.csv
tail +2 apps.csv | sort > sorted_apps.csv
tail +2 apps_obfus.csv
| sort > sorted_apps_obfus.csv
How to Merge the App Data
with Obfuscation Results? (1)
TMP=`head -n 1 apps.csv`
echo "${TMP},obfuscated" > apps_join.csv
tail +2 apps.csv | sort > sorted_apps.csv
tail +2 apps_obfus.csv
| sort > sorted_apps_obfus.csv
store result of
command in variable
How to Merge the App Data
with Obfuscation Results? (1)
TMP=`head -n 1 apps.csv`
echo "${TMP},obfuscated" > apps_join.csv
tail +2 apps.csv | sort > sorted_apps.csv
tail +2 apps_obfus.csv
| sort > sorted_apps_obfus.csv
store result of
command in variable
storing the column names first
How to Merge the App Data
with Obfuscation Results? (1)
TMP=`head -n 1 apps.csv`
echo "${TMP},obfuscated" > apps_join.csv
tail +2 apps.csv | sort > sorted_apps.csv
tail +2 apps_obfus.csv
| sort > sorted_apps_obfus.csv
store result of
command in variable
storing the column names first
merging requires
sorted files
How to Merge the App Data
with Obfuscation Results? (2)
How to Merge the App Data
with Obfuscation Results? (2)
join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv
| cut -f -11,13 -d ,
>> apps_join.csv
How to Merge the App Data
with Obfuscation Results? (2)
join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv
| cut -f -11,13 -d ,
>> apps_join.csv
comma-
separate files
How to Merge the App Data
with Obfuscation Results? (2)
join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv
| cut -f -11,13 -d ,
>> apps_join.csv
comma-
separate files
lines with same value for first column in
file 1 and in file 2 should be merged
How to Merge the App Data
with Obfuscation Results? (2)
join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv
| cut -f -11,13 -d ,
>> apps_join.csv
comma-
separate files
lines with same value for first column in
file 1 and in file 2 should be merged
join removes the specified
-2 column, but keeps rest
of columns of file 2; here
we only want the last
column of file 2, so we
remove the 12th column
(keeping only the first 11
columns and the 13th)
Which Category has Most of
the Obfuscated Code?
Which Category has Most of
the Obfuscated Code?
tail +2 apps_join.csv | grep -e ",Obfuscated"
| cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
Which Category has Most of
the Obfuscated Code?
tail +2 apps_join.csv | grep -e ",Obfuscated"
| cut -f 2,3 -d , | sort -u
| cut -f 2 -d ,
| sort | uniq -c
| sort -n
only consider lines that
are obfuscated
Bonus: How to Create a Comma-
Separated List from a List of Words?
Bonus: How to Create a Comma-
Separated List from a List of Words?
cut -f 3 -d , apps.csv | sort -u
| paste -d , -s -
Bonus: How to Create a Comma-
Separated List from a List of Words?
cut -f 3 -d , apps.csv | sort -u
| paste -d , -s -
take input from pipe
Bonus: How to Create a Comma-
Separated List from a List of Words?
cut -f 3 -d , apps.csv | sort -u
| paste -d , -s -
take input from pipe
concatenate all lines
Bonus: How to Create a Comma-
Separated List from a List of Words?
cut -f 3 -d , apps.csv | sort -u
| paste -d , -s -
take input from pipe
concatenate all lines… and put commas between them
If you’re Interested, Check Out
these Books for More (and less ;-))

Contenu connexe

Tendances

Becoming a Plumber: Building Deployment Pipelines - All Day DevOps
Becoming a Plumber: Building Deployment Pipelines - All Day DevOpsBecoming a Plumber: Building Deployment Pipelines - All Day DevOps
Becoming a Plumber: Building Deployment Pipelines - All Day DevOpsDaniel Barker
 
Optimizing Spring Boot apps for Docker
Optimizing Spring Boot apps for DockerOptimizing Spring Boot apps for Docker
Optimizing Spring Boot apps for DockerGraham Charters
 
QA Fest 2018. Adam Stasiak. React Native is Coming – the story of hybrid mobi...
QA Fest 2018. Adam Stasiak. React Native is Coming – the story of hybrid mobi...QA Fest 2018. Adam Stasiak. React Native is Coming – the story of hybrid mobi...
QA Fest 2018. Adam Stasiak. React Native is Coming – the story of hybrid mobi...QAFest
 
Frontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonFrontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonPhilip Tellis
 
Velocity London - Chaos Engineering Bootcamp
Velocity London - Chaos Engineering Bootcamp Velocity London - Chaos Engineering Bootcamp
Velocity London - Chaos Engineering Bootcamp Ana Medina
 
Frontend Performance: De débutant à Expert à Fou Furieux
Frontend Performance: De débutant à Expert à Fou FurieuxFrontend Performance: De débutant à Expert à Fou Furieux
Frontend Performance: De débutant à Expert à Fou FurieuxPhilip Tellis
 
Immutable infrastructure:觀念與實作 (建議)
Immutable infrastructure:觀念與實作 (建議)Immutable infrastructure:觀念與實作 (建議)
Immutable infrastructure:觀念與實作 (建議)William Yeh
 
Frontend Performance: Expert to Crazy Person
Frontend Performance: Expert to Crazy PersonFrontend Performance: Expert to Crazy Person
Frontend Performance: Expert to Crazy PersonPhilip Tellis
 
Baremetal deployment scale
Baremetal deployment scaleBaremetal deployment scale
Baremetal deployment scalebaremetal
 
Baremetal deployment
Baremetal deploymentBaremetal deployment
Baremetal deploymentbaremetal
 
.Net Hijacking to Defend PowerShell BSidesSF2017
.Net Hijacking to Defend PowerShell BSidesSF2017 .Net Hijacking to Defend PowerShell BSidesSF2017
.Net Hijacking to Defend PowerShell BSidesSF2017 Amanda Rousseau
 
Hosting Your Own OTA Update Service
Hosting Your Own OTA Update ServiceHosting Your Own OTA Update Service
Hosting Your Own OTA Update ServiceQuinlan Jung
 
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)Christian Catalan
 
ContainerDays NYC 2016: "Introduction to Application Automation with Habitat"...
ContainerDays NYC 2016: "Introduction to Application Automation with Habitat"...ContainerDays NYC 2016: "Introduction to Application Automation with Habitat"...
ContainerDays NYC 2016: "Introduction to Application Automation with Habitat"...DynamicInfraDays
 
Distributing UI Libraries: in a post Web-Component world
Distributing UI Libraries: in a post Web-Component worldDistributing UI Libraries: in a post Web-Component world
Distributing UI Libraries: in a post Web-Component worldRachael L Moore
 
Frontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonFrontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonPhilip Tellis
 
Simple SQL Change Management with Sqitch
Simple SQL Change Management with SqitchSimple SQL Change Management with Sqitch
Simple SQL Change Management with SqitchDavid Wheeler
 
Laravel Code Generators and Packages
Laravel Code Generators and PackagesLaravel Code Generators and Packages
Laravel Code Generators and PackagesPovilas Korop
 
FrenchKit 2017: Server(less) Swift
FrenchKit 2017: Server(less) SwiftFrenchKit 2017: Server(less) Swift
FrenchKit 2017: Server(less) SwiftChris Bailey
 

Tendances (20)

Becoming a Plumber: Building Deployment Pipelines - All Day DevOps
Becoming a Plumber: Building Deployment Pipelines - All Day DevOpsBecoming a Plumber: Building Deployment Pipelines - All Day DevOps
Becoming a Plumber: Building Deployment Pipelines - All Day DevOps
 
Optimizing Spring Boot apps for Docker
Optimizing Spring Boot apps for DockerOptimizing Spring Boot apps for Docker
Optimizing Spring Boot apps for Docker
 
QA Fest 2018. Adam Stasiak. React Native is Coming – the story of hybrid mobi...
QA Fest 2018. Adam Stasiak. React Native is Coming – the story of hybrid mobi...QA Fest 2018. Adam Stasiak. React Native is Coming – the story of hybrid mobi...
QA Fest 2018. Adam Stasiak. React Native is Coming – the story of hybrid mobi...
 
Frontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonFrontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy Person
 
Velocity London - Chaos Engineering Bootcamp
Velocity London - Chaos Engineering Bootcamp Velocity London - Chaos Engineering Bootcamp
Velocity London - Chaos Engineering Bootcamp
 
Frontend Performance: De débutant à Expert à Fou Furieux
Frontend Performance: De débutant à Expert à Fou FurieuxFrontend Performance: De débutant à Expert à Fou Furieux
Frontend Performance: De débutant à Expert à Fou Furieux
 
Immutable infrastructure:觀念與實作 (建議)
Immutable infrastructure:觀念與實作 (建議)Immutable infrastructure:觀念與實作 (建議)
Immutable infrastructure:觀念與實作 (建議)
 
Frontend Performance: Expert to Crazy Person
Frontend Performance: Expert to Crazy PersonFrontend Performance: Expert to Crazy Person
Frontend Performance: Expert to Crazy Person
 
Baremetal deployment scale
Baremetal deployment scaleBaremetal deployment scale
Baremetal deployment scale
 
Baremetal deployment
Baremetal deploymentBaremetal deployment
Baremetal deployment
 
.Net Hijacking to Defend PowerShell BSidesSF2017
.Net Hijacking to Defend PowerShell BSidesSF2017 .Net Hijacking to Defend PowerShell BSidesSF2017
.Net Hijacking to Defend PowerShell BSidesSF2017
 
Let your tests drive your code
Let your tests drive your codeLet your tests drive your code
Let your tests drive your code
 
Hosting Your Own OTA Update Service
Hosting Your Own OTA Update ServiceHosting Your Own OTA Update Service
Hosting Your Own OTA Update Service
 
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)
 
ContainerDays NYC 2016: "Introduction to Application Automation with Habitat"...
ContainerDays NYC 2016: "Introduction to Application Automation with Habitat"...ContainerDays NYC 2016: "Introduction to Application Automation with Habitat"...
ContainerDays NYC 2016: "Introduction to Application Automation with Habitat"...
 
Distributing UI Libraries: in a post Web-Component world
Distributing UI Libraries: in a post Web-Component worldDistributing UI Libraries: in a post Web-Component world
Distributing UI Libraries: in a post Web-Component world
 
Frontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy PersonFrontend Performance: Beginner to Expert to Crazy Person
Frontend Performance: Beginner to Expert to Crazy Person
 
Simple SQL Change Management with Sqitch
Simple SQL Change Management with SqitchSimple SQL Change Management with Sqitch
Simple SQL Change Management with Sqitch
 
Laravel Code Generators and Packages
Laravel Code Generators and PackagesLaravel Code Generators and Packages
Laravel Code Generators and Packages
 
FrenchKit 2017: Server(less) Swift
FrenchKit 2017: Server(less) SwiftFrenchKit 2017: Server(less) Swift
FrenchKit 2017: Server(less) Swift
 

En vedette

An Empirical Study of Build System Migrations in Practice (ICSM 2012)
An Empirical Study of Build System Migrations in Practice (ICSM 2012)An Empirical Study of Build System Migrations in Practice (ICSM 2012)
An Empirical Study of Build System Migrations in Practice (ICSM 2012)Bram Adams
 
The Evolution of the R Software Ecosystem (CSMR 2013)
The Evolution of the R Software Ecosystem (CSMR 2013)The Evolution of the R Software Ecosystem (CSMR 2013)
The Evolution of the R Software Ecosystem (CSMR 2013)Bram Adams
 
A Qualitative Study on Performance Bugs (MSR 2012)
A Qualitative Study on Performance Bugs (MSR 2012)A Qualitative Study on Performance Bugs (MSR 2012)
A Qualitative Study on Performance Bugs (MSR 2012)Bram Adams
 
Modern Release Engineering in a Nutshell - Why Researchers should Care!
Modern Release Engineering in a Nutshell - Why Researchers should Care!Modern Release Engineering in a Nutshell - Why Researchers should Care!
Modern Release Engineering in a Nutshell - Why Researchers should Care!Bram Adams
 
Why do Automated Builds Break? An Empirical Study (ICSME 2014)
Why do Automated Builds Break? An Empirical Study (ICSME 2014)Why do Automated Builds Break? An Empirical Study (ICSME 2014)
Why do Automated Builds Break? An Empirical Study (ICSME 2014)Bram Adams
 
On Software Release Engineering (Bram Adams)
On Software Release Engineering (Bram Adams)On Software Release Engineering (Bram Adams)
On Software Release Engineering (Bram Adams)Bram Adams
 
How much does this commit cost? -A position paper
How much does this commit cost? -A position paperHow much does this commit cost? -A position paper
How much does this commit cost? -A position paperYujuan Jiang
 

En vedette (7)

An Empirical Study of Build System Migrations in Practice (ICSM 2012)
An Empirical Study of Build System Migrations in Practice (ICSM 2012)An Empirical Study of Build System Migrations in Practice (ICSM 2012)
An Empirical Study of Build System Migrations in Practice (ICSM 2012)
 
The Evolution of the R Software Ecosystem (CSMR 2013)
The Evolution of the R Software Ecosystem (CSMR 2013)The Evolution of the R Software Ecosystem (CSMR 2013)
The Evolution of the R Software Ecosystem (CSMR 2013)
 
A Qualitative Study on Performance Bugs (MSR 2012)
A Qualitative Study on Performance Bugs (MSR 2012)A Qualitative Study on Performance Bugs (MSR 2012)
A Qualitative Study on Performance Bugs (MSR 2012)
 
Modern Release Engineering in a Nutshell - Why Researchers should Care!
Modern Release Engineering in a Nutshell - Why Researchers should Care!Modern Release Engineering in a Nutshell - Why Researchers should Care!
Modern Release Engineering in a Nutshell - Why Researchers should Care!
 
Why do Automated Builds Break? An Empirical Study (ICSME 2014)
Why do Automated Builds Break? An Empirical Study (ICSME 2014)Why do Automated Builds Break? An Empirical Study (ICSME 2014)
Why do Automated Builds Break? An Empirical Study (ICSME 2014)
 
On Software Release Engineering (Bram Adams)
On Software Release Engineering (Bram Adams)On Software Release Engineering (Bram Adams)
On Software Release Engineering (Bram Adams)
 
How much does this commit cost? -A position paper
How much does this commit cost? -A position paperHow much does this commit cost? -A position paper
How much does this commit cost? -A position paper
 

Similaire à The Bash Dashboard (Or: How to Use Bash for Data Analysis)

Hello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdfHello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdfnamarta88
 
Automated bug localization
Automated bug localizationAutomated bug localization
Automated bug localizationXin Ye
 
This project is the first projects you will be working on this quart.pdf
This project is the first projects you will be working on this quart.pdfThis project is the first projects you will be working on this quart.pdf
This project is the first projects you will be working on this quart.pdfeyewaregallery
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeXin Ye
 
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...Andrey Karpov
 
Web2py Code Lab
Web2py Code LabWeb2py Code Lab
Web2py Code LabColin Su
 
Testing business-logic-in-dsls
Testing business-logic-in-dslsTesting business-logic-in-dsls
Testing business-logic-in-dslsMayank Jain
 
2600 v08 n1 (spring 1991)
2600 v08 n1 (spring 1991)2600 v08 n1 (spring 1991)
2600 v08 n1 (spring 1991)Felipe Prado
 
Question 1 briefly respond to all the following questions. make
Question 1 briefly respond to all the following questions. make Question 1 briefly respond to all the following questions. make
Question 1 briefly respond to all the following questions. make YASHU40
 
Python Programming - II. The Basics
Python Programming - II. The BasicsPython Programming - II. The Basics
Python Programming - II. The BasicsRanel Padon
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR MasterclassIan Massingham
 
Compose all the things
Compose all the thingsCompose all the things
Compose all the thingsThomas Sojka
 
Reaction StatisticsBackgroundWhen collecting experimental data f.pdf
Reaction StatisticsBackgroundWhen collecting experimental data f.pdfReaction StatisticsBackgroundWhen collecting experimental data f.pdf
Reaction StatisticsBackgroundWhen collecting experimental data f.pdffashionbigchennai
 
BENG 108 Final Project
BENG 108 Final ProjectBENG 108 Final Project
BENG 108 Final ProjectJason Trimble
 
COMP 2103X1 Assignment 2Due Thursday, January 26 by 700 PM.docx
COMP 2103X1 Assignment 2Due Thursday, January 26 by 700 PM.docxCOMP 2103X1 Assignment 2Due Thursday, January 26 by 700 PM.docx
COMP 2103X1 Assignment 2Due Thursday, January 26 by 700 PM.docxdonnajames55
 

Similaire à The Bash Dashboard (Or: How to Use Bash for Data Analysis) (20)

Hello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdfHello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdf
 
Automated bug localization
Automated bug localizationAutomated bug localization
Automated bug localization
 
Easy R
Easy REasy R
Easy R
 
This project is the first projects you will be working on this quart.pdf
This project is the first projects you will be working on this quart.pdfThis project is the first projects you will be working on this quart.pdf
This project is the first projects you will be working on this quart.pdf
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
R Machine Learning - handbook
R Machine Learning - handbookR Machine Learning - handbook
R Machine Learning - handbook
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
 
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...
Analysis of commits and pull requests in Travis CI, Buddy and AppVeyor using ...
 
Web2py Code Lab
Web2py Code LabWeb2py Code Lab
Web2py Code Lab
 
Testing business-logic-in-dsls
Testing business-logic-in-dslsTesting business-logic-in-dsls
Testing business-logic-in-dsls
 
2600 v08 n1 (spring 1991)
2600 v08 n1 (spring 1991)2600 v08 n1 (spring 1991)
2600 v08 n1 (spring 1991)
 
groovy & grails - lecture 13
groovy & grails - lecture 13groovy & grails - lecture 13
groovy & grails - lecture 13
 
Question 1 briefly respond to all the following questions. make
Question 1 briefly respond to all the following questions. make Question 1 briefly respond to all the following questions. make
Question 1 briefly respond to all the following questions. make
 
Python Programming - II. The Basics
Python Programming - II. The BasicsPython Programming - II. The Basics
Python Programming - II. The Basics
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Compose all the things
Compose all the thingsCompose all the things
Compose all the things
 
Reaction StatisticsBackgroundWhen collecting experimental data f.pdf
Reaction StatisticsBackgroundWhen collecting experimental data f.pdfReaction StatisticsBackgroundWhen collecting experimental data f.pdf
Reaction StatisticsBackgroundWhen collecting experimental data f.pdf
 
BENG 108 Final Project
BENG 108 Final ProjectBENG 108 Final Project
BENG 108 Final Project
 
COMP 2103X1 Assignment 2Due Thursday, January 26 by 700 PM.docx
COMP 2103X1 Assignment 2Due Thursday, January 26 by 700 PM.docxCOMP 2103X1 Assignment 2Due Thursday, January 26 by 700 PM.docx
COMP 2103X1 Assignment 2Due Thursday, January 26 by 700 PM.docx
 

Dernier

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Dernier (20)

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

The Bash Dashboard (Or: How to Use Bash for Data Analysis)

  • 1. (or: How to Use Bash for Data Analytics) The Bash Dashboard Bram Adams Polytechnique Montreal M C IS
  • 2. Yes, this kind of stuff :-)
  • 3. Last time I checked, every PC on earth had Excel installed, so what gives? (quote by random grad student)
  • 5. Let me rephrase: Why Bash if one has Python or R? (fictitious quote)
  • 6. To better understand and prepare your data before deeper analysis!
  • 7. Basic Constructs echo “Bram” > file.txt echo “Michel” >> file.txt echo “Giovanni” >> file.txt cat file.txt | head -n 2
  • 8. Basic Constructs echo “Bram” > file.txt echo “Michel” >> file.txt echo “Giovanni” >> file.txt cat file.txt | head -n 2 replace file content
  • 9. Basic Constructs echo “Bram” > file.txt echo “Michel” >> file.txt echo “Giovanni” >> file.txt cat file.txt | head -n 2 replace file content append file content
  • 10. Basic Constructs echo “Bram” > file.txt echo “Michel” >> file.txt echo “Giovanni” >> file.txt cat file.txt | head -n 2 replace file content append file content pipe: send output of first command to input of second command
  • 11. Basic Constructs echo “Bram” > file.txt echo “Michel” >> file.txt echo “Giovanni” >> file.txt cat file.txt | head -n 2 Bram Michel replace file content append file content pipe: send output of first command to input of second command
  • 14. apps.csv package_app,name,category,version,rating_average,votes,star1,star2,star3,star4,star5 a8.kv.chilly,a8 chili slot machine lite,CARDS,1.2,3.7,42,10,2,4,2,24 [censored apps] accessline.spy_camera,Hidden camera free version,MEDIA_AND_VIDEO,1.41,2.4,67,34,7,5,5,16 acciones.chile,Acciones Chile,FINANCE,1.0,4.2,24,1,1,0,11,11 acgs.topanime.evawp.photos,Evangelion HD Live Wallpaper,SPORTS,1.1,3.7,70,16,2,7,10,35 Adam.androiddev,Anti Dog Repellent / Whistle,TOOLS,2.4,3.1,748,288,30,51,77,302 […]
  • 15. apps.csv package_app,name,category,version,rating_average,votes,star1,star2,star3,star4,star5 a8.kv.chilly,a8 chili slot machine lite,CARDS,1.2,3.7,42,10,2,4,2,24 [censored apps] accessline.spy_camera,Hidden camera free version,MEDIA_AND_VIDEO,1.41,2.4,67,34,7,5,5,16 acciones.chile,Acciones Chile,FINANCE,1.0,4.2,24,1,1,0,11,11 acgs.topanime.evawp.photos,Evangelion HD Live Wallpaper,SPORTS,1.1,3.7,70,16,2,7,10,35 Adam.androiddev,Anti Dog Repellent / Whistle,TOOLS,2.4,3.1,748,288,30,51,77,302 […] typical csv file has comma-separated list of attribute names on line 1
  • 16. apps.csv package_app,name,category,version,rating_average,votes,star1,star2,star3,star4,star5 a8.kv.chilly,a8 chili slot machine lite,CARDS,1.2,3.7,42,10,2,4,2,24 [censored apps] accessline.spy_camera,Hidden camera free version,MEDIA_AND_VIDEO,1.41,2.4,67,34,7,5,5,16 acciones.chile,Acciones Chile,FINANCE,1.0,4.2,24,1,1,0,11,11 acgs.topanime.evawp.photos,Evangelion HD Live Wallpaper,SPORTS,1.1,3.7,70,16,2,7,10,35 Adam.androiddev,Anti Dog Repellent / Whistle,TOOLS,2.4,3.1,748,288,30,51,77,302 […] typical csv file has comma-separated list of attribute names on line 1 … followed by one line per different observation, each of which has a value for each attribute
  • 20. What Kind of Data does apps.csv Contain?
  • 21. What Kind of Data does apps.csv Contain? head -n 1 apps.csv
  • 22. What Kind of Data does apps.csv Contain? head -n 1 apps.csv show first line
  • 23. Oh, does the File Contain the birthdayChocolate package?
  • 24. Oh, does the File Contain the birthdayChocolate package? grep -e "birthdayChocolate" apps.csv
  • 25. Oh, does the File Contain the birthdayChocolate package? grep -e "birthdayChocolate" apps.csv search for a literal string
  • 26. How Many Apps are There?
  • 27. How Many Apps are There? wc -l apps.csv
  • 28. How Many Apps are There? wc -l apps.csv #lines in a file
  • 29. Wait a Minute, What about the First Line?
  • 30. Wait a Minute, What about the First Line? tail +2 apps.csv | wc -l
  • 31. Wait a Minute, What about the First Line? tail +2 apps.csv | wc -l all the lines of a file starting with line 2 (i.e., removing line 1)
  • 32. … and what about Apps with >1 Version?
  • 33. … and what about Apps with >1 Version? tail +2 apps.csv | cut -f 2 -d , | sort -u | wc -l
  • 34. … and what about Apps with >1 Version? tail +2 apps.csv | cut -f 2 -d , | sort -u | wc -l only keep second column of comma- delimited file
  • 35. … and what about Apps with >1 Version? tail +2 apps.csv | cut -f 2 -d , | sort -u | wc -l only keep second column of comma- delimited file sort alphabetically and remove duplicate lines
  • 36. What is the Maximum #Versions of an App?
  • 37. What is the Maximum #Versions of an App? tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n
  • 38. What is the Maximum #Versions of an App? tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n sort, but keep all the lines
  • 39. What is the Maximum #Versions of an App? tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n sort, but keep all the lines count #occurrences of each unique line, i.e., group per line and give #occurrences of each group
  • 40. What is the Maximum #Versions of an App? tail +2 apps.csv | cut -f 2 -d , | sort | uniq -c | sort -n sort, but keep all the lines count #occurrences of each unique line, i.e., group per line and give #occurrences of each group sort numerically
  • 41. Which App Category Contains Most of the Apps?
  • 42. Which App Category Contains Most of the Apps? tail +2 apps.csv | cut -f 2,3 -d , | sort -u | cut -f 2 -d , | sort | uniq -c | sort -n
  • 43. Which App Category Contains Most of the Apps? tail +2 apps.csv | cut -f 2,3 -d , | sort -u | cut -f 2 -d , | sort | uniq -c | sort -n only keep app name and category
  • 44. Which App Category Contains Most of the Apps? tail +2 apps.csv | cut -f 2,3 -d , | sort -u | cut -f 2 -d , | sort | uniq -c | sort -n only keep app name and category keep one version per app name
  • 45. Which App Category Contains Most of the Apps? tail +2 apps.csv | cut -f 2,3 -d , | sort -u | cut -f 2 -d , | sort | uniq -c | sort -n only keep app name and category keep one version per app name throw away app name
  • 46. Which App Category Contains Most of the Apps? tail +2 apps.csv | cut -f 2,3 -d , | sort -u | cut -f 2 -d , | sort | uniq -c | sort -n only keep app name and category keep one version per app name throw away app name group and count per category
  • 47. Which App Category Contains Most of the Apps? tail +2 apps.csv | cut -f 2,3 -d , | sort -u | cut -f 2 -d , | sort | uniq -c | sort -n only keep app name and category keep one version per app name throw away app name group and count per category sort categories per count
  • 48. Let’s Take a Look at the Obfuscation Data
  • 49. Let’s Take a Look at the Obfuscation Data less apps_labels.csv
  • 50. Let’s Take a Look at the Obfuscation Data less apps_labels.csv buffer file to scroll up and down (vs. more)
  • 51. What a Mess?! More on line-ending: http://www.cyberciti.biz/faq/howto-unix-linux-convert-dos-newlines-cr-lf-unix-text-format/
  • 52. What a Mess?! tr 'r' 'n' < apps_labels.csv > apps_obfus.csv More on line-ending: http://www.cyberciti.biz/faq/howto-unix-linux-convert-dos-newlines-cr-lf-unix-text-format/
  • 53. What a Mess?! tr 'r' 'n' < apps_labels.csv > apps_obfus.csv fix Windows end- of-line issues by replacing the r character by n More on line-ending: http://www.cyberciti.biz/faq/howto-unix-linux-convert-dos-newlines-cr-lf-unix-text-format/
  • 54. How to Merge the App Data with Obfuscation Results? (1)
  • 55. How to Merge the App Data with Obfuscation Results? (1) TMP=`head -n 1 apps.csv` echo "${TMP},obfuscated" > apps_join.csv tail +2 apps.csv | sort > sorted_apps.csv tail +2 apps_obfus.csv | sort > sorted_apps_obfus.csv
  • 56. How to Merge the App Data with Obfuscation Results? (1) TMP=`head -n 1 apps.csv` echo "${TMP},obfuscated" > apps_join.csv tail +2 apps.csv | sort > sorted_apps.csv tail +2 apps_obfus.csv | sort > sorted_apps_obfus.csv store result of command in variable
  • 57. How to Merge the App Data with Obfuscation Results? (1) TMP=`head -n 1 apps.csv` echo "${TMP},obfuscated" > apps_join.csv tail +2 apps.csv | sort > sorted_apps.csv tail +2 apps_obfus.csv | sort > sorted_apps_obfus.csv store result of command in variable storing the column names first
  • 58. How to Merge the App Data with Obfuscation Results? (1) TMP=`head -n 1 apps.csv` echo "${TMP},obfuscated" > apps_join.csv tail +2 apps.csv | sort > sorted_apps.csv tail +2 apps_obfus.csv | sort > sorted_apps_obfus.csv store result of command in variable storing the column names first merging requires sorted files
  • 59. How to Merge the App Data with Obfuscation Results? (2)
  • 60. How to Merge the App Data with Obfuscation Results? (2) join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv | cut -f -11,13 -d , >> apps_join.csv
  • 61. How to Merge the App Data with Obfuscation Results? (2) join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv | cut -f -11,13 -d , >> apps_join.csv comma- separate files
  • 62. How to Merge the App Data with Obfuscation Results? (2) join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv | cut -f -11,13 -d , >> apps_join.csv comma- separate files lines with same value for first column in file 1 and in file 2 should be merged
  • 63. How to Merge the App Data with Obfuscation Results? (2) join -t , -1 1 -2 1 sorted_apps.csv sorted_apps_obfus.csv | cut -f -11,13 -d , >> apps_join.csv comma- separate files lines with same value for first column in file 1 and in file 2 should be merged join removes the specified -2 column, but keeps rest of columns of file 2; here we only want the last column of file 2, so we remove the 12th column (keeping only the first 11 columns and the 13th)
  • 64. Which Category has Most of the Obfuscated Code?
  • 65. Which Category has Most of the Obfuscated Code? tail +2 apps_join.csv | grep -e ",Obfuscated" | cut -f 2,3 -d , | sort -u | cut -f 2 -d , | sort | uniq -c | sort -n
  • 66. Which Category has Most of the Obfuscated Code? tail +2 apps_join.csv | grep -e ",Obfuscated" | cut -f 2,3 -d , | sort -u | cut -f 2 -d , | sort | uniq -c | sort -n only consider lines that are obfuscated
  • 67. Bonus: How to Create a Comma- Separated List from a List of Words?
  • 68. Bonus: How to Create a Comma- Separated List from a List of Words? cut -f 3 -d , apps.csv | sort -u | paste -d , -s -
  • 69. Bonus: How to Create a Comma- Separated List from a List of Words? cut -f 3 -d , apps.csv | sort -u | paste -d , -s - take input from pipe
  • 70. Bonus: How to Create a Comma- Separated List from a List of Words? cut -f 3 -d , apps.csv | sort -u | paste -d , -s - take input from pipe concatenate all lines
  • 71. Bonus: How to Create a Comma- Separated List from a List of Words? cut -f 3 -d , apps.csv | sort -u | paste -d , -s - take input from pipe concatenate all lines… and put commas between them
  • 72. If you’re Interested, Check Out these Books for More (and less ;-))