SlideShare a Scribd company logo
1 of 21
Download to read offline
RedHat Enterprise Linux Essential
     Unit 7: Text Processing Tools
Objectives
Upon completion of this unit, you should be able to:

 Use tools for extracting, analyzing and manipulating
  text data
Tools for Extracting Text

 File Contents: less and cat

 File Excerpts: head and tail

 Extract by Column: cut

 Extract by Keyword: grep
Viewing File Contents
                             less and cat

 cat: dump one or more files to STDOUT

    Multiple files are concatenated together


 less: view file or STDIN one page at a time

    Useful commands while viewing:

       • /text searches for text

       • n/N jumps to the next/previous match

       • v opens the file in a text editor


 less is the pager used by man
Viewing File Excerpts
                               head and tail

 head: Display the first 10 lines of a file

    Use -n to change number of lines displayed


 tail: Display the last 10 lines of a file

    Use -n to change number of lines displayed


    Use -f to "follow" subsequent additions to the file
       • Very useful for monitoring log files!
Extracting Text by Keyword
                             grep
 Prints lines of files or STDIN where a pattern is matched
       $ grep 'john' /etc/passwd

       $ date --help | grep year

 Use -i to search case-insensitively

 Use -n to print line numbers of matches

 Use -v to print lines not containing pattern

 Use -AX to include the X lines after each match

 Use -BX to include the X lines before each match
Extracting Text by Column
                                 cut

 Display specific columns of file or STDIN data

  $ cut -d: -f1 /etc/passwd

  $ grep root /etc/passwd | cut -d: -f7


 Use -d to specify the column delimiter (default is TAB)

 Use -f to specify the column to print

 Use -c to cut by characters

  $ cut -c2-5 /usr/share/dict/words
Tools for Analyzing Text

 Text Stats: wc

 Sorting Text: sort

 Comparing Files: diff and patch

 Spell Check: aspell
Gathering Text Statistics
                       wc (word count)
 Counts words, lines, bytes and characters

 Can act upon a file or STDIN

       $ wc story.txt

       39   237   1901 story.txt

 Use -l for only line count

 Use -w for only word count

 Use -c for only byte count

 Use -m for character count (not displayed)
Sorting Text sort

 Sorts text to STDOUT - original file unchanged

       $ sort [options] file(s)
 Common options
    -r performs a reverse (descending) sort

    -n performs a numeric sort

    -f ignores (folds) case of characters in strings

    -u (unique) removes duplicate lines in output

    -t c uses c as a field separator

    -k X sorts by c-delimited field X
       • Can be used multiple times
Eliminating Duplicate Lines
                        sort and uniq
 sort -u: removes duplicate lines from input

 uniq: removes duplicate adjacent lines from input
    Use -c to count number of occurrences

    Use with sort for best effect:

      $ sort userlist.txt | uniq -c
Comparing Files
                              diff
 Compares two files for differences
      $ diff foo.conf-broken foo.conf-works
      5c5
      < use_widgets = no
      ---
      > use_widgets = yes
    Denotes a difference (change) on line 5

 Use gvimdiff for graphical diff
    Provided by vim-X11 package
Duplicating File Changes
                               patch
 diff output stored in a file is called a "patchfile"
    Use -u for "unified" diff, best in patchfiles

 patch duplicates changes in other files (use with care!)

 • Use -b to automatically back up changed files

  $ diff -u foo.conf-broken foo.conf-works > foo.patch

  $ patch -b foo.conf-broken foo.patch
Spell Checking with aspell

 Interactively spell-check files:
       $ aspell check letter.txt

 Non-interactively list mis-spelled words in STDIN

       $ aspell list < letter.txt

       $ aspell list < letter.txt | wc -l
Tools for Manipulating Text
                           tr and sed
 Alter (translate) Characters: tr
    Converts characters in one set to corresponding characters in another
     set
    Only reads data from STDIN

       $ tr 'a-z' 'A-Z' < lowercase.txt

 Alter Strings: sed
    stream editor

    Performs search/replace operations on a stream of text

    Normally does not alter source file

    Use -i.bak to back-up and alter source file
sed
                              Examples
 Quote search and replace instructions!

 sed addresses
    sed 's/dog/cat/g' pets

    sed '1,50s/dog/cat/g' pets

    sed '/digby/,/duncan/s/dog/cat/g' pets

 Multiple sed instructions
    sed -e 's/dog/cat/' -e 's/hi/lo/' pets

    sed -f myedits pets
Introduction awk

   Field/Column processor
   Supports egrep-compatible (POSIX) RegExes
   Can return full lines like grep
   Awk runs 3 steps:
     BEGIN - optional
     Body, where the main action(s) take place
     END - optional
 Multiple body actions can be executed by separating them using
  semicolons. e.g. '{ print $1; print $2 }'
 awk, auto-loops through input stream, regardless of the source of the
  stream. e.g. STDIN, Pipe, File
 Usage:
       awk '/optional_match/ { action }' file_name | Pipe
Example awk

 Print a text file
    awk '{print }' /etc/passwd

    awk '{print $0}' /etc/passwd

 Print specific field
    awk -F':' '{print $1}' /etc/passwd

 Pattern matching
    awk '$9 == 500 { print $0}' /var/log/httpd/access.log

 Print lines containing vmintam,student and khanh
    awk '/vmintam|student|khanh/' /etc/passwd
Example awk (con’t)

 print 1st lines from file
   awk "NR==1{print;exit}" /etc/resolv.conf

 Simply Arithmetic
   awk '{total += $1} END {print total}' earnings.txt

 Shell cannot calculate with floating point numberes, but awk can:
   awk 'BEGIN {printf "%.3fn", 2005.50 / 3}‘

 history | awk '{print $2}' | sort | uniq -c | sort -rn | head
Special Characters for Complex Searches
                 Regular Expressions
 ^ represents beginning of line

 $ represents end of line

 Character classes as in bash:
    [abc], [^abc]

    [[:upper:]], [^[:upper:]]

 Used by:
    grep, sed, less, others
Unit 8 text processing tools

More Related Content

What's hot

Grep - A powerful search utility
Grep - A powerful search utilityGrep - A powerful search utility
Grep - A powerful search utilityNirajan Pant
 
101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressionsAcácio Oliveira
 
101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressionsAcácio Oliveira
 
15 practical grep command examples in linux
15 practical grep command examples in linux15 practical grep command examples in linux
15 practical grep command examples in linuxTeja Bheemanapally
 
Hex file and regex cheat sheet
Hex file and regex cheat sheetHex file and regex cheat sheet
Hex file and regex cheat sheetMartin Cabrera
 
intro unix/linux 05
intro unix/linux 05intro unix/linux 05
intro unix/linux 05duquoi
 
Unix Basics
Unix BasicsUnix Basics
Unix BasicsDr.Ravi
 
Introduction to Python , Overview
Introduction to Python , OverviewIntroduction to Python , Overview
Introduction to Python , OverviewNB Veeresh
 
Using Unix
Using UnixUsing Unix
Using UnixDr.Ravi
 
Unix Commands
Unix CommandsUnix Commands
Unix CommandsDr.Ravi
 
Chunked, dplyr for large text files
Chunked, dplyr for large text filesChunked, dplyr for large text files
Chunked, dplyr for large text filesEdwin de Jonge
 

What's hot (18)

Grep - A powerful search utility
Grep - A powerful search utilityGrep - A powerful search utility
Grep - A powerful search utility
 
101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions
 
101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions
 
15 practical grep command examples in linux
15 practical grep command examples in linux15 practical grep command examples in linux
15 practical grep command examples in linux
 
Grep
GrepGrep
Grep
 
Hex file and regex cheat sheet
Hex file and regex cheat sheetHex file and regex cheat sheet
Hex file and regex cheat sheet
 
PHP 5.3
PHP 5.3PHP 5.3
PHP 5.3
 
Unix - Filters
Unix - FiltersUnix - Filters
Unix - Filters
 
intro unix/linux 05
intro unix/linux 05intro unix/linux 05
intro unix/linux 05
 
Mysql
MysqlMysql
Mysql
 
Unix Basics
Unix BasicsUnix Basics
Unix Basics
 
Introduction to Python , Overview
Introduction to Python , OverviewIntroduction to Python , Overview
Introduction to Python , Overview
 
Using Unix
Using UnixUsing Unix
Using Unix
 
Programming in C
Programming in CProgramming in C
Programming in C
 
Unix Commands
Unix CommandsUnix Commands
Unix Commands
 
Linux com
Linux comLinux com
Linux com
 
Learning Grep
Learning GrepLearning Grep
Learning Grep
 
Chunked, dplyr for large text files
Chunked, dplyr for large text filesChunked, dplyr for large text files
Chunked, dplyr for large text files
 

Viewers also liked

Speed protocol processor
Speed protocol processorSpeed protocol processor
Speed protocol processorAkhil Kumar
 
Word processor in the classroom
Word processor in the classroomWord processor in the classroom
Word processor in the classroomLuphiie Lyaa
 
Ictlessonepp4aralin10angcomputerfilesystem 150622081942-lva1-app6892 -
Ictlessonepp4aralin10angcomputerfilesystem 150622081942-lva1-app6892 -Ictlessonepp4aralin10angcomputerfilesystem 150622081942-lva1-app6892 -
Ictlessonepp4aralin10angcomputerfilesystem 150622081942-lva1-app6892 -Cathy Princess Bunye
 
Ictlessonepp4 aralin11pananaliksikgamitanginternet-150622045536-lva1-app6891 -
Ictlessonepp4 aralin11pananaliksikgamitanginternet-150622045536-lva1-app6891 -Ictlessonepp4 aralin11pananaliksikgamitanginternet-150622045536-lva1-app6891 -
Ictlessonepp4 aralin11pananaliksikgamitanginternet-150622045536-lva1-app6891 -Cathy Princess Bunye
 
Ict lesson epp 4 aralin 9 pangangalap ng impormasyon gamit ang ict
Ict lesson epp 4 aralin 9 pangangalap ng impormasyon gamit ang ictIct lesson epp 4 aralin 9 pangangalap ng impormasyon gamit ang ict
Ict lesson epp 4 aralin 9 pangangalap ng impormasyon gamit ang ictMary Ann Encinas
 
K TO 12 GRADE 4 UNANG MARKAHANG PAGSUSULIT
K TO 12 GRADE 4 UNANG MARKAHANG PAGSUSULITK TO 12 GRADE 4 UNANG MARKAHANG PAGSUSULIT
K TO 12 GRADE 4 UNANG MARKAHANG PAGSUSULITLiGhT ArOhL
 
Ppt for tranmission media
Ppt for tranmission mediaPpt for tranmission media
Ppt for tranmission mediaManish8976
 

Viewers also liked (7)

Speed protocol processor
Speed protocol processorSpeed protocol processor
Speed protocol processor
 
Word processor in the classroom
Word processor in the classroomWord processor in the classroom
Word processor in the classroom
 
Ictlessonepp4aralin10angcomputerfilesystem 150622081942-lva1-app6892 -
Ictlessonepp4aralin10angcomputerfilesystem 150622081942-lva1-app6892 -Ictlessonepp4aralin10angcomputerfilesystem 150622081942-lva1-app6892 -
Ictlessonepp4aralin10angcomputerfilesystem 150622081942-lva1-app6892 -
 
Ictlessonepp4 aralin11pananaliksikgamitanginternet-150622045536-lva1-app6891 -
Ictlessonepp4 aralin11pananaliksikgamitanginternet-150622045536-lva1-app6891 -Ictlessonepp4 aralin11pananaliksikgamitanginternet-150622045536-lva1-app6891 -
Ictlessonepp4 aralin11pananaliksikgamitanginternet-150622045536-lva1-app6891 -
 
Ict lesson epp 4 aralin 9 pangangalap ng impormasyon gamit ang ict
Ict lesson epp 4 aralin 9 pangangalap ng impormasyon gamit ang ictIct lesson epp 4 aralin 9 pangangalap ng impormasyon gamit ang ict
Ict lesson epp 4 aralin 9 pangangalap ng impormasyon gamit ang ict
 
K TO 12 GRADE 4 UNANG MARKAHANG PAGSUSULIT
K TO 12 GRADE 4 UNANG MARKAHANG PAGSUSULITK TO 12 GRADE 4 UNANG MARKAHANG PAGSUSULIT
K TO 12 GRADE 4 UNANG MARKAHANG PAGSUSULIT
 
Ppt for tranmission media
Ppt for tranmission mediaPpt for tranmission media
Ppt for tranmission media
 

Similar to Unit 8 text processing tools

Handling Files Under Unix.pptx
Handling Files Under Unix.pptxHandling Files Under Unix.pptx
Handling Files Under Unix.pptxHarsha Patel
 
Handling Files Under Unix.pptx
Handling Files Under Unix.pptxHandling Files Under Unix.pptx
Handling Files Under Unix.pptxHarsha Patel
 
Cheatsheet: Hex file headers and regex
Cheatsheet: Hex file headers and regexCheatsheet: Hex file headers and regex
Cheatsheet: Hex file headers and regexKasper de Waard
 
Unix Trainning Doc.pptx
Unix Trainning Doc.pptxUnix Trainning Doc.pptx
Unix Trainning Doc.pptxKalpeshRaut7
 
intro unix/linux 06
intro unix/linux 06intro unix/linux 06
intro unix/linux 06duquoi
 
Linux Command Line - By Ranjan Raja
Linux Command Line - By Ranjan Raja Linux Command Line - By Ranjan Raja
Linux Command Line - By Ranjan Raja Ranjan Raja
 
101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressionsAcácio Oliveira
 
Shell Scripts
Shell ScriptsShell Scripts
Shell ScriptsDr.Ravi
 
1) List currently running jobsANS) see currently runningcommand.pdf
1) List currently running jobsANS) see currently runningcommand.pdf1) List currently running jobsANS) see currently runningcommand.pdf
1) List currently running jobsANS) see currently runningcommand.pdfamaresh6333
 
Python_Unit_III.pptx
Python_Unit_III.pptxPython_Unit_III.pptx
Python_Unit_III.pptxssuserc755f1
 

Similar to Unit 8 text processing tools (20)

Handling Files Under Unix.pptx
Handling Files Under Unix.pptxHandling Files Under Unix.pptx
Handling Files Under Unix.pptx
 
Handling Files Under Unix.pptx
Handling Files Under Unix.pptxHandling Files Under Unix.pptx
Handling Files Under Unix.pptx
 
Cheatsheet: Hex file headers and regex
Cheatsheet: Hex file headers and regexCheatsheet: Hex file headers and regex
Cheatsheet: Hex file headers and regex
 
Ch05
Ch05Ch05
Ch05
 
Unix Trainning Doc.pptx
Unix Trainning Doc.pptxUnix Trainning Doc.pptx
Unix Trainning Doc.pptx
 
Linux
LinuxLinux
Linux
 
Linux
LinuxLinux
Linux
 
Linux
LinuxLinux
Linux
 
Linux
LinuxLinux
Linux
 
Spsl II unit
Spsl   II unitSpsl   II unit
Spsl II unit
 
Unix lab manual
Unix lab manualUnix lab manual
Unix lab manual
 
Vim and Python
Vim and PythonVim and Python
Vim and Python
 
intro unix/linux 06
intro unix/linux 06intro unix/linux 06
intro unix/linux 06
 
Scripting and the shell in LINUX
Scripting and the shell in LINUXScripting and the shell in LINUX
Scripting and the shell in LINUX
 
Linux Command Line - By Ranjan Raja
Linux Command Line - By Ranjan Raja Linux Command Line - By Ranjan Raja
Linux Command Line - By Ranjan Raja
 
101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions
 
Shell Scripts
Shell ScriptsShell Scripts
Shell Scripts
 
1) List currently running jobsANS) see currently runningcommand.pdf
1) List currently running jobsANS) see currently runningcommand.pdf1) List currently running jobsANS) see currently runningcommand.pdf
1) List currently running jobsANS) see currently runningcommand.pdf
 
Unix
UnixUnix
Unix
 
Python_Unit_III.pptx
Python_Unit_III.pptxPython_Unit_III.pptx
Python_Unit_III.pptx
 

More from root_fibo

Unit 13 network client
Unit 13 network clientUnit 13 network client
Unit 13 network clientroot_fibo
 
Unit 12 finding and processing files
Unit 12 finding and processing filesUnit 12 finding and processing files
Unit 12 finding and processing filesroot_fibo
 
Unit 11 configuring the bash shell – shell script
Unit 11 configuring the bash shell – shell scriptUnit 11 configuring the bash shell – shell script
Unit 11 configuring the bash shell – shell scriptroot_fibo
 
Unit3 browsing the filesystem
Unit3 browsing the filesystemUnit3 browsing the filesystem
Unit3 browsing the filesystemroot_fibo
 
Unit 10 investigating and managing
Unit 10 investigating and managingUnit 10 investigating and managing
Unit 10 investigating and managingroot_fibo
 
Unit 9 basic system configuration tools
Unit 9 basic system configuration toolsUnit 9 basic system configuration tools
Unit 9 basic system configuration toolsroot_fibo
 
Unit 7 standard i o
Unit 7 standard i oUnit 7 standard i o
Unit 7 standard i oroot_fibo
 
Unit 6 bash shell
Unit 6 bash shellUnit 6 bash shell
Unit 6 bash shellroot_fibo
 
Unit 5 vim an advanced text editor
Unit 5 vim an advanced text editorUnit 5 vim an advanced text editor
Unit 5 vim an advanced text editorroot_fibo
 
Unit 4 user and group
Unit 4 user and groupUnit 4 user and group
Unit 4 user and grouproot_fibo
 

More from root_fibo (11)

Unit 13 network client
Unit 13 network clientUnit 13 network client
Unit 13 network client
 
Unit 12 finding and processing files
Unit 12 finding and processing filesUnit 12 finding and processing files
Unit 12 finding and processing files
 
Unit 11 configuring the bash shell – shell script
Unit 11 configuring the bash shell – shell scriptUnit 11 configuring the bash shell – shell script
Unit 11 configuring the bash shell – shell script
 
Unit3 browsing the filesystem
Unit3 browsing the filesystemUnit3 browsing the filesystem
Unit3 browsing the filesystem
 
Unit 10 investigating and managing
Unit 10 investigating and managingUnit 10 investigating and managing
Unit 10 investigating and managing
 
Unit 9 basic system configuration tools
Unit 9 basic system configuration toolsUnit 9 basic system configuration tools
Unit 9 basic system configuration tools
 
Unit 7 standard i o
Unit 7 standard i oUnit 7 standard i o
Unit 7 standard i o
 
Unit 6 bash shell
Unit 6 bash shellUnit 6 bash shell
Unit 6 bash shell
 
Unit 5 vim an advanced text editor
Unit 5 vim an advanced text editorUnit 5 vim an advanced text editor
Unit 5 vim an advanced text editor
 
Unit 4 user and group
Unit 4 user and groupUnit 4 user and group
Unit 4 user and group
 
Unit2 help
Unit2 helpUnit2 help
Unit2 help
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Unit 8 text processing tools

  • 1. RedHat Enterprise Linux Essential Unit 7: Text Processing Tools
  • 2. Objectives Upon completion of this unit, you should be able to:  Use tools for extracting, analyzing and manipulating text data
  • 3. Tools for Extracting Text  File Contents: less and cat  File Excerpts: head and tail  Extract by Column: cut  Extract by Keyword: grep
  • 4. Viewing File Contents less and cat  cat: dump one or more files to STDOUT  Multiple files are concatenated together  less: view file or STDIN one page at a time  Useful commands while viewing: • /text searches for text • n/N jumps to the next/previous match • v opens the file in a text editor  less is the pager used by man
  • 5. Viewing File Excerpts head and tail  head: Display the first 10 lines of a file  Use -n to change number of lines displayed  tail: Display the last 10 lines of a file  Use -n to change number of lines displayed  Use -f to "follow" subsequent additions to the file • Very useful for monitoring log files!
  • 6. Extracting Text by Keyword grep  Prints lines of files or STDIN where a pattern is matched $ grep 'john' /etc/passwd $ date --help | grep year  Use -i to search case-insensitively  Use -n to print line numbers of matches  Use -v to print lines not containing pattern  Use -AX to include the X lines after each match  Use -BX to include the X lines before each match
  • 7. Extracting Text by Column cut  Display specific columns of file or STDIN data $ cut -d: -f1 /etc/passwd $ grep root /etc/passwd | cut -d: -f7  Use -d to specify the column delimiter (default is TAB)  Use -f to specify the column to print  Use -c to cut by characters $ cut -c2-5 /usr/share/dict/words
  • 8. Tools for Analyzing Text  Text Stats: wc  Sorting Text: sort  Comparing Files: diff and patch  Spell Check: aspell
  • 9. Gathering Text Statistics wc (word count)  Counts words, lines, bytes and characters  Can act upon a file or STDIN $ wc story.txt 39 237 1901 story.txt  Use -l for only line count  Use -w for only word count  Use -c for only byte count  Use -m for character count (not displayed)
  • 10. Sorting Text sort  Sorts text to STDOUT - original file unchanged $ sort [options] file(s)  Common options  -r performs a reverse (descending) sort  -n performs a numeric sort  -f ignores (folds) case of characters in strings  -u (unique) removes duplicate lines in output  -t c uses c as a field separator  -k X sorts by c-delimited field X • Can be used multiple times
  • 11. Eliminating Duplicate Lines sort and uniq  sort -u: removes duplicate lines from input  uniq: removes duplicate adjacent lines from input  Use -c to count number of occurrences  Use with sort for best effect: $ sort userlist.txt | uniq -c
  • 12. Comparing Files diff  Compares two files for differences $ diff foo.conf-broken foo.conf-works 5c5 < use_widgets = no --- > use_widgets = yes  Denotes a difference (change) on line 5  Use gvimdiff for graphical diff  Provided by vim-X11 package
  • 13. Duplicating File Changes patch  diff output stored in a file is called a "patchfile"  Use -u for "unified" diff, best in patchfiles  patch duplicates changes in other files (use with care!)  • Use -b to automatically back up changed files $ diff -u foo.conf-broken foo.conf-works > foo.patch $ patch -b foo.conf-broken foo.patch
  • 14. Spell Checking with aspell  Interactively spell-check files: $ aspell check letter.txt  Non-interactively list mis-spelled words in STDIN $ aspell list < letter.txt $ aspell list < letter.txt | wc -l
  • 15. Tools for Manipulating Text tr and sed  Alter (translate) Characters: tr  Converts characters in one set to corresponding characters in another set  Only reads data from STDIN $ tr 'a-z' 'A-Z' < lowercase.txt  Alter Strings: sed  stream editor  Performs search/replace operations on a stream of text  Normally does not alter source file  Use -i.bak to back-up and alter source file
  • 16. sed Examples  Quote search and replace instructions!  sed addresses  sed 's/dog/cat/g' pets  sed '1,50s/dog/cat/g' pets  sed '/digby/,/duncan/s/dog/cat/g' pets  Multiple sed instructions  sed -e 's/dog/cat/' -e 's/hi/lo/' pets  sed -f myedits pets
  • 17. Introduction awk  Field/Column processor  Supports egrep-compatible (POSIX) RegExes  Can return full lines like grep  Awk runs 3 steps:  BEGIN - optional  Body, where the main action(s) take place  END - optional  Multiple body actions can be executed by separating them using semicolons. e.g. '{ print $1; print $2 }'  awk, auto-loops through input stream, regardless of the source of the stream. e.g. STDIN, Pipe, File  Usage: awk '/optional_match/ { action }' file_name | Pipe
  • 18. Example awk  Print a text file awk '{print }' /etc/passwd awk '{print $0}' /etc/passwd  Print specific field awk -F':' '{print $1}' /etc/passwd  Pattern matching awk '$9 == 500 { print $0}' /var/log/httpd/access.log  Print lines containing vmintam,student and khanh awk '/vmintam|student|khanh/' /etc/passwd
  • 19. Example awk (con’t)  print 1st lines from file awk "NR==1{print;exit}" /etc/resolv.conf  Simply Arithmetic awk '{total += $1} END {print total}' earnings.txt  Shell cannot calculate with floating point numberes, but awk can: awk 'BEGIN {printf "%.3fn", 2005.50 / 3}‘  history | awk '{print $2}' | sort | uniq -c | sort -rn | head
  • 20. Special Characters for Complex Searches Regular Expressions  ^ represents beginning of line  $ represents end of line  Character classes as in bash:  [abc], [^abc]  [[:upper:]], [^[:upper:]]  Used by:  grep, sed, less, others