8000
Skip to content

Pattern Matching Utilities, Shell, and Shell Scripting #282

@qingquan-li

Description

@qingquan-li

1. Pattern Matching Utilities

A subset of text processing utilities/tools specifically designed to search for patterns in text using regular expressions or other pattern-based logic.

Examples: sed (uses patterns for substitution or deletion), awk (uses patterns for selecting and processing data), grep (explicit pattern searching).

sed (stream editor)

Name origin: Stream Editor

man sed: sed - stream editor for filtering and transforming text

Primary Purpose: Processes text line-by-line and applies transformations such as substitution, deletion, or insertion. Most oftern used for replacing text in files.

Syntax

sed [options] 'command' filename

Use options like -n to suppress default output or -i for in-place editing.
Common commands include: s (substitution), p (print), d (delete), a (append); i (insert).

Examples

sed 's/old/new/g' file.txt      # Substitute all occurrences of "old" with "new" (does not modify the original file)
sed -i '/s/old//i' file.txt     # Substitute "old" with "" (delete "old") in place (modifies the file).
sed '3,5 /s/old/new/' file.txt  # Substitute "old" with "new" in lines 3 to 5

echo "Hello World" | sed 's/World/Universe/'  # print: Hello Universe

sed -n '5,30p' file.txt    # print lines 5 to 30
sed -n '/foo/p' file.txt   # print lines containing foo
sed -n '/foo/Ip' file.txt  # print lines containing foo, case insensitive
sed -n '/foo/!p' file.txt  # print lines not containing foo

sed -i '2d' file.txt   # Delete line 2 and overwrite the file.txt file
sed '/foo/d' file.txt  # delete lines matching foo

sed G file.txt  # append a newline after each line
sed '/foo/a extra-line' file.txt  # append a line with "extra-line" after lines containing foo

sed '5 i apple' file.txt  # insert apple before line 5

sed -e 's/foo/bar/' -e '/delete/d' file.txt  # Apply multiple commands using -e

sed -f script.sed file.txt  # run sed commands from a script file. e.g., script.sed: /s/old/new/

awk (pattern scanning and processing language)

Name origin: Aho, Weinberger, and Kernighan

man awk: gawk - pattern scanning and processing language

Primary Purpose: awk is a language for (structured) text processing. A more powerful sed; follows sed style, but uses C syntax to specify commands. Like a mini relational database management system.

Syntax

awk 'pattern {action}' filename

pattern: A condition or expression that determines which lines of the input file are processed. If omitted, the action is applied to all lines.

{action}: A block of code to execute for each line that matches the pattern. Actions can include printing, calculations, string manipulations, etc. If omitted, the default action is to print the line.

Examples

$ awk {print} datafile 
Aaron   45  55  60  90
Bob     70  75  88  100
Chuck   75  80  85  100
Donald  80  70  70  95
$
$ awk '/Bob/ {print}' datafile # /Bob/: A pattern that matches lines containing "Bob".
Bob     70  75  88  100
$
$ # ~: Checks if a string or field matches a pattern (regular expression)
$ # The !~ operator is used to check for non-matching patterns
$ awk '$1 ~ /Bo/ {print}' datafile # Print lines where the first field contains "Bo"
Bob     70  75  88  100
$ cat prog.awk 
BEGIN { print "Starting to read" }
{ print }
END {print "Finished reading" }
$
$ awk -f prog.awk datafile
Starting to read
Aaron   45  55  60  90
Bob     70  75  88  100
Chuck   75  80  85  100
Donald  80  70  70  95
Finished reading
$ awk '{ print $1, $2 }' datafile 
Aaron 45
Bob 70
Chuck 75
Donald 80
$
$ awk '$5 > 99 { print $0 }' datafile # Print lines where the 5th field > 99. $0 is the default argument, optional, and means printing the entire line
Bob     70  75  88  100
Chuck   75  80  85  100
$
$ awk 'BEGIN { print "Names" } { print $1 } END { } ' datafile
Names
Aaron
Bob
Chuck
Donald
$ cat prog2.awk
# AWK can serve C-like expressions
BEGIN { print "Average"; total = 0; count = 0; }
      { total = total + $2; ++count; }
END   { avg = total / count
        print total, avg }
$
$ awk -f prog2.awk datafile 
Average
270 67.5
$ cat prom3.awk             
BEGIN { 
    print "Total for Bob"
}
# /Bob/ is a pattern that matches any line containing the string "Bob"
/Bob/ {
    sum = $2 + $3 + $4 
    print sum
}
$
$ awk -f prom3.awk datafile
Total for Bob
233
$ # -F: Specifies the field separator (whitespace by default)
$ # -F',' sets the delimiter to a comma (useful for CSV files).
$ cat data.csv                    
ID,Name,Score
1,Bob,85
2,Alice,90
3,John,78
4,Mary,92
$ awk -F',' '{print $2}' data.csv
Name
Bob
Alice
John
Mary

grep (print lines that match patterns)

Name origin: Global Regular Expression Print

man grep: grep, egrep, fgrep, rgrep - print lines that match patterns

Primary Purpose: Scans files or streams for lines matching specific patterns or regular expressions.

Syntax

grep [options] pattern [file...]

[options]: Flags to modify the behavior of grep.

pattern: A string or regular expression to search for.

[file...]: The file(s) to search in. If no file is provided, grep reads from standard input.

Examples

$ cat filename
hello world
Hello again
Say hi to the world
helloworld
$
$ grep 'hello' filename  # Print line
862D
s containing "hello"
hello world
helloworld
$
$ grep -i "hello" filename  # Print lines containing "hello", Case-insensitive
hello world
Hello again
helloworld
$
$ grep -n 'hello' filename  # Print the line numbers where "hello" occurs
1:hello world
4:helloworld
$
$ grep -v 'hello' filename  # Print lines that do not contain "hello"
Hello again
Say hi to the world
$
$ grep -w 'hello' filename  # Match only the whole word "hello" (not substrings like "helloworld")
hello world
$
$ # grep -r 'hello' /path/to/directory  # Search for "hello" in all files under a directory
$ grep -r 'hello' ./                
.//script.sed:s/hi/hello/
.//filename:hello world
.//filename:helloworld
$
$ grep -c 'hello' filename  # Count how many lines contain "hello"
2
$
$ grep -E 'hello|world' filename  # Match lines with a pattern using extended regular expressions
hello world
Say hi to the world
helloworld
$
$ grep 'hello' *.txt  # Search for a Pattern in Multiple Files. Search for "hello" in all .txt files

2. Shell

What It Is:

  • A command-line interpreter that provides an interface to interact with the operating system.
  • Processes user commands and executes them (e.g., running programs, managing files).

Key Features:

  • Provides built-in commands for file management (ls, cd, rm), process management (ps, kill), and more.
  • Can run external programs like sed, awk, and grep.

Common Shells:

  • Bash (Bourne Again Shell): The most common shell for Linux systems.
  • Zsh: An extended shell with additional features.
  • Fish: A user-friendly command line shell.

Bash (Bourne Again Shell)

Bash is a specific implementation of a shell, widely used on Linux and macOS. Bash is an extension of the original Bourne Shell (sh) with additional features.

Key Features:

  • Supports advanced scripting features like arrays, associative arrays, and functions.
  • Includes built-in commands and utilities.
  • Offers command-line editing, job control, and history management.

Example Use Case: Using Bash as an interactive shell:

$ echo "Hello, World!"
Hello, World!

3. Shell Scripting

What It Is:

  • Writing scripts (automated sequences of commands) to be executed by a shell.
  • Combines shell commands, utilities (like sed, awk, and grep), and control flow structures (like loops and conditionals).

Key Features:

  • Enables automation of repetitive tasks.
  • Supports variables, loops, conditionals, and functions.

Example: Find all .log files, replace the word "ERROR" with "WARNING", and save the output to new files (filename_processed.log).

Step1: Create a script.sh file and give it execute permission

$ touch script.sh
$ chmod +x ./script.sh

Step2: Edit the script.sh file

#!/bin/bash
for file in *.log; do
  sed 's/ERROR/WARNING/g' "$file" > "${file%.log}_processed.log"
done

Step3: Execute the script.sh file

$ ./script.sh
  • sed processes the text.
  • Shell provides the environment to run commands.
  • Shell Scripting with Bash combines commands into an automated workflow.

POSIX

POSIX (Portable Operating System Interface) it's a set of standard operating system interfaces based on the Unix operating system.

As a standard, POSIX helps maintain compatibility between operating systems. POSIX defines both the system and user-level application programming interfaces (APIs), along with command line shells, and utility interfaces, for software compatibility (portability) with variants of Unix and other operating systems.

A POSIX operating system is any operating system that adheres to the POSIX standards, which is a set of guidelines defining application programming interfaces (APIs) and system calls, essentially ensuring compatibility and portability between different Unix-like operating systems, allowing applications to run on various platforms with minimal modifications.

POSIX-compliant operating systems: Linux, macOS, FreeBSD, OpenBSD, and Oracle Solaris.

Windows is not a POSIX operating system, but there are several ways to use POSIX on Windows, for example:

  • Windows Subsystem for Linux (WSL): A compatibility layer that allows developers to run Linux binary executables on Windows 10 and 11. WSL allows developers to access a Linux environment that can run POSIX software on Windows files.
  • PowerShell: A synthesis of Windows and Unix culture that is based on the IEEE POSIX 1003.2 standard for Unix shells.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0