Cognitionis
Hector Llorens Portfolio

SEd Basic Kit


The Stream Editor…

Good basics

Sed whole words, equivalent to “grep -w”. Imagin that I want to change “one” into “1″. Given the sentence “One day in Oneland one kid was alone”. Then the result would be “1 day in 1land 1 kid was al1″. But may be the expected was “One day in Oneland 1 kid was alone”. To do so use < >. Example sed “s/\<one\>/1/ig”

POSIX Character Class Definitions

POSIX 1003.2 section 2.8.3.2 (6) defines a set of character classesthat denote certain common ranges. They tend to look very ugly but have the advantage that also take into account the ‘locale’, that is, any variant of the local language/coding system. Many utilities/languages provide short-hand ways of invoking these classes. Strictly the names used and hence their contents reference the LC_CTYPE POSIX definition (1003.2 section 2.5.2.1).

Value

Meaning

[:digit:] Only the digits 0 to 9
[:alnum:] Any alphanumeric character 0 to 9 OR A to Z or a to z.
[:alpha:] Any alpha character A to Z or a to z.
[:blank:] Space and TAB characters only.
[:xdigit:] Hexadecimal notation 0-9, A-F, a-f.
[:punct:] Punctuation symbols . , ” ‘ ? ! ; : # $ % & ( ) * + – / < > = @ [ ] \ ^ _ { } | ~
[:print:] Any printable character.
[:space:] Any whitespace characters (space, tab, NL, FF, VT, CR). Many system abbreviate as \s.
[:graph:] Exclude whitespace (SPACE, TAB). Many system abbreviate as \W.
[:upper:] Any alpha character A to Z.
[:lower:] Any alpha character a to z.
[:cntrl:] Control Characters NL CR LF TAB VT FF NUL SOH STX EXT EOT ENQ ACK SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC IS1 IS2 IS3 IS4 DEL.

These are always used inside square brackets in the form [[:alnum:]] or combined as [[:digit:]a-d]

Advanced

:label places a label

b label branches to label

t label if s/// is successful branches to label

Clarifying example:

If you are trying to remove XML tags from a text file and you do it like “Not working example”, if there are tags like:

<TimeML
id="1">

Those tags won’t be removed from text because sed works only line by line by default.

#!/bin/bash
a='<b>a <a>a</a> pepe <a\na="a"\n>a</a></b>';
echo "Not working example:"
echo "$a" | sed "s/<[^>]*>//g"
echo "\nWorking example:"
echo "$a" | sed ":top /<.*>/{s/<[^<>]*>//g;t top}; /</{N;b top}"

If you want to see a professional sed XML tag remover see: XML.untag.sh

Remove all words in a file till a concrete word for example <TEXT>:

sed “:mes /.*/{s/.*<text>//ig;t mes};/.*/{N;b mes}”

Optimize for speed

   sed "s/foo/bar/g" filename         # standard replace command
   sed "/foo/s/foo/bar/g" filename    # executes more quickly

On line selection or deletion in which you only need to output lines
from the first part of the file, a "quit" command (q) in the script
will drastically reduce processing time for large files. Thus:

   sed -n "45,50p" filename           # print line nos. 45-50 of a file
   sed -n "51q;45,50p" filename       # same, but executes much faster

options

       d      Delete pattern space.  Start next cycle.

       D      Delete  up to the first embedded newline in the pattern space.  Start next cycle, but skip reading from the input if there is still data in
              the pattern space.

       h H    Copy/append pattern space to hold space.

       g G    Copy/append hold space to pattern space.

       x      Exchange the contents of the hold and pattern spaces.

       l      List out the current line in a ‘‘visually unambiguous’’ form.

       n N    Read/append the next line of input into the pattern space.

       p      Print the current pattern space.

       P      Print up to the first embedded newline of the current pattern space.

Sed multiline example

#!/bin/sh
if [ "$#" -lt 2 ]
then
exit;
fi

# change the input file if no 3rd argument
if [ -z "$3" ]
then
outputfile=$1″
else
outputfile=$3″
fi
sed -n
# if the first line copy the pattern to the hold buffer
1h
# if not the first line then append the pattern to the hold buffer
1!H
# if the last line then …
$ {
# copy from the hold to the pattern buffer
g
# do the search and replace
$2″
# print
p
}
$1 > $1.tmp;
mv -f $1.tmp $outputfile;

here a nice set of sed emulating unix commands

# IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
 sed 's/.$//'               # assumes that all lines end with CR/LF
 sed 's/^M$//'              # in bash/tcsh, press Ctrl-V then Ctrl-M
 UNIX         |  SED
 -------------+----------------------------------------------------------------
 cat          |  sed ':'
 cat -s       |  sed '1s/^$//p;/./,/^$/!d'
 tac          |  sed '1!G;h;$!d'
 grep         |  sed '/patt/!d'
 grep -v      |  sed '/patt/d'
 head         |  sed '10q'
 head -1      |  sed 'q'
 tail         |  sed -e ':a' -e '$q;N;11,$D;ba'
 tail -1      |  sed '$!d'
 tail -f      |  sed -u '/./!d'
 cut -c 10    |  sed 's/\(.\)\{10\}.*/\1/'
 cut -d: -f4  |  sed 's/\(\([^:]*\):\)\{4\}.*/\2/'
 tr A-Z a-z   |  sed 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/'
 tr a-z A-Z   |  sed 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/'
 tr -s ' '    |  sed 's/ \+/ /g'
 tr -d '\012' |  sed 'H;$!d;g;s/\n//g'
 wc -l        |  sed -n '$='
 uniq         |  sed 'N;/^\(.*\)\n\1$/!P;D'
 rev          |  sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
 basename     |  sed 's,.*/,,'
 dirname      |  sed 's,[^/]*$,,'
 xargs        |  sed -e ':a' -e '$!N;s/\n/ /;ta'
 paste -sd:   |  sed -e ':a' -e '$!N;s/\n/:/;ta'
 cat -n       |  sed '=' | sed '$!N;s/\n/ /'
 grep -n      |  sed -n '/patt/{=;p;}' | sed '$!N;s/\n/:/'
 cp orig new  |  sed 'w new' orig
 hostname -s  |  hostname | sed 's/\..*//'