Cognitionis
The little I know

Perl Basic Kit


Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall, a linguist working as a systems administrator for NASA, in 1987, as a general purpose Unix scripting language for report processing. Perl borrows features from other programming languages including C, shell scripting (sh), AWK, sed and Lisp. The language provides powerful text processing facilities without the arbitrary data length limits of many contemporary Unix tools, facilitating easy manipulation of text files.

Version history

5.8.8, 5.10 Not newest but most used version and quite updated (use strict…)

6 Will supose a new era of Perl making it less cryptic.

Installation:

1. Download lastes Perl release.
2. sh Configure -de -Dprefix=/home/user/perl-version
3. make
4. make test (optional)
5. make install

Install desired modules: cpan -i Module::Name (ie cpan -i XML::Parser, and follow all instructions) (See the specific section of this page)

Example:

#!/usr/bin/perl (better to use -w switch)
print "Hello, world!\n";

Use #!/usr/bin/perl -w to show and handle warnings (better error localization, recomended)

Use use strict to make perl more restrictive in defining local and global variables.
If you planed to use utf8 chars be careful, even if your ecoding is utf8 you may consider: use utf8 and use utf8;
use encoding 'utf8'; commands.

---> VER PDF BUENISIMO DE USAR UNICODE (UTF-8 EN PERL). En Dropbox/Programing/Perl
IMPORTANTISSIM: http://perldoc.perl.org/perlunicode.html

Lowercase utf8 chars ÀÑ -> àñ with lc() if the encoding is well defined.

Advantages: Good performance, powerful regexps and text I/O handling, short code.
Disadvantages: Difficult to learn, cryptic to read.

Regular expressions

regular expressions are mostly like sed.

// match
s//// replace

But the best way to make it is with curly braces {}
m{} match
s{}{}

POSIX

[[:alpha:]] \w
[[:digit:]] \d
[[:alnum:]]
[[:punct:]]
[[:blank:]] \t
[[:space:]] \s

options:

    m  Multiline mode - ^ and $ match internal lines
    s  match as a Single line - . matches \n
    i  case-Insensitive
    x  eXtended legibility - free whitespace and comments
    p  Preserve a copy of the matched string -
       ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
    o  compile pattern Once
    g  Global - all occurrences
    c  don't reset pos on failed matches when using /g

i ignore case
g global (more than one occurrence)
o complile only once regular expressions with variables (improves performance)
s ingludes \n as a character for . operator.
m multiline matches
x allow to make a long regular exression ignoring blancs and newlines. Example:

    s{
       <a \s+ href="([^"]+)">
        (.*?)
       </a>
     }
     {link=$1, text=$2}xms;

Backreference on match using    \groupnumber
Backreference on replace using  $groupnumber
Example:
s/(\d\d?([\.\/-])\d\d?\2\d\d(\d\d)?)/<TIMEX3 type=\"DATE\">$1<\/TIMEX3>/g

Always make use of non-capturing groups when capturing is not needed: (?:...)

REview:

http://perldoc.perl.org/perlreref.html

http://www.regular-expressions.info/

Perl Basics

Useful perl cheat sheet. (perldoc perlcheat)
my: local variables. If you already included use  strict at the top of your program,
perl will check that all variables are  introduced with my. Variables made private
with my only exist within a block (curly  braces). The subroutine body is a block,
so the my variables only exist  within the body of the subroutine.

sub: subroutine (function)

A variable is defined by the ($) symbol (scalar), the (@) symbol (arrays),
or the (%) symbol (hashes). Note that in perl arrays and lists are the same thing, and hashes are associative arrays.
$somenumber = 4;
$myname = "some string";
@array = ("value00","value01","value02");

%hash = ("Quarter", 25, "Dime", 10, "Nickle", 5);

To use global variables in "use strict" you have to redeclare them in each function by:

 use vars qw($scalar %hash @array)

while(<>) # read input
The <> symbol will return false only once. If you call it

again after this, it will assume you are processing another @ARGV

list, and if you haven't set @ARGV, it will input from STDIN.

Execute external Commands (system)

There are many ways to execute external commands from Perl. The most commons are:

  • system function
  • exec function
  • backticks (“) operator. $output=`ls -la` (like bash scripts)
  • open function (most powerful, you can capture even STDERR)

All of these methods have different behaviour, so you should choose which one to use depending of your particular need. In brief, these are the recommendations:

method use if …
system() you want to execute a command and don’t want to capture its output
exec you don’t want to return to the calling perl script
backticks you want to capture the output of the command
open you want to pipe the command (as input or output) to your script

XML and PERL (GOOD COMBINATION)

http://perl-xml.sourceforge.net/faq/

www.xml.com (perl)

SPECIAL VARIABLES

$_    default variable
$0    program name
$/    input separator
$\    output separator
$|    autoflush
$!    sys/libcall error
$@    eval error
$$    process ID
$.    line number
@ARGV command line args
@INC  include paths
@_    subroutine args
%ENV  environment
Global Special Variables
There are quite a few variables that are global in the fullest sense — they mean the same thing in every package. If you want a private copy of one of them, you must localize it in the current block.
Variable Contents Mnemonic
$_ The default input and pattern-searching space. The following pairs are equivalent:

while (<>) {…     # equivalent only in while!
while ($_ =<>) {…

/^Subject:/
$_ =~ /^Subject:/

y/a-z/A-Z/
$_ =~ y/a-z/A-Z/

chop
chop($_)

underline is understood to be underlying certain undertakings
$. The current input line number of the last filehandle that was read. Rember that only an explicit close on the filehandle resets the line number. many programs use . to mean the current line number
$/ The input record separator, newline by default. $/ may be set to a value longer than one character in order to match a multi-character delimiter. If $/ is undefined, no record separator is matched, and <FILEHANDLE> will read everything to the end of the current file. / is used to delimit line boundries when quoting poetry. Or, if you prefer, think of mad slashers cutting things to ribbons.
$\ The output record separator for the print operator. You set $\ instead of adding \n at the end of the print.
$, The output field separator for the print operator. What is printed when there is a , in your print statement
$” This is similar to $, except that it applies to array values interpolated into a double-quoted string (or similar interpreted string). Default is space. Obvious, I think
$# The output format for numbers display via the print operator # is the number sign
$$ The process number of the Perl running this script Same as shells
$? The status returned by the last pipe close, backtick(“) command or system operator. Note that this is the status word returned by the wait() system call, so the exit value of the subprocess is actually ($? >>*). $? & 255 gives which signal, if any, the process died from, and whether there was a core dump. Similar to sh and ksh
$* Set to 1 to do multi-line matching within a string, 0 to tell Perl that it can assume that strings contain a single line, for the purpose of optimizing pattern matches. Default is 0 * matches multiple things
$0 Contains the name of the file containing the Perl script being executed. Depending on your OS, it may or may not include the full pathname. Same as sh and ksh
$[ The index of the first element in an array, and of the first character in a substring. [ begins subscripts
$] The first part of the string printed out when you say perl -v. It can be used to determine at the beginning of a script whether the Perl interpreter executing the script is in the right range of versions. If used in a numeric context, $] returns version + patchlevel /1000. Is this version of Perl in the “rightbracket”?
$; The subscript separator for multi-dimensional array emulation. If you refer to an associative array element as:$foo{$a,$b,$c}
it really means:
$foo{join($;, $a, $b, $c)}
but don’t put
@foo{$a,$b,$c}
which means
($foo{$a},$foo{$b},$foo{$c})
Comma (the syntactic subscript separator) is a semi-semicolon. Yeah, it’s pretty lame, but $, is already taken for something more important.
$! If used in a numeric context, yields the current value of errno, with all the usual caveats. (This means that you shouldn’t depend on the value of $! to be anything in particular unless you’ve gotten a specific error return indicating a system error.) If used in a string context, yields the corresponding sysem error string. What just went bang?
$@ The Perl syntax error or routine error message from the last eval, do-FILE, or require command. If set, either the compilation failed, or the die function was executed within the code of the eval. Where was the syntax error at?
Local Special Variables
These variables that are always local to the current block, so you never need to mention them in a local(). All of them are associated with the last successful pattern match.
Variable Contents Mnemonic
$1..$9 Contains the subpattern from the corresponding set of parentheses in the last pattern matched like \1..\9
$& Contains the string matched by the last pattern match like & in some editors
$` The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already. ` often precedes a quoted string in normal text
$’ The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:$_ = ‘abcdefghi’;
/def/;
print “$`:$&:$’\n”;    # prints abc:def:ghi
‘ often follows a quoted string in normal text
$+ the last bracket matched by the last search pattern. This is useful if you don’t know which of a set of alternative patterns matched. For example:/Version: (.*)|Revision: (.*)/ && ($rev = $+); be positive and forward looking

Perl Data Structures (@arrays #hashes)

See Perl Data Structures CookBook (pdf)

@array=(a, b, c);

$#array –> 3

$array[1] –> b

shift

pop/push

splice

references and values… arrays of arrays

print “\t[ @{$array[$n]} ]\n”;

note: the outer [] in the print are important.

Control Sequences

while

for

last –> break

next –> continue

redo –> ??? volver a empezar despues de un splice…

Install MODULES from CPAN

1. Install all dependent packages for CPAN
sudo  apt-get install build-essential

2. invoke the cpan command as a normal user

$cpan

But once you hit on enter for “cpan” to execute, you
be asked of some few questions. To make it simple for
yourself, answer “no” for the first question so that
the latter ones will be done for you automatically.

3. Once the above is done, you will be present with the cpan
prompt. now enter the commands below

make install

install Bundle::CPAN

4. Now all is set and you can install any perl module you want. examples of what installed below

cpan prompt>  install  IO::File
cpan prompt>  install  Net::SMTP_auth
cpan prompt>  Email::MIME::Attachment::Stripper
cpan prompt>  Mail::POP3Client