PERL and BioPERL are the programming languages that are very important for Biological researchers who are working in computational biology. The programs provide the basic knowledge in both theory and practical applications. The introduction provides the basics on these programming languages and the execution in windows.
Contents
Topic
Chapter 1: PERL
Chapter 2: PERL installation and execution (windows)
Program1: Concatenation
Program 2: Reverse complement
Program 3: Game demonstrating artificial intelligence
Program 4: DNA Execution using package
Program 5: Array
Program 6: Reference variable
Program 7: String at specific location
Program 8: Arrow operator
Program 9: Hash
Program 10: Motif search using subroutines
Program 11: Subroutines using array
Program 12: 2D Matrix
Program 13: Matrix using for loop
Program 14: Transcription
Program 15: ORF Search
Program 16: Regular expression
Program 17: Pattern Search
Program 18: Mutation
Program 19: Mutation Percent
Program 20: Translation program using PERL
Program 21: Translation using Beginperlbioinfo.pm module
Program 22: Counting number of bases
Program 23: Calculating GC%
Program 24: Writing Geneticcode.pm
Program 25: Translation using Geneticcode.pm
Program 26: Calculating the length, total nucleotides and GC and AT counts
Chapter 3: CPAN
Chapter 4: BioPERL
Program 27: Creating a BioPERL module and program for translation
Program 28: Translation using BioPERL
Data 1: Blast annotated file
Program 29: Parsing BLAST
Program 30: Run query sequence from remote BLAST
Program 31: Conversion of sequence to FASTA format
Program 32: Minimum sequences in string
Program 33: Shorten IDs in FASTA format from databases
Program 34: Motif search
Program 35: Simple alignment of sequences
Program 36: Restriction enzymes
CHAPTER 1 PERL
1. PERL stands for Practical Extraction and Report Language
2. PERL programming is developed by Larry Wall in the late 1980s.PERL builds on sh, awk, sed and c
3. PERL is a high level programming language
4. Perl is a language used for finding random text files, extracting and analyzing the information from the text files and prints reports based on data.
5. Perl interpreters are available for the Macintosh, Windows, UNIX and LINUX operating systems. The Perl interpreter has a list of directories in which it searches for modules.
6. Perl has grown into a general-purpose programming language.
7. Perl programming supports procedural, structural, functional and Object Oriented Programming (OOPs) paradigms
8. The Perl homepage is http://www.perl.org
9. Perl by default contain the following lines:
#!/usr/bin/perl
use strict;
use warnings;
The first line is a comment line which identifies the perl file as a Perl script (the beginning of the first line).
10. The file extension in PERL is .pl (like <File name>.pl).
11. A script consists of a series of command/s that is called as statements, is significant to the Perl interpreter.
12. The special character sequence \n prints out as a newline character.
13. Variables provide temporary storage for numbers, strings, and other values preceded by a dollar sign.
14. Data types: Variables that hold a single value is technically called scalar variables. A rrays and hashes are two types of variables that are capable of holding multiple values.
15. Arithmetic Operators
+ addition
- subtraction
* multiplication
/ division
% modulus
** exponentiation
() grouping
+ Positive sign
- Negative sign
++ Autoincrement operator
-- Autodecrement operator
16. Logical Operators
|| or Logical OR
&& and Logical AND
! not Logical NOT, i.e. a negation
xor Logical XOR. Exclusive OR
17. Bitwise Operators
<< Binary shift left
>> Binary shift right
& Bitwise AND
| Bitwise OR
ˆ Bitwise XOR
! Bitwise NOT
18. Equality Operators
== equal (numeric comparison)
!= not equal (numeric comparison)
eq equal (stringwise comparison)
ne not equal (stringwise comparison)
19. String Manipulation Operators
x String repetition operator
. String concatenation operator
20. Expression Description
++$var Pre_x Increment
$var++ Post_x Decrement
--$var Pre_x Decrement
$var-- Post_x Decrement
21. Assignment Operators
= Assignment operator
+= -= *= /= %= **= Arithmetic manipulation with assignment
.= x= String manipulation with assignment
&&= ||= Logical manipulation with assignment
&= |= ˆ= <<= >>= Bitwise manipulation with assignment
22. Operator Precedence and Associativity Operators
left Terms and list operators (leftward)
left ->
nonassoc ++ --
right **
right ! ~ \ + - (unary)
left = ~ !~
left * / % x
left + - .
left << >>
nonassoc named unary operators
nonassoc < > <= >= lt gt le ge
nonassoc == != <=> eq ne cmp
left &
left | ˆ
left &&
left ||
nonassoc .. ...
right ?:
right = += -= *= etc. (assignment operators)
left , =>
nonassoc List operators (rightward)
right not
left and
left or xor
23. In double-quoted strings, the variable is expanded based on the contents, a process called string interpolation.
24. Variable interpolation will only extend to the contents of the variable only.
25. To read input data, use the angle bracket operator (<>).
26. Chomp or Chop will remove the terminal newline present from a string.
27. Open a file for reading using the open function. The first is a name for the filehandle is MYFILE eg open MYFILE, ‘C:\<filename>.txt’;
28. Filehandle can be closed using the close function.
29. Conditional/ Numeric/String Operators
[illustration not visible in this excerpt]
30. Loops
while(CONDITION) {BLOCK}
until(CONDITION) {BLOCK}
for(INITIALIZATION ; CONDITION ; RE-INITIALIZATION ) {BLOCK}
foreach VAR (LIST) {BLOCK}
for VAR (LIST) {BLOCK}
do {BLOCK} while (CONDITION)
do {BLOCK} until (CONDITION)
31. Conditional statements help in decision making that specifies one or more conditions to be tested by the program.
[illustration not visible in this excerpt]
[illustration not visible in this excerpt]
32. A pattern match is a special type of text comparison and the pattern description language is known as a regular expression.
33. Regular Expression Metacharacters
[illustration not visible in this excerpt]
34. Regular Expression Quantifiers
[illustration not visible in this excerpt]
35. Lists are a set of constants or variables enclosed in parentheses.
36. The split and join functions allow to transform strings into arrays and to join the elements of arrays together into strings.
37. Hashes are similar to arrays that hold multiple values. The elements of a hash are unordered.
38. The indexes of a hash are called its keys. Calling the keys function produces a list of all the keys in the hash.
39. Subroutines define customized functions that take arguments and return a result.
40. References allow data structures like lists of lists to be created in memory.
41. Modules are the libraries with useful code.
42. Pipes and processes allow for the control of external programs.
CHAPTER 2 PERL INSTALLATION AND EXECUTION (WINDOWS)
1. Download PERL software
2. Install/ copy folder to the specific drive (eg. C:/)
3. Open command prompt (cmd) from start menu
[illustration not visible in this excerpt]
4. Set the path
[illustration not visible in this excerpt]
5. Type edit command and press enter button
[illustration not visible in this excerpt]
6. Type program
[illustration not visible in this excerpt]
Or open from notepad and type the program/ Code
[illustration not visible in this excerpt]
7. Go to file and should save program in C:\perl\bin folder as <filename>.pl
[illustration not visible in this excerpt]
8. Select file and click exit
[illustration not visible in this excerpt]
9. Type perl hello.pl in command prompt for execution
[illustration not visible in this excerpt]
10. Output will be executed
[illustration not visible in this excerpt]
Program 1: Concatenation
Variables are denoted with symbol $. Concatenation is the process of joining two or more strings (in variables) into a common string. In the present program, the two strings (or set of characters) have been combined into single string through various processes. Joining of strings can be also done using dot (.) operator.
Program
$DNA1 = 'ATGTACACTAC';
$DNA2 = 'GATCATGT';
$DNA3 = "$DNA1$DNA2";
print "first string: $DNA1 \n";
print "second string: $DNA2 \n";
print "concatenation of two strings (version 1):\n\n";
print "$DNA3\n\n";
$DNA3 = $DNA1 . $DNA2;
print "concatenation of two strings (version 2):\n\n";
print "$DNA3\n\n";
print "concatenation of two strings (version 3):\n\n";
print $DNA1, $DNA2, "\n";
exit;
Execution
illustration not visible in this excerpt
Program 2: Reverse complement
Reverse complement is the process of reversing the string and substituting the complementary characters such as A with T or T with A and C with G or G with C. In general DNA contains four characters such as A, T, G and C. It is a double helical structure with 2 strands, one from 5’ to 3’ direction and other a complementary strand 3’ to 5’ direction. Hence analysis of these 2 strands is a major function in the DNA through reverse complementation.
Program
$DNA = 'TCATCGTC';
print "DNA:\n\n";
print "$DNA\n\n";
$revcom = reverse $DNA;
print "WRONG METHOD reverse complement DNA:\n\n";
$revcom =~ s/A/T/g;
$revcom =~ s/T/A/g;
$revcom =~ s/G/C/g;
$revcom =~ s/C/G/g;
print "$revcom\n\n";
print "RIGHT METHOD reverse complement DNA:\n\n";$revcom = reverse $DNA;
$revcom =~ tr/ACGTacgt/TGCAtgca/;
print "$revcom\n\n";
print "RIGHT METHOD reverse complement DNA:\n\n";
print "$revcom\n";
exit;
Execution
illustration not visible in this excerpt
Program 3: Game demonstrating artificial intelligence
The game program is the demonstration based on Artificial Intelligence (AI). There should be 3 players. Select the names. Click any number and add the numbers in the results. Maximum numbered player wins the game.
Program
use strict;
use warnings;
my $count;
my $input;
my $number;
my $sentence;
my $story;
my @nouns = ('name1', 'name2', 'name3');
my @verbs = ('1', '2', '3', '4', '5');
my @prepositions = ('6', '7', '8', '9', '10');
srand(time|$$);
do {
$story = '';
for ($count = 0; $count < 6; $count++) {
$sentence = $nouns[int(rand(scalar @nouns))]. " " . $verbs[int(rand(scalar @verbs))] . " " . $nouns[int(rand(scalar @nouns))] . " " . $prepositions[int(rand(scalar @prepositions))] . '. ';
$story .= $sentence;
}
print "\n",$story,"\n add numbers to win";
print "\nType \"quit\" to quit, or press Enter to continue: ";
$input = <STDIN>;
}
until($input =~ /^\s*q/i);
exit;
Execution
illustration not visible in this excerpt
Program 4: DNA Execution using package
The program used to print the information from the current package to the Mouse package. The string from $DNA ‘ATTTA’ will be overwrites with ‘CGGGGGC’. After execution both the packages will execute and overwrites string ‘CGGGGGC’.
Program
$dna = 'ATTTTA';
package Mouse;
$dna = 'CGGGGGC';
print "DNA from current package is:\n\n";
print $dna, "\n\n";
print "DNA from Mouse package is:\n\n";
print $Mouse::dna, "\n\n";
Execution
illustration not visible in this excerpt
General Program
Code
use lib "C:\Perl\lib";
print join("\n", @INC), "\n";
Execution
illustration not visible in this excerpt
Program 5: Array
Array is defined as set of similar characters. It is denoted with the symbol @. Characters should be assigned with single cotes and numbers need not contain cotes. The characters or numbers should be provided in (); separated with commas. The data will be stored in continues address in the memory.
Program
@contigs = ('atcca', 'atggcctg', 'tgccaa');
#print array
print "@contigs", "\n";
#print as concatenated string
print @contigs, "\n";
Execution
illustration not visible in this excerpt
Program 6: Reference variable
Reference variable is the information of the variable at the reference (address) located in the memory units.
Program
$pep = 'LEAF';
$pepref = \$pep;
print "Reference is:\n";
print $pepref, "\n";
print "output of reference:\n";
print ${$pepref}, "\n";
print "reference output also can be as: \n";
print $$pepref, "\n";
Execution
illustration not visible in this excerpt
Program 7: String at specific location
Array locations start from 0 to n-1. The below program array[0] contains ‘atgc’, array[1] contains 'ctgaa', and array[2] contains 'aattggcc', The execution of these locations can be programmed as given below.
Program
@array = ('atgc', 'ctgaa', 'aattggcc');
$ref = \@array;
print "reference string:\n";
print $ref, "\n";
print "reference address to:\n";
print "@{$ref}\n";
print "first value in array:\n";
print ${$ref}[0], "\n";
print "second value in array:\n";
print ${$ref}[1], "\n";
Execution
illustration not visible in this excerpt
Program 8: Arrow operator
Arrow operator assigns to value at the reference (or address). The arrow operator assigns with -> symbol.
Program
@array = ('atgc', 'ctgaa', 'aattggcc');
$ref = \@array;
print "reference string:\n";
print $ref, "\n";
print "reference address to:\n";
print "@{$ref}\n";
print "first value in array:\n";
print $ref->[0], "\n";
print "second value in array:\n";
print $ref->[1], "\n";
Execution
illustration not visible in this excerpt
-
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X. -
Upload your own papers! Earn money and win an iPhone X.