Linux basics session 6
Warm-up excercise
Create a script that outputs current timestamp and random line from file words?
Example output:
2021-03-06T00:06+01:00
Anthropocene
Solution: Create new text file for the script and start editing: nano date_name.sh
. Insert following content:
#!/bin/bash
date -Im
shuf words | head -1
Our script expect file "words" to be located in the same directory. It comprises of two distinct commands, the second one uses pipes, but still is one command.
Don't forget to allow execution of our script by running chmod +x date_name.sh
.
Now execute script ./date_name.sh
.
How about if we want the date and random word on the same line?
We can use subshell directive - that is executed separatelly and output can be handled as string value.
Subshell returning value is enclosed into $()
.
#!/bin/bash
echo "$(date -Im) $(shuf words | head -1)"
How do you store result in a text file?
This task is not about what is inside our script, but how we execute it. It behaves like any other command, so we use redirect.
Executing ./date_name.sh > out.txt
will store output into a text file.
Advanced Bash: Loops
Very common construction is endless loop in a script. This is usually used in cases where we wont to process something repeatedly until user decides to stop. This is also used as main loop for interactive applications.
# endless loop
while true; do
echo 'ahoj';
sleep 1;
done
Such a program will execute command every second and display output. When run from terminal, prompt is gone (the script occupies current shell). Break execution by pressing Ctrl+C.
For loop is used when we need to perform operation over a set of input data. For processing in bash we usually prefer one input data on each line (as we know it from using pipes).
# for loop
for i in $( cat items.txt ); do
echo -n $i | wc -c;
done
Advanced Bash: Conditions
When creating scripts we usually want different behavior under some conditions.
Following example uses numeric comparison
#!/bin/bash
num_a=400
num_b=200
# conditions
if [ $num_a -lt $num_b ]; then
echo "$num_a is less than $num_b!"
else
echo "$num_a is greater than $num_b!"
fi
Periodical tasks: CRON
Every linux machine has a mechanism called cron, that allows every user to (separatelly) define schedule for a command (script) execution.
Display current cron jobs crontab -l
. Edit crontab crontab -e
(you may be asked to choose editor if doing this for the first time, choose nano)./date_name.sh
Crontab is in following format - see more details in docs of cron
every value may use * and */x
minute (0-59 or * for all or */10 for every 10th)
/ hour (0-23)
| / day of month (1-31)
| | / month of the year (1-12)
| | | / day of the week (1-7)
| | | | /
| | | | |
#m h dom mon dow command
* * * * * /home/vasek/projects/uochb/date_name.sh > /home/vasek/projects/uochb/date_name.log
In this example script is executed every minute (all *) and output is written into a .log file. Please note you need to use absolute file paths as cron don't recognize current path.
Advanced Python scripting
Writing scripts in bash can be tricky as variables are mostly strings and processing structured data is complicated. Python provides wealth of libraries to use and uses less error prone syntax. Python is usually included in Linux as it's used for some internals.
Good habit is to check which version of python is used, try following commands:
python --version
python3 --version
The first one is obsolete and usually points to Python 2, that is at the end of life, so please don't use it.
** Practical excercise **
Create Reverse Complement for given sequences. Input file is called Seqs.fasta
and has following format:
>3-307633-43481.39
GGAAGATAGTGTGTCGATAAAGGGACAATGCTGAATTTCCTCCCTGAGCCGGCGCGTAGGGGGGAGTGACTTGGGATGGGGGGTG
>4-292175-41296.53
GGAAGACTTGACGAGCGCACAGCGCTTGTTCAGAATATCACCCGTCATGTGTTCGTGAAGGGGGAGTGACTTGGGATGGGGGTTC
>5-204535-28909.34
GGAAGAAATGTAGAGGAAACAGTGACTCTGCAGAATATCCTCACTGCGTAGTGGGGCAGGGGGGAGTGACTTGGGATGGGGGCTA
>6-175161-24757.57
GGCAGAGATGGCAACGTCAACATGAGGATGCCGCATATCCCCAGTGCACACTTGGGCAGTGGGGAGTGACTTGGGATGGGGGAAA
General process of SW developlent is: Analyse -> Design -> Implement -> Test
Analyse:
Input (read data - two values for each input: name and sequence) Process (append " REVCOM" to the name; convert sequence with transformation function) Output (print to STDOUT in original structure)
Design:
revcom function accepts string and for each character creates REVCOM by following rules:
'C' -> 'G'
'G' -> 'C'
'A' -> 'T'
'T' -> 'A'
Implementation:
#!/usr/bin/env python3
def revcom(input):
'''
Function returns Reverse Complement of given input sequence
'''
output = ''
for char in input[::-1]:
if char == 'C': output += 'G'
elif char == 'G': output += 'C'
elif char == 'A': output += 'T'
elif char == 'T': output += 'A'
else: output += '?'
return output
# --- Input ---
data = []
f = open("Seqs.fasta", "r")
while True:
item = {
'name': '',
'seq': ''
}
item['name'] = f.readline()[:-1]
item['seq'] = f.readline()[:-1]
if not item['seq']: break
data.append(item)
print(data[0])
# --- Process ---
for item in data:
item['name'] += " REV-COM"
item['seq'] = revcom(item['seq'])
print(data[0])
exit()
# --- Output ---
for item in data:
print(item['name'])
print(item['seq'])
Test:
Above script outputs only the first line in original format and in output format, this is for testing purposes. After you are happy with what you see, comment out debug prints and exit function so the output is printed in full.