Linux basics session 6

Course overview

Warm-up excercise

Create a script that outputs current timestamp and random line from file words?

Example output:

2021-03-06T00:06+01:00
Anthropocene

Solution: Create new text file for the script and start editing: nano date_name.sh. Insert following content:

#!/bin/bash
date -Im
shuf words | head -1

Our script expect file "words" to be located in the same directory. It comprises of two distinct commands, the second one uses pipes, but still is one command.

Don't forget to allow execution of our script by running chmod +x date_name.sh.

Now execute script ./date_name.sh.

How about if we want the date and random word on the same line?

We can use subshell directive - that is executed separatelly and output can be handled as string value. Subshell returning value is enclosed into $().

#!/bin/bash
echo "$(date -Im) $(shuf words | head -1)"

How do you store result in a text file? This task is not about what is inside our script, but how we execute it. It behaves like any other command, so we use redirect. Executing ./date_name.sh > out.txt will store output into a text file.

Advanced Bash: Loops

Very common construction is endless loop in a script. This is usually used in cases where we wont to process something repeatedly until user decides to stop. This is also used as main loop for interactive applications.

# endless loop
while true; do 
    echo 'ahoj';
    sleep 1;
done

Such a program will execute command every second and display output. When run from terminal, prompt is gone (the script occupies current shell). Break execution by pressing Ctrl+C.


For loop is used when we need to perform operation over a set of input data. For processing in bash we usually prefer one input data on each line (as we know it from using pipes).

# for loop
for i in $( cat items.txt ); do
    echo -n $i | wc -c;
done

Advanced Bash: Conditions

When creating scripts we usually want different behavior under some conditions.

Following example uses numeric comparison

#!/bin/bash

num_a=400
num_b=200

# conditions
if [ $num_a -lt $num_b ]; then
    echo "$num_a is less than $num_b!"
else
    echo "$num_a is greater than $num_b!"
fi

Periodical tasks: CRON

Every linux machine has a mechanism called cron, that allows every user to (separatelly) define schedule for a command (script) execution.

Display current cron jobs crontab -l. Edit crontab crontab -e (you may be asked to choose editor if doing this for the first time, choose nano)./date_name.sh

Crontab is in following format - see more details in docs of cron

every value may use * and */x
 minute (0-59 or * for all or */10 for every 10th)
/    hour (0-23)
|   /   day of month (1-31)
|   |  /    month of the year (1-12)
|   |  |   /    day of the week (1-7)
|   |  |   |   /
|   |  |   |   |
#m  h  dom mon dow   command
*   *   *   *   *     /home/vasek/projects/uochb/date_name.sh > /home/vasek/projects/uochb/date_name.log

In this example script is executed every minute (all *) and output is written into a .log file. Please note you need to use absolute file paths as cron don't recognize current path.

Advanced Python scripting

Writing scripts in bash can be tricky as variables are mostly strings and processing structured data is complicated. Python provides wealth of libraries to use and uses less error prone syntax. Python is usually included in Linux as it's used for some internals.

Good habit is to check which version of python is used, try following commands:

python --version
python3 --version

The first one is obsolete and usually points to Python 2, that is at the end of life, so please don't use it.

** Practical excercise **

Create Reverse Complement for given sequences. Input file is called Seqs.fasta and has following format:

>3-307633-43481.39
GGAAGATAGTGTGTCGATAAAGGGACAATGCTGAATTTCCTCCCTGAGCCGGCGCGTAGGGGGGAGTGACTTGGGATGGGGGGTG
>4-292175-41296.53
GGAAGACTTGACGAGCGCACAGCGCTTGTTCAGAATATCACCCGTCATGTGTTCGTGAAGGGGGAGTGACTTGGGATGGGGGTTC
>5-204535-28909.34
GGAAGAAATGTAGAGGAAACAGTGACTCTGCAGAATATCCTCACTGCGTAGTGGGGCAGGGGGGAGTGACTTGGGATGGGGGCTA
>6-175161-24757.57
GGCAGAGATGGCAACGTCAACATGAGGATGCCGCATATCCCCAGTGCACACTTGGGCAGTGGGGAGTGACTTGGGATGGGGGAAA

General process of SW developlent is: Analyse -> Design -> Implement -> Test

Analyse:

Input (read data - two values for each input: name and sequence) Process (append " REVCOM" to the name; convert sequence with transformation function) Output (print to STDOUT in original structure)

Design:

revcom function accepts string and for each character creates REVCOM by following rules:

'C' -> 'G'
'G' -> 'C'
'A' -> 'T'
'T' -> 'A'

Implementation:

#!/usr/bin/env python3

def revcom(input):
    '''
    Function returns Reverse Complement of given input sequence
    '''
    output = ''
    for char in input[::-1]:
        if char == 'C': output += 'G'
        elif char == 'G': output += 'C'
        elif char == 'A': output += 'T'
        elif char == 'T': output += 'A'
        else: output += '?'
    return output

# --- Input ---
data = []
f = open("Seqs.fasta", "r")
while True:
    item = {
        'name': '',
        'seq': ''
    }
    item['name'] = f.readline()[:-1]
    item['seq'] = f.readline()[:-1]
    if not item['seq']: break
    data.append(item)

print(data[0])

# --- Process ---
for item in data:
    item['name'] += " REV-COM"
    item['seq'] = revcom(item['seq'])

print(data[0])

exit()

# --- Output ---
for item in data:
    print(item['name'])
    print(item['seq'])

Test:

Above script outputs only the first line in original format and in output format, this is for testing purposes. After you are happy with what you see, comment out debug prints and exit function so the output is printed in full.