Classes‎ > ‎458‎ > ‎

Shell Scripting


Introduction

When practicing computational materials science, you will often want to systematically tune parameters in your calculations. For example, you may want to test k point convergence when using DFT, or probe the relationship between total energy and lattice parameter when using lattice methods. You could do this by hand, by manually editing your input file(s) for each data point you’d like to collect, perform the calculation, and then look into your output file to find the data you would like, and record it. However, this can be very time consuming, and greatly increases the chances of making an error you may not catch. A better alternative is to utilize shell scripting (the language used to navigate the terminal) to automate your submissions, and your data extraction. You can think of shell scripting as an extension of the terminal -- each line is effectively a new command issued to the terminal. Shell scripting is therefore extremely powerful when used to methodically manipulate files and file architecture.

There are four types of files that are of importance to you: Input files, output files, submission files (or q files), and script files. Input files control the parameters of your experiment, output files contain the output of your experiment, q files tell the cluster how to handle your calculation, and script files are used to automate processes. In reality, your q file is just a script file, with a few specific lines that tell the scheduler what to do.

Automation Workflow

There are many ways you could set up your automation workflow, but I'll outline one below.

1) Create a base input file with the parameters for your experiment. For the parameter you want to vary, put in a string flag to replace that does not otherwise appear in the document (‘XYZ’)

2) In your q file, use for-loops, in conjunction with the sed command (described below), to substitute a value into the flag written in step 1).

a)          Within this loop, use other shell commands (ie mv, cp)  to manipulate files in your directory to ensure files aren’t written over, and call mpirun to execute your calculations.

3) Write a data extraction script that will take the required information from the output files. You could also put this step into your for-loop in your q file, which simplifies having to find the relevant information for the parameter you are tuning, however, that makes you a bit more vulnerable to making mistakes and having to re-calculate things.

The advantage of this method is that you will have one submission file for one problem, which will help not clog up the queue, and will preserve your priority in the queue.

Useful commands

Automating Input

For-loops in bash loop over an array, and treats each value in the array as a variable when executing the loop:

for var in 0 1 2 3
do
echo "$var"
done

The above snippet will output the variables in the array to your terminal. Here, the $ tells bash to expand var

sed is short for stream editor. It has quite a few functions, but you’ll find its usage for substitution to be the most relevant. If we’d like to replace XYZ with ZYX in our base.in into a new file called ZYX.in, we could write:

sed "s/XYZ/ZYX/" base.in > ZYX.in

Putting it all together:
for var in 0 1 2 3
do
sed "s/XYZ/$var/" base.in > "first_instance_replaced_$var.in"
done
This will edit base.in, replace the first XYZ with your variable, and create a new file with your variable name in it.

To replace all instances of XYZ, instead of just the first one:
for var in 0 1 2 3
do
sed "s/XYZ/$var/g" base.in > "all_instances_replaced_$var.in"
done

Automating Data Extraction

grep will grab all lines that contains a desired string in your file

grep "foo" LJ.out

awk can be used in conjunction with grep to extract a column from the output 

grep "foo" LJ.out | awk '{print $2}'
will find all lines with the word foo, and print the second column. The | is called a pipe, and passes the output from the first command to the second. Obviously, knowing the structure of your output file is key here!

To expand on this, you can save the output of commands to variables. For instance:

m=$(grep "energy" LJ.out | awk '{print $3}')

echo "$m" >> outdata.csv

Will output the desired value to a text file. 

Before executing a script, enter

chmod u+x script.sh

This isn't necessary for .q files that you submit to the queue system. 

Other useful commands are head and tail, which can be used to take specific subsets of lines from a file. Google them. 

A good resource to learn basics of Bash scripting is (the first 3 chapters of) the "Advanced Bash-Scripting Guide" hosted by The Linux Documentation Project, here: http://www.tldp.org/LDP/abs/html/index.html

Comments