Classes‎ > ‎458‎ > ‎

Post Processing

Post-processing output in Python

For this tutorial, please see the files in /home/bmeredig/pub/python on either encina or morales. Thanks to Matt Johnson for supplying the log.lammps example.

In this tutorial, we will discuss how you can use the programming language Python to easily process output files from your simulations (regardless of the type of simulation). We will use as an example a log.lammps file from LAMMPS.

Open loglammps.py, which is a Python script. It should look like this:

f=open("log.lammps","r")

for line in f.readlines():
    print(line)

f.close()

You run this file from the command line by doing:

 $ python loglammps.py 

Go ahead and try it. Python should spit out the entire log.lammps file line-by-line. Let's go through how the script works:

f=open("log.lammps","r")

Open the log file in "read" mode and designate it by the variable "f."

for line in f.readlines():
    print(line)

For every "line" (could use any variable name here) that's generated by running the readlines() function on f (reads the file line-by-line), print "line." This has the effect of printing every line of log.lammps.

f.close()

Close the file "f".

Note that Python uses indentations to denote the beginning and end of "for" statements. That's also true for "if", "while", etc.

OK, we've developed the ability to print every line of our file, but that's not very useful. Let's tweak the script:

f=open("log.lammps","r")

for line in f.readlines():
    split=str.split(line)
    print(split)

f.close()

Now we've called upon the str.split function, which will break up "line" into an array based on the spaces in "line." Here's how we access specific columns (namely, the first one) of the the file:

f=open("log.lammps","r")

for line in f.readlines():
    split=str.split(line)
    if len(split)>0:
        print(split[0])

f.close()

Here we're checking to make sure that all of our line arrays have a length greater than 0 (i.e., they shouldn't be blank) before we try to print them. If you run the above script, you should see that it prints out the timesteps of our simulation, along with some stuff we don't necessarily want. Let's cut out everything except the timesteps:

f=open("log.lammps","r")

linenum=1
for line in f.readlines():
    split=str.split(line)
    if len(split)>0 and 98 <= linenum <= 2099:
        print(split[0])
    linenum=linenum+1
f.close()

I got these line numbers from vi; I just opened log.lammps and looked for where the "interesting stuff" (actual simulation data) exists. The very beginning and end of log.lammps don't contain any data, so we exclude those line numbers.

We're not quite done yet; now we would like to make a comma-separated file of temperature,energy pairs for plotting in Excel. Let's see how to do that:

f=open("log.lammps","r")
out=open("temp-energy.csv","w")

linenum=1
for line in f.readlines():
    split=str.split(line)
    if len(split)>0 and 98 <= linenum <= 2099:
        out.write(split[1]+','+split[2]+'\n')
    linenum=linenum+1
f.close()
out.close()

Notice that we have opened a new file in "write" mode to hold our output. Instead of printing to the screen, we write to this "out" file a string expression:

split[1]+','+split[2]+'\n'

split[1] is the temperature; we add on to that a ',' using the + sign and then after the comma we put the energy, split[2]. Finally, we have to end our output with a newline character '\n'. You should now have a file, temp-energy.csv, that is nicely formatted and ready for plotting!

Depending on your output, you may want to perform some mathematical manipulation of your numbers before you write them to a file. You have to do a few extra steps for that:

f=open("log.lammps","r")
out=open("temp-energy.csv","w")

linenum=1
for line in f.readlines():
    split=str.split(line)
    if len(split)>0 and 98 <= linenum <= 2099:
        temp=float(split[1])
        newtemp=3/2*1.381*10**(-23)*temp
        out.write(str(newtemp)+','+split[2]+'\n')
    linenum=linenum+1
f.close()
out.close()

I have defined the variable "temp" as a floating point number equal to split[1]. split[1] is a string of text when read from the file and not a number. We have to convert it to a number to be able to do math with it. Then I defined "newtemp" as the quantity 3/2 k T. To write out "newtemp" to the output file, I have to convert it back to a string with str(newtemp) as shown above.

That's it! You have all the basics! The web is full of nice Python tutorials in case you want to do more advanced stuff in your scripts.

Comments