Step 2. Generate topology file of protein
In the previous step, we obtained the CG PDB (system.cg.pdb) from the AA PDB (step5_assemply.pdb) using the python script and json file.
As seen in
tutorial for lipid membrane systems,
we have to prepare topology files to generate LAMMPS input files using a command of spica-tools,
called
setup_lmp.
Although lipid topology files are included in the force field file used in
tutorial for lipid membrane systems or generated with
a spica-tools command,
json2top,
protein topology files are basically prepared using another spica-tools command,
ENM.
The command is needed to apply an elastic network to fix secondary structures
of proteins in CG-MD simulations.
To perform the python script, we must prepare CG PDB file including only the protein of the CG initial configuration. For example, we can use linux commands to extract the protein data from the CG initial configuration (system.cg.pdb):
$ grep -v DOPC system.cg.pdb | grep -v TIP3 | grep -v CLA > protein.cg.pdb
A example file of the CG protein PDB file (protein.cg.pdb) is given
here.
After preparing the CG protein PDB file, we will apply the
ENM command to generate
the topology file of the CG protein. The command will be:
$ cg_spica ENM protein.cg.pdb protein.cg.top
The
protein.cg.top is the CG topology file of the protein and has elastic network bonds
whose equilibrium lengths are taken from the input PDB file. This file should look like as follows:
atom 1 VAL GBT GBT 56.0385 0.1118 P
atom 2 VAL VAL VAL 43.0883 0.0000 P
atom 3 ARG GBM GBM 56.0385 0.0000 P
...
bondparam 3 14 1.195000 8.852091 # GBM-GBM
bondparam 9 20 1.195000 8.033982 # GBM-GBM
...
bond 1 3 # GBM-GBM
bond 3 6 # GBM-GBM
...
angle 1 3 6 # GBM GBM GBM
...
dihedralparam 10 11 12 13 50.0 1 180 0.0 # PH1-PH2-PH3-PH4
...
In the lines where the first column shows
bondparam, the second
and third columns specify backbone pair indices of the applied elastic
network bond. The forth and fifth columns show a force constant and an
equilibrium lenth of the bond. This can be downloaded
here as an example.
Note that
the python script will generate the topology file for a whole protein (backbone
segments) in the input PDB files. So the input PDB file should contain only a single
protein molecule (monomer), even though you might have several different types of
protein molecules in your system.
In that case, the script should be separately applied to each different protein molecule
to obtain the topology file.
You should rename the topology file,
because this script always overwrite "protein.cg.top".
If you have multiple protein molecules in the input PDB, the script will
generate a topology file with an elastic network all over the protein molecules
including the intermolecular elastic networks. (You may do this on the
purpose, though this is not a typical case.)