Step 2. Generate topology file of protein
In the previous step, the CG PDB (system.cg.pdb) was obtained from the
AA PDB (step5_assemply.pdb) using a python script and json file. As seen
in the
tutorial for lipid membrane systems, we have to prepare topology files to generate input files for LAMMPS
using a command of spica-tools, called
setup_lmp. Although lipid topology files are included in the force field file used
in
tutorial for lipid membrane systems or generated with
a spica-tools command,
json2top,
protein topology files are basically prepared using another spica-tools command,
ENM.
The command is needed to apply an elastic network to fix secondary structures
of proteins in CG-MD simulations.
To perform the python script, we must prepare CG PDB file including only
the protein of the CG initial configuration. For example, we can use linux
commands to extract the protein data from the CG initial configuration
(system.cg.pdb), in which we have DOPC lipids, TIP3 water, CLA chloride
ions with protein:
$ grep -v DOPC system.cg.pdb | grep -v TIP3 | grep -v CLA > protein.cg.pdb
An example file of the CG protein PDB file (protein.cg.pdb) is given
here. Because the bead types of backbones are determined depending on secondary
structures of the protein in the SPICA FF ver2, we must also prepare the
all-atom pdb file of the protein (
protein.aa.pdb). Also, we need to install the DSSP program for assignment of secondary
structures (see
here for installation) and create a PATH to the dssp binary program. After
preparing the CG and AA protein PDB file, we will apply the
ENM command to generate the topology file of the CG protein. The command line
will be:
$ cg_spica ENM -aapdb protein.aa.pdb protein.cg.pdb protein.cg.top
protein.cg.top is the CG topology file of the protein, with elastic network bonds with
equilibrium lengths taken from the input PDB file. This file should look
like the following:
atom 1 VAL GBTL GBTL 56.0385 0.1118 P
atom 2 VAL VAL VAL 43.0883 0.0000 P
atom 3 ARG GBML GBML 56.0385 0.0000 P
...
bondparam 3 14 1.195000 8.852091 # GBM-GBM
bondparam 9 20 1.195000 8.033982 # GBM-GBM
...
bond 1 3 # GBM-GBM
bond 3 6 # GBM-GBM
...
angleparam 1 3 6 10.0 130.0000# GBM GBM GBM
...
dihedralparam 10 11 12 13 50.0 1 180 0.0 # PH1-PH2-PH3-PH4
...
For rows where the first column is
bondparam, the second and third columns specify the backbone pair indices of the
applied elastic network bond, and the forth and fifth columns show a force
constant and an equilibrium lenth of the bond, respectively. This can be
downloaded
here as an example.
Note that, if you want to prepare the topology file of the protein for
SPICA FF ver1, in which the backbone bead types are not dependent on the
secondary structures of the protein, add the flag "-v1" in the
command line:
$ cg_spica ENM -v1 protein.cg.pdb protein.cg.v1.top
An example topology file of ver1 is given
here.
Note that
the python script generates a topology file for the entire protein (backbone
segment) contained in the input PDB files. Thus, the input PDB file should contain only a single protein molecule
(monomer). If there are several different types of protein molecules in
the system, the script must be applied to each protein molecule separately
to obtain the topology file. The obtained topology file should be renamed,
because the script produces and overwrites the topology file, "protein.cg.top".
If you have multiple protein molecules in the input PDB file, the script
will generate a topology file with an elastic network all over the protein
molecules including the intermolecular elastic networks. (You may do this
on the purpose, though this is not a typical case.)