一个使用gromacs进行蛋白质模拟的入门教程

合集下载

GROMACS使用教程

GROMACS教程一 Gromacs基本模拟流程 (4)1 下载pdb文件 (4)2 用pdb2gmx 处理 pdb 文件 (5)3 建立盒子 (5)5 设置能量最小化 (6)6 用grompp程序进行文件处理 (9)7 使用 genion 和tpr文件添加离子 (9)8 用fws_ion.pdb来产生能量最小化的输入文件 (10)9 在后台运行能量最小化（在命令后加&） (10)二设置位置限制性动力学模拟 (11)三设置非限制性动力学模拟 (15)1 如何重启一个计算 (17)2 如何延长一个计算 (17)3 如何设置并行计算 (18)五模拟结果分析 (18)1 如何将特定帧的轨迹保存成*.pdb文件 (18)2 用ngmx观察轨迹文件（也可以用VMD观察轨迹文件) (19)3 比较常用的分析工具 (21)3.3 g_covar 计算斜方差 (24)3.4 g_energy 能量数据作图，如压力、体积、密度等 (24)3.5 g_gyrate 测量回旋半径 (26)3.6 g_rms 与 g_rmsdist 计算结构的RMSD 值 (26)3.7 g_rmsf 计算原子位置的根均方波动（ rmsf ） (27)3.8 do_dssp 计算模型的二级结构 (30)3.9 g_hbond 计算模拟过程中分子间的氢键的数目、距离或角度 (31)3.10 g_saltbr 分析模拟中残基间的盐桥 (32)GROMACS 是一个使用经典分子动力学理论研究蛋白质动力学的高端的高效的工具。

GROMACS是遵守GNU许可的免费软件，可以从以下站点下载：，并且可以在linux和 Windows上使用。

在本教程中，将研究一个从漏斗形蜘蛛的毒液中分离的毒素。

我们将使用显性溶剂动力学的方法来进行研究。

首先比较真空中和溶解的模型。

我们将把毒素肽溶在水盒子里，紧接着用牛顿运动定律加以平衡。

我们还将比较偿离子在显性溶剂动力学中的影响。

gromacs使用手册

gromacs使用手册摘要：1.引言2.gromacs 简介3.gromacs 的基本使用3.1 安装gromacs3.2 创建模拟系统3.3 运行模拟3.4 分析结果4.gromacs 的高级使用4.1 模拟参数设置4.2 力场参数设置4.3 脚本编写与批处理5.gromacs 的常见问题及解决方法6.总结正文：gromacs 使用手册gromacs 是一款开源的分子动力学模拟软件，广泛应用于生物大分子、药物分子、液晶等体系的模拟研究。

本手册将详细介绍gromacs 的使用方法及高级技巧。

1.引言分子动力学模拟是一种通过计算原子之间的相互作用力，来研究分子运动规律的方法。

gromacs 作为一款功能强大的分子动力学模拟软件，得到了广大科研工作者的青睐。

2.gromacs 简介gromacs 是一款基于GROMOS 力场的分子动力学模拟软件，支持多种体系、多种力场的模拟计算。

gromacs 采用高效的算法和技术，可以实现高效的大规模模拟计算。

3.gromacs 的基本使用3.1 安装gromacs用户可以根据gromacs 的官方文档，选择合适的安装方式，如使用编译器进行编译安装，或使用包管理器进行安装。

3.2 创建模拟系统首先需要构建分子模型，包括原子类型、坐标文件、相互作用参数等。

接着，通过gromacs 的脚本或命令行，设定模拟参数，如温度、压力、模拟时间等。

3.3 运行模拟根据设定的参数，运行gromacs 命令，开始模拟计算。

gromacs 会生成一系列模拟结果文件，包括轨迹文件、能量文件、坐标文件等。

3.4 分析结果使用gromacs 提供的分析工具或其他第三方软件，对模拟结果进行后处理，如计算均方根偏差（RMSD）、计算相互作用能等。

4.gromacs 的高级使用4.1 模拟参数设置根据实际需求，调整模拟参数，如采用更高级的力场、改变模拟方法等，以优化模拟效果。

4.2 力场参数设置通过修改力场参数文件，可以自定义力场参数，以适应不同体系的研究需求。

gromacs使用手册

gromacs使用手册【原创实用版】目录1.Gromacs 简介2.Gromacs 的功能和应用领域3.Gromacs 的使用方法和技巧4.Gromacs 的安装与配置5.Gromacs 的常见问题与解决方案6.Gromacs 的未来发展趋势正文1.Gromacs 简介Gromacs 是一个开源的生物大分子模拟软件，主要用于分子动力学模拟和静态模拟。

Gromacs 可以模拟各种生物大分子，如蛋白质、核酸和多糖等，被广泛应用于生物学、化学、材料科学等领域的研究中。

2.Gromacs 的功能和应用领域Gromacs 具有以下主要功能：(1) 分子动力学模拟：Gromacs 可以使用不同的力场和算法进行分子动力学模拟，以研究分子结构和功能的关系。

(2) 静态模拟：Gromacs 可以进行静态模拟，以研究分子的静态结构。

(3) 计算化学：Gromacs 可以进行量子化学计算和分子轨道计算。

(4) 溶剂模型：Gromacs 提供了多种溶剂模型，以模拟不同环境下分子的行为。

Gromacs 的应用领域包括：(1) 蛋白质结构和功能研究(2) 药物设计(3) 材料科学(4) 生物物理学3.Gromacs 的使用方法和技巧(1) 安装 Gromacs：Gromacs 的安装过程相对简单，只需按照官方提供的安装指南进行即可。

(2) 配置 Gromacs：Gromacs 需要配置一些环境变量和参数文件，以保证其正常运行。

(3) 使用 Gromacs：Gromacs 提供了命令行界面和图形界面两种使用方式。

(4) 技巧：在进行模拟时，需要注意选择合适的力场和算法，以及合理设置模拟参数。

4.Gromacs 的安装与配置(1) 安装 Gromacs：可以从 Gromacs 官网下载最新版本的安装包，然后按照官方提供的安装指南进行安装。

(2) 配置 Gromacs：需要配置环境变量和参数文件，以保证 Gromacs 正常运行。

gromacs使用手册

gromacs使用手册【原创版】目录1.Gromacs 简介2.Gromacs 的功能3.Gromacs 的使用方法4.Gromacs 的常见问题与解决方案5.总结正文1.Gromacs 简介Gromacs 是一个开源的生物大分子模拟软件，主要用于分子动力学模拟、势能计算、分子对接等。

它广泛应用于生物物理学、药物设计等领域，为研究者提供了强大的计算工具。

2.Gromacs 的功能Gromacs 具有以下主要功能：(1) 分子动力学模拟：Gromacs 能够模拟不同温度、压力条件下的分子动力学行为，帮助研究者深入了解分子结构和性质。

(2) 势能计算：Gromacs 可以计算分子间的相互作用势能，为分子对接、药物设计等领域提供重要参考数据。

(3) 分子对接：Gromacs 具有高效的分子对接功能，可以帮助研究者快速找到合适的分子结合模式。

(4) 蛋白质结构预测：Gromacs 可以通过分子动力学模拟，预测蛋白质的二级结构和三级结构。

3.Gromacs 的使用方法(1) 安装 Gromacs：首先需要从官方网站下载 Gromacs 源代码并进行编译安装。

(2) 准备模拟系统：根据研究目的，构建分子模型，并设定模拟参数，如温度、压力等。

(3) 运行模拟：使用 Gromacs 提交模拟任务，等待模拟结果。

(4) 分析结果：对模拟结果进行分析，提取所需的信息，如分子动力学轨迹、势能曲线等。

4.Gromacs 的常见问题与解决方案(1) 模拟过程中出现异常：可能是由于模拟参数设置不合理，或者系统配置不足导致。

需要检查模拟参数和系统配置，适当调整以解决问题。

(2) 模拟结果不准确：可能是由于分子模型构建不准确，或者模拟过程中出现误差。

需要检查分子模型，适当延长模拟时间以提高模拟精度。

(3) Gromacs 报错：可能是由于软件版本不兼容，或者安装过程中出现错误。

需要检查软件版本和安装过程，重新安装或升级软件。

5.总结Gromacs 是一个功能强大的生物大分子模拟软件，广泛应用于生物物理学、药物设计等领域。

gromacs使用手册

gromacs使用手册摘要：一、Gromacs简介二、Gromacs的安装与配置三、Gromacs的基本操作1.创建模拟配置文件2.运行模拟3.分析结果四、Gromacs的高级功能1.分子动力学模拟2.热力学计算3.对接与筛选五、Gromacs的优缺点六、Gromacs的未来发展正文：一、Gromacs简介Gromacs是一款用于分子动力学模拟的开源软件，广泛应用于生物化学、材料科学等领域。

它具有高效的计算性能、丰富的功能和友好的用户界面，为科学家提供了强大的分子模拟工具。

二、Gromacs的安装与配置要在计算机上安装Gromacs，首先需要确保满足系统的硬件和软件要求。

接下来，按照官方文档的指引进行安装和配置。

在配置过程中，用户可以根据自己的需求选择相应的模块和参数。

三、Gromacs的基本操作1.创建模拟配置文件要运行Gromacs，首先需要创建一个模拟配置文件（gro文件），其中包含了模拟系统的信息，如原子、盒子、温度、压力等。

通过编辑gro文件，用户可以设置模拟的具体参数。

2.运行模拟在完成gro文件设置后，使用Gromacs提供的脚本（如mdrun）运行模拟。

根据需要，用户可以选择不同的模拟模式，如NVT、NPT等。

3.分析结果Gromacs可以自动生成模拟过程中的数据文件（如gro、xtc、trr等），用户可以通过Gromacs提供的分析工具（如g_analysis）对这些文件进行处理和可视化。

四、Gromacs的高级功能1.分子动力学模拟Gromacs支持多种分子动力学算法，如Verlet积分器、Langevin动力学等。

用户可以根据研究需求选择合适的算法进行模拟。

2.热力学计算Gromacs可以用于计算系统的热力学性质，如比热、熵等。

这些计算有助于深入了解系统的热力学行为。

3.对接与筛选Gromacs提供了对接和筛选工具，用于寻找分子间的最佳结合位点。

这对于药物设计和蛋白质筛选等领域具有重要的应用价值。

一个使用gromacs进行蛋白质模拟的入门教程

Lysozyme in WaterJustin LemkulDepartment of Biochemistry, Virginia TechThis example will guide a new user through the process of setting up a simulation system containing a protein (lysozyme) in a box of water, with ions. Each step will contain an explanation of input and output, using typical settings for general use.This tutorial assumes you are using a GROMACS version in the series.GROMACS TutorialStep One: Prepare the TopologyWe must download the protein structure file we will be working with. For this tutorial, we will utilize hen egg white lysozyme (PDB code 1AKI). Go to the RCSB website and download the PDB text for the crystal structure.Once you have downloaded the structure, you can visualize the structure using a viewing program such as VMD, Chimera, PyMOL, etc. Once you've had a look at the molecule, you are going to want to strip out the crystal waters. Use a plain text editor like vi, emacs (Linux/Mac), or Notepad (Windows). Do not use word processing software! Delete the lines corresponding to these molecules (residue "HOH" in the PDB file). Note that such a procedure is not universally appropriate ., the case of a bound active site water molecule). For our intentions here, we do not need crystal water.Always check your .pdb file for entries listed under the comment MISSING, as these entries indicate either atoms or whole residues that are not present in the crystal structure. Terminal regions may be absent, and may not present a problem for dynamics. Incomplete internal sequences or any amino acid residues that have missing atoms will cause pdb2gmx to fail. These missing atoms/residues must be modeled in using other software packages. Also note that pdb2gmx is not magic. It cannot generate topologies for arbitrary molecules, just the residues defined by the force field (in the *.rtp files - generally proteins, nucleic acids, and a very finite amount of cofactors, like NAD(H) and ATP).Now that the crystal waters are gone and we have verified that all the necessary atoms are present, the PDB file should contain only protein atoms, and is ready to be input into the first GROMACS tool,pdb2gmx. The purpose of pdb2gmx is to generate three files:1.The topology for the molecule.2. A position restraint file.3. A post-processed structure file.The topology by default) contains all the information necessary to define the molecule within a simulation. This information includes nonbonded parameters (atom types and charges) as well as bonded parameters (bonds, angles, and dihedrals). We will take a more detailed look at the topology once it has been generated.Execute pdb2gmx by issuing the following command:pdb2gmx -f-o-water spceThe structure will be processed by pdb2gmx, and you will be prompted to choose a force field:Select the Force Field:From '/usr/local/gromacs/share/gromacs/top':1: AMBER03 force field (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)2: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)3: AMBER96 force field (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996)4: AMBER99 force field (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000)5: AMBER99SB force field (Hornak et al., Proteins 65, 712-725, 2006)6: AMBER99SB-ILDN force field (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010)7: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002)8: CHARMM27 all-atom force field (with CMAP) - version9: GROMOS96 43a1 force field10: GROMOS96 43a2 force field (improved alkane dihedrals)11: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205)12: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656)13: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656)14: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)15: [DEPRECATED] Encad all-atom force field, using full solvent charges16: [DEPRECATED] Encad all-atom force field, using scaled-down vacuum charges 17: [DEPRECATED] Gromacs force field (see manual)18: [DEPRECATED] Gromacs force field with hydrogens for NMRThe force field will contain the information that will be written to the topology. This is a very important choice! You should always read thoroughly about each force field and decide which is most applicable to your situation. For this tutorial, we will use the all-atom OPLS force field, so type 14 at the command prompt, followed by 'Enter'.There are many other options that can be passed to pdb2gmx. Some are listed here:-ignh: Ignore H atoms in the PDB file; especially useful for NMR structures. Otherwise, if H atoms are present, they must be in the correct order and named exactly how GROMACS expects them to be.-ter: Interactively assign charge states for N- and C-termini.-inter: Interactively assign charge states for Glu, Asp, Lys, Arg, and His; assign disulfides to Cys.You have now generated three new files: , , and . is a GROMACS-formatted structurefile that contains all the atoms defined within the force field ., H atoms have been added to the amino acids in the protein). The file is the system topology (more on this in a minute). The file contains information used to restrain the positions of heavy atoms (more on this later).Step Two: Examine the TopologyLet's look at what is in the output topology . Again, using a plain text editor, inspect its contents. After several comment lines (preceded by ;), you will find the following:#include ""This line calls the parameters within the OPLS-AA force field. It is at the beginning of the file, indicating that all subsequent parameters are derived from this force field. The next important line is [ moleculetype ], below which you will find ; Name nrexclProtein_A 3The name "Protein_A" defines the molecule name, based on the fact that the protein was labeled as chain A in the PDB file. There are 3 exclusions for bonded neighbors. More information on exclusions can be found in the GROMACS manual; a discussion of this information is beyond the scope of this tutorial.The next section defines the [ atoms ] in the protein. The information is presented as columns:[ atoms ]; nr type resnr residue atom cgnr charge mass typeB chargeB massB; residue 1 LYS rtp LYSH q +1 opls_287 1 LYS N 1 ; qtot2 opls_290 1 LYS H1 1 ; qtot3 opls_290 1 LYS H2 1 ; qtot4 opls_290 1 LYS H3 1 ; qtot5 opls_293B 1 LYS CA 1 ; qtot6 opls_140 1 LYS HA 1 ; qtot 1The interpretation of this information is as follows:nr: Atom numbertype: Atom typeresnr: Amino acid residue numberresidue: The amino acid residue nameNote that this residue was "LYS" in the PDB file; the use of .rtp entry "LYSH"indicates that the residue is protonated (the predominant state at neutralpH).atom: Atom namecgnr: Charge group numberCharge groups define units of integer charge; they aid in speeding upcalculationscharge: Self-explanatoryThe "qtot" descriptor keeps a running count of the total charge on the moleculemass: Also self-explanatorytypeB, chargeB, massB: Used for free energy perturbation (not discussed here) Subsequent sections include [ bonds ], [ pairs ], [ angles ], and [ dihedrals ]. Some of these sections are self-explanatory (bonds, angles, and dihedrals). The parameters and function types associated with these sections are elaborated on in Chapter 5 of the GROMACS manual. Special 1-4 interactions are included under "pairs" (section of the GROMACS manual).The remainder of the file involves defining a few other useful/necessary topologies, starting with position restraints. The "" file was generated by pdb2gmx; it defines a force constant used to keep atoms in place during equilibration (more on this later).; Include Position restraint file#ifdef POSRES#include ""#endifThis ends the "Protein_A" moleculetype definition. The remainder of the topology file is dedicated to defining other molecules and providing system-level descriptions. The next moleculetype (by default) is the solvent, in this case SPC/E water. Other typical choices for water include SPC, TIP3P, and TIP4P. We chose this by passing "-water spce" to pdb2gmx. For an excellent summary of the many different water models, click here, but be aware that not all of these models are present within GROMACS.; Include water topology#include ""#ifdef POSRES_WATER; Position restraint for each water oxygen[ position_restraints ]; i funct fcx fcy fcz1 1 1000 1000 1000#endifAs you can see, water can also be position-restrained, using a force constant (kpr) of 1000 kJ mol-1nm-2.Ion parameters are included next:; Include generic topology for ions#include ""Finally come system-level definitions. The [ system ] directive gives the name of the system that will be written to output files during the simulation. The [ molecules ] directive lists all of the molecules in the system.[ system ]; NameLYSOZYME[ molecules ]; Compound #molsProtein_A 1A few key notes about the [ molecules ] directive:1.The order of the listed molecules must exactly match the order of themolecules in the coordinate (in this case, .gro) file.2.The names listed must match the [ moleculetype ] name for each species,not residue names or anything else.If you fail to satisfy these concrete requirements at any time, you will get fatal errors from grompp (discussed later) about mismatched names, molecules not being found, or a number of others.Now that we have examined the contents of a topology file, we can continue building our system.Step Three: Defining the Unit Cell & Adding SolventNow that you are familiar with the contents of the GROMACS topology, it is time to continue building our system. In this example, we are going to be simulating a simple aqueous system. It is possible to simulate proteins and other molecules in different solvents, provided that good parameters are available for all species involved. There are two steps to defining the box and filling it with solvent:1.Define the box dimensions using editconf.2.Fill the box with water using genbox.You are now presented with a choice as to how to treat the unit cell. For the purpose of this tutorial, we will use a simple cubic box as the unit cell. As you becomemore comfortable with periodic boundary conditions and box types, I highly recommend the rhombic dodecahedron, as its volume is ~71% of the cubic box of the same periodic distance, thus saving on the number of water molecules that need to be added to solvate the protein.Let's define the box using editconf:editconf-f-o -c -d -bt cubicThe above command centers the protein in the box (-c), and places it at least nm from the box edge (-d. The box type is defined as a cube (-bt cubic). The distance to the edge of the box is an important parameter. Since we will be using periodic boundary conditions, we must satisfy the minimum image convention. That is, a protein should never see its periodic image, otherwise the forces calculated will be spurious. Specifying a solute-box distance of nm will mean that there are at least nm between any two periodic images of a protein. This distance will be sufficient for just about any cutoff scheme commonly used in simulations.Now that we have defined a box, we can fill it with solvent (water). Solvation is accomplished using genbox:genbox -cp-cs-o -pThe configuration of the protein (-cp) is contained in the output of the previous editconf step, and the configuration of the solvent (-cs) is part of the standard GROMACS installation. We are using , which is a generic equilibrated 3-point solvent model. You can use as the solvent configuration for SPC, SPC/E, or TIP3P water, since they are all three-point water models. The output is called , and we tell genboxthe name of the topology file so it can be modified. Note the changes to the [ molecules ] directive of :[ molecules ]; Compound #molsProtein_A 1SOL 10832What genbox has done is keep track of how many water molecules it has added, which it then writes to your topology to reflect the changes that have been made. Note that if you use any other (non-water) solvent, genbox will not make these changes to your topology! Its compatibility with updating water molecules is hard-coded.Step Four: Adding IonsWe now have a solvated system that contains a charged protein. The output of pdb2gmx told us that the protein has a net charge of +8e (based on its amino acid composition). If you missed this information in the pdb2gmx output, look at the last line of your [ atoms ] directive in ; it should read (in part) "qtot 8." Since life does not exist at a net charge, we must add ions to our system.The tool for adding ions within GROMACS is called genion. What genion does is read through the topology and replace water molecules with the ions that the user specifies. The input is called a run input file, which has an extension of .tpr; this file is produced by the GROMACS tool grompp (GROMACS pre-processor), which will also be used later when we run our first simulation. What grompp does is process the coordinate file and topology (which describes the molecules) to generate anatomic-level input (.tpr). The .tpr file contains all the parameters for all of the atoms in the system.To produce a .tpr file with grompp, we will need an additional input file, with the extension .mdp (molecular dynamics parameter file); grompp will assemble the parameters specified in the .mdp file with the coordinates and topology information to generate a .tpr file.An .mdp file is normally used to run energy minimization or an MD simulation, but in this case is simply used to generate an atomic description of the system. An example .mdp file (the one we will use) can be downloaded here.In reality, the .mdp file used at this step can contain any legitimate combination of parameters. I typically use an energy-minimization script, because they are very basic and do not involve any complicated parameter combinations.Assemble your .tpr file with the following:grompp -f -c-p -oNow we have an atomic-level description of our system in the binary file . We will pass this file to genion:genion -s-o-p -pname NA-nname CL-nn 8When prompted, choose group 13 "SOL" for embedding ions. You do not want to replace parts of your protein with ions.In the genion command, we provide the structure/state file (-s) as input, generate a .gro file as output (-o), process the topology (-p) to reflect the removal of water molecules and addition of ions, define positive and negative ion names (-pname and-nname, respectively), and tell genion to add only the ions necessary to neutralize the net charge on the protein by adding the correct number of negative ions (-nn 8). You could also use genion to add a specified concentration of ions in addition to simply neutralizing the system by specifying the -neutral and -conc options in conjunction. Refer to the genion man page for information on how to use these options. The names of the ions specified with -pname and -nname were force field-specific in previous versions of GROMACS, but have been standardized as of version . The specified atom names are always the elemental symbol in all capital letters, along with the [ moleculetype ]. Residue names may or may not append the sign of the charge (+/-). Refer to for proper nomenclature if you encounter difficulties.Your [ molecules ] directive should now look like:[ molecules ]; Compound #molsProtein_A 1SOL 10824CL 8Step Five: Energy MinimizationThe solvated, electroneutral system is now assembled. Before we can begin dynamics, we must ensure that the system has no steric clashes or inappropriate geometry. The structure is relaxed through a process called energy minimization (EM).The process for EM is much like the addition of ions. We are once again going to use grompp to assemble the structure, topology, and simulation parameters into a binary input file (.tpr), but this time, instead of passing the .tpr to genion, we will run the energy minimization through the GROMACS MD engine, mdrun.Assemble the binary input using grompp using this input parameter file: grompp -f -c -p-oMake sure you have been updating your file when running genbox and genion, or else you will get lots of nasty error messages ("number of coordinates in coordinate file does not match topology," etc).We are now ready to invoke mdrun to carry out the EM:mdrun -v-deffnm emThe -v flag is for the impatient: it makes mdrun verbose, such that it prints its progress to the screen at every step. The -deffnm flag will define the file names of the input and output. So, if you did not name your grompp output "," you will have to explicitly specify its name with the mdrun -s flag. In our case, we will get the following files:: ASCII-text log file of the EM process: Binary energy file: Binary full-precision trajectory: Energy-minimized structureThere are two very important factors to evaluate to determine if EM was successful. The first is the potential energy (printed at the end of the EM process, even without-v). Epot should be negative, and (for a simple protein in water) on the order of 105-106, depending on the system size and number of water molecules. The second important feature is the maximum force, Fmax, the target for which was set in - "emtol = " - indicating a target Fmax of no greater than 1000 kJ mol-1nm-1. It is possible to arrive at a reasonable Epot with Fmax > emtol. If this happens, your system may not be stable enough for simulation. Evaluate why it may be happening, and perhaps change your minimization parameters (integrator, emstep, etc).Let's do a bit of analysis. The file contains all of the energy terms that GROMACS collects during EM. You can analyze any .edr file using the GROMACS tools g_energy: g_energy -f-oAt the prompt, type "10 0" to select Potential (10); zero (0) terminates input. You will be shown the average of Epot, and a file called "" will be written. To plot this data, you will need the Xmgrace plotting tool. The resulting plot should look something like this, demonstrating the nice, steady convergence of Epot:Now that our system is at an energy minimum, we can begin real dynamics.Step Six: EquilibrationEM ensured that we have a reasonable starting structure, in terms of geometry and solvent orientation. To begin real dynamics, we must equilibrate the solvent and ions around the protein. If we were to attempt unrestrained dynamics at this point, the system may collapse. The reason is that the solvent is mostly optimized within itself, and not necessarily with the solute. It needs to be brought to the temperaturewe wish to simulate and establish the proper orientation about the solute (the protein). After we arrive at the correct temperature (based on kinetic energies), we will apply pressure to the system until it reaches the proper density. Remember that file that pdb2gmx generated a long time ago? We're going to use it now. The purpose of is to apply a position restraining force on the heavy atoms of the protein (anything that is not a hydrogen). Movement is permitted, but only after overcoming a substantial energy penalty. The utility of position restraints is that they allow us to equilibrate our solvent around our protein, without the added variable of structural changes in the protein.Equilibration is often conducted in two phases. The first phase is conducted under an NVT ensemble (constant Number of particles, Volume, and Temperature). This ensemble is also referred to as "isothermal-isochoric" or "canonical." The timeframe for such a procedure is dependent upon the contents of the system, but in NVT, the temperature of the system should reach a plateau at the desired value. If the temperature has not yet stabilized, additional time will be required. Typically, 50-100 ps should suffice, and we will conduct a 100-ps NVT equilibration for this exercise. Depending on your machine, this may take a while (just over an hour on a dual-core MacBook). Get the .mdp file here.We will call grompp and mdrun just as we did at the EM step:grompp -f-c -p -omdrun -deffnm nvtA full explanation of the parameters used can be found in the GROMACS manual, in addition to the comments provided. Take note of a few parameters in the .mdp file:gen_vel = yes: Initiates velocity generation. Using different random seeds (gen_seed) gives different initial velocities, and thus multiple (different) simulations can be conducted from the same starting structure.tcoupl = V-rescale: The velocity rescaling thermostat is an improvement upon the Berendsen weak coupling method, which did not reproduce a correct kinetic ensemble.pcoupl = no: Pressure coupling is not applied.Let's analyze the temperature progression, again using g_energy:g_energy -fType "15 0" at the prompt to select the temperature of the system and exit. The resulting plot should look something like the following:From the plot, it is clear that the temperature of the system quickly reaches the target value (300 K), and remains stable over the remainder of the equilibration. For this system, a shorter equilibration period (on the order of 50 ps) may have been adequate.Step Seven: Equilibration, Part 2The previous step, NVT equilibration, stabilized the temperature of the system. Prior to data collection, we must also stabilize the pressure (and thus also the density) of the system. Equilibration of pressure is conducted under an NPT ensemble, wherein the Number of particles, Pressure, and Temperature are allconstant. The ensemble is also called the "isothermal-isobaric" ensemble, and most closely resembles experimental conditions.The .mdp file used for a 100-ps NPT equilibration can be found here. It is not drastically different from the parameter file used for NVT equilibration. Note the addition of the pressure coupling section, using the Parrinello-Rahman barostat.A few other changes:continuation = yes: We are continuing the simulation from the NVT equilibration phasegen_vel = no: Velocities are read from the trajectory (see below)We will call grompp and mdrun just as we did for NVT equilibration. Note that we are now including the -t flag to include the checkpoint file from the NVT equilibration; this file contains all the necessary state variables to continue our simulation. To conserve the velocities produced during NVT, we must include this file. The coordinate file (-c) is the final output of the NVT simulation.grompp -f -c-t-p -omdrun -deffnm nptLet's analyze the pressure progression, again using g_energy:g_energy -f -oType "16 0" at the prompt to select the pressure of the system and exit. The resulting plot should look something like the following:The pressure value fluctuates widely over the course of the 100-ps equilibration phase, but this behavior is not unexpected. The running average of these data are plotted as the red line in the plot. Over the course of the equilibration, the average value of the pressure is bar.Let's take a look at density as well, this time using g_energy and entering "22 0" at the prompt.g_energy -f -oAs with the pressure, the running average of the density is also plotted in red. The average value over the course of 100 ps is kg m-3, close to the experimental value of 1000 kg m-3. The parameters for the SPC/E water model closely replicate experimental values for water. The density values are very stable over time, indicating that the system is well-equilibrated now with respect to pressure and density.Step Eight: Production MDUpon completion of the two equilibration phases, the system is now well-equilibrated at the desired temperature and pressure. We are now ready to release the position restraints and run production MD for data collection. The process is just like we have seen before, as we will make use of the checkpoint file (which in this case now contains preserve pressure coupling information) to grompp. We will run a 1-ns MD simulation, the script for which can be found here.grompp -f -c -t -p -omdrun -deffnm md_0_1The mdrun step is definitely best run in parallel on a cluster, since it will take several hours to complete. The proper command for executing such a job would be: mpirun -np X mdrun_mpi -deffnm md_0_1where X indicates the number of processors to be used for the simulation. Near the end of the grompp output, you should have seen a line like:Estimate for the relative computational load of the PME mesh part:The estimate for PME load will dictate how many processors should be dedicated to the PME calculation, and how many for the PP calculations. Refer to the GROMACS 4 publication and the manual for details. For a cubic box, the optimal setup will have a PME load of (3:1 PP:PME - we're in luck!); for a dodecahedral box, the optimal PME load is (2:1 PP:PME). When executing mdrun, the program should automatically determine the best number of processors to assign for the PP and PME calculations. Thus, make sure you indicate an appropriate number of nodes for your calculation (the value of -np X), so that you can get the best performance. For this system, I achieved roughly 14 ns/day performance on 24 CPU's (18 PP, 6 PME).Step Nine: AnalysisNow that we have simulated our protein, we should run some analysis on the system. What types of data are important? This is an important question to ask before running the simulation, so you should have some ideas about the types of data you will wantto collect in your own systems. For this tutorial, a few basic tools will be introduced.The first is trjconv, which is used as a post-processing tool to strip out coordinates, correct for periodicity, or manually alter the trajectory (time units, frame frequency, etc). For this exercise, we will use trjconv to account for any periodicity in the system. The protein will diffuse through the unit cell, and may appear to "jump" across to the other side of the box. To account for such actions, issue the following:trjconv -s-f -o -pbc mol-ur compactSelect 0 ("System") for output. We will conduct all our analyses on this "corrected" trajectory. Let's look at structural stability first. GROMACS has a built-in utility for RMSD calculations called g_rms. To use g_rms, issue this command:g_rms -s-f-o -tu nsChoose 4 ("Backbone") for both the least squares fit and the group for RMSD calculation. The -tu flag will output the results in terms of ns, even though the trajectory was written in ps. This is done for clarity of the output (especially if you have a long simulation - 1e+06 ps does not look as nice as 100 ns). The output plot will show the RMSD relative to the structure present in the minimized, equilibrated system:If we wish to calculate RMSD relative to the crystal structure, we could issue the following:g_rms -s -f -o -tu nsThe result will be something like:Both plots show the RMSD levels off to ~ nm (1 Å), indicating that the structure is very stable. Subtle differences between the plots indicate that the structure at t = 0 ns is slightly different from this crystal structure. This is to be expected, since it has been energy-minimized, and because the position restraints are not 100% perfect, as discussed previously.The radius of gyration of a protein is a measure of its compactness. If a protein is stably folded, it will likely maintain a relatively steady value of Rg. If a protein unfolds, its Rg will change over time. Let's analyze the radius of gyration for lysozyme in our simulation:g_gyrate -s -f-oWe can see from the reasonably invariant Rg values that the protein remains very stable, in its compact (folded) form over the course of 1 ns at 300 K. This result is not unexpected, but illustrates an advanced capacity of GROMACS analysis that comes built-in.SummaryYou have now conducted a molecular dynamics simulation with GROMACS, and analyzed some of the results. This tutorial should not be viewed as comprehensive. There are many more types of simulations that one can conduct with GROMACS (free energy calculations, non-equilibrium MD, and normal modes analysis, just to name a few).。

gromacs分子计算模拟流程

gromacs分子计算模拟流程English Answer:GROMACS Molecular Simulation Workflow.GROMACS is an open-source molecular simulation software package that is used to simulate the behavior of biomolecules, such as proteins, DNA, and lipids. The GROMACS workflow typically consists of the following steps:1. System preparation: This involves creating a molecular system that includes the molecule(s) of interest, as well as the surrounding solvent and any other necessary components. The system is typically prepared using a molecular editor, such as VMD or PyMOL.2. Force field selection: A force field is a set of equations that describe the interactions between atoms and molecules. The force field is selected based on the system being simulated.3. Simulation setup: This involves setting up the simulation parameters, such as the temperature, pressure, and time step. The simulation is also set up to output the desired data, such as the coordinates of the molecules and their energies.4. Simulation: The simulation is performed using a molecular dynamics engine. The engine integrates the equations of motion to update the positions and velocities of the molecules.5. Analysis: The simulation data is analyzed to extract information about the behavior of the system. This can be done using a variety of tools, such as VMD, PyMOL, and GROMACS utilities.Chinese Answer:GROMACS分子计算模拟流程。

gromacs使用手册

gromacs使用手册（原创实用版）目录1.Gromacs 简介2.Gromacs 的功能3.Gromacs 的使用方法4.Gromacs 的应用实例5.Gromacs 的未来发展正文1.Gromacs 简介Gromacs 是一个开源的生物大分子模拟软件，主要用于分子动力学和势能计算。

它由 Warren P.Moran 等人开发，最早的版本于 1996 年发布。

Gromacs 名字来源于“Groningen 分子模拟器”，这反映出它的起源地——荷兰格罗宁根大学。

2.Gromacs 的功能Gromacs 具有多种功能，包括：- 分子动力学模拟：Gromacs 可以使用不同的力场进行分子动力学模拟，包括经典的力场如 CHARMM、OPLS、AMBER 等，也可以使用更高级的力场如 GB/SA、ITS/SA 等。

- 势能计算：Gromacs 可以计算生物大分子的势能，包括分子间的相互作用能、分子内能等。

- 模拟过程中的能量计算：Gromacs 可以在模拟过程中实时计算各种能量，如动能、势能、内能等。

- 轨迹文件的输出：Gromacs 可以输出轨迹文件，方便用户进行后续的分析。

- 模拟过程中的原子运动轨迹可视化：Gromacs 具有可视化工具，可以实时显示原子在模拟过程中的运动轨迹。

3.Gromacs 的使用方法Gromacs 的使用方法相对简单，用户只需要按照以下步骤进行操作：- 准备输入文件：用户需要根据需要编写输入文件，包括分子结构文件、模拟参数文件、力场文件等。

- 运行 Gromacs：用户需要在终端中输入 Gromacs 命令，指定输入文件和输出文件。

- 查看输出文件：模拟完成后，用户可以在指定的输出文件中查看模拟结果。

4.Gromacs 的应用实例Gromacs 广泛应用于生物大分子的模拟研究，包括蛋白质结构预测、药物设计、生物材料研究等。

一个使用gromacs进行蛋白质模拟的入门教程

GROMACS TutorialLysozyme in WaterJustin LemkulDepartment of Biochemistry, Virginia TechThis example will guide a new user through the process of setting up a simulation system containing a protein (lysozyme) in a box of water, with ions. Each step will contain an explanation of input and output, using typical settings for general use.This tutorial assumes you are using a GROMACS version in the 4.5.x series.GROMACS TutorialStep One: Prepare the TopologyWe must d ownload the protein structure file we will be working with. For this tutorial, we will utilize hen egg white lysozyme (PDB code 1AKI). Go to the RCSB website and d ownload the PDB text for the crystal structure.Once you have d ownl oaded the structure, you can visualize the structure using a viewing program such as VMD, Chimera, PyMOL, etc. Once you've had a look at the molecule, you are going to want to strip out the crystal waters. Use a plain text editor like vi, emacs (Linux/Mac), or Notepad (Wind ows). Do not use word processing software! Delete the lines corresponding to these molecules (residue "HOH" in the PDB file). Note that such a procedure is not universally appropriate (i.e., the case of a bound active site water molecul e). For our intentions here, we do not need crystal water.Always check your .pdb fil e for entries listed under the comment MISSING, as these entries indicate either atoms or whole residues that are not present in the crystal structure. Terminal regions may be absent, and may not present a probl em for dynamics. Incompl ete internal sequences or any amino acid residues that have missing atoms will cause pdb2gmx to fail. These missing atoms/residues must be modeled in using other software packages. Also note that pdb2gmx is not magic. It cannot generate topologies for arbitrary molecul es, just the residues defined by the force fiel d (in the *.rtp fil es - generally proteins, nucl eic acids, and a very finite amount of cofactors, like NAD(H) and ATP).Now that the crystal waters are gone and we have verified that all the necessary atoms are present, the PDB file shoul d contain only protein atoms, and is ready to be input into the first GROMACS tool, pdb2gmx. The purpose of pdb2gmx is to generate three files:1.The topol ogy for the molecule.2. A position restraint file.3. A post-processed structure file.The topol ogy (topol.top by default) contains all the information necessary to define the mol ecule within a simulation. This information includ es nonbonded parameters (atom types and charges) as well as bond ed parameters (bonds, angles, and dihedrals). We will take a more detail ed l ook at the topol ogy once it has been generated.Execute pdb2gmx by issuing the following command:pdb2gmx -f 1AKI.pdb-o 1AKI_processed.gro-water spceThe structure will be processed by pdb2gmx, and you will be prompted to choose a force field:Select the Force Field:From '/usr/l ocal/gromacs/share/gromacs/top':1: AMBER03 force fiel d (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)2: AMBER94 force fiel d (Cornell et al., JACS 117, 5179-5197, 1995)3: AMBER96 force fiel d (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996)4: AMBER99 force fiel d (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000)5: AMBER99SB force field (Hornak et al., Proteins 65, 712-725, 2006)6: AMBER99SB-ILDN force field (Lind orff-Larsen et al., Proteins 78, 1950-58, 2010)7: AMBERGS force fiel d (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002)8: CHARMM27 all-atom force fiel d (with CMAP) - version 2.09: GROMOS96 43a1 force field10: GROMOS96 43a2 force field (improved alkane dihedrals)11: GROMOS96 45a3 force field (Schul er JCC 2001 22 1205)12: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656)13: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656)14: OPLS-AA/L all-atom force fiel d (2001 aminoacid dihedrals)15: [DEPRECATED] Encad all-atom force field, using full solvent charges16: [DEPRECATED] Encad all-atom force field, using scaled-d own vacuum charges17: [DEPRECATED] Gromacs force field (see manual)18: [DEPRECATED] Gromacs force field with hydrogens for NMRThe force field will contain the information that will be written to the topology. This is a very important choice! You should always read thoroughly about each force fiel d and d ecide which is most applicabl e to your situation. For this tutorial, we will use the all-atom OPLS force field, so type 14 at the command prompt, foll owed by 'Enter'.There are many other options that can be passed to pdb2gmx. Some are listed here:-ignh: Ignore H atoms in the PDB fil e; especially useful for NMR structures. Otherwise, if H atoms are present, they must be in the correct order and named exactly how GROMACS expects them to be.-ter: Interactively assign charge states for N- and C-termini.-inter: Interactively assign charge states for Glu, Asp, Lys, Arg, and His; assign disulfides to Cys.You have now generated three new fil es: 1AKI_processed.gro, topol.top, and posre.itp. 1AKI_processed.gro is a GROMACS-formatted structure file that contains all the atoms d efined within the force field (i.e., H atoms have been ad ded to the amino acids in the protein). The topol.top fil e is the system topol ogy (more on this in a minute). The posre.itp file contains information used to restrain the positions of heavy atoms (more on this later).Step Two: Examine the TopologyLet's l ook at what is in the output topol ogy (topol.top). Again, using a plain text editor, inspect its contents. After several comment lines (preceded by ;), you will find the foll owing:#include "oplsaa.ff/forcefield.itp"This line calls the parameters within the OPLS-AA force field. It is at the beginning of the file, indicating that all subsequent parameters are derived from this force fiel d. The next important line is [ moleculetype ], bel ow which you will find; Name nrexclProtein_A 3The name "Protein_A" d efines the mol ecul e name, based on the fact that the protein was labeled as chain A in the PDB file. There are 3 exclusions for bonded neighbors. More information on exclusions can be found in the GROMACS manual; a discussion of this information is beyond the scope of this tutorial.The next section d efines the [ atoms ] in the protein. The information is presented as columns:[ atoms ]; nr type resnr residue atom cgnr charge mass typeB chargeB massB; residue 1 LYS rtp LYSH q +2.01 opls_287 1 LYS N 1 -0.3 14.0067 ; qtot -0.32 opls_290 1 LYS H1 1 0.33 1.008 ; qtot 0.033 opls_290 1 LYS H2 1 0.33 1.008 ; qtot 0.364 opls_290 1 LYS H3 1 0.33 1.008 ; qtot 0.695 opls_293B 1 LYS CA 1 0.25 12.011 ; qtot 0.946 opls_140 1 LYS HA 1 0.06 1.008 ; qtot 1The interpretation of this information is as foll ows:•nr: Atom number•type: Atom type•resnr: Amino acid residue number•residue: The amino acid residue nameNote that this residue was "LYS" in the PDB file; the use of .rtp entry "LYSH" indicates that the residue is protonated (the predominant state at neutral pH).•atom: Atom name•cgnr: Charge group numberCharge groups d efine units of integer charge; they aid in speeding up calculations•charge: Self-explanatoryThe "qtot" descriptor keeps a running count of the total charge on the molecule•mass: Also self-explanatory•typeB, chargeB, massB: Used for free energy perturbation (not discussed here)Subsequent sections includ e [ bonds ], [ pairs ], [ angles ], and [ dihedrals ]. Some of these sections are self-explanatory (bonds, angles, and dihedrals). The parameters and function types associated with these sections are elaborated on in Chapter 5 of the GROMACS manual. Special 1-4 interactions are included under "pairs" (section 5.3.4 of the GROMACS manual).The remainder of the fil e involves defining a few other useful/necessary topologies, starting with position restraints. The "posre.itp" file was generated by pdb2gmx; it defines a force constant used to keep atoms in place during equilibration (more on this later).; Include Position restraint file#ifdef POSRES#include "posre.itp"#endifThis ends the "Protein_A" moleculetype d efinition. The remaind er of the topol ogy file is dedicated to defining other molecules and providing system-level descriptions. The next mol eculetype (by default) is the solvent, in this case SPC/E water. Other typical choices for water include SPC, TIP3P, and TIP4P. We chose this by passing "-water spce" to pdb2gmx. For an excellent summary of the many different water models, click here, but be aware that not all of these models are present within GROMACS.; Include water topology#include "oplsaa.ff/spce.itp"#ifdef POSRES_WATER; Position restraint for each water oxygen[ position_restraints ]; i funct fcx fcy fcz1 1 1000 1000 1000#endifAs you can see, water can also be position-restrained, using a force constant (kpr) of 1000 kJ mol-1 nm-2.Ion parameters are includ ed next:; Include generic topology for ions#include "oplsaa.ff/ions.itp"Finally come system-level definitions. The [ system ] directive gives the name of the system that will be written to output files during the simulation. The [ molecul es ] directive lists all of the mol ecul es in the system.[ system ]; NameLYSOZYME[ molecules ]; Compound #molsProtein_A 1A few key notes about the [ molecules ] directive:1.The order of the listed mol ecules must exactly match the ord er of the mol ecul es in thecoordinate (in this case, .gro) file.2.The names listed must match the [ moleculetype ] name for each species, not residuenames or anything else.If you fail to satisfy these concrete requirements at any time, you will get fatal errors from grompp (discussed later) about mismatched names, mol ecules not being found, or a number of others.Now that we have examined the contents of a topol ogy file, we can continue building our system.Step Three: Defining the Unit Cell & Adding SolventNow that you are familiar with the contents of the GROMACS topology, it is time to continue building our system. In this example, we are going to be simulating a simple aqueous system. It is possibl e to simulate proteins and other mol ecul es in different solvents, provid ed that good parameters are availabl e for all species involved.There are two steps to defining the box and filling it with solvent:1.Define the box dimensions using editconf.2.Fill the box with water using genbox.You are now presented with a choice as to how to treat the unit cell. For the purpose of this tutorial, we will use a simpl e cubic box as the unit cell. As you become more comfortable with periodic boundary conditions and box types, I highly recommend the rhombic dodecahedron, as its volume is ~71% of the cubic box of the same periodic distance, thus saving on the number of water molecules that need to be add ed to solvate the protein.Let's d efine the box using editconf:editconf-f 1AKI_processed.gro-o 1AKI_newbox.gro-c -d 1.0 -bt cubicThe above command centers the protein in the box (-c), and places it at least 1.0 nm from the box edge (-d 1.0). The box type is defined as a cube (-bt cubic). The distance to the edge of the box is an important parameter. Since we will be using periodic boundary conditions, we must satisfy the minimum image convention. That is, a protein should never see its periodic image, otherwise the forces calculated will be spurious. Specifying a solute-box distance of 1.0 nm will mean that there are at least 2.0 nm between any two periodic images of a protein. This distance will be sufficient for just about any cutoff scheme commonly used in simulations.Now that we have defined a box, we can fill it with solvent (water). Solvation is accomplished using genbox:genbox -cp 1AKI_newbox.gro-cs spc216.gro-o 1AKI_solv.gro-p topol.topThe configuration of the protein (-cp) is contained in the output of the previous editconf step, and the configuration of the solvent (-cs) is part of the standard GROMACS installation. We are using spc216.gro, which is a generic equilibrated 3-point solvent model. You can use spc216.gro as the solvent configuration for SPC, SPC/E, or TIP3P water, since they are all three-point water models. The output is called 1AKI_solv.gro, and we tell genbox the name of the topology file (topol.top) so it can be modified. Note the changes to the [ mol ecul es ] directive of topol.top:[ mol ecul es ]; Compound #molsProtein_A 1SOL 10832What genbox has d one is keep track of how many water mol ecul es it has ad ded, which it then writes to your topol ogy to refl ect the changes that have been mad e. Note that if you use any other (non-water) solvent, genbox will not make these changes to your topol ogy! Its compatibility with updating water molecules is hard-coded.Step Four: Adding IonsWe now have a solvated system that contains a charged protein. The output of pdb2gmx told us that the protein has a net charge of +8e (based on its amino acid composition). If you missed this information in the pdb2gmx output, look at the last line of your [ atoms ] directive in topol.top; it shoul d read (in part) "qtot 8." Since life d oes not exist at a net charge, we must ad d ions to our system.The tool for adding ions within GROMACS is called genion. What genion does is read through the topology and replace water molecules with the ions that the user specifies. The input is call ed a run input fil e, which has an extension of .tpr; this file is produced by the GROMACS tool grompp (GROMACS pre-processor), which will also be used later when we run our first simulation. What grompp does is process the coordinate fil e and topology (which describes the mol ecules) to generate an atomic-level input (.tpr). The .tpr file contains all the parameters for all of the atoms in the system.To produce a .tpr file with grompp, we will need an additional input fil e, with the extension .mdp (molecular dynamics parameter file); grompp will assemble the parameters specified in the .mdp fil e with the coordinates and topol ogy information to generate a .tpr file.An .mdp fil e is normally used to run energy minimization or an MD simulation, but in this case is simply used to generate an atomic description of the system. An example .mdp file (the one we will use) can be d ownload ed here.In reality, the .mdp file used at this step can contain any legitimate combination of parameters.I typically use an energy-minimization script, because they are very basic and d o not involve any complicated parameter combinations.Assembl e your .tpr file with the foll owing:grompp -f ions.mdp-c 1AKI_solv.gro-p topol.top -o ions.tprNow we have an atomic-level description of our system in the binary fil e ions.tpr. We will pass this file to genion:genion -s ions.tpr-o 1AKI_solv_ions.gro-p topol.top-pname NA-nname CL-nn 8When prompted, choose group 13 "SOL" for embedding ions. You d o not want to replace parts of your protein with ions.In the genion command, we provid e the structure/state file (-s) as input, generate a .gro file as output (-o), process the topology (-p) to reflect the removal of water molecules and addition of ions, d efine positive and negative ion names (-pname and -nname, respectively), and tell genion to ad d only the ions necessary to neutralize the net charge on the protein by adding the correct number of negative ions (-nn 8). You could also use genion to add a specified concentration of ions in addition to simply neutralizing the system by specifying the-neutral and -conc options in conjunction. Refer to the genion man page for information on how to use these options.The names of the ions specified with -pname and -nname were force fiel d-specific in previous versions of GROMACS, but have been standardized as of version 4.5. The specified atom names are always the el emental symbol in all capital letters, along with the [ mol eculetype ]. Residue names may or may not append the sign of the charge (+/-). Refer to ions.itp for proper nomenclature if you encounter difficulties.Your [ molecules ] directive should now look like:[ molecules ]; Compound #molsProtein_A 1SOL 10824CL 8Step Five: Energy MinimizationThe solvated, electroneutral system is now assembled. Before we can begin dynamics, we must ensure that the system has no steric clashes or inappropriate geometry. The structure is relaxed through a process called energy minimization (EM).The process for EM is much like the addition of ions. We are once again going to use grompp to assemble the structure, topology, and simulation parameters into a binary input file (.tpr), but this time, instead of passing the .tpr to genion, we will run the energy minimization through the GROMACS MD engine, mdrun.Assembl e the binary input using grompp using this input parameter file:grompp -f minim.mdp-c 1AKI_solv_ions.gro -p topol.top-o em.tprMake sure you have been updating your topol.top file when running genbox and genion, or else you will get lots of nasty error messages ("number of coordinates in coordinate file does not match topology," etc).We are now ready to invoke mdrun to carry out the EM:mdrun -v-deffnm emThe -v flag is for the impatient: it makes mdrun verbose, such that it prints its progress to the screen at every step. The -deffnm flag will define the file names of the input and output. So, if you did not name your grompp output "em.tpr," you will have to explicitly specify its name with the mdrun -s flag. In our case, we will get the foll owing files:•em.log: ASCII-text log file of the EM process•em.edr: Binary energy file•em.trr: Binary full-precision trajectory•em.gro: Energy-minimized structureThere are two very important factors to evaluate to d etermine if EM was successful. The first is the potential energy (printed at the end of the EM process, even without -v). Epot shoul d be negative, and (for a simple protein in water) on the ord er of 105-106, depending on the system size and number of water molecul es. The second important feature is the maximum force, Fmax, the target for which was set in minim.mdp - "emtol = 1000.0" - indicating a target Fmax of no greater than 1000 kJ mol-1 nm-1. It is possible to arrive at a reasonable Epot with Fmax > emtol. If this happens, your system may not be stable enough for simulation. Evaluate why it may be happening, and perhaps change your minimization parameters (integrator, emstep, etc).Let's do a bit of analysis. The em.edr fil e contains all of the energy terms that GROMACS collects during EM. You can analyze any .edr fil e using the GROMACS tools g_energy:g_energy -f em.edr-o potential.xvgAt the prompt, type "10 0" to select Potential (10); zero (0) terminates input. You will be shown the average of Epot, and a file called "potential.xvg" will be written. To pl ot this data, you will need the Xmgrace plotting tool. The resulting plot should look something like this, demonstrating the nice, steady convergence of Epot:Now that our system is at an energy minimum, we can begin real dynamics.Step Six: EquilibrationEM ensured that we have a reasonabl e starting structure, in terms of geometry and solvent orientation. To begin real dynamics, we must equilibrate the solvent and ions around the protein. If we were to attempt unrestrained dynamics at this point, the system may collapse. The reason is that the solvent is mostly optimized within itself, and not necessarily with the solute. It needs to be brought to the temperature we wish to simulate and establish the proper orientation about the solute (the protein). After we arrive at the correct temperature (based on kinetic energies), we will apply pressure to the system until it reaches the proper density.Remember that posre.itp fil e that pdb2gmx generated a long time ago? We're going to use it now. The purpose of posre.itp is to apply a position restraining force on the heavy atoms of the protein (anything that is not a hydrogen). Movement is permitted, but only after overcoming a substantial energy penalty. The utility of position restraints is that they all ow us to equilibrate our solvent around our protein, without the ad ded variabl e of structural changes in the protein.Equilibration is often conducted in two phases. The first phase is conducted und er an NVT ensemble (constant Number of particles, Volume, and Temperature). This ensemble is also referred to as "isothermal-isochoric" or "canonical." The timeframe for such a procedure is dependent upon the contents of the system, but in NVT, the temperature of the system should reach a plateau at the desired value. If the temperature has not yet stabilized, additional time will be required. Typically, 50-100 ps shoul d suffice, and we will conduct a100-ps NVT equilibration for this exercise. Depending on your machine, this may take a while (just over an hour on a dual-core MacBook). Get the .mdp file here.We will call grompp and mdrun just as we did at the EM step:grompp -f nvt.mdp-c em.gro -p topol.top-o nvt.tprmdrun -deffnm nvtA full explanation of the parameters used can be found in the GROMACS manual, in addition to the comments provid ed. Take note of a few parameters in the .mdp file:•gen_vel = yes: Initiates velocity generation. Using different random seeds (gen_seed) gives different initial velocities, and thus multiple (different) simulations can be conducted from the same starting structure.•tcoupl = V-rescale: The velocity rescaling thermostat is an improvement upon the Berendsen weak coupling method, which did not reproduce a correct kinetic ensemble.•pcoupl = no: Pressure coupling is not applied.Let's analyze the temperature progression, again using g_energy:g_energy -f nvt.edrType "15 0" at the prompt to sel ect the temperature of the system and exit. The resulting plot should look something like the foll owing:From the plot, it is cl ear that the temperature of the system quickly reaches the target value (300 K), and remains stabl e over the remainder of the equilibration. For this system, a shorter equilibration period (on the ord er of 50 ps) may have been adequate.Step Seven: Equilibration, Part 2The previous step, NVT equilibration, stabilized the temperature of the system. Prior to data collection, we must also stabilize the pressure (and thus also the d ensity) of the system. Equilibration of pressure is conducted und er an NPT ensemble, wherein the Number of particl es, Pressure, and Temperature are all constant. The ensembl e is also call ed the "isothermal-isobaric" ensemble, and most closely resembles experimental conditions.The .mdp file used for a 100-ps NPT equilibration can be found here. It is not drastically different from the parameter file used for NVT equilibration. Note the addition of the pressure coupling section, using the Parrinello-Rahman barostat.A few other changes:•continuation = yes: We are continuing the simulation from the NVT equilibration phase •gen_vel = no: Velocities are read from the trajectory (see below)We will call grompp and mdrun just as we did for NVT equilibration. Note that we are now including the -t flag to include the checkpoint file from the NVT equilibration; this file contains all the necessary state variables to continue our simulation. To conserve the velocities produced during NVT, we must includ e this file. The coordinate file (-c) is the final output of the NVT simulation.grompp -f npt.mdp -c nvt.gro-t nvt.cpt -p topol.top -o npt.tprmdrun -deffnm nptLet's analyze the pressure progression, again using g_energy:g_energy -f npt.edr -o pressure.xvgType "16 0" at the prompt to select the pressure of the system and exit. The resulting plot should look something like the foll owing:The pressure value fluctuates widely over the course of the 100-ps equilibration phase, but this behavior is not unexpected. The running average of these data are plotted as the red line in the plot. Over the course of the equilibration, the average value of the pressure is 1.05 bar.Let's take a l ook at density as well, this time using g_energy and entering "22 0" at the prompt.g_energy -f npt.edr -o density.xvgAs with the pressure, the running average of the density is also plotted in red. The average value over the course of 100 ps is 998.3 kg m-3, close to the experimental value of 1000 kg m-3. The parameters for the SPC/E water model cl osely replicate experimental values for water. The density values are very stable over time, indicating that the system is well-equilibrated now with respect to pressure and d ensity.Step Eight: Production MDUpon completion of the two equilibration phases, the system is now well-equilibrated at the desired temperature and pressure. We are now ready to rel ease the position restraints and run production MD for data coll ection. The process is just like we have seen before, as we will make use of the checkpoint file (which in this case now contains preserve pressure coupling information) to grompp. We will run a 1-ns MD simulation, the script for which can be found here.grompp -f md.mdp-c npt.gro-t npt.cpt-p topol.top -o md_0_1.tprmdrun -deffnm md_0_1The mdrun step is d efinitely best run in parallel on a cluster, since it will take several hours to complete. The proper command for executing such a job woul d be:mpirun -np X mdrun_mpi -deffnm md_0_1where X indicates the number of processors to be used for the simulation. Near the end of the grompp output, you should have seen a line like:Estimate for the relative computational l oad of the PME mesh part: 0.25The estimate for PME l oad will dictate how many processors shoul d be dedicated to the PME cal culation, and how many for the PP cal culations. Refer to the GROMACS 4 publication and the manual for details. For a cubic box, the optimal setup will have a PME load of 0.25 (3:1 PP:PME - we're in luck!); for a dodecahedral box, the optimal PME l oad is 0.33 (2:1 PP:PME). When executing mdrun, the program shoul d automatically determine the best number of processors to assign for the PP and PME cal culations. Thus, make sure you indicate an appropriate number of nod es for your calculation (the value of -np X), so that you can get the best performance. For this system, I achieved roughly 14 ns/day performance on 24 CPU's (18 PP, 6 PME).Step Nine: AnalysisNow that we have simulated our protein, we shoul d run some analysis on the system. What types of data are important? This is an important question to ask before running the simulation, so you shoul d have some ideas about the types of data you will want to collect in your own systems. For this tutorial, a few basic tools will be introduced.The first is trjconv, which is used as a post-processing tool to strip out coordinates, correct for periodicity, or manually alter the trajectory (time units, frame frequency, etc). For this exercise, we will use trjconv to account for any periodicity in the system. The protein will diffuse through the unit cell, and may appear to "jump" across to the other side of the box. To account for such actions, issue the foll owing:trjconv -s md_0_1.tpr-f md_0_1.xtc-o md_0_1_noPBC.xtc -pbc mol-ur compactSelect 0 ("System") for output. We will conduct all our analyses on this "corrected" trajectory. Let's l ook at structural stability first. GROMACS has a built-in utility for RMSD cal culations call ed g_rms. To use g_rms, issue this command:g_rms -s md_0_1.tpr-f md_0_1_noPBC.xtc-o rmsd.xvg-tu nsChoose 4 ("Backbone") for both the least squares fit and the group for RMSD cal culation. The -tu flag will output the results in terms of ns, even though the trajectory was written in ps. This is d one for clarity of the output (especially if you have a long simulation - 1e+06 ps d oes not look as nice as 100 ns). The output plot will show the RMSD relative to the structure present in the minimized, equilibrated system:If we wish to calculate RMSD relative to the crystal structure, we coul d issue the following:g_rms -s em.tpr -f md_0_1_noPBC.xtc-o rmsd_xtal.xvg -tu nsThe result will be something like:。

gromacs5.x入门教程中文版,超详细

GROMACS 教程第一步：准备拓扑一些GROMACS 基础知识随着版本5.0的GROMACS 的发布，所有的工具本质上是二进制名为“gmx”的模块。

这与以前的版本不同，其中每个工具都被调用为自己的命令。

在5.0中，这仍然通过符号链接工作，但是在将来版本中将会消失，所以最好习惯于新的做事方式。

要获取有关任何GROMACS 模块的帮助信息，您可以调用以下任一命令：gmx help (module)或者gmx (module) -h其中（module ）被您想要发出的命令的实际名称所替代。

信息将打印到终端，包括可用的算法，选项，所需的文件格式，已知的错误和限制等。

对于GROMACS 的新用户，调用常用命令的帮助信息是了解每个命令可以执行的一个很好的方法。

现在，到有趣的东西！溶菌酶教程我们必须下载我们将要工作的蛋白质结构文件。

在本教程中，我们将使用鸡蛋白溶菌酶（PDB 代码1AKI ）。

转到RCSB 网站并下载PDB 文本的晶体结构。

下载结构后，您可以使用VMD ，Chimera ，PyMOL 等查看程序来可视化结构。

一旦看到分子，您将要去除晶体水域。

使用纯文本编辑器，如vi ，emacs （Linux / Mac ）或记事本（Windows ）。

不要使用文字处理软件！删除与这些分子对应的行（PDB 文件中的残基“HOH”）。

注意，这种方法不是普遍适用的（即，结合的活性位点水分子的情况）。

对于我们这里的意图，我们不需要水晶。

请务必检查您的.pdb 文件中的评论MISSING 中列出的条目，因为这些条目表示不存在于晶体结构中的原子或全部残基。

终端区域可能不存在，并且可能不存在动态问题。

不完整的内部序列或任何具有缺失原子的氨基酸残基将导致pdb2gmx 失败。

这些缺失的原子/残基必须使用其他软件包进行建模。

还要注意，pdb2gmx 不是魔术。

它不能为任意分子产生拓扑，只是由力场定义的残基（在* .rtp 文件中通常是蛋白质，核酸和非常有限量的辅因子，如NAD （H ）和ATP ）。

Gromacs模拟基本流程

Gromacs 文件类型
1 .edr 包含能量信息的便携文件。 portable file that contains the energies 2 .gro 分子结构文件，当 pdb2gmx 程序被执行以生成分子拓扑文件的时候，它同时将结构文件 (.pdb) 转换为 gromacs 结构文件 (.gro)。gro 文件和 pdb 文件最大的不同在于 gro 还能保存速度信息。当然，如果你不需要速度的话，在程序中你可以一直使用 pdb 文件。genbox 程序用来生成一个盒子，盒子里面拥有溶剂分子环绕的肽。首先 editconf 程序应该用来定义一个围绕分子并具有合适尺寸的盒子。genbox 将一个溶质分子(肽)溶解到溶剂（在此处是水）中去。genbox 输出的是一个肽溶解于水的 gromacs 结构文件 (.gro)。genbox 程序同时更改分子拓扑文件 (.top，由 pdb2gmx 生成) 来添加溶剂至拓扑。此文件的从左到右各列信息分别是：剩余的数量，剩余的名字，原子名字，原子数量，X、Y、Z 坐标(单位 nm)，X、Y、Z 速度(单位 nm/ps) 3 .itp 包含拓扑文件，这些文件要被包含在系统拓扑文件内。 include topology (ascii),The itp file extension stands for include toplogy. These files are included in topology files ( with the top extension ) 4 .mdp 分子动力学参数文件包含关于分子动力学模拟的全部信息。例如，时间步长，步数，温度，压力等等。操作这样一个文件最容易的方法是修改一个简单的 .mdp 例子文件。这里可以在线获得例子文件。预处理模拟控制输出控制邻居搜索控制•温度耦合压力耦合速度生成 5 .ndx 有时你可能需要一个索引文件，文件中记录对一组原子执行的特殊动作（例如，温度连接，加速，冷冻）。通常默认的索引组已经足够了。 6 .pdb Protein Data Bank(.pdb) ， Files with the .pdb extension are molecular structure files in the protein databank file format. The protein databank file format describes the positions of atoms in a molecular structure. Coordinates are read from the ATOM and HETATM records, until the file ends or an ENDMDL record is encountered. GROMACS programs can read and write a simlation box in the CRYST1 entry. The pdb format can also be used as a trajectory format: several structures, seperated by ENDMDL, can be read from or written to one file.

蛋白质和表面活性剂混合体系的MD模拟_1_

蛋白质和表面活性剂混合体系的MD模拟冯剑工作室滁州学院 2011/1/28本实例在网络材料基础上编辑而成，主要用于向本科生介绍如何使用GROMACS来进行分子动力学模拟。

介绍软件版本使用GROMACS 4.0.7分子动力学软件包。

蛋白质为Lysozyme（1AKI）、表面活性剂为十二烷基硫酸钠（SDS）。

模拟使用的蛋白质分子结构数据可以从RCSB Protein Data Bank（/pdb/home/home.do）网站下载。

该网站的蛋白质和核酸结构数据是实验获得。

结构解决较好的分子可以用于模拟。

该网站没有的分子，模拟中又需要。

则首先从网络检索，获得已经被使用的结构数据。

确实查不到，尤其是那些有机小分子，可以通过有关分子结构绘制软件直接绘制，如：ChemSketch、ChemBioOffice等。

某些商业模拟软件带有分子绘制模块，如Accelrys的Material Studio和DS Visualizer。

手工绘制的分子可以使用有关分子力学或量子化学软件优化结构。

一、准备拓扑文件将1aki.pdb转换为gro文件。

其中的top文件为topol.top。

重原子约束文件为lysozyme.itp。

水模型选择spce。

本例力场选择gromos 43a1力场。

具体的命令如下：pdb2gmx -f 1aki.pdb -o 1aki.gro -i lysozyme.itp –p topol.top -water spce运行界面如下，输入0选择43a1力场（不同力场适用场合和条件有差别，应根据模拟对象具体选择）。

在部分pdb文件中给出晶体信息CRYST1 59.062 68.451 30.517 90.00 90.00 90.00 P 21 21 21 4其中x、y和z的大小为Å。

生成的gro文件中坐标单位为nm。

Gro文件最后一行为分子x、y和z方向的最大宽度。

查看生成的top文件，[atom]块中的注释qtot最后给出了净电荷为+8。

gromacs计算方法

gromacs计算方法一、简介GROMACS是一款广泛应用于生物分子模拟和分子动力学研究的软件包。

它基于Molecular Dynamics（分子动力学）原理，通过模拟分子的运动来研究其结构和性质。

GROMACS适用于各种生物分子，如蛋白质、核酸、脂质等，并支持多种常用的生物分子模拟软件库。

二、软件安装与配置1. 下载并安装GROMACS软件包，确保正确配置环境变量。

2. 准备模拟系统文件，包括原子坐标、力场参数、初始构型等。

3. 确保系统文件符合GROMACS支持的文件格式要求。

三、分子动力学模拟1. 导入系统文件到GROMACS中，设置初始参数和时间步长。

2. 运行分子动力学模拟，观察系统的运动变化。

3. 分析模拟结果，包括能量、构型、键长等。

四、常用命令与参数1. `mdrun`：主运行命令，用于启动模拟。

* `-s` 指定系统文件路径。

* `-dt` 设置时间步长。

* `-potential` 指定力场参数。

* `-append` 用于将模拟结果追加到已有文件。

2. `analyze`：分析模拟结果，包括能量、构型等。

* `energy`：计算系统能量。

* `rmsdist`：计算指定原子间的平均距离。

3. `modify`：修改系统参数，如温度和压力等。

* `unpair`：删除两个原子间的非键相互作用。

* `unit kg`：将系统单位设为质量单位。

4. `plot`：生成模拟结果的图表。

* `friction=0.05 dt`：生成时间步长与摩擦力的关系图。

五、输出文件与数据处理1. GROMACS生成的主要输出文件包括.log、.out和.tpr等。

* .log文件记录模拟过程中的日志信息。

* .out文件包含每个时间步长的详细信息，如能量、构型等。

* .tpr文件记录了整个模拟过程的运动轨迹。

2. 常用的数据处理工具包括Excel、Matplotlib和VMD等。

* Excel：用于数据整理和统计分析。

gromacs模拟基本流程

gromacs模拟基本流程下载温馨提示:该文档是我店铺精心编制而成，希望大家下载以后，能够帮助大家解决实际的问题。

文档下载后可定制随意修改，请根据实际需要进行相应的调整和使用，谢谢!并且，本店铺为大家提供各种各样类型的实用资料，如教育随笔、日记赏析、句子摘抄、古诗大全、经典美文、话题作文、工作总结、词语解析、文案摘录、其他资料等等，如想了解不同资料格式和写法，敬请关注!Download tips: This document is carefully compiled by theeditor. I hope that after you download them,they can help yousolve practical problems. The document can be customized andmodified after downloading,please adjust and use it according toactual needs, thank you!In addition, our shop provides you with various types ofpractical materials,such as educational essays, diaryappreciation,sentence excerpts,ancient poems,classic articles,topic composition,work summary,word parsing,copy excerpts,other materials and so on,want to know different data formats andwriting methods,please pay attention!GROMACS 是一款常用的分子动力学模拟软件，以下是使用 GROMACS 进行模拟的基本流程：1. 准备输入文件：拓扑文件：描述分子的化学结构和连接方式。

GROMACS使用教程

GROMACS教程一Gromacs基本模拟流程 (3)1 下载pdb文件 (3)2 用pdb2gmx 处理pdb 文件 (3)3 建立盒子 (3)5 设置能量最小化 (4)6 用grompp程序进行文件处理 (6)7 使用genion 和tpr文件添加离子 (6)8 用fws_ion.pdb来产生能量最小化的输入文件 (6)9 在后台运行能量最小化（在命令后加&） (7)二设置位置限制性动力学模拟 (7)三设置非限制性动力学模拟 (9)1 如何重启一个计算 (11)2 如何延长一个计算 (11)3 如何设置并行计算 (11)五模拟结果分析 (12)1 如何将特定帧的轨迹保存成*.pdb文件 (12)2 用ngmx观察轨迹文件（也可以用VMD观察轨迹文件) (12)3 比较常用的分析工具 (14)3.3 g_covar 计算斜方差 (16)3.4 g_energy 能量数据作图，如压力、体积、密度等 (16)3.5 g_gyrate 测量回旋半径 (17)3.6 g_rms 与g_rmsdist 计算结构的RMSD 值 (17)3.7 g_rmsf 计算原子位置的根均方波动（rmsf ） (18)3.8 do_dssp 计算模型的二级结构 (20)3.9 g_hbond 计算模拟过程中分子间的氢键的数目、距离或角度 (21)3.10 g_saltbr 分析模拟中残基间的盐桥 (21)GROMACS 是一个使用经典分子动力学理论研究蛋白质动力学的高端的高效的工具。

GROMACS是遵守GNU许可的免费软件，可以从以下站点下载：，并且可以在linux和Windows上使用。

在本教程中，将研究一个从漏斗形蜘蛛的毒液中分离的毒素。

我们将使用显性溶剂动力学的方法来进行研究。

首先比较真空中和溶解的模型。

我们将把毒素肽溶在水盒子里，紧接着用牛顿运动定律加以平衡。

我们还将比较偿离子在显性溶剂动力学中的影响。

gromacs分子动力学操作

GROMACS分子动力学操作1.简介G R OM AC S是一款广泛应用于分子动力学模拟的软件，被广泛用于研究生物、化学和材料科学领域的分子系统。

本文档将介绍GR OM AC S的一些基本操作，包括准备输入文件、运行模拟、分析模拟结果等。

2.准备输入文件2.1.文件格式G R OM ACS支持多种输入文件格式，常用的包括PD B（蛋白质数据银行）格式和G RO格式。

PD B格式适用于分子结构的初始构建，而G RO格式则适用于具体模拟。

2.2.拓扑文件在进行分子动力学模拟之前，需要准备一个拓扑文件。

拓扑文件描述了分子的力场参数，包括原子类型、键长、键角、二面角等信息。

可以使用G RO MA CS提供的工具如p db2g mx来生成拓扑文件。

3.运行模拟3.1.系统能量最小化在进行分子动力学模拟之前，一般需要对系统进行能量最小化。

能量最小化的目的是使体系趋于较稳定的构型，以便开始后续的模拟。

可以使用G RO MA CS中的mdr u n工具进行能量最小化。

3.2.动力学模拟动力学模拟是分子动力学模拟的核心步骤，它模拟了分子系统随时间演化的过程。

可以使用G RO M A CS中的mdr u n工具来进行动力学模拟。

在模拟过程中需要指定模拟的时间步长、温度、压强等参数。

4.分析模拟结果4.1.轨迹文件在动力学模拟过程中，G RO MA C S会生成一个包含分子轨迹信息的文件。

可以使用GR OM AC S提供的工具来对轨迹文件进行分析，比如计算分子的动力学性质、计算分子间相互作用等。

4.2.动力学性质分析可以使用GR OM AC S提供的工具来计算分子的一些动力学性质，比如体系的能量、压强、温度等。

这些性质可以帮助我们理解分子的行为和性质。

5.总结本文档简要介绍了GR O MA CS分子动力学操作的一些基本步骤，包括准备输入文件、运行模拟和分析模拟结果。

通过学习和应用这些操作，我们可以更好地理解和研究分子系统的行为和性质。

一个使用gromacs进行蛋白质模拟的入门教程

一个使用g r o m a c s进行蛋白质模拟的入门教程-CAL-FENGHAI.-(YICAI)-Company One1GROMACS TutorialLysozyme in WaterJustin LemkulDepartment of Biochemistry, Virginia TechThis example will guide a new user through the process of setting up a simulation system containing a protein (lysozyme) in a box of water, with ions. Each step will contain an explanation of input and output, using typical settings for general use. This tutorial assumes you are using a GROMACS version in the series.GROMACS TutorialStep One: Prepare the TopologyWe must download the protein structure file we will be working with. For this tutorial, we will utilize hen egg white lysozyme (PDB code 1AKI). Go to the RCSB website and download the PDB text for the crystal structure.Once you have downloaded the structure, you can visualize the structure using a viewing program such as VMD, Chimera, PyMOL, etc. Once you've had a look at the molecule, you are going to want to strip out the crystal waters. Use a plain text editor like vi, emacs (Linux/Mac), or Notepad (Windows). Do not use word processing software! Delete the lines corresponding to these molecules (residue "HOH" in the PDB file). Note that such a procedure is not universally appropriate ., the case of a bound active site water molecule). For our intentions here, we do not need crystal water.Always check your .pdb file for entries listed under the comment MISSING, as these entries indicate either atoms or whole residues that are not present in the crystal structure. Terminal regions may be absent, and may not present a problem for dynamics. Incomplete internal sequences or any amino acid residues that have missing atoms will cause pdb2gmx to fail. These missing atoms/residues must be modeled in using other software packages. Also note that pdb2gmx is not magic. It cannot generate topologies for arbitrary molecules, just the residues defined by the force field (in the *.rtp files - generally proteins, nucleic acids, and a very finite amount of cofactors, like NAD(H) and ATP).Now that the crystal waters are gone and we have verified that all the necessary atoms are present, the PDB file should contain only protein atoms, and is ready to be input into the first GROMACS tool, pdb2gmx. The purpose of pdb2gmx is to generate three files:1.The topology for the molecule.2. A position restraint file.3. A post-processed structure file.The topology by default) contains all the information necessary to define the molecule within a simulation. This information includes nonbonded parameters (atom types and charges) as well as bonded parameters (bonds, angles, anddihedrals). We will take a more detailed look at the topology once it has been generated.Execute pdb2gmx by issuing the following command:pdb2gmx -f-o-water spceThe structure will be processed by pdb2gmx, and you will be prompted to choose a force field:Select the Force Field:From '/usr/local/gromacs/share/gromacs/top':1: AMBER03 force field (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)2: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)3: AMBER96 force field (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996)4: AMBER99 force field (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000)5: AMBER99SB force field (Hornak et al., Proteins 65, 712-725, 2006)6: AMBER99SB-ILDN force field (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010) 7: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002)8: CHARMM27 all-atom force field (with CMAP) - version9: GROMOS96 43a1 force field10: GROMOS96 43a2 force field (improved alkane dihedrals)11: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205)12: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656)13: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656)14: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)15: [DEPRECATED] Encad all-atom force field, using full solvent charges16: [DEPRECATED] Encad all-atom force field, using scaled-down vacuum charges 17: [DEPRECATED] Gromacs force field (see manual)18: [DEPRECATED] Gromacs force field with hydrogens for NMRThe force field will contain the information that will be written to the topology. This is a very important choice! You should always read thoroughly about each force field and decide which is most applicable to your situation. For this tutorial, we will use the all-atom OPLS force field, so type 14 at the command prompt, followed by'Enter'.There are many other options that can be passed to pdb2gmx. Some are listed here: -ignh: Ignore H atoms in the PDB file; especially useful for NMR structures. Otherwise, if H atoms are present, they must be in the correct order and named exactly how GROMACS expects them to be.-ter: Interactively assign charge states for N- and C-termini.-inter: Interactively assign charge states for Glu, Asp, Lys, Arg, and His; assign disulfides to Cys.You have now generated three new files: , , and . is a GROMACS-formatted structure file that contains all the atoms defined within the force field ., H atoms have been added to the amino acids in the protein). The file is the system topology (more on this in a minute). The file contains information used to restrain the positions of heavy atoms (more on this later).Step Two: Examine the TopologyLet's look at what is in the output topology . Again, using a plain text editor, inspect its contents. After several comment lines (preceded by ;), you will find the following: #include ""This line calls the parameters within the OPLS-AA force field. It is at the beginning of the file, indicating that all subsequent parameters are derived from this force field. The next important line is[moleculetype ], below which you will find; Name nrexclProtein_A 3The name "Protein_A" defines the molecule name, based on the fact that the protein was labeled as chain A in the PDB file. There are 3 exclusions for bonded neighbors. More information on exclusions can be found in the GROMACS manual; a discussion of this information is beyond the scope of this tutorial.The next section defines the[ atoms ]in the protein. The information is presented as columns:[ atoms ]; nr type resnr residue atom cgnr charge mass typeB chargeB massB ; residue 1 LYS rtp LYSH q +1 opls_287 1 LYS N 1 ; qtot2 opls_290 1 LYS H1 1 ; qtot3 opls_290 1 LYS H2 1 ; qtot4 opls_290 1 LYS H3 1 ; qtot5 opls_293B 1 LYS CA 1 ; qtot6 opls_140 1 LYS HA 1 ; qtot 1The interpretation of this information is as follows:nr: Atom numbertype: Atom typeresnr: Amino acid residue numberresidue: The amino acid residue nameNote that this residue was "LYS" in the PDB file; the use of .rtp entry "LYSH"indicates that the residue is protonated (the predominant state at neutralpH).atom: Atom namecgnr: Charge group numberCharge groups define units of integer charge; they aid in speeding upcalculationscharge: Self-explanatoryThe "qtot" descriptor keeps a running count of the total charge on themoleculemass: Also self-explanatorytypeB, chargeB, massB: Used for free energy perturbation (not discussedhere)Subsequent sections include[ bonds ], [ pairs ], [ angles ],and[ dihedrals ]. Some of these sections are self-explanatory (bonds, angles, and dihedrals). The parameters and function types associated with these sections are elaborated on in Chapter 5 of the GROMACS manual. Special 1-4 interactions are included under "pairs" (section of the GROMACS manual).The remainder of the file involves defining a few other useful/necessary topologies, starting with position restraints. The "" file was generated by pdb2gmx; it defines a force constant used to keep atoms in place during equilibration (more on this later). ; Include Position restraint file#ifdef POSRES#include ""#endifThis ends the "Protein_A" moleculetype definition. The remainder of the topology file is dedicated to defining other molecules and providing system-level descriptions. The next moleculetype (by default) is the solvent, in this case SPC/E water. Other typical choices for water include SPC, TIP3P, and TIP4P. We chose this by passing "-water spce" to pdb2gmx. For an excellent summary of the many different water models, click?here, but be aware that not all of these models are present within GROMACS.; Include water topology#include ""#ifdef POSRES_WATER; Position restraint for each water oxygen[ position_restraints ]; i funct fcx fcy fcz1 1 1000 1000 1000#endifAs you can see, water can also be position-restrained, using a force constant (kpr) of 1000 kJ mol-1nm-2.Ion parameters are included next:; Include generic topology for ions#include ""Finally come system-level definitions. The[ system ]directive gives the name of the system that will be written to output files during the simulation.The[ molecules ]directive lists all of the molecules in the system.[ system ]; NameLYSOZYME[ molecules ]; Compound #molsProtein_A 1A few key notes about the[ molecules ]directive:1.The order of the listed molecules mustexactlymatch the order of themolecules in the coordinate (in this case, .gro) file.2.The names listed must match the[ moleculetype ]name for each species, notresidue names or anything else.3.If you fail to satisfy these concrete requirements at any time, you will get fatal errors from grompp (discussed later) about mismatched names, molecules not being found, or a number of others.Now that we have examined the contents of a topology file, we can continue building our system.Step Three: Defining the Unit Cell & Adding SolventNow that you are familiar with the contents of the GROMACS topology, it is time to continue building our system. In this example, we are going to be simulating a simple aqueous system. It is possible to simulate proteins and other molecules in different solvents, provided that good parameters are available for all species involved.There are two steps to defining the box and filling it with solvent:1.Define the box dimensions using editconf.2.Fill the box with water using genbox.You are now presented with a choice as to how to treat the unit cell. For the purpose of this tutorial, we will use a simple cubic box as the unit cell. As you become more comfortable with periodic boundary conditions and box types, I highly recommend the rhombic dodecahedron, as its volume is ~71% of the cubic box of the same periodic distance, thus saving on the number of water molecules that need to be added to solvate the protein.Let's define the box using editconf:editconf-f-o -c -d -bt cubicThe above command centers the protein in the box (-c), and places it at least nm from the box edge (-d . The box type is defined as a cube (-bt cubic). The distance to the edge of the box is an important parameter. Since we will be using periodic boundary conditions, we must satisfy the minimum image convention. That is, aprotein should never see its periodic image, otherwise the forces calculated will be spurious. Specifying a solute-box distance of nm will mean that there are at least nm between any two periodic images of a protein. This distance will be sufficient for just about any cutoff scheme commonly used in simulations.Now that we have defined a box, we can fill it with solvent (water). Solvation is accomplished using genbox:genbox -cp-cs-o -pThe configuration of the protein (-cp) is contained in the output of the previous editconf step, and the configuration of the solvent (-cs) is part of the standard GROMACS installation. We are using , which is a generic equilibrated 3-point solvent model. You can use as the solvent configuration for SPC, SPC/E, or TIP3P water, since they are all three-point water models. The output is called , and we tell genbox the name of the topology file so it can be modified. Note the changes to the[ molecules ]directive of :[ molecules ]; Compound #molsProtein_A 1SOL 10832What genbox has done is keep track of how many water molecules it has added, which it then writes to your topology to reflect the changes that have been made. Note that if you use any other (non-water) solvent, genbox will not make these changes to your topology! Its compatibility with updating water molecules is hard-coded.Step Four: Adding IonsWe now have a solvated system that contains a charged protein. The output ofpdb2gmx told us that the protein has a net charge of +8e(based on its amino acid composition). If you missed this information in the pdb2gmx output, look at the last line of your[ atoms ]directive in ; it should read (in part) "qtot 8." Since life does not exist at a net charge, we must add ions to our system.The tool for adding ions within GROMACS is called genion. What genion does is read through the topology and replace water molecules with the ions that the user specifies. The input is called a run input file, which has an extension of .tpr; this file is produced by the GROMACS tool grompp (GROMACSpre-processor), which will also be used later when we run our first simulation. What grompp does is process thecoordinate file and topology (which describes the molecules) to generate an atomic-level input (.tpr). The .tpr file contains all the parameters for all of the atoms in the system.To produce a .tpr file with grompp, we will need an additional input file, with the extension .mdp (moleculardynamics parameter file); grompp will assemble the parameters specified in the .mdp file with the coordinates and topology information to generate a .tpr file.An .mdp file is normally used to run energy minimization or an MD simulation, but in this case is simply used to generate an atomic description of the system. An example .mdp file (the one we will use) can be downloadedhere.In reality, the .mdp file used at this step can contain any legitimate combination of parameters. I typically use an energy-minimization script, because they are very basic and do not involve any complicated parameter combinations.Assemble your .tpr file with the following:grompp -f -c-p -oNow we have an atomic-level description of our system in the binary file . We will pass this file to genion:genion -s-o-p -pname NA-nname CL-nn 8When prompted, choose group 13 "SOL" for embedding ions. You do not want to replace parts of your protein with ions.In the genion command, we provide the structure/state file (-s) as input, generate a .gro file as output (-o), process the topology (-p) to reflect the removal of water molecules and addition of ions, define positive and negative ion names (-pname and -nname, respectively), and tell genion to add only the ions necessary to neutralize the net charge on the protein by adding the correct number of negative ions (-nn 8). You could also use genion to add a specified concentration of ions in addition to simply neutralizing the system by specifying the -neutral and -conc options in conjunction. Refer to the genion man page for information on how to use these options.The names of the ions specified with -pname and -nname were force field-specific in previous versions of GROMACS, but have been standardized as of version . The specified atom names are always the elemental symbol in all capital letters, along with the[ moleculetype ]. Residue names may or may not append the sign of the charge (+/-). Refer to for proper nomenclature if you encounter difficulties.Your[ molecules ]directive should now look like:[ molecules ]; Compound #molsProtein_A 1SOL 10824CL 8Step Five: Energy MinimizationThe solvated, electroneutral system is now assembled. Before we can begin dynamics, we must ensure that the system has no steric clashes or inappropriate geometry. The structure is relaxed through a process called energy minimization (EM).The process for EM is much like the addition of ions. We are once again going to use grompp to assemble the structure, topology, and simulation parameters into a binary input file (.tpr), but this time, instead of passing the .tpr to genion, we will run the energy minimization through the GROMACS MD engine, mdrun.Assemble the binary input using grompp usingthisinput parameter file:grompp -f -c -p-oMake sure you have been updating your file when running genbox and genion, or else you will get lots of nasty error messages ("number of coordinates in coordinate file does not match topology," etc).We are now ready to invoke mdrun to carry out the EM:mdrun -v-deffnm emThe -v flag is for the impatient: it makes mdrun verbose, such that it prints its progress to the screen at every step. The -deffnm flag will define the file names of the input and output. So, if you did not name your grompp output "," you will have to explicitly specify its name with the mdrun -s flag. In our case, we will get the following files:: ASCII-text log file of the EM process: Binary energy file: Binary full-precision trajectory: Energy-minimized structureThere are two very important factors to evaluate to determine if EM was successful. The first is the potential energy (printed at the end of the EM process, even without -v). Epot?should be negative, and (for a simple protein in water) on the order of 105-106, depending on the system size and number of water molecules. The second important feature is the maximum force, Fmax, the target for which was set in - "emtol = " - indicating a target Fmax?of no greater than 1000 kJ mol-1nm-1. It is possible to arrive at a reasonable Epotwith Fmax> emtol. If this happens, your system may not be stable enough for simulation. Evaluate why it may be happening, and perhaps change your minimization parameters (integrator, emstep, etc).Let's do a bit of analysis. The file contains all of the energy terms that GROMACS collects during EM. You can analyze any .edr file using the GROMACS tools g_energy:g_energy -f-oAt the prompt, type "10 0" to select Potential (10); zero (0) terminates input. You will be shown the average of Epot, and a file called "" will be written. To plot this data, you will need the?Xmgrace plotting tool. The resulting plot should look something like this, demonstrating the nice, steady convergence of Epot:Now that our system is at an energy minimum, we can begin real dynamics.Step Six: EquilibrationEM ensured that we have a reasonable starting structure, in terms of geometry and solvent orientation. To begin real dynamics, we must equilibrate the solvent and ions around the protein. If we were to attempt unrestrained dynamics at this point, the system may collapse. The reason is that the solvent is mostly optimized within itself, and not necessarily with the solute. It needs to be brought to the temperature we wish to simulate and establish the proper orientation about the solute (the protein). After we arrive at the correct temperature (based on kinetic energies), we will apply pressure to the system until it reaches the proper density.Remember that file that pdb2gmx generated a long time ago We're going to use it now. The purpose of is to apply apositionrestraining force on the heavy atoms of the protein (anything that is not a hydrogen). Movement is permitted, but only after overcoming a substantial energy penalty. The utility of position restraints is that they allow us to equilibrate our solvent around our protein, without the added variable of structural changes in the protein.Equilibration is often conducted in two phases. The first phase is conducted under anNVTensemble (constant Number of particles, Volume, and Temperature). This ensemble is also referred to as "isothermal-isochoric" or "canonical." The timeframe for such a procedure is dependent upon the contents of the system, but inNVT, the temperature of the system shouldreach a plateau at the desired value. If the temperature has not yet stabilized, additional time will be required. Typically, 50-100 ps should suffice, and we will conduct a 100-psNVTequilibration for this exercise. Depending on your machine, this may take a while (just over an hour on a dual-core MacBook). Get the .mdp file?here.We will call grompp and mdrun just as we did at the EM step:grompp -f-c -p -omdrun -deffnm nvtA full explanation of the parameters used can be found in the GROMACS manual, in addition to the comments provided. Take note of a few parameters in the .mdp file:gen_vel = yes: Initiates velocity generation. Using different random seeds(gen_seed) gives different initial velocities, and thus multiple (different)simulations can be conducted from the same starting structure.tcoupl = V-rescale: The velocity rescaling thermostat is an improvement upon the Berendsen weak coupling method, which did not reproduce a correctkinetic ensemble.pcoupl = no: Pressure coupling is not applied.Let's analyze the temperature progression, again using g_energy:g_energy -fType "15 0" at the prompt to select the temperature of the system and exit. The resulting plot should look something like the following:From the plot, it is clear that the temperature of the system quickly reaches the target value (300 K), and remains stable over the remainder of the equilibration. For this system, a shorter equilibration period (on the order of 50 ps) may have been adequate.Step Seven: Equilibration, Part 2The previous step,NVTequilibration, stabilized the temperature of the system. Prior to data collection, we must also stabilize the pressure (and thus also the density) of the system. Equilibration of pressure is conducted under anNPT ensemble, wherein the Number of particles, Pressure, and Temperature are all constant. The ensemble is also called the "isothermal-isobaric" ensemble, and most closely resembles experimental conditions.The .mdp file used for a 100-psNPTequilibration can be found here. It is not drastically different from the parameter file used forNVTequilibration. Note the addition of the pressure coupling section, using the Parrinello-Rahman barostat.A few other changes:continuation = yes: We are continuing the simulation fromthe NVT equilibration phasegen_vel = no: Velocities are read from the trajectory (see below)We will call grompp and mdrun just as we did forNVTequilibration. Note that we are now including the -t flag to include the checkpoint file from theNVTequilibration; this file contains all the necessary state variables to continue our simulation. To conserve the velocities produced duringNVT, we must include this file. The coordinate file (-c) is the final output of the?NVTsimulation.grompp -f -c-t-p -omdrun -deffnm nptLet's analyze the pressure progression, again using g_energy:g_energy -f -oType "16 0" at the prompt to select the pressure of the system and exit. The resulting plot should look something like the following:The pressure value fluctuates widely over the course of the 100-ps equilibration phase, but this behavior is not unexpected. The running average of these data are plotted as the red line in the plot. Over the course of the equilibration, the average value of the pressure is bar.Let's take a look at density as well, this time using g_energy and entering "22 0" at the prompt.g_energy -f -oAs with the pressure, the running average of the density is also plotted in red. The average value over the course of 100 ps is kg m-3, close to the experimental value of 1000 kg m-3. The parameters for the SPC/E water model closely replicate experimental values for water. The density values are very stable over time, indicating that the system is well-equilibrated now with respect to pressure and density.Step Eight: Production MDUpon completion of the two equilibration phases, the system is now well-equilibrated at the desired temperature and pressure. We are now ready to release the position restraints and run production MD for data collection. The process is just like we have seen before, as we will make use of the checkpoint file (which in this case now contains preserve pressure coupling information) to grompp. We will run a 1-ns MD simulation, the script for which can be found?here.grompp -f -c -t -p -omdrun -deffnm md_0_1The mdrun step is definitely best run in parallel on a cluster, since it will take several hours to complete. The proper command for executing such a job would be:mpirun -np X mdrun_mpi -deffnm md_0_1where X indicates the number of processors to be used for the simulation. Near the end of the grompp output, you should have seen a line like:Estimate for the relative computational load of the PME mesh part:The estimate for PME load will dictate how many processors should be dedicated to the PME calculation, and how many for the PP calculations. Refer to the GROMACS 4publicationand the manual for details. For a cubic box, the optimal setup will have a PME load of (3:1 PP:PME - we're in luck!); for a dodecahedral box, the optimal PME load is (2:1 PP:PME). When executing mdrun, the program should automatically determine the best number of processors to assign for the PP and PME calculations. Thus, make sure you indicate an appropriate number of nodes for your calculation (the value of -np X), so that you can get the best performance. For this system, I achieved roughly 14 ns/day performance on 24 CPU's (18 PP, 6 PME).Step Nine: AnalysisNow that we have simulated our protein, we should run some analysis on the system. What types of data are important This is an important question to ask before running the simulation, so you should have some ideas about the types of data you will want to collect in your own systems. For this tutorial, a few basic tools will be introduced.The first is trjconv, which is used as a post-processing tool to strip out coordinates, correct for periodicity, or manually alter the trajectory (time units, frame frequency, etc). For this exercise, we will use trjconv to account for any periodicity in the system. The protein will diffuse through the unit cell, and may appear to "jump" across to the other side of the box. To account for such actions, issue the following:trjconv -s-f -o -pbc mol-ur compactSelect 0 ("System") for output. We will conduct all our analyses on this "corrected" trajectory. Let's look at structural stability first. GROMACS has a built-in utility for RMSD calculations called g_rms. To use g_rms, issue this command:g_rms -s-f-o -tu nsChoose 4 ("Backbone") for both the least squares fit and the group for RMSD calculation. The -tu flag will output the results in terms of ns, even though the trajectory was written in ps. This is done for clarity of the output (especially if youhave a long simulation - 1e+06 ps does not look as nice as 100 ns). The output plot will show the RMSD relative to the structure present in the minimized, equilibrated system:If we wish to calculate RMSD relative to the crystal structure, we could issue the following:g_rms -s -f -o -tu nsThe result will be something like:Both plots show the RMSD levels off to ~ nm (1 ), indicating that the structure is very stable. Subtle differences between the plots indicate that the structure at t = 0 ns is slightly different from this crystal structure. This is to be expected, since it has been energy-minimized, and because the position restraints are not 100% perfect, as discussed previously.The radius of gyration of a protein is a measure of its compactness. If a protein is stably folded, it will likely maintain a relatively steady value of?Rg. If a protein unfolds, its?Rg will change over time. Let's analyze the radius of gyration for lysozyme in our simulation:g_gyrate -s -f-oWe can see from the reasonably invariant?Rg values that the protein remains very stable, in its compact (folded) form over the course of 1 ns at 300 K. This result is not unexpected, but illustrates an advanced capacity of GROMACS analysis that comes built-in.SummaryYou have now conducted a molecular dynamics simulation with GROMACS, and analyzed some of the results. This tutorial should not be viewed as comprehensive. There are many more types of simulations that one can conduct with GROMACS (free energy calculations, non-equilibrium MD, and normal modes analysis, just to。

gromacs蛋白质力场选择

gromacs蛋白质力场选择作为一名职业写手，我经常需要在Gromacs中使用蛋白质力场进行模拟。

蛋白质力场的选择对于模拟结果的准确性和可靠性至关重要。

本文将介绍如何在Gromacs中选择合适的蛋白质力场，以及力场选择对模拟结果的影响。

首先，我们需要了解力场参数。

力场参数描述了原子间相互作用，包括键长、键角、非共价相互作用等。

这些参数对于模拟结果的真实性具有重要意义。

在Gromacs中，有许多预定义的力场，如Amber、Charmm和GROMOS等。

了解这些力场的特点和适用范围有助于我们选择合适的力场。

接下来，我们介绍如何在Gromacs中下载和安装力场。

首先，访问相应力场的官方网站，下载力场文件。

然后，根据Gromacs的官方文档，配置力场并在Gromacs中启用力场。

在配置力场时，需要注意力场的版本和Gromacs的版本要保持一致，以避免在运行过程中出现问题。

在Gromacs中选择合适的蛋白质力场是关键步骤。

我们需要根据模拟的目的和蛋白质的特性来选择合适的力场。

例如，如果我们要研究蛋白质的结构稳定性，可以选择具有较高精度的力场，如Amber99sb；如果我们要研究蛋白质的动力学特性，可以选择具有较高计算效率的力场，如Charmm22。

此外，还可以根据研究需求对力场进行参数调整，以获得更好的模拟结果。

在Gromacs中应用蛋白质力场时，需要注意以下几点。

首先，蛋白质结构优化是力场应用的一个重要方面。

通过使用力场，可以对蛋白质结构进行修正，使其更接近天然状态。

其次，蛋白质动力学模拟是力场的另一个重要应用。

通过模拟蛋白质的动力学过程，可以深入了解蛋白质的功能和调控机制。

此外，蛋白质复合物研究也是力场应用的一个热点领域。

通过在Gromacs中使用力场，可以研究蛋白质复合物的结构和功能。

力场选择对模拟结果具有重要影响。

合适的力场可以使模拟结果更接近真实情况，从而提高研究的可靠性。

然而，不合适的力场可能导致模拟结果的偏差，甚至完全失真。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Lysozyme in WaterJustin LemkulDepartment of Biochemistry, Virginia TechThis example will guide a new user through the process of setting up a simulation system containing a protein (lysozyme) in a box of water, with ions. Each step will contain an explanation of input and output, using typical settings for general use.This tutorial assumes you are using a GROMACS version in the 4.5.x series.Step One: Prepare the TopologyWe must download the protein structure file we will be working with. For this tutorial, we will utilize hen egg white lysozyme (PDB code 1AKI). Go to the RCSB website and download the PDB text for the crystal structure.Once you have downloaded the structure, you can visualize the structure using a viewing program such as VMD, Chimera, PyMOL, etc. Once you've had a look at the molecule, you are going to want to strip out the crystal waters. Use a plain text editor like vi, em acs (Linux/Mac), or Notepad (Windows). Do not use word processing software! Delete the lines corresponding to these molecules (residue "HOH" in the PDB file). Note that such a procedure is not universally appropriate (i.e., the case of a bound active site water molecule). For our intentions here, we do not need crystal water.Always check your .pdb file for entries listed under the comment MISSING, as these entries indicate either atoms or whole residues that are not present in the crystal structure. Terminal regions may be absent, and may not present a problem for dynamics. Incomplete internal sequences or any amino acid residues that have missing atoms will cause pdb2gmx to fail. These missing atoms/residues must be modeled in using other software packages. Also note that pdb2gmx is not magic. It cannot generate topologies for arbitrary molecules, just the residues defined by the force field (in the *.rtp files - generally proteins, nucleic acids, and a very finite amount of cofactors, like NAD(H) and ATP).Now that the crystal waters are gone and we have verified that all the necessary atoms are present, the PDB file should contain only protein atoms, and is ready to be input into the first GROMACS tool, pdb2gmx. The purpose of pdb2gmx is to generate three files:1.The topology for the molecule.2. A position restraint file.3. A post-processed structure file.The topology (topol.top by default) contains all the information necessary to define the molecule within a simulation. This information includes nonbonded parameters (atom types and charges) as well as bonded parameters (bonds, angles, and dihedrals). We will take a more detailed look at the topology once it has been generated.Execute pdb2gmx by issuing the following command:pdb2gmx -f 1AKI.pdb-o 1AKI_processed.gro-water spceThe structure will be processed by pdb2gmx, and you will be prompted to choose a forcefield:Select the Force Field:From '/usr/local/gromacs/share/gromacs/top':1: AMBER03 force field (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)2: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)3: AMBER96 force field (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996)4: AMBER99 force field (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000)5: AMBER99SB force field (Hornak et al., Proteins 65, 712-725, 2006)6: AMBER99SB-ILDN force field (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010)7: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002)8: CHARMM27 all-atom force field (with CMAP) - version 2.09: GROMOS96 43a1 force field10: GROMOS96 43a2 force field (improved alkane dihedrals)11: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205)12: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656)13: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656)14: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)15: [DEPRECATED] Encad all-atom force field, using full solvent charges16: [DEPRECATED] Encad all-atom force field, using scaled-down vacuum charges17: [DEPRECATED] Gromacs force field (see manual)18: [DEPRECATED] Gromacs force field with hydrogens for NMRThe force field will contain the information that will be written to the topology. This is a very important choice! You should always read thoroughly about each force field and decide which is most applicable to your situation. For this tutorial, we will use the all-atom OPLS force field, so type 14 at the command prompt, followed by 'Enter'.There are many other options that can be passed to pdb2gmx. Some are listed here:-ignh: Ignore H atoms in the PDB file; especially useful for NMR structures. Otherwise, if H atoms are present, they must be in the correct order and named exactly how GR OMACS expects them to be.-ter: Interactively assign charge states for N- and C-termini.-inter: Interactively assign charge states for Glu, Asp, Lys, Arg, and His; assign disulfides to Cys.You have now generated three new files: 1AKI_processed.gro, topol.top, and posre.itp. 1AKI_processed.gro is a GROMACS-formatted structure file that contains all the atoms defined within the force field (i.e., H atoms have been added to the amino acids in the protein). The topol.top file is the system topology (more on this in a minute). The posre.itp file contains information used to restrain the positions of heavy atoms (more on this later).Step Two: Examine the TopologyLet's look at what is in the output topology (topol.top). Again, using a plain text editor, inspect its contents. After several comment lines (preceded by ;), you will find the following:#include "oplsaa.ff/forcefield.itp"This line calls the parameters within the OPLS-AA force field. It is at the beginning of the file, indicating that all subsequent parameters are derived from this force field. The next important line is [ moleculetype ], below which you will find; Name nrexclProtein_A 3The name "Protein_A" defines the molecule name, based on the fact that the protein was labeled as chain A in the PDB file. There are 3 exclusions for bonded neighbors. More information on exclusions can be found in the GROMACS manual; a discussion of this information is beyond the scope of this tutorial.The next section defines the [ atoms ] in the protein. The information is presented as columns:[ atoms ]; nr type resnr residue atom cgnr charge mass typeB chargeB massB; residue 1 LYS rtp LYSH q +2.01 opls_287 1 LYS N 1 -0.3 14.0067 ; qtot -0.32 opls_290 1 LYS H1 1 0.33 1.008 ; qtot 0.033 opls_290 1 LYS H2 1 0.33 1.008 ; qtot 0.364 opls_290 1 LYS H3 1 0.33 1.008 ; qtot 0.695 opls_293B 1 LYS CA 1 0.25 12.011 ; qtot 0.946 opls_140 1 LYS HA 1 0.06 1.008 ; qtot 1The interpretation of this information is as follows:∙nr: Atom number∙type: Atom type∙resnr: Amino acid residue number∙residue: The amino acid residue nameNote that this residue was "LYS" in the PDB file; the use of .rtp entry "LYSH" indicates that the residue is protonated (the predominant state at neutral pH).∙atom: Atom name∙cgnr: Charge group numberCharge groups define units of integer charge; they aid in speeding up calculations∙charge: Self-explanatoryThe "qtot" descriptor keeps a running count of the total charge on the molecule∙mass: Also self-explanatory∙typeB, chargeB, massB: Used for free energy perturbation (not discussed here)Subsequent sections include [ bonds ], [ pairs ], [ angles ], and [ dihedrals ]. Some of these sections are self-explanatory (bonds, angles, and dihedrals). The parameters and function types associated with these sections are elaborated on in Chapter 5 of the GROMACS manual. Special 1-4 interactions are included under "pairs" (section 5.3.4 of the GROMACS manual).The remainder of the file involves defining a few other useful/necessary topologies, starting with position restraints. The "posre.itp" file was generated by pdb2gmx; it defines a force constant used to keep atoms in place during equilibration (more on this later).; Include Position restraint file#ifdef POSRES#include "posre.itp"#endifThis ends the "Protein_A" moleculetype definition. The remainder of the topology file is dedicated to defining other molecules and providing system-level descriptions. The next moleculetype (by default) is the solvent, in this case SPC/E water. Other typical choices for water include SPC, TIP3P, and TIP4P. We chose this by passing "-water spce" to pdb2gmx. For an excellent summary of the many different water models, click here, but be aware that not all of these models are present within GROMACS.; Include water topology#include "oplsaa.ff/spce.itp"#ifdef POSRES_WATER; Position restraint for each water oxygen[ position_restraints ]; i funct fcx fcy fcz1 1 1000 1000 1000#endifAs you can see, water can also be position-restrained, using a force constant (kpr) of 1000 kJ mol-1 nm-2.Ion parameters are included next:; Include generic topology for ions#include "oplsaa.ff/ions.itp"Finally come system-level definitions. The [ system ] directive gives the name of the system that will be written to output files during the simulation. The [ molecules ] directive lists all of the molecules in the system.[ system ]; NameLYSOZYME[ molecules ]; Compound #molsProtein_A 1A few key notes about the [ molecules ] directive:1.The order of the listed molecules must exactly match the order of the molecules inthe coordinate (in this case, .gro) file.2.The names listed must match the [ moleculetype ] name for each species, not residuenames or anything else.If you fail to satisfy these concrete requirements at any time, you will get fatal errors from grompp (discussed later) about mismatched names, molecules not being found, or a number of others.Now that we have examined the contents of a topology file, we can continue building our system.Step Three: Defining the Unit Cell & Adding SolventNow that you are familiar with the contents of the GROMACS topology, it is time to continue building our system. In this example, we are going to be simulating a simple aqueous system. It is possible to simulate proteins and other molecules in different solvents, provided that good parameters are available for all species involved.There are two steps to defining the box and filling it with solvent:1.Define the box dimensions using editconf.2.Fill the box with water using genbox.You are now presented with a choice as to how to treat the unit cell. For the purpose of this tutorial, we will use a simple cubic box as the unit cell. As you become more comfortable with periodic boundary conditions and box types, I highly recommend the rhombicdodecahedron, as its volume is ~71% of the cubic box of the same periodic distance, thus saving on the number of water molecules that need to be added to solvate the protein.Let's define the box using editconf:editconf-f 1AKI_processed.gro-o 1AKI_newbox.gro-c -d 1.0 -bt cubicThe above command centers the protein in the box (-c), and places it at least 1.0 nm from the box edge (-d 1.0). The box type is defined as a cube (-bt cubic). The distance to the edge of the box is an important parameter. Since we will be using periodic boundary conditions, we must satisfy the minimum image convention. That is, a protein should never see its periodic image, otherwise the forces calculated will be spurious. Specifying a solute-box distance of 1.0 nm will mean that there are at least 2.0 nm between any two periodic images of a protein. This distance will be sufficient for just about any cutoff scheme commonly used in simulations.Now that we have defined a box, we can fill it with solvent (water). Solvation is accomplished using genbox:genbox -cp 1AKI_newbox.gro-cs spc216.gro-o 1AKI_solv.gro-p topol.topThe configuration of the protein (-cp) is contained in the output of the previous editconf step, and the configuration of the solvent (-cs) is part of the standard GROMACS installation. We are using spc216.gro, which is a generic equilibrated 3-point solvent model. You can use spc216.gro as the solvent configuration for SPC, SPC/E, or TIP3P water, since they are all three-point water models. The output is called 1AKI_solv.gro, and we tell genbox the name of the topology file (topol.top) so it can be modified. Note the changes to the [ molecules ] directive of topol.top:[ molecules ]; Compound #molsProtein_A 1SOL 10832What genbox has done is keep track of how many water molecules it has added, which it then writes to your topology to reflect the changes that have been made. Note that if you use any other (non-water) solvent, genbox will not make these changes to your topology! Its compatibility with updating water molecules is hard-coded.Step Four: Adding IonsWe now have a solvated system that contains a charged protein. The output of pdb2gmx told us that the protein has a net charge of +8e (based on its amino acid composition). If you missed this information in the pdb2gmx output, look at the last line of your [ atoms ] directive in topol.top; it should read (in part) "qtot 8." Since life does not exist at a net charge, we must add ions to our system.The tool for adding ions within GROMACS is called genion. What genion does is read through the topology and replace water molecules with the ions that the user specifies. The input is called a run input file, which has an extension of .tpr; this file is produced by the GROMACS tool grompp(GROMACS pre-processor), which will also be used later when we run our first simulation. What grompp does is process the coordinate file and topology (which describes the molecules) to generate an atomic-level input (.tpr). The .tpr file contains all the parameters for all of the atoms in the system.To produce a .tpr file with grompp, we will need an additional input file, with the extension .mdp (molecular dynamics parameter file); grompp will assemble the parameters specified in the .mdp file with the coordinates and topology information to generate a .tpr file.An .mdp file is normally used to run energy minimization or an MD simulation, but in this case is simply used to generate an atomic description of the system. An example .mdp file (the one we will use) can be downloaded here.In reality, the.mdp file used at this step can contain any legitimate combination of parameters. I typically use an energy-minimization script, because they are very basic and do not involve any complicated parameter combinations.Assemble your .tpr file with the following:grompp -f ions.mdp-c 1AKI_solv.gro-p topol.top -o ions.tprNow we have an atomic-level description of our system in the binary file ions.tpr. We will pass this file to genion:genion -s ions.tpr-o 1AKI_solv_ions.gro-p topol.top-pname NA-nname CL-nn 8When prompted, choose group 13 "SOL" for embedding ions. You do not want to replace parts of your protein with ions.In the genion command, we provide the structure/state file (-s) as input, generate a .gro file as output (-o), process the topology (-p) to reflect the removal of water molecules and addition of ions, define positive and negative ion names (-pname and -nname, respectively), and tell genion to add only the ions necessary to neutralize the net charge on the protein by adding the correct number of negative ions (-nn 8). You could also use genion to add a specified concentration of ions in addition to simply neutralizing the system by specifyingthe -neutral and -conc options in conjunction. Refer to the genion man page for information on how to use these options.The names of the ions specified with -pname and -nname were force field-specific in previous versions of GROMACS, but have been standardized as of version 4.5. The specified atom names are always the elemental symbol in all capital letters, along with the [ moleculetype ]. Residue names may or may not append the sign of the charge (+/-). Refer to ions.itp for proper nomenclature if you encounter difficulties.Your [ molecules ] directive should now look like:[ molecules ]; Compound #molsProtein_A 1SOL 10824CL 8Step Five: Energy MinimizationThe solvated, electroneutral system is now assembled. Before we can begin dynamics, we must ensure that the system has no steric clashes or inappropriate geometry. The structure is relaxed through a process called energy minimization (EM).The process for EM is much like the addition of ions. We are once again going to use grompp to assemble the structure, topology, and simulation parameters into a binary input file (.tpr), but this time, instead of passing the .tpr to genion, we will run the energy minimization through the GROMACS MD engine, mdrun.Assemble the binary input using grompp using this input parameter file:grompp -f minim.mdp-c 1AKI_solv_ions.gro -p topol.top-o em.tprMake sure you have been updating your topol.top file when running genbox and genion, or else you will get lots of nasty error messages ("number of coordinates in coordinate file does not match topology," etc).We are now ready to invoke mdrun to carry out the EM:mdrun -v-deffnm emThe -v flag is for the impatient: it makes mdrun verbose, such that it prints its progress to the screen at every step. The -deffnm flag will define the file names of the input and output. So, if you did not name your grompp output "em.tpr," you will have to explicitly specify its name with the mdrun -s flag. In our case, we will get the following files:∙em.log: ASCII-text log file of the EM process∙em.edr: Binary energy file∙em.trr: Binary full-precision trajectory∙em.gro: Energy-minimized structureThere are two very important factors to evaluate to determine if EM was successful. The first is the potential energy (printed at the end of the EM process, even without -v). Epot should be negative, and (for a simple protein in water) on the order of 105-106, depending on the system size and number of water molecules. The second important feature is the maximum force, Fmax, the target for which was set in minim.mdp- "emtol = 1000.0" - indicating a target Fmax of no greater than 1000 kJ mol-1 nm-1. It is possible to arrive at a reasonable Epot with Fmax > emtol. If this happens, your system may not be stable enough for simulation. Evaluate why it may be happening, and perhaps change your minimization parameters (integrator, emstep, etc).Let's do a bit of analysis. The em.edr file contains all of the energy terms that GROMACS collects during EM. You can analyze any .edr file using the GROMACS tools g_energy:g_energy -f em.edr-o potential.xvgAt the prompt, type "10 0" to select Potential (10); zero (0) terminates input. You will be shown the average of Epot, and a file called "potential.xvg" will be written. To plot this data, you will need the Xmgrace plotting tool. The resulting plot should look something like this, demonstrating the nice, steady convergence of Epot:Now that our system is at an energy minimum, we can begin real dynamics.Step Six: EquilibrationEM ensured that we have a reasonable starting structure, in terms of geometry and solvent orientation. To begin real dynamics, we must equilibrate the solvent and ions around the protein. If we were to attempt unrestrained dynamics at this point, the system may collapse. The reason is that the solvent is mostly optimized within itself, and not necessarily with the solute. It needs to be brought to the temperature we wish to simulate and establish the proper orientation about the solute (the protein). After we arrive at the correct temperature (based on kinetic energies), we will apply pressure to the system until it reaches the proper density.Remember that posre.itp file that pdb2gmx generated a long time ago? We're going to use it now. The purpose of posre.itp is to apply a position restraining force on the heavy atoms of the protein (anything that is not a hydrogen). Movement is permitted, but only after overcoming a substantial energy penalty. The utility of position restraints is that they allow us to equilibrate our solvent around our protein, without the added variable of structural changes in the protein.Equilibration is often conducted in two phases. The first phase is conducted under an NVT ensemble (constant Number of particles, Volume, and Temperature). This ensemble is also referred to as "isothermal-isochoric" or "canonical." The timeframe for such a procedure is dependent upon the contents of the system, but in NVT, the temperature of the system should reach a plateau at the desired value. If the temperature has not yet stabilized,additional time will be required. Typically, 50-100 ps should suffice, and we will conduct a 100-ps NVT equilibration for this exercise. Depending on your machine, this may take a while (just over an hour on a dual-core MacBook). Get the .mdp file here.We will call grompp and mdrun just as we did at the EM step:grompp -f nvt.mdp-c em.gro -p topol.top-o nvt.tprmdrun -deffnm nvtA full explanation of the parameters used can be found in the GROMACS manual, in addition to the comments provided. Take note of a few parameters in the .mdp file:∙gen_vel = yes: Initiates velocity generation. Using different random seeds (gen_seed) gives different initial velocities, and thus multiple (different) simulations can be conducted from the same starting structure.∙tcoupl = V-rescale: The velocity rescaling thermostat is an improvement upon the Berendsen weak coupling method, which did not reproduce a correct kinetic ensemble.∙pcoupl = no: Pressure coupling is not applied.Let's analyze the temperature progression, again using g_energy:g_energy -f nvt.edrType "15 0" at the prompt to select the temperature of the system and exit. The resulting plot should look something like the following:From the plot, it is clear that the temperature of the system quickly reaches the target value (300 K), and remains stable over the remainder of the equilibration. For this system, a shorter equilibration period (on the order of 50 ps) may have been adequate.Step Seven: Equilibration, Part 2The previous step, NVT equilibration, stabilized the temperature of the system. Prior to data collection, we must also stabilize the pressure (and thus also the density) of the system. Equilibration of pressure is conducted under an NPT ensemble, wherein the Number of particles, Pressure, and Temperature are all constant. The ensemble is also called the "isothermal-isobaric" ensemble, and most closely resembles experimental conditions.The .mdp file used for a 100-ps NPT equilibration can be found here. It is not drastically different from the parameter file used for NVT equilibration. Note the addition of the pressure coupling section, using the Parrinello-Rahman barostat.A few other changes:∙continuation = yes: We are continuing the simulation from the NVT equilibration phase ∙gen_vel = no: Velocities are read from the trajectory (see below)We will call grompp and mdrun just as we did for NVT equilibration. Note that we are now including the -t flag to include the checkpoint file from the NVT equilibration; this file contains all the necessary state variables to continue our simulation. To conserve the velocities produced during NVT, we must include this file. The coordinate file (-c) is the final output of the NVT simulation.grompp -f npt.mdp -c nvt.gro-t nvt.cpt -p topol.top -o npt.tprmdrun -deffnm nptLet's analyze the pressure progression, again using g_energy:g_energy -f npt.edr -o pressure.xvgType "16 0" at the prompt to select the pressure of the system and exit. The result ing plot should look something like the following:The pressure value fluctuates widely over the course of the 100-ps equilibration phase, but this behavior is not unexpected. The running average of these data are plotted as the red line in the plot. Over the course of the equilibration, the average value of the pressure is 1.05 bar.Let's take a look at density as well, this time using g_energy and entering "22 0" at the prompt.g_energy -f npt.edr -o density.xvgAs with the pressure, the running average of the density is also plotted in red. The average value over the course of 100 ps is 998.3 kg m-3, close to the experimental value of 1000 kg m-3. The parameters for the SPC/E water model closely replicate experimental values for water. The density values are very stable over time, indicating that the system is well-equilibrated now with respect to pressure and density.Step Eight: Production MDUpon completion of the two equilibration phases, the system is now well-equilibrated at the desired temperature and pressure. We are now ready to release the position restraints and run production MD for data collection. The process is just like we have seen before, as we will make use of the checkpoint file (which in this case now contains preserve pressure coupling information) to grompp. We will run a 1-ns MD simulation, the script for which can be found here.grompp -f md.mdp-c npt.gro-t npt.cpt-p topol.top -o md_0_1.tprmdrun -deffnm md_0_1The mdrun step is definitely best run in parallel on a cluster, since it will take several hours to complete. The proper command for executing such a job would be:mpirun -np X mdrun_mpi -deffnm md_0_1where X indicates the number of processors to be used for the simulation. Near the end of the grompp output, you should have seen a line like:Estimate for the relative computational load of the PME mesh part: 0.25The estimate for PME load will dictate how many processors should be dedicated to the PME calculation, and how many for the PP calculations. Refer to the GROMACS 4 publication and the manual for details. For a cubic box, the optimal setup will have a PME load of 0.25 (3:1 PP:PME - we're in luck!); for a dodecahedral box, the optimal PME load is 0.33 (2:1 PP:PME). When executing mdrun, the program should automatically determine the best number of processors to assign for the PP and PME calculations. Thus, make sure you indicate an appropriate number of nodes for your calculation (the value of -np X), so that you can get the best performance. For this system, I achieved roughly 14 ns/day performance on 24 CPU's (18 PP, 6 PME).Step Nine: AnalysisNow that we have simulated our protein, we should run some analysis on the system. What types of data are important? This is an important question to ask before running the simulation, so you should have some ideas about the types of data you will want to collect in your own systems. For this tutorial, a few basic tools will be introduced.The first is trjconv, which is used as a post-processing tool to strip out coordinates, correct for periodicity, or manually alter the trajectory (time units, frame frequency, etc). For this exercise, we will use trjconv to account for any periodicity in the system. The protein will diffuse through the unit cell, and may appear to "jump" across to the other side of the box. To account for such actions, issue the following:trjconv -s md_0_1.tpr-f md_0_1.xtc-o md_0_1_noPBC.xtc -pbc mol-ur compactSelect 0 ("System") for output. We will conduct all our analyses on this "corrected" trajectory. Let's look at structural stability first. GROMACS has a built-in utility for RMSD calculations called g_rms. To use g_rms, issue this command:g_rms -s md_0_1.tpr-f md_0_1_noPBC.xtc-o rmsd.xvg-tu nsChoose 4 ("Backbone") for both the least squares fit and the group for RMSD calculation. The -tu flag will output the results in terms of ns, even though the trajectory was written in ps. This is done for clarity of the output (especially if you have a long simulation - 1e+06 ps does not look as nice as 100 ns). The output plot will show the RMSD relative to the structure present in the minimized, equilibrated system:If we wish to calculate RMSD relative to the crystal structure, we could issue the following:g_rms -s em.tpr -f md_0_1_noPBC.xtc-o rmsd_xtal.xvg -tu nsThe result will be something like:。