跳至主要内容

dpgen simplify 数据精简二次处理

问题:

1.       Carbon的势能文件无法准确描述石墨层间距 共有204,200 bch

2.       NaSPO 的势能文件无法压缩 共有 242,423 bch

都可能是data太臃肿导致的,所以决定精简用simplify命令试试(感觉是重新采样的过程,只是节省了lammps的时间用dp test代替了,还是需要fp过程(也可以注释掉,如果需要新的VASP参数可以启用),更适合需要换fp参数的计算)。第一次取样是随机的,dp采样过程有点慢 压缩的pb需要6小时。

还可以重新fp改善数据集 https://tutorials.deepmodeling.com/en/latest/CaseStudies/Transfer-learning/Transfer-learning.html

https://docs.deepmodeling.com/projects/dpgen/en/latest/simplify/simplify-jdata.html



https://zhuanlan.zhihu.com/p/456504860
http://bohrium-doc.dp.tech/docs/software/DP-GEN_simplify

Simplify — DP-GEN documentation

先做,动起来在看

1.       收集所有数据https://hikunluo.blogspot.com/2022/12/dpgen.html

2.       准备simplify的两个json文件。

    nohup dpgen simplify simplify.json machine.json 1>log 2>err&

    (base) [kluo@condo2017 SimCarbon]$ cat simplify.json

{

     "type_map": ["C"],

     "mass_map": [12.0107],

     "init_data_prefix": "",

     "init_data_sys":  [],

     "pick_data":  "/work/qan-free/kluo/Carbondpgen/collforSim",

     "labeled": true,    #这个表示初始数据集是已经labeled 还是没有(false),精简数据集的时候用true,transfer-learning的时候用false.

     "init_pick_number":100, #初始数据集好像真的是随机选的没有用初始势能文件测试采样,但是重新labled了。

     

     "iter_pick_number":1000,  #这个决定每一轮的采样数据 建议取总数据集的1/10以下。或者目标100w steps 对应的1wframes,所以每一步取1000争取10轮搞定。

     "model_devi_f_trust_lo":0.25, ------一开始用约0.5*之前的数值,后面再一点点提高。

     "model_devi_f_trust_hi":0.45,

    

     "sys_configs": [null], #报错 改回  "sys_configs": [],

     "sys_batch_size": ["auto","auto","auto","auto","auto","auto","auto","auto","auto","auto","auto","auto","auto","auto","auto","auto","auto","auto","auto","auto"], #大于等于数据集的个数

     "training_iter0_model_path":  ["/work/qan-free/kluo/Carbondpgen/SimCarbon/novaIter32/00[0-3]"],

     "training_init_model":   false, #应该用true(simplify 需要用false)的才是继承前面的数据(迁移学习)但是这里我用的初始模型是另外一个机器上拷贝过来的,所以很多参数其实是无法使用的,所以只能关闭了。(Transfer learning is doing 'init-model' based on the original one. )看看第二代之后能不能继承前面的数据集,通过DEEPMD INFO    ---Summary of DataSystem: training     -----------------------------------------------

DEEPMD INFO    found 3 system(s):里的内容就可以查看了。

 

     "_comment": " that's all ",

     "numb_models": 4,

     "_dp_compress": true,

     "default_training_param": {

         "model": {

             "type_map": ["C"],

             "descriptor": {

            "type":             "hybrid",

            "list" : [

              {

                 "type": "se_e3",

                 "sel": [20],  #如果需要初始化旧模型,这个必须用旧模型的确定值,不能用自动挡。

                 "rcut_smth": 0.5,

                 "rcut": 2.5,

                 "neuron": [4,8,16],

                 "resnet_dt": false,

                 "seed": 1

             },

              {

                 "type": "se_e2_a",

                 "sel": [125],

                 "rcut_smth": 1.0,

                 "rcut": 5.0,

                 "neuron": [20,40,80],

                 "resnet_dt": false,

                 "axis_neuron": 12,

                 "seed": 1

             }

                     ]},

             "fitting_net": {

                 "neuron": [120,120,120],

                 "resnet_dt": true,

                 "seed": 1

             }

         },

         "learning_rate": {

             "type": "exp",

             "start_lr": 0.001,

             "stop_lr":      3.51e-8,

             "decay_steps": 2500

         },

         "loss": {

             "type":                "ener",

             "start_pref_e": 0.02,

             "limit_pref_e": 2,

             "start_pref_f": 1000,

             "limit_pref_f": 1,

             "start_pref_v": 0.01,

             "limit_pref_v": 1

         },

         "training": {

             "validation_data":{

             "systems": [

                        "/work/qan-free/kluo/Carbondpgen/validation/CAf",

                        "/work/qan-free/kluo/Carbondpgen/validation/CAp",

                "/work/qan-free/kluo/Carbondpgen/validation/CAt",

                "/work/qan-free/kluo/Carbondpgen/validation/COf",

                "/work/qan-free/kluo/Carbondpgen/validation/COp",

                "/work/qan-free/kluo/Carbondpgen/validation/COt",

                "/work/qan-free/kluo/Carbondpgen/validation/HBf",

                "/work/qan-free/kluo/Carbondpgen/validation/HBp",

                "/work/qan-free/kluo/Carbondpgen/validation/HBt",

                "/work/qan-free/kluo/Carbondpgen/validation/HCf",

                "/work/qan-free/kluo/Carbondpgen/validation/HCp",

                "/work/qan-free/kluo/Carbondpgen/validation/HCt",

                "/work/qan-free/kluo/Carbondpgen/validation/HZf",

                "/work/qan-free/kluo/Carbondpgen/validation/HZp",

                "/work/qan-free/kluo/Carbondpgen/validation/HZt"],

 

            "batch_size": 1,

            "numb_btch":               1,

            "_comment":                "that's all"

        },

             "stop_batch": 500000,

             "disp_file": "lcurve.out",

             "disp_freq": 500,

             "numb_test": 4,

             "save_freq": 10000,

             "save_ckpt": "model.ckpt",

             "disp_training": true,

             "time_training": true,

             "profiling": false,

             "profiling_file": "timeline.json",

             "_comment": "that's all"

         }

     },

    

     "fp_style": "vasp",   #设置成When fp_style is set to none:No fp. 真正的simplifyhttps://docs.deepmodeling.com/projects/dpgen/en/latest/simplify/simplify-jdata.html

     "fp_skip_bad_box":  "true",

     "_shuffle_poscar": false,

     "fp_task_max": 50, #和以前一样这个还是针对每个采样相空间的(由于重新收集归纳了数据集,新的数据集将相同原子个数的整合成一个相空间了),而不是整个数据集。所以通过前面的pick控制总的合适数据,这一步尽量不要卡,所以max=pick即可

     "fp_task_min": 0, #所以这里选0确保pick的所有数据都有效。

     "ratio_failed": 0.9,

     "fp_accurate_threshold": 0.99,

     "fp_accurate_soft_threshold": 0.01,

     "fp_pp_path": "./",

     "fp_pp_files": ["POTCAR_C"],

     "fp_incar": "./INCAR_Carbon",

     "_comment": "that's all"

}

(base) [kluo@condo2017 SimCarbon]$ cat machine.json

{

                "api_version": "1.0",

                "_deepmd_version": "2.0.3",

                "train" :[

                                {

                                                "command": "dp",

                                                "machine": {

                                                                "batch_type": "slurm",

                                                                "context_type": "local",

                                                                "local_root" : "./",

                                                                "remote_root": "./work"

                                                },

                                                "resources": {

                                                                "number_node": 1,

                                                                "cpu_per_node": 16,

                                                                "gpu_per_node": 0,

                                "_queue_name": "cpu-s1-matersimul-0",

                                "custom_flags": ["###SBATCH --account=cpu-s1-matersimul-0", "#SBATCH --job-name='dptrain'", "#SBATCH --cpus-per-task=1", "#SBATCH --hint=compute_bound","#SBATCH --ntasks=16", "#SBATCH --mail-type=ALL", "#SBATCH --time=24:00:00", "#SBATCH --mail-user=kluo@iastate.edu"],

                                "source_list": ["/work/qan-free/kluo/Carbondpgen/dp203.sh"],

                                                                "_module_list": ["intel/mkl/64/2019/5.281", "intel/mpi/64/2019/5.281"],

                                "_time_limit": "240:0:0",

                                "group_size": 1

                                                }

                                }

                ],

                "model_devi":[

                                {

                                                "command": "dp",

                                                "machine": {

                                                                "batch_type": "slurm",

                                                                "context_type": "local",

                                                                "local_root" : "./",

                                                                "remote_root": "./work"

                                                },

                                                "resources": {

                                "number_node": 1,

                                "cpu_per_node": 16,

                                "gpu_per_node": 0,

                                "_queue_name": "cpu-s1-matersimul-0",

                                "custom_flags": ["###SBATCH --account=cpu-s1-matersimul-0", "#SBATCH --job-name='lmps'", "#SBATCH --cpus-per-task=1", "#SBATCH --hint=compute_bound","#SBATCH --ntasks=8", "#SBATCH --time=1:00:00"],

                                                                "source_list": ["/work/qan-free/kluo/Carbondpgen/dp203.sh"],

                                "_module_list": ["intel/mkl/64/2019/5.281", "intel/mpi/64/2019/5.281"],

                                "_time_limit": "240:0:0",

                                "group_size": 100

                        }

                                }

                ],

                "fp":[

                                {

                                                "command": "srun -n 16 /shared/hpc/vasp/5.4.1/bin/vasp_std",

                                                "machine": {

                                "batch_type": "slurm",

                                "context_type": "local",

                                "local_root" : "./",

                                "remote_root": "./work"

                        },

                        "resources": {

                                "number_node": 1,

                                "cpu_per_node": 16,

                                "gpu_per_node": 0,

                                "_queue_name": "cpu-s1-matersimul-0",

                                "custom_flags": ["###SBATCH --account=cpu-s1-matersimul-0", "#SBATCH --job-name='fp'", "#SBATCH --cpus-per-task=1", "#SBATCH --hint=compute_bound","#SBATCH --ntasks=16", "#SBATCH --time=8:00:00"],

                                "source_list": ["/home/kluo/shpy/vasp.sh"],

                                "_module_list": ["intel/mkl/64/2019/5.281", "intel/mpi/64/2019/5.281"],

                                "_time_limit": "240:0:0",

                                "group_size": 2

                        }

                                }

                ]

评论

此博客中的热门博文

lammps 压痕划痕模拟设置参考

  Molecular dynamics study on the effect of electric current on electrically-assisted scratching for crystal copper - IOPscience 原因深入分析如下: ✅ 切削 / 摩擦 / 划痕:局部剧烈变形 → 热量集中 这类过程模拟的是工具与材料 接触区域的强烈局部非平衡过程 ; 如果对整个系统控温,会 严重抹平局部发热、滑移带的应变能耗散 等重要现象; 所以 只在边界区域(如底部、侧边)设 thermostat,起到“热沉”作用 ; 文献经典设置就是: 底部固定 ; 边缘 slab 控温 ; 接触区完全不控温,自由演化 。 ✅ 拉伸 / 压缩 / 剪切:全局加载 → 热传导充分 是材料整体在受力,不存在特别“集中”的能量输入区域; 局部发热相对温和,且在 bulk 系统中可以通过自身结构进行导热 ; 实验中常常是等温加载(准静态过程); 所以 很多文献就直接用整体 fix nvt 控温 ,保持恒温环境,简化模拟; 注意有些更精细的研究会改为: 只在两端 slab 控温,中间 Newtonian 自由演化 。 📚 二、典型模拟场景下的控温策略总览 场景类别 控温方式 控温范围 控温方法 控温目的 注意事项 ✅ 平衡态热力学性质 (如热容、扩散、应力) 整体控温 全体系 fix nvt , fix npt 模拟室温等温状态 标准EMD方法 ✅ 热导率(Green-Kubo) 整体控温 全体系 fix nvt (前期平衡), 后期 nve 采集能流自相关函数 采样期不能控温 ✅ 热导率(NEMD) 区域控温 热源/热沉 fix langevin , fix heat 人为施加温差形成热流 中间区自由演化 ✅ 拉伸 / 压缩 / 剪切 整体控温(常用)或局部控温 全体系或上下 slab fix nvt 或 langevin slab 保持恒温,避免非真实升温 全控温可抹平热应变 ✅ 应力松弛 / 加热冷却过程 整体控温 全体系 fix nvt 或温度渐变 等温退火、升温或冷却 控温方式决定退火速率 ✅ 位错运动 / 缺陷扩散 局部控温 边界或部分 slab ...

lammps Pdamp,Tdamp的设置经验

 一张小抄(固体/位错/二维材料都适用) fix             11    all npt temp 0.1 0.1 0.5 tri 0.0 0.0 5   drag 2 tchain 3 pchain 3 保持 Pdamp ≫ Tdamp(通常 10× 左右)。 固体/低温:Tdamp 取 0.5–1 ps,Pdamp 取 5–15 ps;需要更稳就再加大 Pdamp。 所以推荐如下: 0.1K 用1 10 300K 固体 用0.5 5  高温用0.2 2 液体用0.1 1 液体/高温:Tdamp 0.2–0.5 ps,Pdamp 2–5 ps 往往够。 2D(石墨烯等,只控平面 x/y):Pdamp 常用 10–20 ps 起步,z 固定或 z NULL。 观察到体积/压力振荡大:增加 Pdamp 或加 drag 2–3,必要时把 dt 临时降到 0.5 fs。 drag 2、tchain/pchain 3 保留没坏处,确实能再抑制一点振荡;不是硬性必须,但在固体+低温+(可能还有 tri 或剪切)的组合里,“更稳”>“更快”,我一般会开着。

dpgen训练经验

最新的dpgen参考PtCuP /work/qan/kluo/PtCuP 0.1K的第0代采样很重要 可以多重复几次确保99以上的准确率,它是后续高温的基础 单点能计算  ISYM = 2 nohup dpgen run param.json machine.json 1>log 2>err& nohup dpgen init_bulk param.json machine.json  1>log 2>err& 初始数据集产生,只能一个POSCAR的计算 但是可以同时提多个任务,但是每个任务都需要 POTCAR POSCAR 一 一对应才行。  elements 和POSCAR POTCAR保持一致。 " type_map " : [ "Ti" , "C" , "V" , "Cr" , "Nb" , "Mo" ],都写全才行 POSCAR 不需要 改成特定顺序 程序最终生成数据集的时候会根据 type_map自动统一匹配 usage: dpgen [-h] {init_surf,init_bulk,auto_gen_param,init_reaction,run,run/report,collect,simplify,autotest,db,gui} ... dpgen is a convenient script that uses DeepGenerator to prepare initial data, drive DeepMDkit and analyze results. This script works based on several sub-commands with their own options. To see the options for the sub-commands, type "dpgen sub-command -h". positional arguments:   {init_surf,init_bulk,auto_gen_param,init_reaction,run,run/report,collect,simpli...