跳至主要内容

Constrained Density Functional Theory (CDFT) 的实现策略

 density functional theory - Calculating the band structure with different unit cell - Matter Modeling Stack Exchange

涉及能带所以应该用原胞才不会出错


Mg dpgen TransL 加电子计算,需要脚本,在原始脚本上修改:
https://chat.openai.com/share/6e370492-98eb-4aed-b5e1-4a4c25bd9f18

import argparse

import os

import numpy as np


def get_nband(e: float, ncore: int) -> int:

    return int(ncore * np.ceil(e / ncore))


def read_poscar(file: str, e_count: dict, n_t: int) -> (float, int):

    with open(file, 'r') as f:

        lines = f.readlines()


    line_name = lines[5].split()

    line_count = [int(i) for i in lines[6].split()]

    assert len(line_name) == len(line_count)


    total_e = sum(e_count[atom] * count for atom, count in zip(line_name, line_count))

    total_atom = sum(line_count)


    nELECT_e = total_e + (total_atom / n_t)

    return nELECT_e, total_e


def modify_main_incar(current_folder: str, nband: int, nelect: float):

    incar_path = os.path.join(current_folder, "INCAR")

    if not os.path.exists(incar_path):

        raise FileNotFoundError(f"{incar_path} not found")


    with open(incar_path, 'r') as file:

        lines = file.readlines()


    with open(incar_path, 'w') as file:

        for line in lines:

            if any(tag in line for tag in ['NBANDS', 'NELECT']):

                continue

            file.write(line)

        file.write(f"NBANDS = {nband}\n")

        file.write(f"NELECT = {nelect:.6f}\n")


if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    parser.add_argument("folder", help="Folder containing POSCAR")

    parser.add_argument("--name", nargs='+', type=str, help="Names of elements")

    parser.add_argument("--count", nargs='+', type=int, help="Electron count for each element")

    parser.add_argument("--nt", type=int, required=True, help="Number of atoms per 1 electron")

    parser.add_argument("--ncore", type=int, required=True, help="Number of cores used in the calculation")


    args = parser.parse_args()


    e_count = dict(zip(args.name, args.count))

    nELECT_e, total_e = read_poscar(os.path.join(args.folder, "POSCAR"), e_count, args.nt)

    n_band = get_nband(total_e * 0.6, args.ncore)

    modify_main_incar(args.folder, n_band, nELECT_e)

尝试在dpgen machine.json里修改vasp命令实现完全自动化    
            "command": "python vasp_calculate.py ./ --name Mg --count 2 --nt 50 --ncore 64 && srun -n 64 /work/qan/codes/VASP/bin/vasp_std1",
####



问题:
CdTe在光照激发后的性质有什么变化呢?

参考文献:1. Wang, H., Morozov, S. I., Goddard, W. A. & An, Q. Light irradiation induced brittle-to-ductile and ductile-to-brittle transition in inorganic semiconductors. Phys. Rev. B 99, 161202 (2019).

解决问题的关键点是:构建激发态下的计算框架,这里需要用到CDFT即人为的提供一种约束环境。

下面做两个例子:最少原子24和最大原子214结构。需要保持e/atom相同而且尽量少的激发电子接近实际情况。所以以最大原子214激发一个电子为准。24原子激发24/214=0.11215个电子。

为了实现CdTetransfer learning,建议只处理单K-point Gamma,所以将小结构都扩到到适合单k

约束环境的构建具体过程如下(这里以最大结构一个电子的激发为准)

1.       得电子(1e)和失去电子(1h)结构的电子能带分布情况EIGENVAL:两个文件夹1e, 1h

a.       INCAR里需要调整的参数

b.       ISIF=2 #单点能计算3#几何优化

c.       NSW=2

d.       NBANDS = 144 #假设的能带数,36core整除且<电子数的

e.       NELECT = 216.11215 #OUTCAR里可以看总电子数+1e

f.        得到EIGENVAL,复制重命名到Ocche文件夹里:EIGENVALe  EIGENVALh

2.       依据上面两种极端情况采用物理公式计算出电子激发后(1e1h)的电子能带分布情况(K点上的分布): Ocche

a.       需要修改Ocche.sh里面的两处VBM=110一般的电子数或者真实的价带顶那个带数

b.       $ ./Ocche.sh即可,即可得到OCCHE.out #(1e1h)的电子在K点上的分布

3.       INCAR里添加上这个电子状态计算即可,建议先SCF,再OPT:两个文件夹

a.       最后一行添加FERWE = 213*1*********

b.       ISMEAR=-2

c.       NELECT = 216 #回复平衡了

4.       算完能量极大,不正常。发现是VBM选错了,改成108后还是报错。我的这个缺陷结构基态的时候就很复杂,108能带之上的109110两个能带上都有电子分布,导致加1e1h时都是这两个带上变化,没有明显的能带之间的跳跃变化。所以无法用之前的脚本直接剪辑嫁接就行。

5.       最后在GPT4的辅助下修改了原始脚本,改成直接能带数据平均化处理了。

/work/qan/kluo/CDFT/3C112/test/AveFERWE.sh

#!/bin/bash

# 函数:计算平均能带数据
calculate_avg_band_data() {
  input_file1="$1"
  input_file2="$2"
  output_file="$3"

  paste "$input_file1" "$input_file2" | awk '{
    printf "%.0f %.6f %.6f\n", ($1 + $4) / 2, ($2 + $5) / 2, ($3 + $6) / 2
  }' > "$output_file"
}

# 函数:获取能带占据情况
get_band_occupations() {
  input_file="$1"

  ones=$(awk '$3 == 1 { count++ } END { print count }' "$input_file")
  zeros=$(awk '$3 == 0 { count++ } END { print count }' "$input_file")
  non_zero_values=$(awk '0 < $3 && $3 < 1 { printf "%.6f ", $3 }' "$input_file")

  echo "${ones}*1 ${non_zero_values} ${zeros}*0"
}

# 设置文件名
fileh="EIGENVALh"
filee="EIGENVALe"

# 获取能带数和K点数
Nbands=$(awk 'NR==6{print $3}' "$fileh")
NK=$(awk 'NR==6{print $2}' "$fileh")
line=$(($Nbands + 2))

# 循环处理每个K点
result_file="OCCHE.out"
echo -n "" > "$result_file"
for ((k = 0; k < $NK; k++)); do
  cutstart=$(echo "scale=4;$k*$line+7" | bc)
  cutend=$(echo "scale=4;$cutstart+$line-1" | bc)

  occ_fileh="occ$(($k+1))h"
  occ_filee="occ$(($k+1))e"
  sed -n ''$cutstart','$cutend'p' "$fileh" | sed -n '3,$p' > "$occ_fileh"
  sed -n ''$cutstart','$cutend'p' "$filee" | sed -n '3,$p' > "$occ_filee"

  avg_occ_file="occ$(($k+1))eh"
  calculate_avg_band_data "$occ_fileh" "$occ_filee" "$avg_occ_file"

  band_occupations=$(get_band_occupations "$avg_occ_file")
  echo "$band_occupations"
  echo "$band_occupations" >> "$result_file"

  # 删除临时文件
  rm "$occ_fileh" "$occ_filee" "$avg_occ_file"
done

# 将结果文件的所有行合并为一行,并添加标题
sed -i ':a;N;$!ba;s/\n/ /g' "$result_file"
sed -i '1iFERWE =' "$result_file"


打算将上面的步骤集成脚本化,所以需要规范每一步:

/work/qan/kluo/CDFTpython/work/task0 24atoms(读取POSCAR)

/work/qan/kluo/CDFTpython/work/task1 56atoms(读取POSCAR)

(base) [kluo@nova task0]$ ls

INCAR  KPOINTS  POSCAR  POTCAR  slurm.nova

1.       确定激发的电子和空穴数值:N/Maxatoms,这里就是1/56=0.017857143

2.       每个文件夹了创建两个子文件夹e, h

a.       (base) [kluo@nova task0]$ mkdir h

b.       (base) [kluo@nova task0]$ mkdir e

3.       Copy 当前文件到子文件夹里

a.       cp * h/

b.       cp * e/

4.       修改INCAR:添加两个参数

a.       读取POSCAR 24atoms 所以 确定添加电子数 24*1/56=0.428571429

b.       确定总的电子数(Telect):POSCAR里的原子*POTCAR电子数=12*12+12*6=216 所以NELECT=216.428571429

c.       固定需要计算的能带数NBANDSTelect*1.2/2={Telect*0.6}取整除节点核数36=144

d.       h文件夹里 修改INCAR: NELECT= 215.5714286 NBANDS=144

5.       提交这两个子文件的任务

a.       得到EIGENVAL 分别cp EIGENVAL EIGENVALh cp EIGENVAL EIGENVALe

6.       利用脚本平均化

a.       (base) [kluo@nova task0]$ bash AveFERWE.sh

b.       得到 /work/qan/kluo/CDFTpython/work/task0/OCCHE.out 复制里面的内容到 INCAR FERWE = *****

c.       修改主INCAR: NBANDS = 144 NELECT = 216   ISMEAR=-2

在Ruizhou的帮助下编写了自动化脚本处理这个过程

import argparse
import os
import numpy as np
import re

regexs = {
'kpt': "\s+(.\d+\.\d+E[+-]\d+)\s+(.\d+\.\d+E[+-]\d+)\s+(.\d+\.\d+E[+-]\d+)\s+.*",
'enval': "\s+(\d+)\s+(-?.\d+\.\d+)" }


def get_nband(e:float,ncore:int):
    # return the least number of multiple of ncore that is larger than e
    return int(ncore * np.ceil(e/ncore))

# 1. read poscar

# 2. make task for e and h

# 3. run subtasks
def read_poscar(file,e_count:dict,n_t:int,ncore:int):
    """
    e_count : electron count for each element eg. {'Cd':12,'Te':6}
    n_t : number of atoms per 1 electron eg. 200
    ncore : number of cores used in the calculation eg. 36
    """
    with open(file, 'r') as f:
        lines = f.readlines()

    line_name = lines[5].split()
    line_count = lines[6].split()
    line_count = [int(i) for i in line_count]
    # ensure the name of element is in the same order as the count
    assert len(line_name) == len(line_count)

    total_e = 0

    atom_count = {ele_name:line_count[i] for i,ele_name in enumerate(line_name)}
    for atom_name in atom_count.keys():
        if atom_name not in e_count:
            raise ValueError(f"Element {atom_name} is not in e_count")
        total_e += e_count[atom_name] * atom_count[atom_name]
   
    total_atom = sum(atom_count.values())

    nELECT_e = total_e +  ( total_atom / n_t )
    nELECT_h = total_e -  ( total_atom / n_t )

    n_band = get_nband(total_e*0.6,ncore)

    return nELECT_e,nELECT_h,n_band,total_e


def make_subtasks(current_folder,nELECT_e,nELECT_h,n_band):
    # copy POTCAR POSCAR INCAR KPOINTS to subfolders
    subtask_e = os.path.join(current_folder,"e")
    subtask_h = os.path.join(current_folder,"h")
    os.makedirs(subtask_e,exist_ok=True)
    os.makedirs(subtask_h,exist_ok=True)
    for vasp_file in ['POTCAR','INCAR','KPOINTS','POSCAR']:
        os.system(f"cp {os.path.join(current_folder,vasp_file)} {subtask_e}")
        os.system(f"cp {os.path.join(current_folder,vasp_file)} {subtask_h}")
    # modify INCAR
    with open(os.path.join(subtask_e,"INCAR"),'a') as f:
        # add NELECT=xxx and NBANDS =xxx to INCAR
        # formt nelect to 6 digits
        f.write(f"\n# --- append ---\nNELECT = {nELECT_e:.6f}\nNBANDS = {n_band}\n")
    with open(os.path.join(subtask_h,"INCAR"),'a') as f:
        f.write(f"\n# --- append ---\nNELECT = {nELECT_h:.6f}\nNBANDS = {n_band}\n")

def run_subtasks(current_folder,vasp_executable):
    subtask_e = os.path.join(current_folder,"e")
    subtask_h = os.path.join(current_folder,"h")
    # make sure the subtasks exist
    assert os.path.exists(subtask_e)
    assert os.path.exists(subtask_h)

    os.system(f"cd {subtask_e}; {vasp_executable} > sub.e.out")
    os.system(f"cd {subtask_h}; {vasp_executable} > sub.h.out")



# Helper function to parse blocks of data from the file
def parse_blocks(filepath, Nbands):
    with open(filepath, 'r') as file:
        block = []
        line_counter = 0
        for line in file:
            if line_counter < 6:  # Skip the first 6 lines
                line_counter += 1
                continue
            if line.strip():  # if line is not empty
                block.append(line)
                if len(block) == Nbands + 2:  # A block ends when it has Nbands + 2 lines
                    yield block
                    block = []
            elif block:
                yield block
                block = []
        if block:
            yield block  # yield the last block even if it's not followed by an empty line

# Function to get band and K points
def get_band_and_k_points(filepath):
    with open(filepath, 'r') as file:
        lines = file.readlines()
    NK, _, Nbands = map(int, lines[5].split())
    return Nbands, NK

# Function to calculate average band data
def calculate_avg_band_data(block1, block2):
    data1 = np.array([[float(value) for value in line.split()] for line in block1[1:]])
    data2 = np.array([[float(value) for value in line.split()] for line in block2[1:]])
    return (data1 + data2) / 2

# Function to get band occupations
def get_band_occupations(data):
    ones = np.count_nonzero(data[:, 2] == 1)
    zeros = np.count_nonzero(data[:, 2] == 0)
    non_zero_values = data[(0 < data[:, 2]) & (data[:, 2] < 1), 2]
    return f"{ones}*1 {' '.join(map(str, non_zero_values))} {zeros}*0"



def extract_bandinfo(current_folder):
    file_h =  os.path.join(current_folder,"h","EIGENVAL")
    file_e =  os.path.join(current_folder,"e","EIGENVAL")
    if not os.path.exists(file_h):
        raise FileNotFoundError(f"{file_h} not found")
    if not os.path.exists(file_e):
        raise FileNotFoundError(f"{file_e} not found")
    # Parse blocks of data
    Nbands, NK = get_band_and_k_points(file_e)
    blocks_h = list(parse_blocks(file_h, Nbands))
    blocks_e = list(parse_blocks(file_e, Nbands))

    # Loop through each K point
    result = []
    for k in range(min(len(blocks_h), len(blocks_e))):  # use the number of blocks instead of NK
        block_h = blocks_h[k]
        block_e = blocks_e[k]

        avg_data = calculate_avg_band_data(block_h, block_e)
        band_occupations = get_band_occupations(avg_data)
        result.append(band_occupations)
    return result

def read_eigenval(file):
    """
    read vasp eigenval file
    """

    with open(file,'r') as f:
        f.readline() # N/A
        f.readline() # N/A
        f.readline() # N/A
        f.readline() # Cartesian/Direct

        # System name w/ stripped newline character
        name = f.readline().rstrip()
        print("System name: " + name)

        first = f.readline() # Possibly interesting information
        (nelect,nkpts,nbands) = first.split()
        print("# of electrons:  " + nelect)
        print("# of k-points:   " + nkpts)
        print("# of bands:      " + nbands)

        kpts = []
        bands = []
        for i in range(int(nbands)):
            bands.append([])
        j = 0 # mark band number

        print("Finding k-point, energy pairs for each band...")

        for line in f:
            kpt = re.match(regexs['kpt'], line)
            enval = re.match(regexs['enval'], line)

            if kpt != None:
                (kx,ky,kz) = kpt.groups()
                k = np.sqrt(float(kx)**2 + float(ky)**2 + float(kz)**2)
                k = k
               
            if enval != None:
                e = float(enval.groups(0)[1])
                bands[j%int(nbands)].append([k,e])
                j += 1
       
        # print results
        for i in range(int(nbands)):
            print("Band " + str(i+1) + ":")
            for pair in bands[i]:
                print(pair)
            print("")
   


def modify_main_incar(current_folder,nband,nelect,bandinfo):
    incar_path = os.path.join(current_folder,"INCAR")
    if not os.path.exists(incar_path):
        raise FileNotFoundError(f"{incar_path} not found")
    lines = []
    with open(incar_path,'r') as f:
        lines = f.readlines()
    # 1. comment out old tags
    tags = ['ISMEAR','NBANDS','NELECT','FERWE']
    for i in range(len(lines)):
        for tag in tags:
            if tag in lines[i]:
                lines[i] = "# "+lines[i]
           
    # 2. add bandinfo
    lines.append("#---- Modifyied ----\n")
    lines.append("ISMEAR = -2\n")
    lines.append(f"NBANDS = {nband} \n")
    lines.append(f"NELECT = {nelect} \n")
    lines.append(f"FERWE = {' '.join(bandinfo)} \n")
    # 2. write modified INCAR lines
    with open(incar_path,'w') as f:
        f.writelines(lines)

   



def test_read_poscar():
    filepath = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/POSCAR"
    print(read_poscar(filepath,{'Cd':12,'Te':6},200,36))
def test_read_eigenval():
    # filepath = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/EIGENVAL"
    # read_eigenval(filepath)
    fileh = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/e/EIGENVAL"  # replace with the correct path for your h file
    filee = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/h/EIGENVAL"  # replace with the correct path for your e file

    # Parse blocks of data
    Nbands, NK = get_band_and_k_points(filee)
    blocks_h = list(parse_blocks(fileh, Nbands))
    blocks_e = list(parse_blocks(filee, Nbands))

    # Loop through each K point
    result = []
    for k in range(min(len(blocks_h), len(blocks_e))):  # use the number of blocks instead of NK
        block_h = blocks_h[k]
        block_e = blocks_e[k]

        avg_data = calculate_avg_band_data(block_h, block_e)

        band_occupations = get_band_occupations(avg_data)
        print(band_occupations)
        result.append(band_occupations)
    print(result)

def main():
    parser = argparse.ArgumentParser()

    parser.add_argument("folder",help="folder containing POSCAR, INCAR, POTCAR, KPOINTS")
    parser.add_argument('--name',nargs='+', type=str,help="electron count for each element eg. {'Cd':12,'Te':6}")
    parser.add_argument('--count',nargs='+', type=int,help="electron count for each element eg. {'Cd':12,'Te':6}")

    parser.add_argument('--nt', type=int,required=True,help="number of atoms per 1 electron eg. 200")
    parser.add_argument("--ncore", type=int,required=True,help="number of cores used in the calculation eg. 36")
    parser.add_argument("--exe", type=str,required=True,help="vasp executable eg. vasp_std")
    args = parser.parse_args()

    # 1. read poscar
    # make sure folder exists
    init_poscar = os.path.join(args.folder,"POSCAR")
    assert os.path.exists(init_poscar)

    e_count = {args.name[i]:args.count[i] for i in range(len(args.name))}

    nELECT_e,nELECT_h,n_band,total_e = read_poscar(init_poscar,e_count,args.nt,args.ncore)
    # 2. make subtasks
    make_subtasks(args.folder,nELECT_e,nELECT_h,n_band)
    # 3. run subtasks
    run_subtasks(args.folder,args.exe)
    # 4. average e and h
    result = extract_bandinfo(args.folder)
    # 5. modify INCAR, prepare for main calculation
    modify_main_incar(args.folder,n_band,total_e,result)



if __name__ == "__main__":
    main()

   
# test_read_poscar()
# test_read_eigenval()



   

尝试在dpgen machine.json里修改vasp命令实现完全自动化    
            "command": "python /work/qan/kluo/CDFTpython/CDFT.py ./ --name Cd Te --count 12 6 --nt 200 --ncore 64 --exe \"srun -n 64 /work/qan/codes/VASP/bin/vasp_std\" && srun -n 64 /work/qan/codes/VASP/bin/vasp_std",
####

   

评论

此博客中的热门博文

lammps 压痕划痕模拟设置参考

  Molecular dynamics study on the effect of electric current on electrically-assisted scratching for crystal copper - IOPscience 原因深入分析如下: ✅ 切削 / 摩擦 / 划痕:局部剧烈变形 → 热量集中 这类过程模拟的是工具与材料 接触区域的强烈局部非平衡过程 ; 如果对整个系统控温,会 严重抹平局部发热、滑移带的应变能耗散 等重要现象; 所以 只在边界区域(如底部、侧边)设 thermostat,起到“热沉”作用 ; 文献经典设置就是: 底部固定 ; 边缘 slab 控温 ; 接触区完全不控温,自由演化 。 ✅ 拉伸 / 压缩 / 剪切:全局加载 → 热传导充分 是材料整体在受力,不存在特别“集中”的能量输入区域; 局部发热相对温和,且在 bulk 系统中可以通过自身结构进行导热 ; 实验中常常是等温加载(准静态过程); 所以 很多文献就直接用整体 fix nvt 控温 ,保持恒温环境,简化模拟; 注意有些更精细的研究会改为: 只在两端 slab 控温,中间 Newtonian 自由演化 。 📚 二、典型模拟场景下的控温策略总览 场景类别 控温方式 控温范围 控温方法 控温目的 注意事项 ✅ 平衡态热力学性质 (如热容、扩散、应力) 整体控温 全体系 fix nvt , fix npt 模拟室温等温状态 标准EMD方法 ✅ 热导率(Green-Kubo) 整体控温 全体系 fix nvt (前期平衡), 后期 nve 采集能流自相关函数 采样期不能控温 ✅ 热导率(NEMD) 区域控温 热源/热沉 fix langevin , fix heat 人为施加温差形成热流 中间区自由演化 ✅ 拉伸 / 压缩 / 剪切 整体控温(常用)或局部控温 全体系或上下 slab fix nvt 或 langevin slab 保持恒温,避免非真实升温 全控温可抹平热应变 ✅ 应力松弛 / 加热冷却过程 整体控温 全体系 fix nvt 或温度渐变 等温退火、升温或冷却 控温方式决定退火速率 ✅ 位错运动 / 缺陷扩散 局部控温 边界或部分 slab ...

lammps Pdamp,Tdamp的设置经验

 一张小抄(固体/位错/二维材料都适用) fix             11    all npt temp 0.1 0.1 0.5 tri 0.0 0.0 5   drag 2 tchain 3 pchain 3 保持 Pdamp ≫ Tdamp(通常 10× 左右)。 固体/低温:Tdamp 取 0.5–1 ps,Pdamp 取 5–15 ps;需要更稳就再加大 Pdamp。 所以推荐如下: 0.1K 用1 10 300K 固体 用0.5 5  高温用0.2 2 液体用0.1 1 液体/高温:Tdamp 0.2–0.5 ps,Pdamp 2–5 ps 往往够。 2D(石墨烯等,只控平面 x/y):Pdamp 常用 10–20 ps 起步,z 固定或 z NULL。 观察到体积/压力振荡大:增加 Pdamp 或加 drag 2–3,必要时把 dt 临时降到 0.5 fs。 drag 2、tchain/pchain 3 保留没坏处,确实能再抑制一点振荡;不是硬性必须,但在固体+低温+(可能还有 tri 或剪切)的组合里,“更稳”>“更快”,我一般会开着。

dpgen训练经验

最新的dpgen参考PtCuP /work/qan/kluo/PtCuP 0.1K的第0代采样很重要 可以多重复几次确保99以上的准确率,它是后续高温的基础 单点能计算  ISYM = 2 nohup dpgen run param.json machine.json 1>log 2>err& nohup dpgen init_bulk param.json machine.json  1>log 2>err& 初始数据集产生,只能一个POSCAR的计算 但是可以同时提多个任务,但是每个任务都需要 POTCAR POSCAR 一 一对应才行。  elements 和POSCAR POTCAR保持一致。 " type_map " : [ "Ti" , "C" , "V" , "Cr" , "Nb" , "Mo" ],都写全才行 POSCAR 不需要 改成特定顺序 程序最终生成数据集的时候会根据 type_map自动统一匹配 usage: dpgen [-h] {init_surf,init_bulk,auto_gen_param,init_reaction,run,run/report,collect,simplify,autotest,db,gui} ... dpgen is a convenient script that uses DeepGenerator to prepare initial data, drive DeepMDkit and analyze results. This script works based on several sub-commands with their own options. To see the options for the sub-commands, type "dpgen sub-command -h". positional arguments:   {init_surf,init_bulk,auto_gen_param,init_reaction,run,run/report,collect,simpli...