跳至主要内容

Constrained Density Functional Theory (CDFT) 的实现策略

 density functional theory - Calculating the band structure with different unit cell - Matter Modeling Stack Exchange

涉及能带所以应该用原胞才不会出错


Mg dpgen TransL 加电子计算,需要脚本,在原始脚本上修改:
https://chat.openai.com/share/6e370492-98eb-4aed-b5e1-4a4c25bd9f18

import argparse

import os

import numpy as np


def get_nband(e: float, ncore: int) -> int:

    return int(ncore * np.ceil(e / ncore))


def read_poscar(file: str, e_count: dict, n_t: int) -> (float, int):

    with open(file, 'r') as f:

        lines = f.readlines()


    line_name = lines[5].split()

    line_count = [int(i) for i in lines[6].split()]

    assert len(line_name) == len(line_count)


    total_e = sum(e_count[atom] * count for atom, count in zip(line_name, line_count))

    total_atom = sum(line_count)


    nELECT_e = total_e + (total_atom / n_t)

    return nELECT_e, total_e


def modify_main_incar(current_folder: str, nband: int, nelect: float):

    incar_path = os.path.join(current_folder, "INCAR")

    if not os.path.exists(incar_path):

        raise FileNotFoundError(f"{incar_path} not found")


    with open(incar_path, 'r') as file:

        lines = file.readlines()


    with open(incar_path, 'w') as file:

        for line in lines:

            if any(tag in line for tag in ['NBANDS', 'NELECT']):

                continue

            file.write(line)

        file.write(f"NBANDS = {nband}\n")

        file.write(f"NELECT = {nelect:.6f}\n")


if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    parser.add_argument("folder", help="Folder containing POSCAR")

    parser.add_argument("--name", nargs='+', type=str, help="Names of elements")

    parser.add_argument("--count", nargs='+', type=int, help="Electron count for each element")

    parser.add_argument("--nt", type=int, required=True, help="Number of atoms per 1 electron")

    parser.add_argument("--ncore", type=int, required=True, help="Number of cores used in the calculation")


    args = parser.parse_args()


    e_count = dict(zip(args.name, args.count))

    nELECT_e, total_e = read_poscar(os.path.join(args.folder, "POSCAR"), e_count, args.nt)

    n_band = get_nband(total_e * 0.6, args.ncore)

    modify_main_incar(args.folder, n_band, nELECT_e)

尝试在dpgen machine.json里修改vasp命令实现完全自动化    
            "command": "python vasp_calculate.py ./ --name Mg --count 2 --nt 50 --ncore 64 && srun -n 64 /work/qan/codes/VASP/bin/vasp_std1",
####



问题:
CdTe在光照激发后的性质有什么变化呢?

参考文献:1. Wang, H., Morozov, S. I., Goddard, W. A. & An, Q. Light irradiation induced brittle-to-ductile and ductile-to-brittle transition in inorganic semiconductors. Phys. Rev. B 99, 161202 (2019).

解决问题的关键点是:构建激发态下的计算框架,这里需要用到CDFT即人为的提供一种约束环境。

下面做两个例子:最少原子24和最大原子214结构。需要保持e/atom相同而且尽量少的激发电子接近实际情况。所以以最大原子214激发一个电子为准。24原子激发24/214=0.11215个电子。

为了实现CdTetransfer learning,建议只处理单K-point Gamma,所以将小结构都扩到到适合单k

约束环境的构建具体过程如下(这里以最大结构一个电子的激发为准)

1.       得电子(1e)和失去电子(1h)结构的电子能带分布情况EIGENVAL:两个文件夹1e, 1h

a.       INCAR里需要调整的参数

b.       ISIF=2 #单点能计算3#几何优化

c.       NSW=2

d.       NBANDS = 144 #假设的能带数,36core整除且<电子数的

e.       NELECT = 216.11215 #OUTCAR里可以看总电子数+1e

f.        得到EIGENVAL,复制重命名到Ocche文件夹里:EIGENVALe  EIGENVALh

2.       依据上面两种极端情况采用物理公式计算出电子激发后(1e1h)的电子能带分布情况(K点上的分布): Ocche

a.       需要修改Ocche.sh里面的两处VBM=110一般的电子数或者真实的价带顶那个带数

b.       $ ./Ocche.sh即可,即可得到OCCHE.out #(1e1h)的电子在K点上的分布

3.       INCAR里添加上这个电子状态计算即可,建议先SCF,再OPT:两个文件夹

a.       最后一行添加FERWE = 213*1*********

b.       ISMEAR=-2

c.       NELECT = 216 #回复平衡了

4.       算完能量极大,不正常。发现是VBM选错了,改成108后还是报错。我的这个缺陷结构基态的时候就很复杂,108能带之上的109110两个能带上都有电子分布,导致加1e1h时都是这两个带上变化,没有明显的能带之间的跳跃变化。所以无法用之前的脚本直接剪辑嫁接就行。

5.       最后在GPT4的辅助下修改了原始脚本,改成直接能带数据平均化处理了。

/work/qan/kluo/CDFT/3C112/test/AveFERWE.sh

#!/bin/bash

# 函数:计算平均能带数据
calculate_avg_band_data() {
  input_file1="$1"
  input_file2="$2"
  output_file="$3"

  paste "$input_file1" "$input_file2" | awk '{
    printf "%.0f %.6f %.6f\n", ($1 + $4) / 2, ($2 + $5) / 2, ($3 + $6) / 2
  }' > "$output_file"
}

# 函数:获取能带占据情况
get_band_occupations() {
  input_file="$1"

  ones=$(awk '$3 == 1 { count++ } END { print count }' "$input_file")
  zeros=$(awk '$3 == 0 { count++ } END { print count }' "$input_file")
  non_zero_values=$(awk '0 < $3 && $3 < 1 { printf "%.6f ", $3 }' "$input_file")

  echo "${ones}*1 ${non_zero_values} ${zeros}*0"
}

# 设置文件名
fileh="EIGENVALh"
filee="EIGENVALe"

# 获取能带数和K点数
Nbands=$(awk 'NR==6{print $3}' "$fileh")
NK=$(awk 'NR==6{print $2}' "$fileh")
line=$(($Nbands + 2))

# 循环处理每个K点
result_file="OCCHE.out"
echo -n "" > "$result_file"
for ((k = 0; k < $NK; k++)); do
  cutstart=$(echo "scale=4;$k*$line+7" | bc)
  cutend=$(echo "scale=4;$cutstart+$line-1" | bc)

  occ_fileh="occ$(($k+1))h"
  occ_filee="occ$(($k+1))e"
  sed -n ''$cutstart','$cutend'p' "$fileh" | sed -n '3,$p' > "$occ_fileh"
  sed -n ''$cutstart','$cutend'p' "$filee" | sed -n '3,$p' > "$occ_filee"

  avg_occ_file="occ$(($k+1))eh"
  calculate_avg_band_data "$occ_fileh" "$occ_filee" "$avg_occ_file"

  band_occupations=$(get_band_occupations "$avg_occ_file")
  echo "$band_occupations"
  echo "$band_occupations" >> "$result_file"

  # 删除临时文件
  rm "$occ_fileh" "$occ_filee" "$avg_occ_file"
done

# 将结果文件的所有行合并为一行,并添加标题
sed -i ':a;N;$!ba;s/\n/ /g' "$result_file"
sed -i '1iFERWE =' "$result_file"


打算将上面的步骤集成脚本化,所以需要规范每一步:

/work/qan/kluo/CDFTpython/work/task0 24atoms(读取POSCAR)

/work/qan/kluo/CDFTpython/work/task1 56atoms(读取POSCAR)

(base) [kluo@nova task0]$ ls

INCAR  KPOINTS  POSCAR  POTCAR  slurm.nova

1.       确定激发的电子和空穴数值:N/Maxatoms,这里就是1/56=0.017857143

2.       每个文件夹了创建两个子文件夹e, h

a.       (base) [kluo@nova task0]$ mkdir h

b.       (base) [kluo@nova task0]$ mkdir e

3.       Copy 当前文件到子文件夹里

a.       cp * h/

b.       cp * e/

4.       修改INCAR:添加两个参数

a.       读取POSCAR 24atoms 所以 确定添加电子数 24*1/56=0.428571429

b.       确定总的电子数(Telect):POSCAR里的原子*POTCAR电子数=12*12+12*6=216 所以NELECT=216.428571429

c.       固定需要计算的能带数NBANDSTelect*1.2/2={Telect*0.6}取整除节点核数36=144

d.       h文件夹里 修改INCAR: NELECT= 215.5714286 NBANDS=144

5.       提交这两个子文件的任务

a.       得到EIGENVAL 分别cp EIGENVAL EIGENVALh cp EIGENVAL EIGENVALe

6.       利用脚本平均化

a.       (base) [kluo@nova task0]$ bash AveFERWE.sh

b.       得到 /work/qan/kluo/CDFTpython/work/task0/OCCHE.out 复制里面的内容到 INCAR FERWE = *****

c.       修改主INCAR: NBANDS = 144 NELECT = 216   ISMEAR=-2

在Ruizhou的帮助下编写了自动化脚本处理这个过程

import argparse
import os
import numpy as np
import re

regexs = {
'kpt': "\s+(.\d+\.\d+E[+-]\d+)\s+(.\d+\.\d+E[+-]\d+)\s+(.\d+\.\d+E[+-]\d+)\s+.*",
'enval': "\s+(\d+)\s+(-?.\d+\.\d+)" }


def get_nband(e:float,ncore:int):
    # return the least number of multiple of ncore that is larger than e
    return int(ncore * np.ceil(e/ncore))

# 1. read poscar

# 2. make task for e and h

# 3. run subtasks
def read_poscar(file,e_count:dict,n_t:int,ncore:int):
    """
    e_count : electron count for each element eg. {'Cd':12,'Te':6}
    n_t : number of atoms per 1 electron eg. 200
    ncore : number of cores used in the calculation eg. 36
    """
    with open(file, 'r') as f:
        lines = f.readlines()

    line_name = lines[5].split()
    line_count = lines[6].split()
    line_count = [int(i) for i in line_count]
    # ensure the name of element is in the same order as the count
    assert len(line_name) == len(line_count)

    total_e = 0

    atom_count = {ele_name:line_count[i] for i,ele_name in enumerate(line_name)}
    for atom_name in atom_count.keys():
        if atom_name not in e_count:
            raise ValueError(f"Element {atom_name} is not in e_count")
        total_e += e_count[atom_name] * atom_count[atom_name]
   
    total_atom = sum(atom_count.values())

    nELECT_e = total_e +  ( total_atom / n_t )
    nELECT_h = total_e -  ( total_atom / n_t )

    n_band = get_nband(total_e*0.6,ncore)

    return nELECT_e,nELECT_h,n_band,total_e


def make_subtasks(current_folder,nELECT_e,nELECT_h,n_band):
    # copy POTCAR POSCAR INCAR KPOINTS to subfolders
    subtask_e = os.path.join(current_folder,"e")
    subtask_h = os.path.join(current_folder,"h")
    os.makedirs(subtask_e,exist_ok=True)
    os.makedirs(subtask_h,exist_ok=True)
    for vasp_file in ['POTCAR','INCAR','KPOINTS','POSCAR']:
        os.system(f"cp {os.path.join(current_folder,vasp_file)} {subtask_e}")
        os.system(f"cp {os.path.join(current_folder,vasp_file)} {subtask_h}")
    # modify INCAR
    with open(os.path.join(subtask_e,"INCAR"),'a') as f:
        # add NELECT=xxx and NBANDS =xxx to INCAR
        # formt nelect to 6 digits
        f.write(f"\n# --- append ---\nNELECT = {nELECT_e:.6f}\nNBANDS = {n_band}\n")
    with open(os.path.join(subtask_h,"INCAR"),'a') as f:
        f.write(f"\n# --- append ---\nNELECT = {nELECT_h:.6f}\nNBANDS = {n_band}\n")

def run_subtasks(current_folder,vasp_executable):
    subtask_e = os.path.join(current_folder,"e")
    subtask_h = os.path.join(current_folder,"h")
    # make sure the subtasks exist
    assert os.path.exists(subtask_e)
    assert os.path.exists(subtask_h)

    os.system(f"cd {subtask_e}; {vasp_executable} > sub.e.out")
    os.system(f"cd {subtask_h}; {vasp_executable} > sub.h.out")



# Helper function to parse blocks of data from the file
def parse_blocks(filepath, Nbands):
    with open(filepath, 'r') as file:
        block = []
        line_counter = 0
        for line in file:
            if line_counter < 6:  # Skip the first 6 lines
                line_counter += 1
                continue
            if line.strip():  # if line is not empty
                block.append(line)
                if len(block) == Nbands + 2:  # A block ends when it has Nbands + 2 lines
                    yield block
                    block = []
            elif block:
                yield block
                block = []
        if block:
            yield block  # yield the last block even if it's not followed by an empty line

# Function to get band and K points
def get_band_and_k_points(filepath):
    with open(filepath, 'r') as file:
        lines = file.readlines()
    NK, _, Nbands = map(int, lines[5].split())
    return Nbands, NK

# Function to calculate average band data
def calculate_avg_band_data(block1, block2):
    data1 = np.array([[float(value) for value in line.split()] for line in block1[1:]])
    data2 = np.array([[float(value) for value in line.split()] for line in block2[1:]])
    return (data1 + data2) / 2

# Function to get band occupations
def get_band_occupations(data):
    ones = np.count_nonzero(data[:, 2] == 1)
    zeros = np.count_nonzero(data[:, 2] == 0)
    non_zero_values = data[(0 < data[:, 2]) & (data[:, 2] < 1), 2]
    return f"{ones}*1 {' '.join(map(str, non_zero_values))} {zeros}*0"



def extract_bandinfo(current_folder):
    file_h =  os.path.join(current_folder,"h","EIGENVAL")
    file_e =  os.path.join(current_folder,"e","EIGENVAL")
    if not os.path.exists(file_h):
        raise FileNotFoundError(f"{file_h} not found")
    if not os.path.exists(file_e):
        raise FileNotFoundError(f"{file_e} not found")
    # Parse blocks of data
    Nbands, NK = get_band_and_k_points(file_e)
    blocks_h = list(parse_blocks(file_h, Nbands))
    blocks_e = list(parse_blocks(file_e, Nbands))

    # Loop through each K point
    result = []
    for k in range(min(len(blocks_h), len(blocks_e))):  # use the number of blocks instead of NK
        block_h = blocks_h[k]
        block_e = blocks_e[k]

        avg_data = calculate_avg_band_data(block_h, block_e)
        band_occupations = get_band_occupations(avg_data)
        result.append(band_occupations)
    return result

def read_eigenval(file):
    """
    read vasp eigenval file
    """

    with open(file,'r') as f:
        f.readline() # N/A
        f.readline() # N/A
        f.readline() # N/A
        f.readline() # Cartesian/Direct

        # System name w/ stripped newline character
        name = f.readline().rstrip()
        print("System name: " + name)

        first = f.readline() # Possibly interesting information
        (nelect,nkpts,nbands) = first.split()
        print("# of electrons:  " + nelect)
        print("# of k-points:   " + nkpts)
        print("# of bands:      " + nbands)

        kpts = []
        bands = []
        for i in range(int(nbands)):
            bands.append([])
        j = 0 # mark band number

        print("Finding k-point, energy pairs for each band...")

        for line in f:
            kpt = re.match(regexs['kpt'], line)
            enval = re.match(regexs['enval'], line)

            if kpt != None:
                (kx,ky,kz) = kpt.groups()
                k = np.sqrt(float(kx)**2 + float(ky)**2 + float(kz)**2)
                k = k
               
            if enval != None:
                e = float(enval.groups(0)[1])
                bands[j%int(nbands)].append([k,e])
                j += 1
       
        # print results
        for i in range(int(nbands)):
            print("Band " + str(i+1) + ":")
            for pair in bands[i]:
                print(pair)
            print("")
   


def modify_main_incar(current_folder,nband,nelect,bandinfo):
    incar_path = os.path.join(current_folder,"INCAR")
    if not os.path.exists(incar_path):
        raise FileNotFoundError(f"{incar_path} not found")
    lines = []
    with open(incar_path,'r') as f:
        lines = f.readlines()
    # 1. comment out old tags
    tags = ['ISMEAR','NBANDS','NELECT','FERWE']
    for i in range(len(lines)):
        for tag in tags:
            if tag in lines[i]:
                lines[i] = "# "+lines[i]
           
    # 2. add bandinfo
    lines.append("#---- Modifyied ----\n")
    lines.append("ISMEAR = -2\n")
    lines.append(f"NBANDS = {nband} \n")
    lines.append(f"NELECT = {nelect} \n")
    lines.append(f"FERWE = {' '.join(bandinfo)} \n")
    # 2. write modified INCAR lines
    with open(incar_path,'w') as f:
        f.writelines(lines)

   



def test_read_poscar():
    filepath = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/POSCAR"
    print(read_poscar(filepath,{'Cd':12,'Te':6},200,36))
def test_read_eigenval():
    # filepath = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/EIGENVAL"
    # read_eigenval(filepath)
    fileh = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/e/EIGENVAL"  # replace with the correct path for your h file
    filee = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/h/EIGENVAL"  # replace with the correct path for your e file

    # Parse blocks of data
    Nbands, NK = get_band_and_k_points(filee)
    blocks_h = list(parse_blocks(fileh, Nbands))
    blocks_e = list(parse_blocks(filee, Nbands))

    # Loop through each K point
    result = []
    for k in range(min(len(blocks_h), len(blocks_e))):  # use the number of blocks instead of NK
        block_h = blocks_h[k]
        block_e = blocks_e[k]

        avg_data = calculate_avg_band_data(block_h, block_e)

        band_occupations = get_band_occupations(avg_data)
        print(band_occupations)
        result.append(band_occupations)
    print(result)

def main():
    parser = argparse.ArgumentParser()

    parser.add_argument("folder",help="folder containing POSCAR, INCAR, POTCAR, KPOINTS")
    parser.add_argument('--name',nargs='+', type=str,help="electron count for each element eg. {'Cd':12,'Te':6}")
    parser.add_argument('--count',nargs='+', type=int,help="electron count for each element eg. {'Cd':12,'Te':6}")

    parser.add_argument('--nt', type=int,required=True,help="number of atoms per 1 electron eg. 200")
    parser.add_argument("--ncore", type=int,required=True,help="number of cores used in the calculation eg. 36")
    parser.add_argument("--exe", type=str,required=True,help="vasp executable eg. vasp_std")
    args = parser.parse_args()

    # 1. read poscar
    # make sure folder exists
    init_poscar = os.path.join(args.folder,"POSCAR")
    assert os.path.exists(init_poscar)

    e_count = {args.name[i]:args.count[i] for i in range(len(args.name))}

    nELECT_e,nELECT_h,n_band,total_e = read_poscar(init_poscar,e_count,args.nt,args.ncore)
    # 2. make subtasks
    make_subtasks(args.folder,nELECT_e,nELECT_h,n_band)
    # 3. run subtasks
    run_subtasks(args.folder,args.exe)
    # 4. average e and h
    result = extract_bandinfo(args.folder)
    # 5. modify INCAR, prepare for main calculation
    modify_main_incar(args.folder,n_band,total_e,result)



if __name__ == "__main__":
    main()

   
# test_read_poscar()
# test_read_eigenval()



   

尝试在dpgen machine.json里修改vasp命令实现完全自动化    
            "command": "python /work/qan/kluo/CDFTpython/CDFT.py ./ --name Cd Te --count 12 6 --nt 200 --ncore 64 --exe \"srun -n 64 /work/qan/codes/VASP/bin/vasp_std\" && srun -n 64 /work/qan/codes/VASP/bin/vasp_std",
####

   

评论

此博客中的热门博文

dpgen数据收集分类续算

dpgen 推荐安装升级方法:删除之前工作目录的dpgen文件夹 Install from source code:  git clone https://github.com/deepmodeling/dpgen && pip install ./dpgen 想收集TL的数据集且按化学成分分类:/work/qan/kluo/NaSPO/run/TransL dpgen collect ./ ./collect -p simplify.json -m 不起作用,只能用 param.json格式 所以单独新建文件夹,构造虚拟的 param.json其中初始数据集地址,直接复制最后一个train的input.json.再用 dpgen collect ./ ./collect -p param.json -m就可以收集初始的数据了,但是这时候没有分类,尽管dpgen collect -m 有这个功能但没有作用。 所以,构建虚拟的simplify任务,记住是精简过程,所以       "labeled" : true ,       "init_pick_number" : 0 ,       "iter_pick_number" : 0 , .....       "training_init_model" : false , 运行之后,第一步就得到了 所有的分类数据集。 Q: /work/qan/kluo/NaSPO/run 最近几轮的准确率上不去,猜测是数据集太大 232334 batch ,而每一代的学习步长 step 只有 50 万,以及不够了。所以出现了:一些模型的准确率一直上不去,就算修改上下限也没有改观。或者一些模型学好了,另一些模型就被遗忘了,很明显是学习步长不够了。 A: 解决办法如下:将现有数据收集分类,分拣出确实学习率低的重写单独重点学习。其次,以后用 dpgen 自动采样产生数据集的时候,时刻关注准确率的变化,一旦上不去了赶紧分家。 D: 1.        收集数据 a.       ...

dpgen simplify 数据精简二次处理

问题: 1.        Carbon 的势能文件无法准确描述石墨层间距 共有 204,200 bch 2.        NaSPO 的势能文件无法压缩 共有 242,423 bch 都可能是 data 太臃肿导致的,所以决定精简用 simplify 命令试试(感觉是重新采样的过程,只是节省了 lammps 的时间用 dp test 代替了,还是需要 fp 过程(也可以注释掉,如果需要新的 VASP 参数可以启用),更适合需要换 fp 参数的计算)。第一次取样是随机的, dp 采样过程有点慢 压缩的 pb 需要 6 小时。 还可以重新 fp 改善数据集 https://tutorials.deepmodeling.com/en/latest/CaseStudies/Transfer-learning/Transfer-learning.html https://docs.deepmodeling.com/projects/dpgen/en/latest/simplify/simplify-jdata.html https://zhuanlan.zhihu.com/p/456504860 http://bohrium-doc.dp.tech/docs/software/DP-GEN_simplify Simplify — DP-GEN documentation 先做,动起来在看 1.        收集所有数据 https://hikunluo.blogspot.com/2022/12/dpgen.html 2.        准备 simplify 的两个 json 文件。      nohup dpgen simplify simplify.json machine.json 1 > log 2 > err &      (base) [kluo@condo2017 SimCarbon]$ cat simplify.json { ...

python相关

 在IDLE中运行pip install pandas是不行的。你需要在命令提示符(Command Prompt)或终端(Terminal)中运行该命令。下面是详细步骤: 使用命令提示符安装 pandas 打开命令提示符: 按 Win + R 键打开运行窗口,输入 cmd,然后按回车键。 或者你可以在开始菜单中搜索“命令提示符”并打开它。 而且运行python脚本的一般步骤: 1. 打开IDLE shell这是个交互窗口 不能直接将脚本内容直接复制到这里,但是应该可以用python +脚本路径运行 2. File -new file-粘贴脚本内容 3.这个窗口上有run