Constrained Density Functional Theory (CDFT) 的实现策略

density functional theory - Calculating the band structure with different unit cell - Matter Modeling Stack Exchange

涉及能带所以应该用原胞才不会出错

Mg dpgen TransL 加电子计算，需要脚本，在原始脚本上修改：
https://chat.openai.com/share/6e370492-98eb-4aed-b5e1-4a4c25bd9f18

import argparse

import os

import numpy as np

def get_nband(e: float, ncore: int) -> int:

return int(ncore * np.ceil(e / ncore))

def read_poscar(file: str, e_count: dict, n_t: int) -> (float, int):

with open(file, 'r') as f:

lines = f.readlines()

line_name = lines[5].split()

line_count = [int(i) for i in lines[6].split()]

assert len(line_name) == len(line_count)

total_e = sum(e_count[atom] * count for atom, count in zip(line_name, line_count))

total_atom = sum(line_count)

nELECT_e = total_e + (total_atom / n_t)

return nELECT_e, total_e

def modify_main_incar(current_folder: str, nband: int, nelect: float):

incar_path = os.path.join(current_folder, "INCAR")

if not os.path.exists(incar_path):

raise FileNotFoundError(f"{incar_path} not found")

with open(incar_path, 'r') as file:

lines = file.readlines()

with open(incar_path, 'w') as file:

for line in lines:

if any(tag in line for tag in ['NBANDS', 'NELECT']):

continue

file.write(line)

file.write(f"NBANDS = {nband}\n")

file.write(f"NELECT = {nelect:.6f}\n")

if __name__ == "__main__":

parser = argparse.ArgumentParser()

parser.add_argument("folder", help="Folder containing POSCAR")

parser.add_argument("--name", nargs='+', type=str, help="Names of elements")

parser.add_argument("--count", nargs='+', type=int, help="Electron count for each element")

parser.add_argument("--nt", type=int, required=True, help="Number of atoms per 1 electron")

parser.add_argument("--ncore", type=int, required=True, help="Number of cores used in the calculation")

args = parser.parse_args()

e_count = dict(zip(args.name, args.count))

nELECT_e, total_e = read_poscar(os.path.join(args.folder, "POSCAR"), e_count, args.nt)

n_band = get_nband(total_e * 0.6, args.ncore)

modify_main_incar(args.folder, n_band, nELECT_e)

尝试在dpgen machine.json里修改vasp命令实现完全自动化    

            "command": "python vasp_calculate.py ./ --name Mg --count 2 --nt 50 --ncore 64 && srun -n 64 /work/qan/codes/VASP/bin/vasp_std1",

#### 

问题：CdTe在光照激发后的性质有什么变化呢？

参考文献：1. Wang, H., Morozov, S. I., Goddard, W. A. & An, Q. Light irradiation induced brittle-to-ductile and ductile-to-brittle transition in inorganic semiconductors. Phys. Rev. B 99, 161202 (2019).

解决问题的关键点是：构建激发态下的计算框架，这里需要用到CDFT即人为的提供一种约束环境。

下面做两个例子：最少原子24和最大原子214结构。需要保持e/atom相同而且尽量少的激发电子接近实际情况。所以以最大原子214激发一个电子为准。24原子激发24/214=0.11215个电子。

为了实现CdTe的transfer learning,建议只处理单K-point Gamma点,所以将小结构都扩到到适合单k的

约束环境的构建具体过程如下(这里以最大结构一个电子的激发为准)：

1. 得电子(1e)和失去电子(1h)结构的电子能带分布情况EIGENVAL：两个文件夹1e, 1h

a. INCAR里需要调整的参数

b. ISIF=2 #单点能计算3#几何优化

c. NSW=2

d. NBANDS = 144 #假设的能带数,被36core整除且<电子数的

e. NELECT = 216.11215 #在OUTCAR里可以看总电子数+1e

f. 得到EIGENVAL，复制重命名到Ocche文件夹里：EIGENVALe EIGENVALh

2. 依据上面两种极端情况采用物理公式计算出电子激发后(1e1h)的电子能带分布情况(在K点上的分布): Ocche

a. 需要修改Ocche.sh里面的两处VBM=110一般的电子数或者真实的价带顶那个带数

b. $ ./Ocche.sh即可，即可得到OCCHE.out #(1e1h)的电子在K点上的分布

3. 在INCAR里添加上这个电子状态计算即可，建议先SCF，再OPT:两个文件夹

a. 最后一行添加FERWE = 213*1*********

b. ISMEAR=-2

c. NELECT = 216 #回复平衡了

4. 算完能量极大，不正常。发现是VBM选错了，改成108后还是报错。我的这个缺陷结构基态的时候就很复杂，108能带之上的109，110两个能带上都有电子分布，导致加1e和1h时都是这两个带上变化，没有明显的能带之间的跳跃变化。所以无法用之前的脚本直接剪辑嫁接就行。

5. 最后在GPT4的辅助下修改了原始脚本，改成直接能带数据平均化处理了。

/work/qan/kluo/CDFT/3C112/test/AveFERWE.sh

#!/bin/bash

# 函数：计算平均能带数据
calculate_avg_band_data() {
  input_file1="$1"
  input_file2="$2"
  output_file="$3"

  paste "$input_file1" "$input_file2" | awk '{
    printf "%.0f %.6f %.6f\n", ($1 + $4) / 2, ($2 + $5) / 2, ($3 + $6) / 2
  }' > "$output_file"
}

# 函数：获取能带占据情况
get_band_occupations() {
  input_file="$1"

  ones=$(awk '$3 == 1 { count++ } END { print count }' "$input_file")
  zeros=$(awk '$3 == 0 { count++ } END { print count }' "$input_file")
  non_zero_values=$(awk '0 < $3 && $3 < 1 { printf "%.6f ", $3 }' "$input_file")

  echo "${ones}*1 ${non_zero_values} ${zeros}*0"
}

# 设置文件名
fileh="EIGENVALh"
filee="EIGENVALe"

# 获取能带数和K点数
Nbands=$(awk 'NR==6{print $3}' "$fileh")
NK=$(awk 'NR==6{print $2}' "$fileh")
line=$(($Nbands + 2))

# 循环处理每个K点
result_file="OCCHE.out"
echo -n "" > "$result_file"
for ((k = 0; k < $NK; k++)); do
  cutstart=$(echo "scale=4;$k*$line+7" | bc)
  cutend=$(echo "scale=4;$cutstart+$line-1" | bc)

  occ_fileh="occ$(($k+1))h"
  occ_filee="occ$(($k+1))e"
  sed -n ''$cutstart','$cutend'p' "$fileh" | sed -n '3,$p' > "$occ_fileh"
  sed -n ''$cutstart','$cutend'p' "$filee" | sed -n '3,$p' > "$occ_filee"

  avg_occ_file="occ$(($k+1))eh"
  calculate_avg_band_data "$occ_fileh" "$occ_filee" "$avg_occ_file"

  band_occupations=$(get_band_occupations "$avg_occ_file")
  echo "$band_occupations"
  echo "$band_occupations" >> "$result_file"

  # 删除临时文件
  rm "$occ_fileh" "$occ_filee" "$avg_occ_file"
done

# 将结果文件的所有行合并为一行，并添加标题
sed -i ':a;N;$!ba;s/\n/ /g' "$result_file" 
sed -i '1iFERWE =' "$result_file"

打算将上面的步骤集成脚本化，所以需要规范每一步：

/work/qan/kluo/CDFTpython/work/task0 24atoms(读取POSCAR)

/work/qan/kluo/CDFTpython/work/task1 56atoms(读取POSCAR)

(base) [kluo@nova task0]$ ls

INCAR KPOINTS POSCAR POTCAR slurm.nova

1. 确定激发的电子和空穴数值：N/Maxatoms,这里就是1/56=0.017857143

2. 每个文件夹了创建两个子文件夹e, h

a. (base) [kluo@nova task0]$ mkdir h

b. (base) [kluo@nova task0]$ mkdir e

3. Copy 当前文件到子文件夹里

a. cp * h/

b. cp * e/

4. 修改INCAR：添加两个参数

a. 读取POSCAR 24atoms 所以确定添加电子数 24*1/56=0.428571429

b. 确定总的电子数（Telect）：POSCAR里的原子*POTCAR电子数=12*12+12*6=216 所以NELECT=216.428571429

c. 固定需要计算的能带数NBANDS：Telect*1.2/2={Telect*0.6}取整除节点核数36=144

d. 在h文件夹里修改INCAR: NELECT= 215.5714286 NBANDS=144

5. 提交这两个子文件的任务

a. 得到EIGENVAL 分别cp EIGENVAL EIGENVALh cp EIGENVAL EIGENVALe

6. 利用脚本平均化

a. (base) [kluo@nova task0]$ bash AveFERWE.sh

b. 得到 /work/qan/kluo/CDFTpython/work/task0/OCCHE.out 复制里面的内容到主INCAR里 FERWE = *****

c. 修改主INCAR: NBANDS = 144 NELECT = 216 ISMEAR=-2

在Ruizhou的帮助下编写了自动化脚本处理这个过程

import argparse
import os
import numpy as np
import re

regexs = {
'kpt': "\s+(.\d+\.\d+E[+-]\d+)\s+(.\d+\.\d+E[+-]\d+)\s+(.\d+\.\d+E[+-]\d+)\s+.*",
'enval': "\s+(\d+)\s+(-?.\d+\.\d+)" }


def get_nband(e:float,ncore:int):
    # return the least number of multiple of ncore that is larger than e
    return int(ncore * np.ceil(e/ncore))

# 1. read poscar 

# 2. make task for e and h

# 3. run subtasks
def read_poscar(file,e_count:dict,n_t:int,ncore:int):
    """
    e_count : electron count for each element eg. {'Cd':12,'Te':6}
    n_t : number of atoms per 1 electron eg. 200 
    ncore : number of cores used in the calculation eg. 36
    """
    with open(file, 'r') as f:
        lines = f.readlines()

    line_name = lines[5].split()
    line_count = lines[6].split()
    line_count = [int(i) for i in line_count]
    # ensure the name of element is in the same order as the count
    assert len(line_name) == len(line_count)

    total_e = 0

    atom_count = {ele_name:line_count[i] for i,ele_name in enumerate(line_name)}
    for atom_name in atom_count.keys():
        if atom_name not in e_count:
            raise ValueError(f"Element {atom_name} is not in e_count")
        total_e += e_count[atom_name] * atom_count[atom_name]
    
    total_atom = sum(atom_count.values())

    nELECT_e = total_e +  ( total_atom / n_t )
    nELECT_h = total_e -  ( total_atom / n_t )

    n_band = get_nband(total_e*0.6,ncore)

    return nELECT_e,nELECT_h,n_band,total_e


def make_subtasks(current_folder,nELECT_e,nELECT_h,n_band):
    # copy POTCAR POSCAR INCAR KPOINTS to subfolders
    subtask_e = os.path.join(current_folder,"e")
    subtask_h = os.path.join(current_folder,"h")
    os.makedirs(subtask_e,exist_ok=True)
    os.makedirs(subtask_h,exist_ok=True)
    for vasp_file in ['POTCAR','INCAR','KPOINTS','POSCAR']:
        os.system(f"cp {os.path.join(current_folder,vasp_file)} {subtask_e}")
        os.system(f"cp {os.path.join(current_folder,vasp_file)} {subtask_h}")
    # modify INCAR
    with open(os.path.join(subtask_e,"INCAR"),'a') as f:
        # add NELECT=xxx and NBANDS =xxx to INCAR
        # formt nelect to 6 digits
        f.write(f"\n# --- append ---\nNELECT = {nELECT_e:.6f}\nNBANDS = {n_band}\n")
    with open(os.path.join(subtask_h,"INCAR"),'a') as f:
        f.write(f"\n# --- append ---\nNELECT = {nELECT_h:.6f}\nNBANDS = {n_band}\n")

def run_subtasks(current_folder,vasp_executable):
    subtask_e = os.path.join(current_folder,"e")
    subtask_h = os.path.join(current_folder,"h")
    # make sure the subtasks exist
    assert os.path.exists(subtask_e)
    assert os.path.exists(subtask_h)

    os.system(f"cd {subtask_e}; {vasp_executable} > sub.e.out")
    os.system(f"cd {subtask_h}; {vasp_executable} > sub.h.out")



# Helper function to parse blocks of data from the file
def parse_blocks(filepath, Nbands):
    with open(filepath, 'r') as file:
        block = []
        line_counter = 0
        for line in file:
            if line_counter < 6:  # Skip the first 6 lines
                line_counter += 1
                continue
            if line.strip():  # if line is not empty
                block.append(line)
                if len(block) == Nbands + 2:  # A block ends when it has Nbands + 2 lines
                    yield block
                    block = []
            elif block:
                yield block
                block = []
        if block:
            yield block  # yield the last block even if it's not followed by an empty line

# Function to get band and K points
def get_band_and_k_points(filepath):
    with open(filepath, 'r') as file:
        lines = file.readlines()
    NK, _, Nbands = map(int, lines[5].split())
    return Nbands, NK

# Function to calculate average band data
def calculate_avg_band_data(block1, block2):
    data1 = np.array([[float(value) for value in line.split()] for line in block1[1:]])
    data2 = np.array([[float(value) for value in line.split()] for line in block2[1:]])
    return (data1 + data2) / 2

# Function to get band occupations
def get_band_occupations(data):
    ones = np.count_nonzero(data[:, 2] == 1)
    zeros = np.count_nonzero(data[:, 2] == 0)
    non_zero_values = data[(0 < data[:, 2]) & (data[:, 2] < 1), 2]
    return f"{ones}*1 {' '.join(map(str, non_zero_values))} {zeros}*0"



def extract_bandinfo(current_folder):
    file_h =  os.path.join(current_folder,"h","EIGENVAL")
    file_e =  os.path.join(current_folder,"e","EIGENVAL")
    if not os.path.exists(file_h):
        raise FileNotFoundError(f"{file_h} not found")
    if not os.path.exists(file_e):
        raise FileNotFoundError(f"{file_e} not found")
    # Parse blocks of data
    Nbands, NK = get_band_and_k_points(file_e)
    blocks_h = list(parse_blocks(file_h, Nbands))
    blocks_e = list(parse_blocks(file_e, Nbands))

    # Loop through each K point
    result = []
    for k in range(min(len(blocks_h), len(blocks_e))):  # use the number of blocks instead of NK
        block_h = blocks_h[k]
        block_e = blocks_e[k]

        avg_data = calculate_avg_band_data(block_h, block_e)
        band_occupations = get_band_occupations(avg_data)
        result.append(band_occupations)
    return result

def read_eigenval(file):
    """
    read vasp eigenval file 
    """

    with open(file,'r') as f:
        f.readline() # N/A
        f.readline() # N/A
        f.readline() # N/A
        f.readline() # Cartesian/Direct

        # System name w/ stripped newline character
        name = f.readline().rstrip() 
        print("System name: " + name)

        first = f.readline() # Possibly interesting information
        (nelect,nkpts,nbands) = first.split()
        print("# of electrons:  " + nelect)
        print("# of k-points:   " + nkpts)
        print("# of bands:      " + nbands)

        kpts = []
        bands = []
        for i in range(int(nbands)):
            bands.append([]) 
        j = 0 # mark band number

        print("Finding k-point, energy pairs for each band...")

        for line in f:
            kpt = re.match(regexs['kpt'], line)
            enval = re.match(regexs['enval'], line)

            if kpt != None:
                (kx,ky,kz) = kpt.groups()
                k = np.sqrt(float(kx)**2 + float(ky)**2 + float(kz)**2)
                k = k
                
            if enval != None:
                e = float(enval.groups(0)[1])
                bands[j%int(nbands)].append([k,e])
                j += 1
        
        # print results
        for i in range(int(nbands)):
            print("Band " + str(i+1) + ":")
            for pair in bands[i]:
                print(pair)
            print("")
    


def modify_main_incar(current_folder,nband,nelect,bandinfo):
    incar_path = os.path.join(current_folder,"INCAR")
    if not os.path.exists(incar_path):
        raise FileNotFoundError(f"{incar_path} not found")
    lines = []
    with open(incar_path,'r') as f:
        lines = f.readlines()
    # 1. comment out old tags
    tags = ['ISMEAR','NBANDS','NELECT','FERWE']
    for i in range(len(lines)):
        for tag in tags:
            if tag in lines[i]:
                lines[i] = "# "+lines[i]
            
    # 2. add bandinfo
    lines.append("#---- Modifyied ----\n")
    lines.append("ISMEAR = -2\n")
    lines.append(f"NBANDS = {nband} \n")
    lines.append(f"NELECT = {nelect} \n")
    lines.append(f"FERWE = {' '.join(bandinfo)} \n")
    # 2. write modified INCAR lines
    with open(incar_path,'w') as f:
        f.writelines(lines)

    



def test_read_poscar():
    filepath = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/POSCAR"
    print(read_poscar(filepath,{'Cd':12,'Te':6},200,36))
def test_read_eigenval():
    # filepath = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/EIGENVAL"
    # read_eigenval(filepath)
    fileh = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/e/EIGENVAL"  # replace with the correct path for your h file
    filee = "/Users/zrzrv5/Downloads/CDFTpython/work/task0/h/EIGENVAL"  # replace with the correct path for your e file

    # Parse blocks of data
    Nbands, NK = get_band_and_k_points(filee)
    blocks_h = list(parse_blocks(fileh, Nbands))
    blocks_e = list(parse_blocks(filee, Nbands))

    # Loop through each K point
    result = []
    for k in range(min(len(blocks_h), len(blocks_e))):  # use the number of blocks instead of NK
        block_h = blocks_h[k]
        block_e = blocks_e[k]

        avg_data = calculate_avg_band_data(block_h, block_e)

        band_occupations = get_band_occupations(avg_data)
        print(band_occupations)
        result.append(band_occupations)
    print(result)

def main():
    parser = argparse.ArgumentParser()

    parser.add_argument("folder",help="folder containing POSCAR, INCAR, POTCAR, KPOINTS")
    parser.add_argument('--name',nargs='+', type=str,help="electron count for each element eg. {'Cd':12,'Te':6}")
    parser.add_argument('--count',nargs='+', type=int,help="electron count for each element eg. {'Cd':12,'Te':6}")

    parser.add_argument('--nt', type=int,required=True,help="number of atoms per 1 electron eg. 200")
    parser.add_argument("--ncore", type=int,required=True,help="number of cores used in the calculation eg. 36")
    parser.add_argument("--exe", type=str,required=True,help="vasp executable eg. vasp_std")
    args = parser.parse_args()

    # 1. read poscar
    # make sure folder exists
    init_poscar = os.path.join(args.folder,"POSCAR")
    assert os.path.exists(init_poscar)

    e_count = {args.name[i]:args.count[i] for i in range(len(args.name))}

    nELECT_e,nELECT_h,n_band,total_e = read_poscar(init_poscar,e_count,args.nt,args.ncore)
    # 2. make subtasks
    make_subtasks(args.folder,nELECT_e,nELECT_h,n_band)
    # 3. run subtasks
    run_subtasks(args.folder,args.exe)
    # 4. average e and h
    result = extract_bandinfo(args.folder)
    # 5. modify INCAR, prepare for main calculation
    modify_main_incar(args.folder,n_band,total_e,result)



if __name__ == "__main__":
    main()

    
# test_read_poscar()
# test_read_eigenval()



    

尝试在dpgen machine.json里修改vasp命令实现完全自动化    
            "command": "python /work/qan/kluo/CDFTpython/CDFT.py ./ --name Cd Te --count 12 6 --nt 200 --ncore 64 --exe \"srun -n 64 /work/qan/codes/VASP/bin/vasp_std\" && srun -n 64 /work/qan/codes/VASP/bin/vasp_std",
#### 

   

hikun的地下仓库

搜索此博客

Constrained Density Functional Theory (CDFT) 的实现策略

标签

评论

发表评论

此博客中的热门博文

dpgen数据收集分类续算

dpgen simplify 数据精简二次处理

python相关