PIC scripts

`utilities.PIC_scripts.PIC_check_simulations`

PIC simulation checker.

With this script, we count how many of the simulations run using HTCondor have failed.

We save the directory name of those simulations which failed in a csv called failed_folder.csv.

Display help message to run the code:

python PIC_check_simulations.py --help

Displays all the relevant arguments that can be used.

Authors:

Celsa Pardo Araujo (pardo@csic.es)

`check_simulations(args)`

Checking how many simulations run using HTCondor have failed. We save the name of the output folder for each of the failed simulations in the failed_folders.csv file.

Parameters:

Name	Type	Description	Default
`args`	`Namespace`	An argparse.Namespace object containing the following attributes: output_dir_simulation (pathlib.Path): Output directory where the simulation outputs are. dir_failed_folder_csv (str): Path to the directory where the csv file containing the list of failed folders is saved.	required

Source code in utilities/PIC_scripts/PIC_check_simulations.py

def check_simulations(args: argparse.Namespace) -> None:
    """
    Checking how many simulations run using HTCondor have failed.
    We save the name of the output folder for each of the failed simulations in the failed_folders.csv file.

    Args:
        args (argparse.Namespace): An argparse.Namespace object containing the following attributes:

            - output_dir_simulation (pathlib.Path): Output directory where the simulation outputs are.
            - dir_failed_folder_csv (str): Path to the directory where the csv file containing the list of failed
                folders is saved.
    """

    output_simulations_path = args.output_dir_simulation
    list_directories = os.listdir(output_simulations_path)

    failed_folder_csv_path = args.dir_failed_folder_csv

    count_error = 0
    fail_simulation = []

    for folder in list_directories:
        simulations_directory = pathlib.Path().joinpath(
            output_simulations_path, folder
        )
        if os.path.isdir(simulations_directory):
            files_in_directory = os.listdir(simulations_directory)

            # If the simulations have finished successfully, then 8 files will be located within each output folder.
            # Hence, by checking the file number, we can identify those simulations that have failed.
            if len(files_in_directory) < 8:
                count_error += 1
                fail_simulation.append(os.path.basename(simulations_directory))
    df = pd.DataFrame(data={"folder": fail_simulation})
    df.to_csv(
        pathlib.Path().joinpath(failed_folder_csv_path, "failed_folders.csv")
    )

`utilities.PIC_scripts.PIC_generate_htcondor_failed`

Generating HTCondor files for failed simulations at the PIC.

With this script, we create the structure needed to rerun the simulations that have failed when running the whole set of simulations using HTCondor. Note that in this case, we run each simulation one by one, i.e., one simulation per job to prevent MaxwallTime problems.

Note that to run this script, we first need to run the check_simulation.py script to generate the failed_folders.csv file.

Display help message to run the code:

python PIC_generate_htcondor_failed.py --help

Displays all the relevant arguments that can be used.

Authors:

Celsa Pardo Araujo (pardo@csic.es)

`generate_htcondor_failed(args)`

Create the submit file and wrapper file necessary to relaunch the failed simulations.

Parameters:

Name	Type	Description	Default
`args`	`Namespace`	An argparse.Namespace object containing the following attributes: output_dir_simulation (pathlib.Path): Output directory where the simulation outputs are. number_sim_job (int): Number of simulations to run per job. dyn_data (pathlib.Path): Path to the file where the dynamically evolved population database is stored. type_simulation (str): Type of simulation that we want to run in the PIC with HTCondor. Choose between dyn or magrot.	required

Source code in utilities/PIC_scripts/PIC_generate_htcondor_failed.py

def generate_htcondor_failed(args: argparse.Namespace) -> None:
    """
    Create the submit file and wrapper file necessary to relaunch the failed simulations.

    Args:
        args (argparse.Namespace): An argparse.Namespace object containing the following attributes:

            - output_dir_simulation (pathlib.Path): Output directory where the simulation outputs are.
            - number_sim_job (int): Number of simulations to run per job.
            - dyn_data (pathlib.Path): Path to the file where the dynamically evolved population database is stored.
            - type_simulation (str): Type of simulation that we want to run in the PIC with HTCondor.
                Choose between dyn or magrot.
    """
    output_simulations_path = pathlib.Path(args.output_dir_simulation)

    # If we want to copy and check the error.txt and out.txt of the failed simulations from HTCondor
    # in order to assess the reason for failure, the following variable should be set to True.
    check_simulations_failed = False

    # Reading the `failed_folders.csv` file, where the simulations that have failed are saved using the
    # `PIC_check_simulations.py` script.

    data_failed_folder = pd.read_csv(
        pathlib.Path().joinpath(
            args.dir_failed_folder_csv, "failed_folders.csv"
        )
    )
    list_failed_folder = data_failed_folder["folder"]

    directory_path = output_simulations_path.parents[0]
    failed_simulation_path = pathlib.Path().joinpath(
        directory_path, "failed_simulations"
    )
    output_failed_simulations = pathlib.Path().joinpath(
        failed_simulation_path, "output_simulations"
    )
    htcondor_failed_submit_path = pathlib.Path().joinpath(
        failed_simulation_path, "htcondor_submit"
    )
    htcondor_failed_output_path = pathlib.Path().joinpath(
        failed_simulation_path, "htcondor_output"
    )

    failed_simulation_path.mkdir(exist_ok=True)
    output_failed_simulations.mkdir(exist_ok=True)
    htcondor_failed_submit_path.mkdir(exist_ok=True)
    htcondor_failed_output_path.mkdir(exist_ok=True)

    for i, simulation in enumerate(list_failed_folder):
        simulation = f"{simulation:06}"
        # Copy the folder with the failed simulations output to the failed_simulation folder.
        simulation_folder = pathlib.Path().joinpath(
            output_simulations_path, simulation
        )

        shutil.copytree(
            simulation_folder,
            pathlib.Path().joinpath(output_failed_simulations, simulation),
            dirs_exist_ok=True,
        )

        # Copying the err.txt and out.txt files of the failed simulations from the htcondor_output folder
        # if this is needed to check why the simulations have failed.
        # Note that each of the out.txt and err.txt files contain _stdout and _sterr from more than one simulation.
        # Specifically, there will be as many as the args.number_sim_job.

        if check_simulations_failed:
            out_txt_number = int(int(simulation) / args.number_sim_job) + 1

            htcondor_output_path = pathlib.Path().joinpath(
                directory_path, "HTCondor_output"
            )

            out_txt_path = pathlib.Path().joinpath(
                htcondor_output_path, str(out_txt_number) + "-out.txt"
            )
            err_txt_path = pathlib.Path().joinpath(
                htcondor_output_path, str(out_txt_number) + "-error.txt"
            )
            out_failed_txt_path = pathlib.Path().joinpath(
                htcondor_failed_output_path, str(out_txt_number) + "-out.txt"
            )
            err_failed_txt_path = pathlib.Path().joinpath(
                htcondor_failed_output_path, str(out_txt_number) + "-error.txt"
            )

            shutil.copy(out_txt_path, out_failed_txt_path)
            shutil.copy(err_txt_path, err_failed_txt_path)

    # Creating the new submit and argument.txt files for relaunching the failed simulations.
    failed_arguments_path = pathlib.Path().joinpath(
        htcondor_failed_submit_path, "arguments_.txt"
    )
    failed_submit_path = pathlib.Path().joinpath(
        htcondor_failed_submit_path, "job.submit"
    )
    failed_wrapper_path = pathlib.Path().joinpath(
        htcondor_failed_submit_path, "wrapper.sh"
    )

    list_arguments = [
        str(output_failed_simulations)
        + "/"
        + f"{path:06}"
        + " "
        + str(output_failed_simulations)
        + "/"
        + f"{path:06}"
        + "/override.json"
        for path in list_failed_folder
    ]
    np.savetxt(failed_arguments_path, list_arguments, fmt="%s")

    # Writing the HTCondor submit files, which specify the relevant arguments, output/error paths and queue structure.
    with open(failed_submit_path, "w") as f:
        f.write("universe        = vanilla \n")
        f.write("executable      = " + str(failed_wrapper_path) + "\n")
        f.write("arguments       = $(arg1) $(arg2) \n")
        f.write("output          = $(arg1)/out.txt \n")
        f.write("error           = $(arg1)/error.txt \n")
        f.write("log             = $(arg1)/log.txt \n")
        f.write("Queue arg1 arg2 from " + str(failed_arguments_path) + "\n")
        f.close()

    # We will create the wrapper file according to the args.type_simulation argument. For example,
    # if it is equal to 'dyn', we generate a wrapper file that will execute the dynamical simulations.
    if args.type_simulation == "dyn":
        exec_command = "python /data/magnesia/software/ML-Poppyns/mlpoppyns/simulator/simulate_population_dyn.py --save_dir $1 --parameter_override $2  \n"

    elif args.type_simulation == "magrot":
        exec_command = (
            "python /data/magnesia/software/ML-Poppyns/mlpoppyns/simulator/simulate_population_magrot_det.py --dyn_data "
            + str(args.dyn_data)
            + " --save_dir $1 --parameter_override $2 \n"
        )
    else:
        raise ValueError(
            "The specified simulation type is not feasible. Choose between dyn or magrot."
        )

    with open(failed_wrapper_path, "w") as f:
        f.write("#!/bin/bash \n")
        f.write("\n")
        f.write(
            "export PATH=/data/astro/software/centos7/conda/mambaforge_4.14.0/bin:$PATH\n"
        )
        f.write("conda init bash\n")

        f.write(
            "source /data/astro/software/centos7/conda/mambaforge_4.14.0/etc/profile.d/conda.sh\n"
        )
        f.write(
            "conda activate /data/magnesia/scratch_ssd/conda/envs/pop_syn\n"
        )
        f.write(
            "# We copy the mlpoppyns module in the working node to avoid problems with the path while running the simulations in the server.\n"
        )
        f.write("cp -R /data/magnesia/software/ML-Poppyns/mlpoppyns .\n")
        f.write(exec_command)
        f.write("conda deactivate")
        f.close()

`utilities.PIC_scripts.PIC_generate_htcondor_submit`

Generation of the HTCondor utilities.

We generate all the utilities needed to run the whole set of simulations in chunks using HTCondor. Each job will run each chunk of simulations for a different set of initial parameters.

Note that this script needs the output from the parameter_sweeper.py script.

The submit files for each job, the arguments.txt and a wrapper are saved in the path specified with the command line argument --output_dir_htcondor.

Display help message to run the code:

python PIC_generate_htcondor_submit.py --help

Displays all the relevant arguments that can be used.

Authors:

Celsa Pardo Araujo (pardo@csic.es)

`generate_job_submit(path_output, path_arguments)`

Create all the submit files.

Parameters:

Name	Type	Description	Default
`path_output`	`Path`	Output directory for the submit file.	required
`path_arguments`	`Path`	Path for the simulation arguments.	required

Source code in utilities/PIC_scripts/PIC_generate_htcondor_submit.py

def generate_job_submit(
    path_output: pathlib.Path, path_arguments: pathlib.Path
) -> None:
    """
    Create all the submit files.

    Args:
        path_output (pathlib.Path): Output directory for the submit file.
        path_arguments (pathlib.Path): Path for the simulation arguments.
    """
    # Path where each submit file will be saved.
    path_submit = pathlib.Path().joinpath(path_output, "job.submit")

    # Writing the HTCondor submit file, specifying the relevant arguments, output/error paths and queue structure.
    with open(path_submit, "w") as f:
        f.write("universe        = vanilla \n")
        f.write("executable      = " + str(path_output) + "/wrapper.sh \n")
        f.write(
            "output          = " + str(path_output) + "/$(ProcId)-out.txt \n"
        )
        f.write(
            "error           = " + str(path_output) + "/$(ProcId)-error.txt \n"
        )
        f.write("Queue arguments from " + str(path_arguments) + "\n")
        f.close()

`generate_wrapper(type_simulation, dyn_path, path_wrapper)`

Create all the wrapper files.

Parameters:

Name	Type	Description	Default
`type_simulation`	`str`	String with the type of simulation we want to run.	required
`path_wrapper`	`Path`	Output directory for the wrapper file.	required
`dyn_path`	`Path`	Path to where the dynamically evolved population database is stored.	required

Source code in utilities/PIC_scripts/PIC_generate_htcondor_submit.py

def generate_wrapper(
    type_simulation: str,
    dyn_path: pathlib.Path,
    path_wrapper: pathlib.Path,
) -> None:
    """
    Create all the wrapper files.

    Args:
        type_simulation (str): String with the type of simulation we want to run.
        path_wrapper (pathlib.Path): Output directory for the wrapper file.
        dyn_path (pathlib.Path): Path to where the dynamically evolved population database is stored.
    """

    # Writing the `wrapper.sh` file where we loop over the lines of the `"/arguments_job" + str(j + 1) + ".txt"` file.
    # We will create the wrapper file according to the args.type_simulation. For example,
    # if it is equal to 'dyn', we generate a wrapper file that will execute the dynamical simulations.

    if type_simulation == "dyn":
        exec_command = "python /data/magnesia/software/ML-Poppyns/mlpoppyns/simulator/simulate_population_dyn.py --save_dir ${a[0]} --parameter_override ${a[1]}  \n"

    elif type_simulation == "magrot":
        exec_command = (
            "python /data/magnesia/software/ML-Poppyns/mlpoppyns/simulator/simulate_population_magrot_det.py --dyn_data "
            + str(dyn_path)
            + " --save_dir ${a[0]} --parameter_override ${a[1]} \n"
        )
    else:
        raise ValueError(
            "The specified simulation type is not feasible, choose between dyn or magrot."
        )

    with open(path_wrapper, "w") as f:
        f.write("#!/bin/bash \n")
        f.write("\n")
        f.write(
            "export PATH=/data/astro/software/centos7/conda/mambaforge_4.14.0/bin:$PATH\n"
        )
        f.write("conda init bash\n")

        f.write(
            "source /data/astro/software/centos7/conda/mambaforge_4.14.0/etc/profile.d/conda.sh\n"
        )
        f.write(
            "conda activate /data/magnesia/scratch_ssd/conda/envs/pop_syn\n"
        )
        f.write(
            "# We copy the mlpoppyns module in the working node to avoid problems with the path while running the simulations in the server.\n"
        )
        f.write("cp -R /data/magnesia/software/ML-Poppyns/mlpoppyns .\n")
        f.write("filename=$1 \n")
        f.write("while read line; do \n")
        f.write("#Reading each line. \n")
        f.write("myarr[$index]=$line \n")
        f.write("#Extract each element from the lines. \n")
        f.write("a=(${myarr[$index]})\n")
        f.write(exec_command)
        f.write("done < $filename\n")
        f.write("conda deactivate")
        f.close()

`submit_generator(args)`

Generate HTCondor submit files for running simulations in batches.

This function takes command-line arguments and generates the necessary files for submitting simulations to an HTCondor cluster. It creates a directory structure with one folder per week, where each week's folder contains argument files for individual jobs and a submit file.

Parameters:

Name	Type	Description	Default
`args`	`Namespace`	An argparse.Namespace object containing the following attributes: output_dir_htcondor (str): Path to the directory where the generated files will be saved. output_dir_simulation (str): Path to the directory containing the simulation parameter files. n_sim_job (int): Number of simulations to run per job. n_sim_week (int): Number of simulations to run per week. dyn_data (str): Path to the dynamically evolved population database file (if using the `simulate_population_magrot_det` simulator). type_simulation (str): Type of simulation to run, either 'dyn' or 'magrot'.	required

Source code in utilities/PIC_scripts/PIC_generate_htcondor_submit.py

def submit_generator(args: argparse.Namespace) -> None:
    """
    Generate HTCondor submit files for running simulations in batches.

    This function takes command-line arguments and generates the necessary files
    for submitting simulations to an HTCondor cluster. It creates a directory
    structure with one folder per week, where each week's folder contains
    argument files for individual jobs and a submit file.

    Args:
        args (argparse.Namespace): An argparse.Namespace object containing the following attributes:

            - output_dir_htcondor (str): Path to the directory where the
                generated files will be saved.
            - output_dir_simulation (str): Path to the directory containing the
                 simulation parameter files.
            - n_sim_job (int): Number of simulations to run per job.
            - n_sim_week (int): Number of simulations to run per week.
            - dyn_data (str): Path to the dynamically evolved population database
                 file (if using the `simulate_population_magrot_det` simulator).
            - type_simulation (str): Type of simulation to run, either 'dyn' or 'magrot'.
    """

    common_path = cfg["path_to_output"]

    simulation_output_path = pathlib.Path(args.output_dir_simulation)

    output_htcondor_path = pathlib.Path(args.output_dir_htcondor)

    output_htcondor_path.mkdir(parents=True, exist_ok=True)

    # Reading the txt generated by the ´parameter_sweeper.py´ that contains the simulation arguments.
    simulation_arguments = np.genfromtxt(
        pathlib.Path().joinpath(
            simulation_output_path, "simulation_arguments.txt"
        ),
        dtype=str,
    )

    n_sim_total = len(simulation_arguments)

    # Determine the number of weeks required to run all the simulations, assuming that
    # args.n_sim_week simulations can be run each week.
    if n_sim_total % args.n_sim_week == 0:
        n_week = int(n_sim_total / args.n_sim_week)
    else:
        n_week = int(n_sim_total / args.n_sim_week) + 1

    for i in range(n_week):
        # Creating one folder per week.
        week_folder_path = pathlib.Path().joinpath(
            output_htcondor_path, "week-" + str(i)
        )
        week_folder_path.mkdir(parents=True, exist_ok=True)

        # Taking the chunk of simulations that we want to run this week.
        sim_week = simulation_arguments[
            i * args.n_sim_week : (i + 1) * args.n_sim_week
        ]

        # We need to count how many simulations we have to run each week, since for
        # ´n_sim_total % args.n_sim_week != 0´, we will end up with fewer simulations
        # than args.n_sim_week in the last week.
        n_sim_folder = len(sim_week)

        # Calculate the number of ´argument_job<j>.txt´ files required to have
        # ´args.n_sim_job´ simulations per job, i.e., ´args.n_sim_job´ lines in each ´argument_job<j>.txt´ .

        if n_sim_folder % args.n_sim_job == 0:
            n_args = int(n_sim_folder / args.n_sim_job)
        else:
            n_args = int(n_sim_folder / args.n_sim_job) + 1

        # Create the ´wrapper.sh´ file.
        path_wrapper = pathlib.Path().joinpath(week_folder_path, "wrapper.sh")

        generate_wrapper(args.type_simulation, args.dyn_data, path_wrapper)

        # Create the ´.submit ´ file.
        path_arguments = pathlib.Path().joinpath(
            week_folder_path, "arguments_week.txt"
        )
        generate_job_submit(week_folder_path, path_arguments)

        list_arguments_week = []

        # Looping through the `n_args` to create each `"/arguments_job" + str(j + 1) + ".txt"` file.
        for j in range(n_args):
            chunk_sim_job = sim_week[
                j * args.n_sim_job : (j + 1) * args.n_sim_job
            ]

            path_arguments_job = pathlib.Path().joinpath(
                common_path,
                str(week_folder_path) + "/arguments_job" + str(j + 1) + ".txt",
            )

            np.savetxt(path_arguments_job, chunk_sim_job, fmt="%s")

            # List of all the paths of each `"/arguments_job" + str(j + 1) + ".txt"` file.
            list_arguments_week.append(path_arguments_job)

        np.savetxt(path_arguments, list_arguments_week, fmt="%s")

`utilities.PIC_scripts.PIC_manage_failed_simulation`

Managing of failed simulations at PIC.

After the failed simulations have been launched again and finished successfully, we use this script to transfer the new output back to the original folders.

Display help message to run the code:

python PIC_manage_failed_simulation.py --help

Displays all the relevant arguments that can be used.

Authors:

Celsa Pardo Araujo (pardo@csic.es)

`manage_failed_simulations(args)`

Copying the successfully relaunched output back to the original folders.

Parameters:

Name	Type	Description	Default
`args`	`Namespace`	An argparse.Namespace object containing the following attributes: simulation_dir (pathlib.Path): Output directory where the simulation outputs are. failed_simulation_dir (pathlib.Path): Output directory where the failed simulation outputs are.	required

Source code in utilities/PIC_scripts/PIC_manage_failed_simulation.py

def manage_failed_simulations(args: argparse.Namespace) -> None:
    """
    Copying the successfully relaunched output back to the original folders.

    Args:
        args (argparse.Namespace): An argparse.Namespace object containing the following attributes:

            - simulation_dir (pathlib.Path): Output directory where the simulation outputs are.
            - failed_simulation_dir (pathlib.Path): Output directory where the failed simulation outputs are.
    """

    output_simulations_path = args.simulation_dir
    failed_simulation_path = args.failed_simulation_dir

    output_failed_simulations = pathlib.Path().joinpath(
        failed_simulation_path, "output_simulations"
    )

    list_directories_ = os.listdir(output_failed_simulations)

    for folder in list_directories_:
        folder_failed_path = pathlib.Path().joinpath(
            output_failed_simulations, folder
        )
        folder_original_path = pathlib.Path().joinpath(
            output_simulations_path, folder
        )

        # We avoid to move the override.json to prevent permissions issues.
        # Copying these files is not needed as they are the same as the original ones.
        for files in os.listdir(folder_failed_path):
            if files != "override.json":
                files_path = pathlib.Path().joinpath(folder_failed_path, files)
                files_original_path = pathlib.Path().joinpath(
                    folder_original_path, files
                )
                shutil.copy(files_path, files_original_path)