Skip to content

PIC scripts

utilities.PIC_scripts.PIC_check_simulations

PIC simulation checker.

With this script, we count how many of the simulations run using HTCondor have failed.

We save the directory name of those simulations which failed in a csv called failed_folder.csv.

Display help message to run the code:

python PIC_check_simulations.py --help

Displays all the relevant arguments that can be used.

Authors:

Celsa Pardo Araujo (pardo@csic.es)

check_simulations(args)

Checking how many simulations run using HTCondor have failed. We save the name of the output folder for each of the failed simulations in the failed_folders.csv file.

Parameters:

Name Type Description Default
args Namespace

An argparse.Namespace object containing the following attributes:

  • output_dir_simulation (pathlib.Path): Output directory where the simulation outputs are.
  • dir_failed_folder_csv (str): Path to the directory where the csv file containing the list of failed folders is saved.
required
Source code in utilities/PIC_scripts/PIC_check_simulations.py
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def check_simulations(args: argparse.Namespace) -> None:
    """
    Checking how many simulations run using HTCondor have failed.
    We save the name of the output folder for each of the failed simulations in the failed_folders.csv file.

    Args:
        args (argparse.Namespace): An argparse.Namespace object containing the following attributes:

            - output_dir_simulation (pathlib.Path): Output directory where the simulation outputs are.
            - dir_failed_folder_csv (str): Path to the directory where the csv file containing the list of failed
                folders is saved.
    """

    output_simulations_path = args.output_dir_simulation
    list_directories = os.listdir(output_simulations_path)

    failed_folder_csv_path = args.dir_failed_folder_csv

    count_error = 0
    fail_simulation = []

    for folder in list_directories:
        simulations_directory = pathlib.Path().joinpath(
            output_simulations_path, folder
        )
        if os.path.isdir(simulations_directory):
            files_in_directory = os.listdir(simulations_directory)

            # If the simulations have finished successfully, then 8 files will be located within each output folder.
            # Hence, by checking the file number, we can identify those simulations that have failed.
            if len(files_in_directory) < 8:
                count_error += 1
                fail_simulation.append(os.path.basename(simulations_directory))
    df = pd.DataFrame(data={"folder": fail_simulation})
    df.to_csv(
        pathlib.Path().joinpath(failed_folder_csv_path, "failed_folders.csv")
    )

utilities.PIC_scripts.PIC_generate_htcondor_failed

Generating HTCondor files for failed simulations at the PIC.

With this script, we create the structure needed to rerun the simulations that have failed when running the whole set of simulations using HTCondor. Note that in this case, we run each simulation one by one, i.e., one simulation per job to prevent MaxwallTime problems.

Note that to run this script, we first need to run the check_simulation.py script to generate the failed_folders.csv file.

Display help message to run the code:

python PIC_generate_htcondor_failed.py --help

Displays all the relevant arguments that can be used.

Authors:

Celsa Pardo Araujo (pardo@csic.es)

generate_htcondor_failed(args)

Create the submit file and wrapper file necessary to relaunch the failed simulations.

Parameters:

Name Type Description Default
args Namespace

An argparse.Namespace object containing the following attributes:

  • output_dir_simulation (pathlib.Path): Output directory where the simulation outputs are.
  • number_sim_job (int): Number of simulations to run per job.
  • dyn_data (pathlib.Path): Path to the file where the dynamically evolved population database is stored.
  • type_simulation (str): Type of simulation that we want to run in the PIC with HTCondor. Choose between dyn or magrot.
required
Source code in utilities/PIC_scripts/PIC_generate_htcondor_failed.py
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
def generate_htcondor_failed(args: argparse.Namespace) -> None:
    """
    Create the submit file and wrapper file necessary to relaunch the failed simulations.

    Args:
        args (argparse.Namespace): An argparse.Namespace object containing the following attributes:

            - output_dir_simulation (pathlib.Path): Output directory where the simulation outputs are.
            - number_sim_job (int): Number of simulations to run per job.
            - dyn_data (pathlib.Path): Path to the file where the dynamically evolved population database is stored.
            - type_simulation (str): Type of simulation that we want to run in the PIC with HTCondor.
                Choose between dyn or magrot.
    """
    output_simulations_path = pathlib.Path(args.output_dir_simulation)

    # If we want to copy and check the error.txt and out.txt of the failed simulations from HTCondor
    # in order to assess the reason for failure, the following variable should be set to True.
    check_simulations_failed = False

    # Reading the `failed_folders.csv` file, where the simulations that have failed are saved using the
    # `PIC_check_simulations.py` script.

    data_failed_folder = pd.read_csv(
        pathlib.Path().joinpath(
            args.dir_failed_folder_csv, "failed_folders.csv"
        )
    )
    list_failed_folder = data_failed_folder["folder"]

    directory_path = output_simulations_path.parents[0]
    failed_simulation_path = pathlib.Path().joinpath(
        directory_path, "failed_simulations"
    )
    output_failed_simulations = pathlib.Path().joinpath(
        failed_simulation_path, "output_simulations"
    )
    htcondor_failed_submit_path = pathlib.Path().joinpath(
        failed_simulation_path, "htcondor_submit"
    )
    htcondor_failed_output_path = pathlib.Path().joinpath(
        failed_simulation_path, "htcondor_output"
    )

    failed_simulation_path.mkdir(exist_ok=True)
    output_failed_simulations.mkdir(exist_ok=True)
    htcondor_failed_submit_path.mkdir(exist_ok=True)
    htcondor_failed_output_path.mkdir(exist_ok=True)

    for i, simulation in enumerate(list_failed_folder):
        simulation = f"{simulation:06}"
        # Copy the folder with the failed simulations output to the failed_simulation folder.
        simulation_folder = pathlib.Path().joinpath(
            output_simulations_path, simulation
        )

        shutil.copytree(
            simulation_folder,
            pathlib.Path().joinpath(output_failed_simulations, simulation),
            dirs_exist_ok=True,
        )

        # Copying the err.txt and out.txt files of the failed simulations from the htcondor_output folder
        # if this is needed to check why the simulations have failed.
        # Note that each of the out.txt and err.txt files contain _stdout and _sterr from more than one simulation.
        # Specifically, there will be as many as the args.number_sim_job.

        if check_simulations_failed:
            out_txt_number = int(int(simulation) / args.number_sim_job) + 1

            htcondor_output_path = pathlib.Path().joinpath(
                directory_path, "HTCondor_output"
            )

            out_txt_path = pathlib.Path().joinpath(
                htcondor_output_path, str(out_txt_number) + "-out.txt"
            )
            err_txt_path = pathlib.Path().joinpath(
                htcondor_output_path, str(out_txt_number) + "-error.txt"
            )
            out_failed_txt_path = pathlib.Path().joinpath(
                htcondor_failed_output_path, str(out_txt_number) + "-out.txt"
            )
            err_failed_txt_path = pathlib.Path().joinpath(
                htcondor_failed_output_path, str(out_txt_number) + "-error.txt"
            )

            shutil.copy(out_txt_path, out_failed_txt_path)
            shutil.copy(err_txt_path, err_failed_txt_path)

    # Creating the new submit and argument.txt files for relaunching the failed simulations.
    failed_arguments_path = pathlib.Path().joinpath(
        htcondor_failed_submit_path, "arguments_.txt"
    )
    failed_submit_path = pathlib.Path().joinpath(
        htcondor_failed_submit_path, "job.submit"
    )
    failed_wrapper_path = pathlib.Path().joinpath(
        htcondor_failed_submit_path, "wrapper.sh"
    )

    list_arguments = [
        str(output_failed_simulations)
        + "/"
        + f"{path:06}"
        + " "
        + str(output_failed_simulations)
        + "/"
        + f"{path:06}"
        + "/override.json"
        for path in list_failed_folder
    ]
    np.savetxt(failed_arguments_path, list_arguments, fmt="%s")

    # Writing the HTCondor submit files, which specify the relevant arguments, output/error paths and queue structure.
    with open(failed_submit_path, "w") as f:
        f.write("universe        = vanilla \n")
        f.write("executable      = " + str(failed_wrapper_path) + "\n")
        f.write("arguments       = $(arg1) $(arg2) \n")
        f.write("output          = $(arg1)/out.txt \n")
        f.write("error           = $(arg1)/error.txt \n")
        f.write("log             = $(arg1)/log.txt \n")
        f.write("Queue arg1 arg2 from " + str(failed_arguments_path) + "\n")
        f.close()

    # We will create the wrapper file according to the args.type_simulation argument. For example,
    # if it is equal to 'dyn', we generate a wrapper file that will execute the dynamical simulations.
    if args.type_simulation == "dyn":
        exec_command = "python /data/magnesia/software/ML-Poppyns/mlpoppyns/simulator/simulate_population_dyn.py --save_dir $1 --parameter_override $2  \n"

    elif args.type_simulation == "magrot":
        exec_command = (
            "python /data/magnesia/software/ML-Poppyns/mlpoppyns/simulator/simulate_population_magrot_det.py --dyn_data "
            + str(args.dyn_data)
            + " --save_dir $1 --parameter_override $2 \n"
        )
    else:
        raise ValueError(
            "The specified simulation type is not feasible. Choose between dyn or magrot."
        )

    with open(failed_wrapper_path, "w") as f:
        f.write("#!/bin/bash \n")
        f.write("\n")
        f.write(
            "export PATH=/data/astro/software/centos7/conda/mambaforge_4.14.0/bin:$PATH\n"
        )
        f.write("conda init bash\n")

        f.write(
            "source /data/astro/software/centos7/conda/mambaforge_4.14.0/etc/profile.d/conda.sh\n"
        )
        f.write(
            "conda activate /data/magnesia/scratch_ssd/conda/envs/pop_syn\n"
        )
        f.write(
            "# We copy the mlpoppyns module in the working node to avoid problems with the path while running the simulations in the server.\n"
        )
        f.write("cp -R /data/magnesia/software/ML-Poppyns/mlpoppyns .\n")
        f.write(exec_command)
        f.write("conda deactivate")
        f.close()

utilities.PIC_scripts.PIC_generate_htcondor_submit

Generation of the HTCondor utilities.

We generate all the utilities needed to run the whole set of simulations in chunks using HTCondor. Each job will run each chunk of simulations for a different set of initial parameters.

Note that this script needs the output from the parameter_sweeper.py script.

The submit files for each job, the arguments.txt and a wrapper are saved in the path specified with the command line argument --output_dir_htcondor.

Display help message to run the code:

python PIC_generate_htcondor_submit.py --help

Displays all the relevant arguments that can be used.

Authors:

Celsa Pardo Araujo (pardo@csic.es)

generate_job_submit(path_output, path_arguments)

Create all the submit files.

Parameters:

Name Type Description Default
path_output Path

Output directory for the submit file.

required
path_arguments Path

Path for the simulation arguments.

required
Source code in utilities/PIC_scripts/PIC_generate_htcondor_submit.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def generate_job_submit(
    path_output: pathlib.Path, path_arguments: pathlib.Path
) -> None:
    """
    Create all the submit files.

    Args:
        path_output (pathlib.Path): Output directory for the submit file.
        path_arguments (pathlib.Path): Path for the simulation arguments.
    """
    # Path where each submit file will be saved.
    path_submit = pathlib.Path().joinpath(path_output, "job.submit")

    # Writing the HTCondor submit file, specifying the relevant arguments, output/error paths and queue structure.
    with open(path_submit, "w") as f:
        f.write("universe        = vanilla \n")
        f.write("executable      = " + str(path_output) + "/wrapper.sh \n")
        f.write(
            "output          = " + str(path_output) + "/$(ProcId)-out.txt \n"
        )
        f.write(
            "error           = " + str(path_output) + "/$(ProcId)-error.txt \n"
        )
        f.write("Queue arguments from " + str(path_arguments) + "\n")
        f.close()

generate_wrapper(type_simulation, dyn_path, path_wrapper)

Create all the wrapper files.

Parameters:

Name Type Description Default
type_simulation str

String with the type of simulation we want to run.

required
path_wrapper Path

Output directory for the wrapper file.

required
dyn_path Path

Path to where the dynamically evolved population database is stored.

required
Source code in utilities/PIC_scripts/PIC_generate_htcondor_submit.py
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
def generate_wrapper(
    type_simulation: str,
    dyn_path: pathlib.Path,
    path_wrapper: pathlib.Path,
) -> None:
    """
    Create all the wrapper files.

    Args:
        type_simulation (str): String with the type of simulation we want to run.
        path_wrapper (pathlib.Path): Output directory for the wrapper file.
        dyn_path (pathlib.Path): Path to where the dynamically evolved population database is stored.
    """

    # Writing the `wrapper.sh` file where we loop over the lines of the `"/arguments_job" + str(j + 1) + ".txt"` file.
    # We will create the wrapper file according to the args.type_simulation. For example,
    # if it is equal to 'dyn', we generate a wrapper file that will execute the dynamical simulations.

    if type_simulation == "dyn":
        exec_command = "python /data/magnesia/software/ML-Poppyns/mlpoppyns/simulator/simulate_population_dyn.py --save_dir ${a[0]} --parameter_override ${a[1]}  \n"

    elif type_simulation == "magrot":
        exec_command = (
            "python /data/magnesia/software/ML-Poppyns/mlpoppyns/simulator/simulate_population_magrot_det.py --dyn_data "
            + str(dyn_path)
            + " --save_dir ${a[0]} --parameter_override ${a[1]} \n"
        )
    else:
        raise ValueError(
            "The specified simulation type is not feasible, choose between dyn or magrot."
        )

    with open(path_wrapper, "w") as f:
        f.write("#!/bin/bash \n")
        f.write("\n")
        f.write(
            "export PATH=/data/astro/software/centos7/conda/mambaforge_4.14.0/bin:$PATH\n"
        )
        f.write("conda init bash\n")

        f.write(
            "source /data/astro/software/centos7/conda/mambaforge_4.14.0/etc/profile.d/conda.sh\n"
        )
        f.write(
            "conda activate /data/magnesia/scratch_ssd/conda/envs/pop_syn\n"
        )
        f.write(
            "# We copy the mlpoppyns module in the working node to avoid problems with the path while running the simulations in the server.\n"
        )
        f.write("cp -R /data/magnesia/software/ML-Poppyns/mlpoppyns .\n")
        f.write("filename=$1 \n")
        f.write("while read line; do \n")
        f.write("#Reading each line. \n")
        f.write("myarr[$index]=$line \n")
        f.write("#Extract each element from the lines. \n")
        f.write("a=(${myarr[$index]})\n")
        f.write(exec_command)
        f.write("done < $filename\n")
        f.write("conda deactivate")
        f.close()

submit_generator(args)

Generate HTCondor submit files for running simulations in batches.

This function takes command-line arguments and generates the necessary files for submitting simulations to an HTCondor cluster. It creates a directory structure with one folder per week, where each week's folder contains argument files for individual jobs and a submit file.

Parameters:

Name Type Description Default
args Namespace

An argparse.Namespace object containing the following attributes:

  • output_dir_htcondor (str): Path to the directory where the generated files will be saved.
  • output_dir_simulation (str): Path to the directory containing the simulation parameter files.
  • n_sim_job (int): Number of simulations to run per job.
  • n_sim_week (int): Number of simulations to run per week.
  • dyn_data (str): Path to the dynamically evolved population database file (if using the simulate_population_magrot_det simulator).
  • type_simulation (str): Type of simulation to run, either 'dyn' or 'magrot'.
required
Source code in utilities/PIC_scripts/PIC_generate_htcondor_submit.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
def submit_generator(args: argparse.Namespace) -> None:
    """
    Generate HTCondor submit files for running simulations in batches.

    This function takes command-line arguments and generates the necessary files
    for submitting simulations to an HTCondor cluster. It creates a directory
    structure with one folder per week, where each week's folder contains
    argument files for individual jobs and a submit file.

    Args:
        args (argparse.Namespace): An argparse.Namespace object containing the following attributes:

            - output_dir_htcondor (str): Path to the directory where the
                generated files will be saved.
            - output_dir_simulation (str): Path to the directory containing the
                 simulation parameter files.
            - n_sim_job (int): Number of simulations to run per job.
            - n_sim_week (int): Number of simulations to run per week.
            - dyn_data (str): Path to the dynamically evolved population database
                 file (if using the `simulate_population_magrot_det` simulator).
            - type_simulation (str): Type of simulation to run, either 'dyn' or 'magrot'.
    """

    common_path = cfg["path_to_output"]

    simulation_output_path = pathlib.Path(args.output_dir_simulation)

    output_htcondor_path = pathlib.Path(args.output_dir_htcondor)

    output_htcondor_path.mkdir(parents=True, exist_ok=True)

    # Reading the txt generated by the ´parameter_sweeper.py´ that contains the simulation arguments.
    simulation_arguments = np.genfromtxt(
        pathlib.Path().joinpath(
            simulation_output_path, "simulation_arguments.txt"
        ),
        dtype=str,
    )

    n_sim_total = len(simulation_arguments)

    # Determine the number of weeks required to run all the simulations, assuming that
    # args.n_sim_week simulations can be run each week.
    if n_sim_total % args.n_sim_week == 0:
        n_week = int(n_sim_total / args.n_sim_week)
    else:
        n_week = int(n_sim_total / args.n_sim_week) + 1

    for i in range(n_week):
        # Creating one folder per week.
        week_folder_path = pathlib.Path().joinpath(
            output_htcondor_path, "week-" + str(i)
        )
        week_folder_path.mkdir(parents=True, exist_ok=True)

        # Taking the chunk of simulations that we want to run this week.
        sim_week = simulation_arguments[
            i * args.n_sim_week : (i + 1) * args.n_sim_week
        ]

        # We need to count how many simulations we have to run each week, since for
        # ´n_sim_total % args.n_sim_week != 0´, we will end up with fewer simulations
        # than args.n_sim_week in the last week.
        n_sim_folder = len(sim_week)

        # Calculate the number of ´argument_job<j>.txt´ files required to have
        # ´args.n_sim_job´ simulations per job, i.e., ´args.n_sim_job´ lines in each ´argument_job<j>.txt´ .

        if n_sim_folder % args.n_sim_job == 0:
            n_args = int(n_sim_folder / args.n_sim_job)
        else:
            n_args = int(n_sim_folder / args.n_sim_job) + 1

        # Create the ´wrapper.sh´ file.
        path_wrapper = pathlib.Path().joinpath(week_folder_path, "wrapper.sh")

        generate_wrapper(args.type_simulation, args.dyn_data, path_wrapper)

        # Create the ´.submit ´ file.
        path_arguments = pathlib.Path().joinpath(
            week_folder_path, "arguments_week.txt"
        )
        generate_job_submit(week_folder_path, path_arguments)

        list_arguments_week = []

        # Looping through the `n_args` to create each `"/arguments_job" + str(j + 1) + ".txt"` file.
        for j in range(n_args):
            chunk_sim_job = sim_week[
                j * args.n_sim_job : (j + 1) * args.n_sim_job
            ]

            path_arguments_job = pathlib.Path().joinpath(
                common_path,
                str(week_folder_path) + "/arguments_job" + str(j + 1) + ".txt",
            )

            np.savetxt(path_arguments_job, chunk_sim_job, fmt="%s")

            # List of all the paths of each `"/arguments_job" + str(j + 1) + ".txt"` file.
            list_arguments_week.append(path_arguments_job)

        np.savetxt(path_arguments, list_arguments_week, fmt="%s")

utilities.PIC_scripts.PIC_manage_failed_simulation

Managing of failed simulations at PIC.

After the failed simulations have been launched again and finished successfully, we use this script to transfer the new output back to the original folders.

Display help message to run the code:

python PIC_manage_failed_simulation.py --help

Displays all the relevant arguments that can be used.

Authors:

Celsa Pardo Araujo (pardo@csic.es)

manage_failed_simulations(args)

Copying the successfully relaunched output back to the original folders.

Parameters:

Name Type Description Default
args Namespace

An argparse.Namespace object containing the following attributes:

  • simulation_dir (pathlib.Path): Output directory where the simulation outputs are.
  • failed_simulation_dir (pathlib.Path): Output directory where the failed simulation outputs are.
required
Source code in utilities/PIC_scripts/PIC_manage_failed_simulation.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def manage_failed_simulations(args: argparse.Namespace) -> None:
    """
    Copying the successfully relaunched output back to the original folders.

    Args:
        args (argparse.Namespace): An argparse.Namespace object containing the following attributes:

            - simulation_dir (pathlib.Path): Output directory where the simulation outputs are.
            - failed_simulation_dir (pathlib.Path): Output directory where the failed simulation outputs are.
    """

    output_simulations_path = args.simulation_dir
    failed_simulation_path = args.failed_simulation_dir

    output_failed_simulations = pathlib.Path().joinpath(
        failed_simulation_path, "output_simulations"
    )

    list_directories_ = os.listdir(output_failed_simulations)

    for folder in list_directories_:
        folder_failed_path = pathlib.Path().joinpath(
            output_failed_simulations, folder
        )
        folder_original_path = pathlib.Path().joinpath(
            output_simulations_path, folder
        )

        # We avoid to move the override.json to prevent permissions issues.
        # Copying these files is not needed as they are the same as the original ones.
        for files in os.listdir(folder_failed_path):
            if files != "override.json":
                files_path = pathlib.Path().joinpath(folder_failed_path, files)
                files_original_path = pathlib.Path().joinpath(
                    folder_original_path, files
                )
                shutil.copy(files_path, files_original_path)