Skip to content

Loaders

mlpoppyns.learning.loaders.loader_base

Base Loader.

Base abstract class for any custom data loader.

Authors:

Alberto Garcia Garcia (garciagarcia@ice.csic.es)

LoaderBase

Bases: DataLoader

Base loader abstract class.

This class serves as a blueprint for creating various dataset loaders. It defines the essential methods that all loaders must implement, ensuring consistency.

Source code in mlpoppyns/learning/loaders/loader_base.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
class LoaderBase(torch.utils.data.DataLoader):
    """
    Base loader abstract class.

    This class serves as a blueprint for creating various dataset loaders.
    It defines the essential methods that all loaders must implement, ensuring consistency.
    """

    def __init__(
        self,
        dataset: torch.utils.data.Dataset,
        batch_size: int,
        num_workers: int,
        shuffle: bool = False,
        collate_fn=torch.utils.data.dataloader.default_collate,
    ) -> None:
        """
        Initialization of base loader.

        Args:
            dataset (torch.utils.data.Dataset): Dataset of images and labels to load.
            batch_size (int): Batch size for the samplers.
            num_workers (int): Number of workers (threads) to read data.
            shuffle (bool): Random shuffle samples or not.
            collate_fn (torch.utils.data.dataloader.default_collate): Function to process the list of samples
                to pack a batch.
        """

        self.n_samples = len(dataset)

        train_idx = np.arange(self.n_samples)

        if shuffle:
            self.sampler = torch.utils.data.sampler.SubsetRandomSampler(
                train_idx
            )
        else:
            self.sampler = torch.utils.data.sampler.SequentialSampler(
                train_idx
            )

        # Initialize base loader with the provided arguments.
        self.init_kwargs = {
            "dataset": dataset,
            "batch_size": batch_size,
            "collate_fn": collate_fn,
            "num_workers": num_workers,
        }

        super().__init__(sampler=self.sampler, **self.init_kwargs)

__init__(dataset, batch_size, num_workers, shuffle=False, collate_fn=torch.utils.data.dataloader.default_collate)

Initialization of base loader.

Parameters:

Name Type Description Default
dataset Dataset

Dataset of images and labels to load.

required
batch_size int

Batch size for the samplers.

required
num_workers int

Number of workers (threads) to read data.

required
shuffle bool

Random shuffle samples or not.

False
collate_fn default_collate

Function to process the list of samples to pack a batch.

default_collate
Source code in mlpoppyns/learning/loaders/loader_base.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def __init__(
    self,
    dataset: torch.utils.data.Dataset,
    batch_size: int,
    num_workers: int,
    shuffle: bool = False,
    collate_fn=torch.utils.data.dataloader.default_collate,
) -> None:
    """
    Initialization of base loader.

    Args:
        dataset (torch.utils.data.Dataset): Dataset of images and labels to load.
        batch_size (int): Batch size for the samplers.
        num_workers (int): Number of workers (threads) to read data.
        shuffle (bool): Random shuffle samples or not.
        collate_fn (torch.utils.data.dataloader.default_collate): Function to process the list of samples
            to pack a batch.
    """

    self.n_samples = len(dataset)

    train_idx = np.arange(self.n_samples)

    if shuffle:
        self.sampler = torch.utils.data.sampler.SubsetRandomSampler(
            train_idx
        )
    else:
        self.sampler = torch.utils.data.sampler.SequentialSampler(
            train_idx
        )

    # Initialize base loader with the provided arguments.
    self.init_kwargs = {
        "dataset": dataset,
        "batch_size": batch_size,
        "collate_fn": collate_fn,
        "num_workers": num_workers,
    }

    super().__init__(sampler=self.sampler, **self.init_kwargs)

mlpoppyns.learning.loaders.loader_multichannel_array

Loader for multichannel 2D arrays datasets.

This loader imports the statistics to perform normalization or standardization on the targets from an already existent statistics.json file.

Authors:

Michele Ronchi (ronchi@ice.csic.es)
Alberto Garcia Garcia (garciagarcia@ice.csic.es)
Celsa Pardo Araujo (pardo@ice.csic.es)

DatasetMultichannelArray

Dataset for a multichannel array input.

This class represents a dataset of populations whose representation for any of the inputs is a numpy array of numerical values stored in NPY format. All those inputs will be treated as individual channels to generate an input tensor for the loader. Labels will be generated as a vector.

Source code in mlpoppyns/learning/loaders/loader_multichannel_array.py
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
class DatasetMultichannelArray:
    """
    Dataset for a multichannel array input.

    This class represents a dataset of populations whose representation for any
    of the inputs is a numpy array of numerical values stored in NPY format. All
    those inputs will be treated as individual channels to generate an input
    tensor for the loader. Labels will be generated as a vector.
    """

    def __import_statistics(self, statistic_path: str) -> None:
        """
        Import dataset statistics for normalization and standardization.

        This routine import the training dataset statistics that might be needed for
        input/targets normalization and standardization like mean, standard
        deviation, minimum and maximum.

        Args:
            statistic_path (str): Path to the statistics.json file containing the statistics
                of the training dataset.
        """

        # Load statistics from JSON file.
        with open(statistic_path) as read_file:
            self.statistics = json.load(read_file)

        label_name = []

        # Loop over every input column of the dataset to collect all outputs.
        for col in self.dataset.columns:
            # All input channel headers are annotated with a prefix "input:" in
            # the dataset CSV file. Find them and skip them to find the targets names.
            if "input:" not in col:
                label_name.append(col)

        # Save the statistics for the filtered labels.
        self.target_mean = np.array(
            [self.statistics[key]["mean"] for key in label_name],
            dtype=np.float32,
        )
        self.target_std = np.array(
            [self.statistics[key]["std"] for key in label_name],
            dtype=np.float32,
        )
        self.target_max = np.array(
            [self.statistics[key]["max"] for key in label_name],
            dtype=np.float32,
        )
        self.target_min = np.array(
            [self.statistics[key]["min"] for key in label_name],
            dtype=np.float32,
        )

    def __fetch_target_names(self) -> None:
        """
        Fetch the names of the targets/labels from the dataset file.
        """

        self.target_names = []
        # Loop over every input column of the dataset to collect all outputs.
        for col in self.dataset.columns:
            # All input channel headers are annotated with a prefix "input:" in
            # the dataset CSV file. Find them and skip them to find the targets.
            if "input:" not in col:
                self.target_names.append(col)

    def __init__(
        self,
        dataset_path: str,
        statistic_path: str,
        filter_channels: list = [],
        filter_labels: list = [],
        normalize: bool = False,
        standardize: bool = False,
        transform: Optional[Callable] = None,
    ) -> None:
        """
        Initialization or constructor routine for the dataset.

        Args:
            dataset_path (str): Path to the dataset.csv file containing all the
                information on the dataset.
            statistic_path (str): Path to the statistics.json file containing the statistics
                of the training dataset.
            filter_channels (list): Indices of the input columns of the dataset that
                will be considered by the loader.
            filter_labels (list): Indices of the target/labels columns in the
                dataset that will be considered by the loader.
            normalize (bool): Whether to normalize inputs and targets or not on
                the fly while loading samples.
            standardize (bool): Whether or not to standardize inputs and targets
                on the fly while loading samples.
            transform (Optional[Callable]): Transformations to apply to the arrays.
        """

        self.normalize = normalize
        self.standardize = standardize
        self.transform = transform

        # Load dataset from CSV file.
        self.dataset = pd.read_csv(dataset_path)

        # Remove the input columns and labels that are to be ignored.
        self.dataset = self.dataset.iloc[:, filter_channels + filter_labels]

        # Import dataset statistics needed for standardization or normalization
        # like mean, standard deviation, minimum, maximum...
        self.__import_statistics(statistic_path)
        self.__fetch_target_names()

    def __len__(self) -> int:
        """
        Length of the dataset (number of samples).

        Returns:
            (int): Length of the dataset
        """
        return len(self.dataset)

    def __getitem__(self, index: int) -> Tuple[np.ndarray, np.ndarray]:
        """
        Read the dataset and extract the arrays and the corresponding labels.

        Args:
            index (int): Index running along the rows of the dataset CSV file.

        Returns:
            (Tuple[np.ndarray, np.ndarray]): Tuple consisting of a multi-channel 2D array
                with shape N x N x channels (where N is the number of entries
                along a row or column of the array in the .npy file) composed by stacking
                all input arrays specified in the dataset for the requested sample
                and the corresponding labels for the requested sample.
        """

        channels = []
        i = 0

        # Loop over the input column of the dataset to get all input channels in
        # a list, so we can stack them later. We assume that all columns must be
        # ordered so "input:" columns go first then all the labels.
        for col in self.dataset.columns:
            # All input channel headers are annotated with a prefix "input:" in
            # the dataset CSV file. Find them and add them to the list.
            if "input:" in col:
                channel_filename = self.dataset.iloc[index, i]
                channel = np.array(np.load(channel_filename), dtype=np.float32)
                channels.append(channel)
            # If an input prefix is not found, it is a label (ground truth) then
            # skip to directly stack them later based on the last index in which
            # we found the input prefix.
            else:
                break

            i += 1

        # Stack all input channels.
        matrix = np.dstack(channels)
        # Fetch all the labels from the last input channel column.
        targets = np.array(self.dataset.iloc[index, i:], dtype=np.float32)

        # On-the-fly normalization of inputs and labels. Inputs are normalized
        # on a per-sample basis whilst targets are normalized using dataset-wide
        # statistics.
        if self.normalize:
            per_channel_min = np.min(matrix, axis=(0, 1), keepdims=True)
            per_channel_max = np.max(matrix, axis=(0, 1), keepdims=True)

            # Identify channels where per_channel_max equals per_channel_min, indicating that all pixels in the matrix have the
            # same value. This implies that no stars were detected in these simulations.
            zero_norm_mask = (per_channel_max == per_channel_min).squeeze()
            if np.count_nonzero(zero_norm_mask) != 0:
                # Set the entire matrix to 0 for channels where per_channel_max == per_channel_min to avoid dividing by zero.
                matrix[:, :, zero_norm_mask] = 0

            else:
                matrix = (matrix - per_channel_min) / (
                    per_channel_max - per_channel_min
                )

            targets = (targets - self.target_min) / (
                self.target_max - self.target_min
            )

        # On-the-fly standardization of inputs/labels. Inputs are standardized
        # on a per-sample basis whilst targets are normalized using dataset-wide
        # statistics.
        elif self.standardize:
            per_channel_std = np.std(matrix, axis=(0, 1), keepdims=True)
            per_channel_mean = np.mean(matrix, axis=(0, 1), keepdims=True)
            # Check if per_channel_std equal to 0, indicating that all pixels in the matrix have the
            # same value. This implies that no stars were detected in these simulations.
            zero_std_mask = (per_channel_std == 0).squeeze()
            if np.count_nonzero(zero_std_mask) != 0:
                # Set the entire matrix to -1 for channels where per_channel_std = 0, to avoid dividing by zero.
                matrix[:, :, zero_std_mask] = -1
            else:
                matrix = (matrix - per_channel_mean) / per_channel_std

            targets = (targets - self.target_mean) / self.target_std

        # Apply all requested transformations to input.
        if self.transform is not None:
            matrix = self.transform(matrix)

        return matrix, targets

__fetch_target_names()

Fetch the names of the targets/labels from the dataset file.

Source code in mlpoppyns/learning/loaders/loader_multichannel_array.py
78
79
80
81
82
83
84
85
86
87
88
89
def __fetch_target_names(self) -> None:
    """
    Fetch the names of the targets/labels from the dataset file.
    """

    self.target_names = []
    # Loop over every input column of the dataset to collect all outputs.
    for col in self.dataset.columns:
        # All input channel headers are annotated with a prefix "input:" in
        # the dataset CSV file. Find them and skip them to find the targets.
        if "input:" not in col:
            self.target_names.append(col)

__getitem__(index)

Read the dataset and extract the arrays and the corresponding labels.

Parameters:

Name Type Description Default
index int

Index running along the rows of the dataset CSV file.

required

Returns:

Type Description
Tuple[ndarray, ndarray]

Tuple consisting of a multi-channel 2D array with shape N x N x channels (where N is the number of entries along a row or column of the array in the .npy file) composed by stacking all input arrays specified in the dataset for the requested sample and the corresponding labels for the requested sample.

Source code in mlpoppyns/learning/loaders/loader_multichannel_array.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
def __getitem__(self, index: int) -> Tuple[np.ndarray, np.ndarray]:
    """
    Read the dataset and extract the arrays and the corresponding labels.

    Args:
        index (int): Index running along the rows of the dataset CSV file.

    Returns:
        (Tuple[np.ndarray, np.ndarray]): Tuple consisting of a multi-channel 2D array
            with shape N x N x channels (where N is the number of entries
            along a row or column of the array in the .npy file) composed by stacking
            all input arrays specified in the dataset for the requested sample
            and the corresponding labels for the requested sample.
    """

    channels = []
    i = 0

    # Loop over the input column of the dataset to get all input channels in
    # a list, so we can stack them later. We assume that all columns must be
    # ordered so "input:" columns go first then all the labels.
    for col in self.dataset.columns:
        # All input channel headers are annotated with a prefix "input:" in
        # the dataset CSV file. Find them and add them to the list.
        if "input:" in col:
            channel_filename = self.dataset.iloc[index, i]
            channel = np.array(np.load(channel_filename), dtype=np.float32)
            channels.append(channel)
        # If an input prefix is not found, it is a label (ground truth) then
        # skip to directly stack them later based on the last index in which
        # we found the input prefix.
        else:
            break

        i += 1

    # Stack all input channels.
    matrix = np.dstack(channels)
    # Fetch all the labels from the last input channel column.
    targets = np.array(self.dataset.iloc[index, i:], dtype=np.float32)

    # On-the-fly normalization of inputs and labels. Inputs are normalized
    # on a per-sample basis whilst targets are normalized using dataset-wide
    # statistics.
    if self.normalize:
        per_channel_min = np.min(matrix, axis=(0, 1), keepdims=True)
        per_channel_max = np.max(matrix, axis=(0, 1), keepdims=True)

        # Identify channels where per_channel_max equals per_channel_min, indicating that all pixels in the matrix have the
        # same value. This implies that no stars were detected in these simulations.
        zero_norm_mask = (per_channel_max == per_channel_min).squeeze()
        if np.count_nonzero(zero_norm_mask) != 0:
            # Set the entire matrix to 0 for channels where per_channel_max == per_channel_min to avoid dividing by zero.
            matrix[:, :, zero_norm_mask] = 0

        else:
            matrix = (matrix - per_channel_min) / (
                per_channel_max - per_channel_min
            )

        targets = (targets - self.target_min) / (
            self.target_max - self.target_min
        )

    # On-the-fly standardization of inputs/labels. Inputs are standardized
    # on a per-sample basis whilst targets are normalized using dataset-wide
    # statistics.
    elif self.standardize:
        per_channel_std = np.std(matrix, axis=(0, 1), keepdims=True)
        per_channel_mean = np.mean(matrix, axis=(0, 1), keepdims=True)
        # Check if per_channel_std equal to 0, indicating that all pixels in the matrix have the
        # same value. This implies that no stars were detected in these simulations.
        zero_std_mask = (per_channel_std == 0).squeeze()
        if np.count_nonzero(zero_std_mask) != 0:
            # Set the entire matrix to -1 for channels where per_channel_std = 0, to avoid dividing by zero.
            matrix[:, :, zero_std_mask] = -1
        else:
            matrix = (matrix - per_channel_mean) / per_channel_std

        targets = (targets - self.target_mean) / self.target_std

    # Apply all requested transformations to input.
    if self.transform is not None:
        matrix = self.transform(matrix)

    return matrix, targets

__import_statistics(statistic_path)

Import dataset statistics for normalization and standardization.

This routine import the training dataset statistics that might be needed for input/targets normalization and standardization like mean, standard deviation, minimum and maximum.

Parameters:

Name Type Description Default
statistic_path str

Path to the statistics.json file containing the statistics of the training dataset.

required
Source code in mlpoppyns/learning/loaders/loader_multichannel_array.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def __import_statistics(self, statistic_path: str) -> None:
    """
    Import dataset statistics for normalization and standardization.

    This routine import the training dataset statistics that might be needed for
    input/targets normalization and standardization like mean, standard
    deviation, minimum and maximum.

    Args:
        statistic_path (str): Path to the statistics.json file containing the statistics
            of the training dataset.
    """

    # Load statistics from JSON file.
    with open(statistic_path) as read_file:
        self.statistics = json.load(read_file)

    label_name = []

    # Loop over every input column of the dataset to collect all outputs.
    for col in self.dataset.columns:
        # All input channel headers are annotated with a prefix "input:" in
        # the dataset CSV file. Find them and skip them to find the targets names.
        if "input:" not in col:
            label_name.append(col)

    # Save the statistics for the filtered labels.
    self.target_mean = np.array(
        [self.statistics[key]["mean"] for key in label_name],
        dtype=np.float32,
    )
    self.target_std = np.array(
        [self.statistics[key]["std"] for key in label_name],
        dtype=np.float32,
    )
    self.target_max = np.array(
        [self.statistics[key]["max"] for key in label_name],
        dtype=np.float32,
    )
    self.target_min = np.array(
        [self.statistics[key]["min"] for key in label_name],
        dtype=np.float32,
    )

__init__(dataset_path, statistic_path, filter_channels=[], filter_labels=[], normalize=False, standardize=False, transform=None)

Initialization or constructor routine for the dataset.

Parameters:

Name Type Description Default
dataset_path str

Path to the dataset.csv file containing all the information on the dataset.

required
statistic_path str

Path to the statistics.json file containing the statistics of the training dataset.

required
filter_channels list

Indices of the input columns of the dataset that will be considered by the loader.

[]
filter_labels list

Indices of the target/labels columns in the dataset that will be considered by the loader.

[]
normalize bool

Whether to normalize inputs and targets or not on the fly while loading samples.

False
standardize bool

Whether or not to standardize inputs and targets on the fly while loading samples.

False
transform Optional[Callable]

Transformations to apply to the arrays.

None
Source code in mlpoppyns/learning/loaders/loader_multichannel_array.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
def __init__(
    self,
    dataset_path: str,
    statistic_path: str,
    filter_channels: list = [],
    filter_labels: list = [],
    normalize: bool = False,
    standardize: bool = False,
    transform: Optional[Callable] = None,
) -> None:
    """
    Initialization or constructor routine for the dataset.

    Args:
        dataset_path (str): Path to the dataset.csv file containing all the
            information on the dataset.
        statistic_path (str): Path to the statistics.json file containing the statistics
            of the training dataset.
        filter_channels (list): Indices of the input columns of the dataset that
            will be considered by the loader.
        filter_labels (list): Indices of the target/labels columns in the
            dataset that will be considered by the loader.
        normalize (bool): Whether to normalize inputs and targets or not on
            the fly while loading samples.
        standardize (bool): Whether or not to standardize inputs and targets
            on the fly while loading samples.
        transform (Optional[Callable]): Transformations to apply to the arrays.
    """

    self.normalize = normalize
    self.standardize = standardize
    self.transform = transform

    # Load dataset from CSV file.
    self.dataset = pd.read_csv(dataset_path)

    # Remove the input columns and labels that are to be ignored.
    self.dataset = self.dataset.iloc[:, filter_channels + filter_labels]

    # Import dataset statistics needed for standardization or normalization
    # like mean, standard deviation, minimum, maximum...
    self.__import_statistics(statistic_path)
    self.__fetch_target_names()

__len__()

Length of the dataset (number of samples).

Returns:

Type Description
int

Length of the dataset

Source code in mlpoppyns/learning/loaders/loader_multichannel_array.py
135
136
137
138
139
140
141
142
def __len__(self) -> int:
    """
    Length of the dataset (number of samples).

    Returns:
        (int): Length of the dataset
    """
    return len(self.dataset)

LoaderMultichannelArray

Bases: LoaderBase

Source code in mlpoppyns/learning/loaders/loader_multichannel_array.py
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
class LoaderMultichannelArray(LoaderBase):
    def __init__(
        self,
        dataset_path: str,
        statistic_path: str,
        batch_size: int,
        filter_inputs: list,
        filter_labels: list,
        num_workers: int = 1,
        shuffle: bool = False,
        normalize: bool = False,
        standardize: bool = False,
    ) -> None:
        """
        Data loader for a multi-channel array-based dataset. The dataset is
        expected to be packed in a dataset.csv file and contain paths to .npy
        files to be loaded.

        Args:
            dataset_path (string): Path to the dataset.
            statistic_path (string): Path to the dataset.
            batch_size (int): Number of samples per batch.
            filter_inputs (list): Indices of columns in the dataset to consider.
            filter_labels (list): Indices of columns with labels to consider.
            num_workers (int): Workers to load the data.
            shuffle (bool): Shuffle the samples or not.
            normalize (bool): Whether to normalize inputs and targets or not.
            standardize (bool): Whether or not to standardize inputs and targets.
        """

        transformation = torchvision.transforms.ToTensor()

        self.dataset_path = dataset_path
        self.statistic_path = statistic_path
        self.filter_inputs = filter_inputs
        self.filter_labels = filter_labels
        self.normalize = normalize
        self.standardize = standardize

        self.dataset = DatasetMultichannelArray(
            self.dataset_path,
            self.statistic_path,
            self.filter_inputs,
            self.filter_labels,
            self.normalize,
            self.standardize,
            transform=transformation,
        )

        self.target_mean = self.dataset.target_mean
        self.target_std = self.dataset.target_std
        self.target_max = self.dataset.target_max
        self.target_min = self.dataset.target_min
        self.target_names = self.dataset.target_names

        super().__init__(self.dataset, batch_size, num_workers, shuffle)

__init__(dataset_path, statistic_path, batch_size, filter_inputs, filter_labels, num_workers=1, shuffle=False, normalize=False, standardize=False)

Data loader for a multi-channel array-based dataset. The dataset is expected to be packed in a dataset.csv file and contain paths to .npy files to be loaded.

Parameters:

Name Type Description Default
dataset_path string

Path to the dataset.

required
statistic_path string

Path to the dataset.

required
batch_size int

Number of samples per batch.

required
filter_inputs list

Indices of columns in the dataset to consider.

required
filter_labels list

Indices of columns with labels to consider.

required
num_workers int

Workers to load the data.

1
shuffle bool

Shuffle the samples or not.

False
normalize bool

Whether to normalize inputs and targets or not.

False
standardize bool

Whether or not to standardize inputs and targets.

False
Source code in mlpoppyns/learning/loaders/loader_multichannel_array.py
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
def __init__(
    self,
    dataset_path: str,
    statistic_path: str,
    batch_size: int,
    filter_inputs: list,
    filter_labels: list,
    num_workers: int = 1,
    shuffle: bool = False,
    normalize: bool = False,
    standardize: bool = False,
) -> None:
    """
    Data loader for a multi-channel array-based dataset. The dataset is
    expected to be packed in a dataset.csv file and contain paths to .npy
    files to be loaded.

    Args:
        dataset_path (string): Path to the dataset.
        statistic_path (string): Path to the dataset.
        batch_size (int): Number of samples per batch.
        filter_inputs (list): Indices of columns in the dataset to consider.
        filter_labels (list): Indices of columns with labels to consider.
        num_workers (int): Workers to load the data.
        shuffle (bool): Shuffle the samples or not.
        normalize (bool): Whether to normalize inputs and targets or not.
        standardize (bool): Whether or not to standardize inputs and targets.
    """

    transformation = torchvision.transforms.ToTensor()

    self.dataset_path = dataset_path
    self.statistic_path = statistic_path
    self.filter_inputs = filter_inputs
    self.filter_labels = filter_labels
    self.normalize = normalize
    self.standardize = standardize

    self.dataset = DatasetMultichannelArray(
        self.dataset_path,
        self.statistic_path,
        self.filter_inputs,
        self.filter_labels,
        self.normalize,
        self.standardize,
        transform=transformation,
    )

    self.target_mean = self.dataset.target_mean
    self.target_std = self.dataset.target_std
    self.target_max = self.dataset.target_max
    self.target_min = self.dataset.target_min
    self.target_names = self.dataset.target_names

    super().__init__(self.dataset, batch_size, num_workers, shuffle)

mlpoppyns.learning.loaders.loader_multichannel_image

Loader for multichannel 2D map.

This loader creates a multichannel 2D image for each sample in the dataset by sticking together different 2D density maps.

These images can be loaded as an input of the neural network together with the related values of the labels (ground truth).

Authors:

Michele Ronchi (ronchi@ice.csic.es)
Alberto Garcia Garcia (garciagarcia@ice.csic.es)

DatasetMultichannelImage

Dataset for a multichannel image input.

Source code in mlpoppyns/learning/loaders/loader_multichannel_image.py
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
class DatasetMultichannelImage:
    """
    Dataset for a multichannel image input.
    """

    def __import_statistics(self, statistic_path: str) -> None:
        """
        Import dataset statistics for normalization and standardization.

        This routine import the training dataset statistics that might be needed for
        targets normalization and standardization like mean, standard
        deviation, minimum and maximum.

        Args:
            statistic_path (str): Path to the statistics.json file containing the statistics
                of the training dataset.
        """

        # Load statistics from JSON file.
        with open(statistic_path) as read_file:
            self.statistics = json.load(read_file)

        label_name = []

        # Loop over every input column of the dataset to collect all outputs.
        for col in self.dataset.columns:
            # All input channel headers are annotated with a prefix "input:" in
            # the dataset CSV file. Find them and skip them to find the targets names.
            if "input:" not in col:
                label_name.append(col)

        # Save the statistics for the filtered labels.
        self.target_mean = np.array(
            [self.statistics[key]["mean"] for key in label_name],
            dtype=np.float32,
        )
        self.target_std = np.array(
            [self.statistics[key]["std"] for key in label_name],
            dtype=np.float32,
        )
        self.target_max = np.array(
            [self.statistics[key]["max"] for key in label_name],
            dtype=np.float32,
        )
        self.target_min = np.array(
            [self.statistics[key]["min"] for key in label_name],
            dtype=np.float32,
        )

    def __fetch_target_names(self) -> None:
        """
        Fetch the names of the targets/labels from the dataset file.
        """

        self.target_names = []
        # Loop over every input column of the dataset to collect all outputs.
        for col in self.dataset.columns:
            # All input channel headers are annotated with a prefix "input:" in
            # the dataset CSV file. Find them and skip them to find the targets.
            if "input:" not in col:
                self.target_names.append(col)

    def __init__(
        self,
        dataset_path: str,
        statistic_path: str,
        filter_channels: list = [],
        filter_labels: list = [],
        normalize: bool = False,
        standardize: bool = False,
        transform: Optional[Callable] = None,
    ) -> None:
        """
        Initialization or constructor function for the dataset.

        Args:
            dataset_path (str): Path to the dataset.csv file containing all the
                information on the dataset.
            statistic_path (str): Path to the statistics.json file containing the statistics
                of the training dataset.
            filter_channels (list): Indices of the input columns of the dataset that
                will be considered by the loader.
            filter_labels (list): Indices of the target/labels columns in the
                dataset that will be considered by the loader.
            normalize (bool): Whether to normalize targets or not on
                the fly while loading samples.
            standardize (bool): Whether or not to standardize targets
                on the fly while loading samples.
            transform (Optional[Callable]): Transformations to apply to the images.
        """

        self.normalize = normalize
        self.standardize = standardize
        self.transform = transform

        # Load dataset from CSV file.
        self.dataset = pd.read_csv(dataset_path)

        # Remove the input columns and labels that are to be ignored.
        self.dataset = self.dataset.iloc[:, filter_channels + filter_labels]

        # Import dataset statistics needed for standardization or normalization
        # like mean, standard deviation, minimum, maximum...
        self.__import_statistics(statistic_path)
        self.__fetch_target_names()

    def __len__(self) -> int:
        """
        Length of the dataset (number of samples).

        Returns:
            (int): Length of the dataset
        """
        return len(self.dataset)

    def __getitem__(self, index: int) -> Tuple[np.ndarray, np.ndarray]:
        """
        Read the dataset and extract the images and the corresponding labels.

        Args:
            index (int): Index running along the rows of the dataset.csv file.

        Returns:
            (Tuple[np.ndarray, np.ndarray]): Tuple consisting of a multi-channel 2D image
                with shape N x N x channels (where N is the number of entries
                along a row or column of the array in the .npy file) composed by stacking
                all input images specified in the dataset for the requested sample
                and the corresponding labels for the requested sample.
        """

        channels = []
        i = 0

        # Loop over every input column of the dataset to collect all input channels
        # in a list so we can stack them later. We assume that all columns must be
        # ordered so "input:" columns go first then all the labels.
        for col in self.dataset.columns:
            # All input channel headers are annotated with a prefix "input:" in the
            # dataset CSV file. Find them and add them to the list.
            if "input:" in col:
                channel_filename = self.dataset.iloc[index, i]
                channel = np.array(Image.open(channel_filename))[:, :, 0]
                channels.append(channel)
            # If an input prefix is not found, it is a label (ground truth) then
            # skip to directly stack them later based on the last index in which
            # we found the input prefix.
            else:
                break

            i += 1

        # Stack all input channels.
        image = np.dstack(channels)
        # Fetch all the labels from the last input channel column.
        labels = np.array(self.dataset.iloc[index, i:], dtype=np.float32)

        # On-the-fly normalization of inputs and labels. Inputs are normalized
        # on a per-sample basis whilst targets are normalized using dataset-wide
        # statistics.
        if self.normalize:
            labels = (labels - self.target_min) / (
                self.target_max - self.target_min
            )

        # On-the-fly standardization of inputs/labels. Inputs are standardized
        # on a per-sample basis whilst targets are normalized using dataset-wide
        # statistics.
        elif self.standardize:
            labels = (labels - self.target_mean) / self.target_std

        if self.transform is not None:
            image = self.transform(image)

        return image, labels

__fetch_target_names()

Fetch the names of the targets/labels from the dataset file.

Source code in mlpoppyns/learning/loaders/loader_multichannel_image.py
76
77
78
79
80
81
82
83
84
85
86
87
def __fetch_target_names(self) -> None:
    """
    Fetch the names of the targets/labels from the dataset file.
    """

    self.target_names = []
    # Loop over every input column of the dataset to collect all outputs.
    for col in self.dataset.columns:
        # All input channel headers are annotated with a prefix "input:" in
        # the dataset CSV file. Find them and skip them to find the targets.
        if "input:" not in col:
            self.target_names.append(col)

__getitem__(index)

Read the dataset and extract the images and the corresponding labels.

Parameters:

Name Type Description Default
index int

Index running along the rows of the dataset.csv file.

required

Returns:

Type Description
Tuple[ndarray, ndarray]

Tuple consisting of a multi-channel 2D image with shape N x N x channels (where N is the number of entries along a row or column of the array in the .npy file) composed by stacking all input images specified in the dataset for the requested sample and the corresponding labels for the requested sample.

Source code in mlpoppyns/learning/loaders/loader_multichannel_image.py
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
def __getitem__(self, index: int) -> Tuple[np.ndarray, np.ndarray]:
    """
    Read the dataset and extract the images and the corresponding labels.

    Args:
        index (int): Index running along the rows of the dataset.csv file.

    Returns:
        (Tuple[np.ndarray, np.ndarray]): Tuple consisting of a multi-channel 2D image
            with shape N x N x channels (where N is the number of entries
            along a row or column of the array in the .npy file) composed by stacking
            all input images specified in the dataset for the requested sample
            and the corresponding labels for the requested sample.
    """

    channels = []
    i = 0

    # Loop over every input column of the dataset to collect all input channels
    # in a list so we can stack them later. We assume that all columns must be
    # ordered so "input:" columns go first then all the labels.
    for col in self.dataset.columns:
        # All input channel headers are annotated with a prefix "input:" in the
        # dataset CSV file. Find them and add them to the list.
        if "input:" in col:
            channel_filename = self.dataset.iloc[index, i]
            channel = np.array(Image.open(channel_filename))[:, :, 0]
            channels.append(channel)
        # If an input prefix is not found, it is a label (ground truth) then
        # skip to directly stack them later based on the last index in which
        # we found the input prefix.
        else:
            break

        i += 1

    # Stack all input channels.
    image = np.dstack(channels)
    # Fetch all the labels from the last input channel column.
    labels = np.array(self.dataset.iloc[index, i:], dtype=np.float32)

    # On-the-fly normalization of inputs and labels. Inputs are normalized
    # on a per-sample basis whilst targets are normalized using dataset-wide
    # statistics.
    if self.normalize:
        labels = (labels - self.target_min) / (
            self.target_max - self.target_min
        )

    # On-the-fly standardization of inputs/labels. Inputs are standardized
    # on a per-sample basis whilst targets are normalized using dataset-wide
    # statistics.
    elif self.standardize:
        labels = (labels - self.target_mean) / self.target_std

    if self.transform is not None:
        image = self.transform(image)

    return image, labels

__import_statistics(statistic_path)

Import dataset statistics for normalization and standardization.

This routine import the training dataset statistics that might be needed for targets normalization and standardization like mean, standard deviation, minimum and maximum.

Parameters:

Name Type Description Default
statistic_path str

Path to the statistics.json file containing the statistics of the training dataset.

required
Source code in mlpoppyns/learning/loaders/loader_multichannel_image.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def __import_statistics(self, statistic_path: str) -> None:
    """
    Import dataset statistics for normalization and standardization.

    This routine import the training dataset statistics that might be needed for
    targets normalization and standardization like mean, standard
    deviation, minimum and maximum.

    Args:
        statistic_path (str): Path to the statistics.json file containing the statistics
            of the training dataset.
    """

    # Load statistics from JSON file.
    with open(statistic_path) as read_file:
        self.statistics = json.load(read_file)

    label_name = []

    # Loop over every input column of the dataset to collect all outputs.
    for col in self.dataset.columns:
        # All input channel headers are annotated with a prefix "input:" in
        # the dataset CSV file. Find them and skip them to find the targets names.
        if "input:" not in col:
            label_name.append(col)

    # Save the statistics for the filtered labels.
    self.target_mean = np.array(
        [self.statistics[key]["mean"] for key in label_name],
        dtype=np.float32,
    )
    self.target_std = np.array(
        [self.statistics[key]["std"] for key in label_name],
        dtype=np.float32,
    )
    self.target_max = np.array(
        [self.statistics[key]["max"] for key in label_name],
        dtype=np.float32,
    )
    self.target_min = np.array(
        [self.statistics[key]["min"] for key in label_name],
        dtype=np.float32,
    )

__init__(dataset_path, statistic_path, filter_channels=[], filter_labels=[], normalize=False, standardize=False, transform=None)

Initialization or constructor function for the dataset.

Parameters:

Name Type Description Default
dataset_path str

Path to the dataset.csv file containing all the information on the dataset.

required
statistic_path str

Path to the statistics.json file containing the statistics of the training dataset.

required
filter_channels list

Indices of the input columns of the dataset that will be considered by the loader.

[]
filter_labels list

Indices of the target/labels columns in the dataset that will be considered by the loader.

[]
normalize bool

Whether to normalize targets or not on the fly while loading samples.

False
standardize bool

Whether or not to standardize targets on the fly while loading samples.

False
transform Optional[Callable]

Transformations to apply to the images.

None
Source code in mlpoppyns/learning/loaders/loader_multichannel_image.py
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def __init__(
    self,
    dataset_path: str,
    statistic_path: str,
    filter_channels: list = [],
    filter_labels: list = [],
    normalize: bool = False,
    standardize: bool = False,
    transform: Optional[Callable] = None,
) -> None:
    """
    Initialization or constructor function for the dataset.

    Args:
        dataset_path (str): Path to the dataset.csv file containing all the
            information on the dataset.
        statistic_path (str): Path to the statistics.json file containing the statistics
            of the training dataset.
        filter_channels (list): Indices of the input columns of the dataset that
            will be considered by the loader.
        filter_labels (list): Indices of the target/labels columns in the
            dataset that will be considered by the loader.
        normalize (bool): Whether to normalize targets or not on
            the fly while loading samples.
        standardize (bool): Whether or not to standardize targets
            on the fly while loading samples.
        transform (Optional[Callable]): Transformations to apply to the images.
    """

    self.normalize = normalize
    self.standardize = standardize
    self.transform = transform

    # Load dataset from CSV file.
    self.dataset = pd.read_csv(dataset_path)

    # Remove the input columns and labels that are to be ignored.
    self.dataset = self.dataset.iloc[:, filter_channels + filter_labels]

    # Import dataset statistics needed for standardization or normalization
    # like mean, standard deviation, minimum, maximum...
    self.__import_statistics(statistic_path)
    self.__fetch_target_names()

__len__()

Length of the dataset (number of samples).

Returns:

Type Description
int

Length of the dataset

Source code in mlpoppyns/learning/loaders/loader_multichannel_image.py
133
134
135
136
137
138
139
140
def __len__(self) -> int:
    """
    Length of the dataset (number of samples).

    Returns:
        (int): Length of the dataset
    """
    return len(self.dataset)

LoaderMultichannelImage

Bases: LoaderBase

Source code in mlpoppyns/learning/loaders/loader_multichannel_image.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
class LoaderMultichannelImage(LoaderBase):
    def __init__(
        self,
        dataset_path: str,
        statistic_path: str,
        batch_size: int,
        filter_inputs: list,
        filter_labels: list,
        num_workers: int = 1,
        shuffle: bool = False,
        normalize: bool = False,
        standardize: bool = False,
    ) -> None:
        """
        Data loader for the density maps dataset. The dataset is expected to be
        packed in dataset.csv file.

        Args:
            dataset_path (string): Path to the dataset.
            statistic_path (string): Path to the dataset.
            batch_size (int): Number of samples per batch.
            filter_inputs (list): Indices of the input columns of the dataset that
                will be considered by the loader.
            filter_labels (list): Indices of the target/labels columns in the
                dataset that will be considered by the loader.
            num_workers (int): Workers to load the data.
            shuffle (bool): Shuffle the samples or not.
            normalize (bool): Whether to normalize targets or not.
            standardize (bool): Whether or not to standardize targets.
        """

        transformation = torchvision.transforms.ToTensor()

        self.dataset_path = dataset_path
        self.statistic_path = statistic_path
        self.filter_inputs = filter_inputs
        self.filter_labels = filter_labels
        self.normalize = normalize
        self.standardize = standardize

        self.dataset = DatasetMultichannelImage(
            self.dataset_path,
            self.statistic_path,
            self.filter_inputs,
            self.filter_labels,
            self.normalize,
            self.standardize,
            transform=transformation,
        )

        self.target_mean = self.dataset.target_mean
        self.target_std = self.dataset.target_std
        self.target_max = self.dataset.target_max
        self.target_min = self.dataset.target_min
        self.target_names = self.dataset.target_names

        super().__init__(self.dataset, batch_size, num_workers, shuffle)

__init__(dataset_path, statistic_path, batch_size, filter_inputs, filter_labels, num_workers=1, shuffle=False, normalize=False, standardize=False)

Data loader for the density maps dataset. The dataset is expected to be packed in dataset.csv file.

Parameters:

Name Type Description Default
dataset_path string

Path to the dataset.

required
statistic_path string

Path to the dataset.

required
batch_size int

Number of samples per batch.

required
filter_inputs list

Indices of the input columns of the dataset that will be considered by the loader.

required
filter_labels list

Indices of the target/labels columns in the dataset that will be considered by the loader.

required
num_workers int

Workers to load the data.

1
shuffle bool

Shuffle the samples or not.

False
normalize bool

Whether to normalize targets or not.

False
standardize bool

Whether or not to standardize targets.

False
Source code in mlpoppyns/learning/loaders/loader_multichannel_image.py
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
def __init__(
    self,
    dataset_path: str,
    statistic_path: str,
    batch_size: int,
    filter_inputs: list,
    filter_labels: list,
    num_workers: int = 1,
    shuffle: bool = False,
    normalize: bool = False,
    standardize: bool = False,
) -> None:
    """
    Data loader for the density maps dataset. The dataset is expected to be
    packed in dataset.csv file.

    Args:
        dataset_path (string): Path to the dataset.
        statistic_path (string): Path to the dataset.
        batch_size (int): Number of samples per batch.
        filter_inputs (list): Indices of the input columns of the dataset that
            will be considered by the loader.
        filter_labels (list): Indices of the target/labels columns in the
            dataset that will be considered by the loader.
        num_workers (int): Workers to load the data.
        shuffle (bool): Shuffle the samples or not.
        normalize (bool): Whether to normalize targets or not.
        standardize (bool): Whether or not to standardize targets.
    """

    transformation = torchvision.transforms.ToTensor()

    self.dataset_path = dataset_path
    self.statistic_path = statistic_path
    self.filter_inputs = filter_inputs
    self.filter_labels = filter_labels
    self.normalize = normalize
    self.standardize = standardize

    self.dataset = DatasetMultichannelImage(
        self.dataset_path,
        self.statistic_path,
        self.filter_inputs,
        self.filter_labels,
        self.normalize,
        self.standardize,
        transform=transformation,
    )

    self.target_mean = self.dataset.target_mean
    self.target_std = self.dataset.target_std
    self.target_max = self.dataset.target_max
    self.target_min = self.dataset.target_min
    self.target_names = self.dataset.target_names

    super().__init__(self.dataset, batch_size, num_workers, shuffle)

mlpoppyns.learning.loaders.loader_rgb_image

Loader for RGB density map images.

Authors:

Michele Ronchi (ronchi@ice.csic.es)
Alberto Garcia Garcia (garciagarcia@ice.csic.es)

DatasetRGBImage

Upload the images dataset and their labels.

Source code in mlpoppyns/learning/loaders/loader_rgb_image.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class DatasetRGBImage:
    """
    Upload the images dataset and their labels.
    """

    def __init__(
        self, file_path: str, transform: Optional[Callable] = None
    ) -> None:
        """
        Load the images and labels dataset.

        Args:
            file_path (str): Path to the dataset.csv file containing all the
                information on the dataset.
            transform (Optional[Callable]): Transformation to apply to the images.
        """
        self.dataset = pd.read_csv(file_path)
        self.transform = transform

    def __len__(self) -> int:
        """
        Length of the dataset (number of samples).

        Returns:
            (int): Length of the dataset
        """
        return len(self.dataset)

    def __getitem__(self, index: int) -> Tuple[np.ndarray, np.ndarray]:
        """
        Read the dataset and extract the images and the corresponding labels.

        Args:
            index (int): Index running along the raws of the dataset.csv file.

        Returns:
            (Tuple[np.ndarray, np.ndarray]): Tuple composed by a multidimensional matrices for the images of
                shape N x N x 3 (where N is the number of pixels along a raw or column of
                the .png file) and an array of labels of each image.
        """
        image_name = self.dataset.iloc[index, 0]

        image = np.array(Image.open(image_name))[:, :, 0:3]

        labels = np.array(self.dataset.iloc[index, 1:], dtype=np.float32)

        if self.transform is not None:
            image = self.transform(image)

        return image, labels

__getitem__(index)

Read the dataset and extract the images and the corresponding labels.

Parameters:

Name Type Description Default
index int

Index running along the raws of the dataset.csv file.

required

Returns:

Type Description
Tuple[ndarray, ndarray]

Tuple composed by a multidimensional matrices for the images of shape N x N x 3 (where N is the number of pixels along a raw or column of the .png file) and an array of labels of each image.

Source code in mlpoppyns/learning/loaders/loader_rgb_image.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def __getitem__(self, index: int) -> Tuple[np.ndarray, np.ndarray]:
    """
    Read the dataset and extract the images and the corresponding labels.

    Args:
        index (int): Index running along the raws of the dataset.csv file.

    Returns:
        (Tuple[np.ndarray, np.ndarray]): Tuple composed by a multidimensional matrices for the images of
            shape N x N x 3 (where N is the number of pixels along a raw or column of
            the .png file) and an array of labels of each image.
    """
    image_name = self.dataset.iloc[index, 0]

    image = np.array(Image.open(image_name))[:, :, 0:3]

    labels = np.array(self.dataset.iloc[index, 1:], dtype=np.float32)

    if self.transform is not None:
        image = self.transform(image)

    return image, labels

__init__(file_path, transform=None)

Load the images and labels dataset.

Parameters:

Name Type Description Default
file_path str

Path to the dataset.csv file containing all the information on the dataset.

required
transform Optional[Callable]

Transformation to apply to the images.

None
Source code in mlpoppyns/learning/loaders/loader_rgb_image.py
25
26
27
28
29
30
31
32
33
34
35
36
37
def __init__(
    self, file_path: str, transform: Optional[Callable] = None
) -> None:
    """
    Load the images and labels dataset.

    Args:
        file_path (str): Path to the dataset.csv file containing all the
            information on the dataset.
        transform (Optional[Callable]): Transformation to apply to the images.
    """
    self.dataset = pd.read_csv(file_path)
    self.transform = transform

__len__()

Length of the dataset (number of samples).

Returns:

Type Description
int

Length of the dataset

Source code in mlpoppyns/learning/loaders/loader_rgb_image.py
39
40
41
42
43
44
45
46
def __len__(self) -> int:
    """
    Length of the dataset (number of samples).

    Returns:
        (int): Length of the dataset
    """
    return len(self.dataset)

LoaderRGBImage

Bases: LoaderBase

Source code in mlpoppyns/learning/loaders/loader_rgb_image.py
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
class LoaderRGBImage(LoaderBase):
    def __init__(
        self,
        data_path: str,
        batch_size: int,
        ignored_inputs: list = [],
        num_workers: int = 1,
        shuffle: bool = False,
    ) -> None:
        """
        Data loader for RGB density maps dataset. The dataset is expected to be
        packed in dataset.csv file.

        Args:
            data_path (string): Path to the dataset.
            batch_size (int): Number of samples per batch.
            ignored_inputs (list): Indices of columns in the dataset to ignore.
            num_workers (int): Workers to load the data.
            shuffle (bool): Shuffle the samples or not.
        """

        transformation = torchvision.transforms.ToTensor()

        self.data_path = data_path
        # No possiblity to ignore inputs is given in this dataset. The parameter
        # is just kept for interface purposes.
        self.ignored_inputs = ignored_inputs
        self.dataset = DatasetRGBImage(
            self.data_path, transform=transformation
        )

        super().__init__(self.dataset, batch_size, num_workers, shuffle)

__init__(data_path, batch_size, ignored_inputs=[], num_workers=1, shuffle=False)

Data loader for RGB density maps dataset. The dataset is expected to be packed in dataset.csv file.

Parameters:

Name Type Description Default
data_path string

Path to the dataset.

required
batch_size int

Number of samples per batch.

required
ignored_inputs list

Indices of columns in the dataset to ignore.

[]
num_workers int

Workers to load the data.

1
shuffle bool

Shuffle the samples or not.

False
Source code in mlpoppyns/learning/loaders/loader_rgb_image.py
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
def __init__(
    self,
    data_path: str,
    batch_size: int,
    ignored_inputs: list = [],
    num_workers: int = 1,
    shuffle: bool = False,
) -> None:
    """
    Data loader for RGB density maps dataset. The dataset is expected to be
    packed in dataset.csv file.

    Args:
        data_path (string): Path to the dataset.
        batch_size (int): Number of samples per batch.
        ignored_inputs (list): Indices of columns in the dataset to ignore.
        num_workers (int): Workers to load the data.
        shuffle (bool): Shuffle the samples or not.
    """

    transformation = torchvision.transforms.ToTensor()

    self.data_path = data_path
    # No possiblity to ignore inputs is given in this dataset. The parameter
    # is just kept for interface purposes.
    self.ignored_inputs = ignored_inputs
    self.dataset = DatasetRGBImage(
        self.data_path, transform=transformation
    )

    super().__init__(self.dataset, batch_size, num_workers, shuffle)

mlpoppyns.learning.loaders.loaders

Loaders.

This is just an empty module that gathers all the available loaders.

Authors:

Alberto Garcia Garcia (garciagarcia@ice.csic.es)