pharmpy.data.iterators module¶
data.iterators¶
Iterators generating new datasets from a dataset. The dataset could either be stand alone or connected to a model. If a model is used the same model will be updated with different datasets for each iteration.
Currenly contains:
Omit - Can be used for cdd
Resample - Can be used by bootstrap
- class pharmpy.data.iterators.DatasetIterator(iterations, name_pattern='dataset_{}')[source]¶
Bases:
object
Base class for iterator classes that generate new datasets from an input dataset
The __next__ function could return either a DataFrame or a tuple where the first element is the main DataFrame.
- class pharmpy.data.iterators.Omit(dataset_or_model, group, name_pattern='omitted_{}')[source]¶
Bases:
pharmpy.data.iterators.DatasetIterator
Iterate over omissions of a certain group in a dataset. One group is omitted at a time.
- Parameters
dataset_or_model – DataFrame to iterate over or a model from which to use the dataset
group (colname) – Name of the column to use for grouping
name_pattern – Name to use for generated datasets. A number starting from 1 will be put in the placeholder.
- Returns
Tuple of DataFrame and the omitted group
- class pharmpy.data.iterators.Resample(dataset_or_model, group, resamples=1, stratify=None, sample_size=None, replace=False, name_pattern='resample_{}', name=None)[source]¶
Bases:
pharmpy.data.iterators.DatasetIterator
Iterate over resamples of a dataset.
The dataset will be grouped on the group column then groups will be selected randomly with or without replacement to form a new dataset. The groups will be renumbered from 1 and upwards to keep them separated in the new dataset.
Stratification will make sure that
- Parameters
df (DataFrame) – DataFrame to iterate over
group (colname) – Name of column to group by
resamples (Int) – Number of resamples (iterations) to make
stratify (colname) – Name of column to use for stratification. The values in the stratification column must be equal within a group so that the group can be uniquely determined. A ValueError exception will be raised otherwise.
sample_size (Int) – The number of groups that should be sampled. The default is the number of groups. If using stratification the default is to sample using the proportion of the stratas in the dataset. A dictionary of specific sample sizes for each strata can also be supplied.
replace (bool) – A boolean controlling whether sampling should be done with or without replacement
name_pattern – Name to use for generated datasets. A number starting from 1 will be put in the placeholder.
- Returns
A tuple of a resampled DataFrame and a list of resampled groups in order