pharmpy.data.iterators module

data.iterators

Iterators generating new datasets from a dataset. The dataset could either be stand alone or connected to a model. If a model is used the same model will be updated with different datasets for each iteration.

Currenly contains:

  1. Omit - Can be used for cdd

  2. Resample - Can be used by bootstrap

class pharmpy.data.iterators.DatasetIterator(iterations, name_pattern='dataset_{}')[source]

Bases: object

Base class for iterator classes that generate new datasets from an input dataset

The __next__ function could return either a DataFrame or a tuple where the first element is the main DataFrame.

class pharmpy.data.iterators.Omit(dataset_or_model, group, name_pattern='omitted_{}')[source]

Bases: pharmpy.data.iterators.DatasetIterator

Iterate over omissions of a certain group in a dataset. One group is omitted at a time.

Parameters
  • dataset_or_model – DataFrame to iterate over or a model from which to use the dataset

  • group (colname) – Name of the column to use for grouping

  • name_pattern – Name to use for generated datasets. A number starting from 1 will be put in the placeholder.

Returns

Tuple of DataFrame and the omitted group

class pharmpy.data.iterators.Resample(dataset_or_model, group, resamples=1, stratify=None, sample_size=None, replace=False, name_pattern='resample_{}', name=None)[source]

Bases: pharmpy.data.iterators.DatasetIterator

Iterate over resamples of a dataset.

The dataset will be grouped on the group column then groups will be selected randomly with or without replacement to form a new dataset. The groups will be renumbered from 1 and upwards to keep them separated in the new dataset.

Stratification will make sure that

Parameters
  • df (DataFrame) – DataFrame to iterate over

  • group (colname) – Name of column to group by

  • resamples (Int) – Number of resamples (iterations) to make

  • stratify (colname) – Name of column to use for stratification. The values in the stratification column must be equal within a group so that the group can be uniquely determined. A ValueError exception will be raised otherwise.

  • sample_size (Int) – The number of groups that should be sampled. The default is the number of groups. If using stratification the default is to sample using the proportion of the stratas in the dataset. A dictionary of specific sample sizes for each strata can also be supplied.

  • replace (bool) – A boolean controlling whether sampling should be done with or without replacement

  • name_pattern – Name to use for generated datasets. A number starting from 1 will be put in the placeholder.

Returns

A tuple of a resampled DataFrame and a list of resampled groups in order

Inheritance Diagram

Inheritance diagram of pharmpy.data.iterators