deidentify_data#

pharmpy.modeling.deidentify_data(df, id_column='ID', date_columns=None)[source]#

Deidentify a dataset

Two operations are performed on the dataset:

  1. All ID numbers are randomized from the range 1 to n

  2. All columns containing dates will have the year changed

The year change is done by letting the earliest year in the dataset be used as a reference and by maintaining leap years. The reference year will either be 1901, 1902, 1903 or 1904 depending on its distance to the closest preceeding leap year.

Parameters:
  • df (pd.DataFrame) – A dataset

  • id_column (str) – Name of the id column

  • date_columns (list) – Names of all date columns

Returns:

pd.DataFrame – Deidentified dataset