Dataset Compression

class ptp.compression.Codec(ds={}, filename='', compressed=False)[source]

Dataset compression coder/decoder (codec)

Parameters
  • ds – (dictionary) dataset

  • filename – (string) dataset file name

  • compressed – (bool) whether the supplied dataset is compressed

compress()[source]

Reorganize dataset more efficiently for storing into files

The data[‘data’] member of the dataset holds a list of dictionaries, each containing several metrics. This is is very inefficient for storage, since the keys (strings) are repeated on every dictionary.

Some of the metrics in the dataset are present on all dictionaries. Hence, they can be stored in lists directly. Other metrics are not present in all dictionaries, in which case they should be stored in a pair of lists, one containing the actual time-series, the other containing the indexes where the elements are present in the dataset.

Parameters

data – Dataset dictionary formatted as {‘metadata’: x, ‘data’: y}, i.e., as a dictionary containing the metadata and data keys.

Returns

(dict) The compressed dataset

decompress()[source]

Revert the compression

Returns

(dict) The decompressed dataset

dump(ext='xz')[source]

Dump dataset to file

Parameters

ext – Output file extension, which also determines the binary compression scheme to be adopted. Choose from “json”, “pickle”, “gz”, “pbz2” or “xz”