Dataset Compression¶

class ptp.compression.Codec(ds={}, filename='', compressed=False)[source]¶

Dataset compression coder/decoder (codec)

Parameters

ds – (dictionary) dataset
filename – (string) dataset file name
compressed – (bool) whether the supplied dataset is compressed

Reorganize dataset more efficiently for storing into files

The data[‘data’] member of the dataset holds a list of dictionaries, each containing several metrics. This is is very inefficient for storage, since the keys (strings) are repeated on every dictionary.

Some of the metrics in the dataset are present on all dictionaries. Hence, they can be stored in lists directly. Other metrics are not present in all dictionaries, in which case they should be stored in a pair of lists, one containing the actual time-series, the other containing the indexes where the elements are present in the dataset.

Parameters: data – Dataset dictionary formatted as {‘metadata’: x, ‘data’: y}, i.e., as a dictionary containing the metadata and data keys.
Returns: (dict) The compressed dataset

decompress()[source]¶

Revert the compression

Returns: (dict) The decompressed dataset

dump(ext='xz')[source]¶

Dump dataset to file

Parameters: ext – Output file extension, which also determines the binary compression scheme to be adopted. Choose from “json”, “pickle”, “gz”, “pbz2” or “xz”