pjml.data.flow.file.File

class pjml.data.flow.file.File(name: str, path: str = './', description: str = 'No description.', hashes: str = None, **kwargs)[source]

Source of Data object from CSV, ARFF, file.

TODO: always classification task? There will be a single transformation (history) on the generated Data.

A short hash will be added to the name, to ensure unique names. Actually, the first collision is expected after 12M different datasets with the same name ( 2**(log(107**7, 2)/2) ). Since we already expect unique names like ‘iris’, and any transformed dataset is expected to enter the system through a component, 12M should be safe enough. Ideally, a single ‘iris’ be will stored. In practice, no more than a dozen are expected.

__init__(name: str, path: str = './', description: str = 'No description.', hashes: str = None, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(name, path, description, hashes, …) Initialize self.
default_config() Create a copy of the component default configuration.
disable_pretty_printing() Disable the pretty-printing.
dual_transform(train, …)
enable_pretty_printing() Enable the pretty-printing.
updated(**kwargs) Clone this component, optionally replacing given params.

Attributes

cfserialized
cfuuid UUID excluding ‘model’ and ‘enhance’ flags.
cs1 Convert component into a config space with a single component inside it.
enhancer
id Short uuID First 8 chars of uuid, usually for printing purposes.
jsonable
longname
model
name
path
pretty_printing
serialized
sid Short uuID First 6 chars of uuid, usually for printing purposes.
transformations
unwrap Subpipeline inside the first Wrap().
uuid Lazily calculated unique identifier for this dataset.
wrapped Same as unwrap(), but with the external container Wrap.
cfuuid[source]

UUID excluding ‘model’ and ‘enhance’ flags. Identifies the transformer.

cs1[source]

Convert component into a config space with a single component inside it.

classmethod default_config() → Dict[str, Any][source]

Create a copy of the component default configuration.

Returns:
dict

Copy of the component default configuration.

disable_pretty_printing()[source]

Disable the pretty-printing.

enable_pretty_printing()[source]

Enable the pretty-printing.

id[source]

Short uuID First 8 chars of uuid, usually for printing purposes. First collision expect after 12671943 combinations. :return:

sid[source]

Short uuID First 6 chars of uuid, usually for printing purposes. :return:

unwrap[source]

Subpipeline inside the first Wrap().

Hopefully there is only one Wrap in the pipeline. This method performs a depth-first search.

Examples

>>> pipe = Pipeline(
>>>     File(name='iris.arff'),
>>>     Wrap(Std(), SVMC()),
>>>     Metric(function='accuracy')
>>> )
>>> pipe.unwrap  # -> Chain(Std(), SVMC())
updated(**kwargs)[source]

Clone this component, optionally replacing given params.

Returns:
A ready to use component.
uuid

Lazily calculated unique identifier for this dataset.

Should be accessed direct as a class member: ‘uuid’.

Returns:
A unique identifier UUID object.
wrapped[source]

Same as unwrap(), but with the external container Wrap.