Welcome to pjml ‘s documentation!¶
Install¶
Install¶
The pjml is available on the PyPi . You can install it via pip as follow:
pip install -U pjml
It is possible to use the development version installing from GitHub:
pip install -U git@github.com:end-to-end-data-science/pjml.git
If you prefer, you can clone it and run the setup.py file. Use the following commands to get a copy from Github and install all dependencies:
git clone git@github.com:end-to-end-data-science/pjml.git
cd pjml
pip install .
Test and coverage¶
If you want to test/test-coverage the code before to install:
$ make install-dev
$ make test-cov
Or:
$ make install-dev
$ pytest --cov=pjml/ tests/
API Documentation¶
This is the full API documentation of the pjml package.
pjml.abs
: Abstract Classes and Mixin¶
Abstract Classes¶
component.Component (config, enhance, model, …) |
|
container.Container (config, seed, …) |
A container modifies ‘component(s)’. |
container1.Container1 (config, seed, …) |
Configurable container for a single component. |
containern.ContainerN (config, seed, components) |
Container for more than one component. |
invisible.Invisible |
Parent class of all atomic components that don’t increase history of transformations. |
macro.Macro |
|
minimalcontainer.MinimalContainer1 (*args[, …]) |
Container with minimum configuration (seed) for a single component. |
minimalcontainer.MinimalContainerN (*args[, …]) |
Container with minimum configuration (seed) for more than one component. |
Mixin¶
defaultenhancerimpl.withDefaultEnhancerImpl |
|
defaultmodelimpl.withDefaultModelImpl |
|
exceptionhandling.WithExceptionHandling |
Handle component exceptions and enable/disable numpy warnings. |
functioninspection.withFunctionInspection |
|
nodatahandling.withNoDataHandling |
All components that accept NoData should derive this class after deriving Transformer or descendants. |
noinfoimpl.withNoInfoImpl |
|
timing.withTiming |
Management of time. |
pjml.data
: Abstract Classes and Mixin¶
Data Communication Tools¶
cache.Cache (*args[, storage_alias, seed, …]) |
|
report.Report (text, **kwargs) |
Report printer. |
Data Evaluation Tools¶
metric.Metric ([functions, target, prediction]) |
Metric to evaluate a given Data field. |
split.Split (split_type, partitions, …) |
Split a given Data field into training/apply set and testing/use set. |
Data Flow Tools¶
file.File (name, path, description, hashes, …) |
Source of Data object from CSV, ARFF, file. |
unfreeze.Unfreeze (**kwargs) |
Resurrect a workflow by unfreezing a Data object. |
Data Manipulation Tools¶
pjml.operator
: Component Operators¶
pipeline.Pipeline (*args[, seed, components, …]) |
|
chain.Chain (*args[, seed, components, …]) |
Chain the execution of the given components. |
select.Select |
A permutation is sampled. |
shuffle.Shuffle |
A permutation is sampled. |
pjml.stream
: Stream Manipulation Tools¶
Expand¶
partition.Partition (split_type, partitions, …) |
Class to perform, e.g. |
Reduce¶
accumulator.Accumulator (iterator, start, …) |
Cumulative iterator that returns a final/result value. |
accumulator.Result (value) |
|
reduce.Reduce (config, **kwargs) |
|
summ.Summ (field, function, **kwargs) |
Given a field, summarizes a Collection object to a Data object. |
Transform¶
map.Map (*args[, seed, components, enhance, …]) |
Execute the same component for the entire stream. |
multi.Multi (*args[, seed, components, …]) |
Process each Data object from a stream with its respective component. |
pjml.util
: Utils¶
distributions.choice (items) |
|
distributions.uniform ([low, high, size]) |
|
macro.evaluator (*components[, function]) |
|
macro.tsplit (split_type, partitions, …) |
Make a sequence of Data splitters. |
The pjml example gallery¶
The pjml package aims to provide easy tools to create machine learning pipelines from scratch. It adds new and elegant ways to deal and operates machine learning components, i.e., algorithms, partition, metrics and etc.
Below we present a gallery with examples of use:
Introductory Examples¶
Introductory examples of pjautoml package.
Note
Click here to download the full example code
Operating machine learning pipelines (basic)¶
You can create pipelines using the following operators:
- Chain –> It creates a sequential chain of components
- Shuffle –> It shuffles the componensts order
- Select –> It selects one of the given componets
Importing the required packages
import numpy as np
from pjml.operator.chain import Chain
from pjml.operator.select import Select
from pjml.operator.shuffle import Shuffle
from pjpy.modeling.supervised.classifier.dt import DT
from pjpy.modeling.supervised.classifier.svmc import SVMC
from pjpy.processing.feature.reductor.pca import PCA
from pjpy.processing.feature.scaler.minmax import MinMax
np.random.seed(0)
Using Chain¶
The Chain
is an operator that concatenates other components in a sequence.
exp = Chain(SVMC(), DT())
print(exp)
# You can also use the python operator ``*``
exp = SVMC() * DT()
print(exp)
Out:
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 1.0,
"kernel": "rbf",
"degree": 3,
"gamma": "scale",
"coef0": 0.0,
"shrinking": true,
"probability": false,
"tol": 0.001,
"cache_size": 200,
"class_weight": null,
"verbose": false,
"max_iter": -1,
"decision_function_shape": "ovr",
"break_ties": false,
"random_state": null,
"seed": 0
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"seed": 0
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 1.0,
"kernel": "rbf",
"degree": 3,
"gamma": "scale",
"coef0": 0.0,
"shrinking": true,
"probability": false,
"tol": 0.001,
"cache_size": 200,
"class_weight": null,
"verbose": false,
"max_iter": -1,
"decision_function_shape": "ovr",
"break_ties": false,
"random_state": null,
"seed": 0
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"seed": 0
}
},
"enhance": true,
"model": true
}
Using Shuffle¶
The Select
is an operator that works like a bifurcation, where only one of
the components will be selected.
exp = Shuffle(PCA(), MinMax())
print(exp)
Out:
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 2
}
},
"enhance": true,
"model": true
}
You can also use the python operator @
exp = PCA() @ MinMax()
print(exp)
Out:
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 2
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
Using Select¶
The Shuffle
is an operator that concatenate components in a sequence, but
the order is not maintained.
exp = Select(SVMC(), DT())
print(exp)
Out:
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"seed": 0
}
},
"enhance": true,
"model": true
}
You can also use the python operator +
exp = SVMC() + DT()
print(exp)
Out:
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 1.0,
"kernel": "rbf",
"degree": 3,
"gamma": "scale",
"coef0": 0.0,
"shrinking": true,
"probability": false,
"tol": 0.001,
"cache_size": 200,
"class_weight": null,
"verbose": false,
"max_iter": -1,
"decision_function_shape": "ovr",
"break_ties": false,
"random_state": null,
"seed": 0
}
},
"enhance": true,
"model": true
}
Using them all:¶
Using these simple operations, you can create diverse kind of pipelines to represent an end-to-end machine learning pipeline.
exp = Chain(Shuffle(PCA(), MinMax()), Select(SVMC() + DT()))
print(exp)
Out:
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 2
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"seed": 0
}
},
"enhance": true,
"model": true
}
You can also use python operators
exp = PCA() @ MinMax() * (SVMC() + DT())
print(exp)
Out:
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 2
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"seed": 0
}
},
"enhance": true,
"model": true
}
Total running time of the script: ( 0 minutes 0.062 seconds)
Note
Click here to download the full example code
Running machine learning pipelines¶
Let’s run a machine learning pipeline to the Iris dataset.
Importing the required packages
import numpy as np
from pjml.data.communication.report import Report
from pjml.data.evaluation.metric import Metric
from pjml.data.flow.file import File
from pjml.operator.pipeline import Pipeline
from pjml.stream.expand.partition import Partition
from pjml.stream.reduce.reduce import Reduce
from pjml.stream.reduce.summ import Summ
from pjml.stream.transform.map import Map
from pjpy.modeling.supervised.classifier.svmc import SVMC
from pjpy.processing.feature.reductor.pca import PCA
from pjpy.processing.feature.scaler.minmax import MinMax
np.random.seed(0)
First, we must create a pipeline.
pipe = Pipeline(
File("../data/iris.arff"),
Partition(),
Map(MinMax(), PCA(), SVMC(), Metric()),
Summ(),
Reduce(),
Report("Mean S: $S"),
)
Now we will train our pipeline
res_train, res_test = pipe.dual_transform()
print("Train result: ", res_train)
print("test result: ", res_test)
Out:
[model] Mean S: array([[0.96]])
Train result: <pjdata.content.data.Data object at 0x7f37688e6c70>
test result: <pjdata.content.data.Data object at 0x7f376877b850>
Total running time of the script: ( 0 minutes 0.146 seconds)
Note
Click here to download the full example code
Creating an end-to-end pipelines¶
Let’s create an end-to-end machine learning pipeline.
Importing the required packages
import numpy as np
from pjml.data.communication.report import Report
from pjml.data.evaluation.metric import Metric
from pjml.data.flow.file import File
from pjml.operator.pipeline import Pipeline
from pjml.stream.expand.partition import Partition
from pjml.stream.reduce.reduce import Reduce
from pjml.stream.reduce.summ import Summ
from pjml.stream.transform.map import Map
from pjpy.modeling.supervised.classifier.svmc import SVMC
from pjpy.processing.feature.reductor.pca import PCA
from pjpy.processing.feature.scaler.minmax import MinMax
np.random.seed(0)
First, we create a machine learning expression.
exp = Pipeline(MinMax(), PCA(), SVMC())
Let’s look at the sequence of operations and the hyperparameter values.
print(exp)
Out:
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 2
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 1.0,
"kernel": "rbf",
"degree": 3,
"gamma": "scale",
"coef0": 0.0,
"shrinking": true,
"probability": false,
"tol": 0.001,
"cache_size": 200,
"class_weight": null,
"verbose": false,
"max_iter": -1,
"decision_function_shape": "ovr",
"break_ties": false,
"random_state": null,
"seed": 0
}
},
"enhance": true,
"model": true
}
Defined our machine learning expression, we will create an end-to-end pipeline.
pipeline = Pipeline(
File("../data/iris.arff"),
Partition(),
Map(exp, Metric()),
Summ(),
Reduce(),
Report(),
)
or using only python operators
pipeline = (
File("../data/iris.arff")
* Partition()
* Map(exp * Metric())
* Summ(function="mean")
* Reduce()
* Report("Mean S: $S")
)
This pipeline represents and end-to-end machine learning experiment.
print(pipeline)
Out:
{
"info": {
"_id": "File@pjml.data.flow.file",
"config": {
"name": "../data/iris.arff",
"path": "./",
"description": "No description.",
"hashes": {
"X": "0ǏǍɽĊũÊүȏŵҖSîҕ",
"Y": "0ЄϒɐĵǏȂϗƽўýÎʃȆ",
"Xd": "5ɫңɖŇǓήʼnÝʑΏƀЀǔ",
"Yd": "5mϛǖͶƅĞOȁЎžʛѲƨ",
"Xt": "5ȥΔĨӑËҭȨƬδſΧȰɩ",
"Yt": "5έēPaӹЄźգǩȱɟǟǹ"
}
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "Partition@pjml.stream.expand.partition",
"config": {
"split_type": "cv",
"partitions": 10,
"seed": 0,
"fields": "X,Y"
}
},
"enhance": true,
"model": true
}
Map>>
{"info": {"_id": "MinMax@pjpy.processing.feature.scaler.minmax","config": {"feature_range": [0,1],"model": true,"enhance": true}},"enhance": true,"model": true}
{"info": {"_id": "PCA@pjpy.processing.feature.reductor.pca","config": {"n": 2,"model": true,"enhance": true}},"enhance": true,"model": true}
{"info": {"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc","config": {"C": 1.0,"kernel": "rbf","degree": 3,"gamma": "scale","coef0": 0.0,"shrinking": true,"probability": false,"tol": 0.001,"cache_size": 200,"class_weight": null,"verbose": false,"max_iter": -1,"decision_function_shape": "ovr","break_ties": false,"random_state": null,"seed": 0,"model": true,"enhance": true}},"enhance": true,"model": true}
{"info": {"_id": "Metric@pjml.data.evaluation.metric","config": {"functions": ["accuracy"],"target": "Y","prediction": "Z","model": true,"enhance": true}},"enhance": true,"model": true}
<<Map
{
"info": {
"_id": "Summ@pjml.stream.reduce.summ",
"config": {
"field": "R",
"function": "mean"
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "Reduce@pjml.stream.reduce.reduce",
"config": {}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "Report@pjml.data.communication.report",
"config": {
"text": "Mean S: $S"
}
},
"enhance": true,
"model": true
}
Total running time of the script: ( 0 minutes 0.086 seconds)
Getting started¶
Information to install, test, and contribute to the package.
API Documentation¶
In this section, we document expected types, functions, classes, and parameters available for AutoML building. We also describe our own AutoML systems.
Examples¶
A set of examples illustrating the use of pjml package. You will learn in this section how pjml works, patter, tips, and more.
What’s new ?¶
Log of the pjml history.