Sampling procedures for testing models (testing)

class Orange.evaluation.testing.Results(data=None, nmethods=0, *, learners=None, train_data=None, nrows=None, nclasses=None, store_data=False, store_models=False, domain=None, actual=None, row_indices=None, predicted=None, probabilities=None, preprocessor=None, callback=None, n_jobs=1)[source]

Class for storing predictions in model testing.

data

Data used for testing. When data is stored, this is typically not a copy but a reference.

Type

Optional[Table]

models

A list of induced models.

Type

Optional[List[Model]]

row_indices

Indices of rows in data that were used in testing, stored as a numpy vector of length nrows. Values of actual[i], predicted[i] and probabilities[i] refer to the target value of instance data[row_indices[i]].

Type

np.ndarray

nrows

The number of test instances (including duplicates).

Type

int

actual

Actual values of target variable; a numpy vector of length nrows and of the same type as data (or np.float32 if the type of data cannot be determined).

Type

np.ndarray

predicted

Predicted values of target variable; a numpy array of shape (number-of-methods, nrows) and of the same type as data (or np.float32 if the type of data cannot be determined).

Type

np.ndarray

probabilities

Predicted probabilities (for discrete target variables); a numpy array of shape (number-of-methods, nrows, number-of-classes) of type np.float32.

Type

Optional[np.ndarray]

folds

A list of indices (or slice objects) corresponding to rows of each fold.

Type

List[Slice or List[int]]

get_augmented_data(model_names, include_attrs=True, include_predictions=True, include_probabilities=True)[source]

Return the data, augmented with predictions, probabilities (if the task is classification) and folds info. Predictions, probabilities and folds are inserted as meta attributes.

Parameters
  • model_names (list) – A list of strings containing learners’ names.

  • include_attrs (bool) – Flag that tells whether to include original attributes.

  • include_predictions (bool) – Flag that tells whether to include predictions.

  • include_probabilities (bool) – Flag that tells whether to include probabilities.

Returns

Data augmented with predictions, (probabilities) and (fold).

Return type

Orange.data.Table

fit(train_data, test_data=None)[source]

Fits self.learners using folds sampled from the provided data.

Parameters
  • train_data (Table) – table to sample train folds

  • test_data (Optional[Table]) – tap to sample test folds of None then train_data will be used

prepare_arrays(test_data)[source]

Initialize arrays that will be used by fit method.

setup_indices(train_data, test_data)[source]

Initializes self.indices with iterable objects with slices (or indices) for each fold.

Parameters
  • train_data (Table) – train table

  • test_data (Table) – test table

split_by_model()[source]

Split evaluation results by models

class Orange.evaluation.testing.CrossValidation(data, learners, k=10, stratified=True, random_state=0, store_data=False, store_models=False, preprocessor=None, callback=None, warnings=None, n_jobs=1)[source]

K-fold cross validation.

If the constructor is given the data and a list of learning algorithms, it runs cross validation and returns an instance of Results containing the predicted values and probabilities.

k

The number of folds.

random_state
setup_indices(train_data, test_data)[source]

Initializes self.indices with iterable objects with slices (or indices) for each fold.

Parameters
  • train_data (Table) – train table

  • test_data (Table) – test table

class Orange.evaluation.testing.CrossValidationFeature(data, learners, feature, store_data=False, store_models=False, preprocessor=None, callback=None, n_jobs=1)[source]

Cross validation with folds according to values of a feature.

feature

The feature defining the folds.

setup_indices(train_data, test_data)[source]

Initializes self.indices with iterable objects with slices (or indices) for each fold.

Parameters
  • train_data (Table) – train table

  • test_data (Table) – test table

class Orange.evaluation.testing.LeaveOneOut(data, learners, store_data=False, store_models=False, preprocessor=None, callback=None, n_jobs=1)[source]

Leave-one-out testing

setup_indices(train_data, test_data)[source]

Initializes self.indices with iterable objects with slices (or indices) for each fold.

Parameters
  • train_data (Table) – train table

  • test_data (Table) – test table

prepare_arrays(test_data)[source]

Initialize arrays that will be used by fit method.

class Orange.evaluation.testing.ShuffleSplit(data, learners, n_resamples=10, train_size=None, test_size=0.1, stratified=True, random_state=0, store_data=False, store_models=False, preprocessor=None, callback=None, n_jobs=1)[source]
setup_indices(train_data, test_data)[source]

Initializes self.indices with iterable objects with slices (or indices) for each fold.

Parameters
  • train_data (Table) – train table

  • test_data (Table) – test table

class Orange.evaluation.testing.TestOnTestData(train_data, test_data, learners, store_data=False, store_models=False, preprocessor=None, callback=None, n_jobs=1)[source]

Test on a separate test dataset.

setup_indices(train_data, test_data)[source]

Initializes self.indices with iterable objects with slices (or indices) for each fold.

Parameters
  • train_data (Table) – train table

  • test_data (Table) – test table

prepare_arrays(test_data)[source]

Initialize arrays that will be used by fit method.

class Orange.evaluation.testing.TestOnTrainingData(data, learners, store_data=False, store_models=False, preprocessor=None, callback=None, n_jobs=1)[source]

Trains and test on the same data

Orange.evaluation.testing.sample(table, n=0.7, stratified=False, replace=False, random_state=None)[source]

Samples data instances from a data table. Returns the sample and a dataset from input data table that are not in the sample. Also uses several sampling functions from scikit-learn.

tabledata table

A data table from which to sample.

nfloat, int (default = 0.7)

If float, should be between 0.0 and 1.0 and represents the proportion of data instances in the resulting sample. If int, n is the number of data instances in the resulting sample.

stratifiedbool, optional (default = False)

If true, sampling will try to consider class values and match distribution of class values in train and test subsets.

replacebool, optional (default = False)

sample with replacement

random_stateint or RandomState

Pseudo-random number generator state used for random sampling.