Sampling procedures for testing models (testing
)¶

class
Orange.evaluation.testing.
Results
(data=None, *, nmethods=None, nrows=None, nclasses=None, domain=None, row_indices=None, folds=None, score_by_folds=True, learners=None, models=None, failed=None, actual=None, predicted=None, probabilities=None, store_data=None, store_models=None, train_time=None, test_time=None)[source]¶ Class for storing predictions in model testing.

models
¶ A list of induced models.
 Type
Optional[List[Model]]

row_indices
¶ Indices of rows in data that were used in testing, stored as a numpy vector of length nrows. Values of actual[i], predicted[i] and probabilities[i] refer to the target value of instance, that is, the ith test instance is data[row_indices[i]], its actual class is actual[i], and the prediction by mth method is predicted[m, i].
 Type
np.ndarray

nrows
¶ The number of test instances (including duplicates); nrows equals the length of row_indices and actual, and the second dimension of predicted and probabilities.
 Type

actual
¶ true values of target variable in a vector of length nrows.
 Type
np.ndarray

predicted
¶ predicted values of target variable in an array of shape (numberofmethods, nrows)
 Type
np.ndarray

probabilities
¶ predicted probabilities (for discrete target variables) in an array of shape (numberofmethods, nrows, numberofclasses)
 Type
Optional[np.ndarray]

folds
¶ a list of indices (or slice objects) corresponding to testing data subsets, that is, row_indices[folds[i]] contains row indices used in fold i, so data[row_indices[folds[i]]] is the corresponding testing data
 Type
List[Slice or List[int]]

train_time
¶ training times of batches
 Type
np.ndarray

test_time
¶ testing times of batches
 Type
np.ndarray

get_augmented_data
(model_names, include_attrs=True, include_predictions=True, include_probabilities=True)[source]¶ Return the test data table augmented with meta attributes containing predictions, probabilities (if the task is classification) and fold indices.
 Parameters
 Returns
data augmented with predictions, probabilities and fold indices
 Return type
augmented_data (Orange.data.Table)


class
Orange.evaluation.testing.
CrossValidation
(k=10, stratified=True, random_state=0, store_data=False, store_models=False, warnings=None)[source]¶ Kfold cross validation

random_state
¶ seed for random number generator (default: 0). If set to None, a different seed is used each time
 Type

stratified
¶ flag deciding whether to perform stratified crossvalidation. If True but the class sizes don’t allow it, it uses nonstratified validataion and adds a list warning with a warning message(s) to the Result.
 Type

get_indices
(data)[source]¶ Return a list of arrays of indices of test data instance
For example, in kfold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
 Parameters
data (Orange.data.Table) – test data
 Returns
a list of arrays of indices into data
 Return type
indices (list of np.ndarray)


class
Orange.evaluation.testing.
CrossValidationFeature
(feature=None, store_data=False, store_models=False, warnings=None)[source]¶ Cross validation with folds according to values of a feature.

feature
¶ the feature defining the folds
 Type

get_indices
(data)[source]¶ Return a list of arrays of indices of test data instance
For example, in kfold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
 Parameters
data (Orange.data.Table) – test data
 Returns
a list of arrays of indices into data
 Return type
indices (list of np.ndarray)


class
Orange.evaluation.testing.
LeaveOneOut
(*, store_data=False, store_models=False)[source]¶ Leaveoneout testing

get_indices
(data)[source]¶ Return a list of arrays of indices of test data instance
For example, in kfold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
 Parameters
data (Orange.data.Table) – test data
 Returns
a list of arrays of indices into data
 Return type
indices (list of np.ndarray)

static
prepare_arrays
(data, indices)[source]¶ Prepare folds, row_indices and actual.
The method is used by __call__. While functional, it may be overriden in subclasses for speedups.
 Parameters
data (Orange.data.Table) – data use for testing
indices (list of vectors) – indices of data instances in each test sample
 Returns
(np.ndarray): see class documentation row_indices: (np.ndarray): see class documentation actual: (np.ndarray): see class documentation
 Return type
folds


class
Orange.evaluation.testing.
ShuffleSplit
(n_resamples=10, train_size=None, test_size=0.1, stratified=True, random_state=0, store_data=False, store_models=False)[source]¶ Test by repeated random sampling

test_size
¶ If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. By default, the value is set to 0.1. The default will change in version 0.21. It will remain 0.1 only if
train_size
is unspecified, otherwise it will complement the specifiedtrain_size
. (from documentation of scipy.sklearn.StratifiedShuffleSplit)

train_size
¶ float, int, or None, default is None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size. (from documentation of scipy.sklearn.StratifiedShuffleSplit)

random_state
¶ seed for random number generator (default: 0). If set to None, a different seed is used each time
 Type

get_indices
(data)[source]¶ Return a list of arrays of indices of test data instance
For example, in kfold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
 Parameters
data (Orange.data.Table) – test data
 Returns
a list of arrays of indices into data
 Return type
indices (list of np.ndarray)


class
Orange.evaluation.testing.
TestOnTestData
(*, store_data=False, store_models=False)[source]¶ Test on separately provided test data
Note that the class has a different signature for __call__.

class
Orange.evaluation.testing.
TestOnTrainingData
(*, store_data=False, store_models=False)[source]¶ Test on training data

Orange.evaluation.testing.
sample
(table, n=0.7, stratified=False, replace=False, random_state=None)[source]¶ Samples data instances from a data table. Returns the sample and a dataset from input data table that are not in the sample. Also uses several sampling functions from scikitlearn.
 tabledata table
A data table from which to sample.
 nfloat, int (default = 0.7)
If float, should be between 0.0 and 1.0 and represents the proportion of data instances in the resulting sample. If int, n is the number of data instances in the resulting sample.
 stratifiedbool, optional (default = False)
If true, sampling will try to consider class values and match distribution of class values in train and test subsets.
 replacebool, optional (default = False)
sample with replacement
 random_stateint or RandomState
Pseudorandom number generator state used for random sampling.