This is documentation for Orange 2.7. For the latest documentation, see Orange 3.

Evaluation (evaluation)

Evaluation of prediction modules is split into two parts. Module Orange.evaluation.testing contains procedures that sample data, train learning algorithms and test models. All procedures return results as an instance of ExperimentResults that is described below. Module Orange.evaluation.scoring uses such data to compute various performance scores like classification accuracy and AUC.

There is a third module available as an add-on, which is unrelated to this scheme,:obj:Orange.evaluation.reliability, that assesses the reliability of individual predictions.

Classes for storing the experimental results

The following two classes are used for storing the results of experiments by Orange.evaluation.testing and computing of scores by Orange.evaluation.scoring. Instances of this class seldom need to be constructed and used outside of these two modules.

class Orange.evaluation.testing.ExperimentResults(iterations, classifier_names, class_values=None, weights=None, base_class=-1)

ExperimentResults stores results of one or more repetitions of some test (cross validation, repeated sampling...) under the same circumstances. Instances of this class are constructed by sampling and testing functions from module Orange.evaluation.testing and used by methods in module Orange.evaluation.scoring.


A list of instances of TestedExample, one for each example in the dataset.


Number of iterations. This can be the number of folds (in cross validation) or the number of repetitions of some test. TestedExample‘s attribute iteration_number should be in range [0, number_of_iterations-1].


Number of learners. Lengths of lists classes and probabilities in each TestedExample should equal number_of_learners.


Stores the names of the classifiers.


A list of classifiers, one element for each iteration of sampling and learning (eg. fold). Each element is a list of classifiers, one for each learner. For instance, classifiers[2][4] refers to the 3rd repetition, 5th learning algorithm.

Note that functions from testing only store classifiers it enabled by setting storeClassifiers to 1.


The reference class for measures like AUC.


The list of class values.


A flag telling whether the results are weighted. If False, weights are still present in TestedExample, but they are all 1.0. Clear this flag, if your experimental procedure ran on weighted testing examples but you would like to ignore the weights in statistics.

add(results, index, replace=-1)

add evaluation results (for one learner)


remove one learner from evaluation results

class Orange.evaluation.testing.TestedExample(iteration_number=None, actual_class=None, n=0, weight=1)

TestedExample stores predictions of different classifiers for a single testing data instance.


A list of predictions of type Value, one for each classifier.


A list of probabilities of classes, one for each classifier.


Iteration number (e.g. fold) in which the TestedExample was created/tested.

__init__(iteration_number=None, actual_class=None, n=0, weight=1)
  • iteration_number – The iteration number of TestedExample.
  • actual_class – The actual class of TestedExample.
  • n – The number of learners.
  • weight – The weight of the TestedExample.
add_result(aclass, aprob)

Append a new result (class and probability prediction by a single classifier) to the classes and probabilities field.

set_result(i, aclass, aprob)

Set the result of the i-th classifier to the given values.