Scoring methods (scoring
)¶
CA¶

Orange.evaluation.
CA
(results=None, **kwargs)[source]¶ A wrapper for sklearn.metrics.classification.accuracy_score. The following is its documentation:
Accuracy classification score.
In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.
Read more in the User Guide.
Precision¶

Orange.evaluation.
Precision
(results=None, **kwargs)[source]¶ A wrapper for sklearn.metrics.classification.precision_score. The following is its documentation:
Compute the precision
The precision is the ratio
tp / (tp + fp)
wheretp
is the number of true positives andfp
the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.The best value is 1 and the worst value is 0.
Read more in the User Guide.
Recall¶

Orange.evaluation.
Recall
(results=None, **kwargs)[source]¶ A wrapper for sklearn.metrics.classification.recall_score. The following is its documentation:
Compute the recall
The recall is the ratio
tp / (tp + fn)
wheretp
is the number of true positives andfn
the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.The best value is 1 and the worst value is 0.
Read more in the User Guide.
F1¶

Orange.evaluation.
F1
(results=None, **kwargs)[source]¶ A wrapper for sklearn.metrics.classification.f1_score. The following is its documentation:
Compute the F1 score, also known as balanced Fscore or Fmeasure
The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is:
F1 = 2 * (precision * recall) / (precision + recall)
In the multiclass and multilabel case, this is the average of the F1 score of each class with weighting depending on the
average
parameter.Read more in the User Guide.
PrecisionRecallFSupport¶

Orange.evaluation.
PrecisionRecallFSupport
(results=None, **kwargs)[source]¶ A wrapper for sklearn.metrics.classification.precision_recall_fscore_support. The following is its documentation:
Compute precision, recall, Fmeasure and support for each class
The precision is the ratio
tp / (tp + fp)
wheretp
is the number of true positives andfp
the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.The recall is the ratio
tp / (tp + fn)
wheretp
is the number of true positives andfn
the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.The Fbeta score can be interpreted as a weighted harmonic mean of the precision and recall, where an Fbeta score reaches its best value at 1 and worst score at 0.
The Fbeta score weights recall more than precision by a factor of
beta
.beta == 1.0
means recall and precision are equally important.The support is the number of occurrences of each class in
y_true
.If
pos_label is None
and in binary classification, this function returns the average precision, recall and Fmeasure ifaverage
is one of'micro'
,'macro'
,'weighted'
or'samples'
.Read more in the User Guide.
AUC¶
Log Loss¶

Orange.evaluation.
LogLoss
(results=None, **kwargs)[source]¶ ${sklpar}
Parameters:  results : Orange.evaluation.Results
Stored predictions and actual data in model testing.
 eps : float
Log loss is undefined for p=0 or p=1, so probabilities are clipped to max(eps, min(1  eps, p)).
 normalize : bool, optional (default=True)
If true, return the mean loss per sample. Otherwise, return the sum of the persample losses.
 sample_weight : arraylike of shape = [n_samples], optional
Sample weights.
Examples
>>> Orange.evaluation.LogLoss(results) array([ 0.3...])
MSE¶
MAE¶
R2¶

Orange.evaluation.
R2
(results=None, **kwargs)[source]¶ A wrapper for sklearn.metrics.regression.r2_score. The following is its documentation:
R^2 (coefficient of determination) regression score function.
Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
Read more in the User Guide.
CD diagram¶

Orange.evaluation.
compute_CD
(avranks, n, alpha='0.05', test='nemenyi')[source]¶ Returns critical difference for Nemenyi or BonferroniDunn test according to given alpha (either alpha=”0.05” or alpha=”0.1”) for average ranks and number of tested datasets N. Test can be either “nemenyi” for for Nemenyi two tailed test or “bonferronidunn” for BonferroniDunn test.

Orange.evaluation.
graph_ranks
(avranks, names, cd=None, cdmethod=None, lowv=None, highv=None, width=6, textspace=1, reverse=False, filename=None, **kwargs)[source]¶ Draws a CD graph, which is used to display the differences in methods’ performance. See Janez Demsar, Statistical Comparisons of Classifiers over Multiple Data Sets, 7(Jan):1–30, 2006.
Needs matplotlib to work.
The image is ploted on plt imported using import matplotlib.pyplot as plt.
 Args:
avranks (list of float): average ranks of methods. names (list of str): names of methods. cd (float): Critical difference used for statistically significance of
difference between methods. cdmethod (int, optional): the method that is compared with other methods
 If omitted, show pairwise comparison of methods
lowv (int, optional): the lowest shown rank highv (int, optional): the highest shown rank width (int, optional): default width in inches (default: 6) textspace (int, optional): space on figure sides (in inches) for the
method names (default: 1) reverse (bool, optional): if set to True, the lowest rank is on the
 right (default: False)
 filename (str, optional): output file name (with extension). If not
 given, the function does not write a file.
Example¶
>>> import Orange
>>> import matplotlib.pyplot as plt
>>> names = ["first", "third", "second", "fourth" ]
>>> avranks = [1.9, 3.2, 2.8, 3.3 ]
>>> cd = Orange.evaluation.compute_CD(avranks, 30) #tested on 30 datasets
>>> Orange.evaluation.graph_ranks(avranks, names, cd=cd, width=6, textspace=1.5)
>>> plt.show()
The code produces the following graph: