This is documentation for Orange 2.7. For the latest documentation, see Orange 3.

Basic Statistics for Continuous Features (basic)

The are two simple classes for computing basic statistics for continuous features, such as their minimal and maximal value or average: Orange.statistics.basic.Variable holds the statistics for a single variable and Orange.statistics.basic.Domain behaves like a list of instances of the above class for all variables in the domain.

class Orange.statistics.basic.Variable

Computes and stores minimal, maximal, average and standard deviation of a variable. It does not include the median or any other statistics that can be computed on the fly, without remembering the data; such statistics can be obtained classes from module Orange.statistics.distribution.

Instances of this class are seldom constructed manually; they are more often returned by Domain described below.

variable

The variable to which the data applies.

min

Minimal value encountered

max

Maximal value encountered

avg

Average value

dev

Standard deviation

n

Number of instances for which the value was defined. If instances were weighted, n holds the sum of weights

sum

Weighted sum of values

sum2

Weighted sum of squared values

add(value[, weight=1])

Add a value to the statistics: adjust min and max if necessary, increase n and recompute sum, sum2, avg and dev.

Parameters:
  • value (float) – Value to be added to the statistics
  • weight (float) – Weight assigned to the value
class Orange.statistics.basic.Domain

statistics.basic.Domain behaves like an ordinary list, except that its elements can also be indexed by variable names or descriptors.

__init__(data[, weight=None])

Compute the statistics for all continuous variables in the data, and put None to the places corresponding to variables of other types.

Parameters:
  • data (Orange.data.Table) – A table of instances
  • weight (int or none) – The id of the meta-attribute with weights
purge()

Remove the None‘s corresponding to non-continuous features; this truncates the list, so the indices do not respond to indices of variables in the domain.

part of distributions-basic-stat.py

import Orange

iris = Orange.data.Table("iris.tab")
bas = Orange.statistics.basic.Domain(iris) 

print "%20s %5s %5s %5s" % ("feature", "min", "max", "avg")
for a in bas:
    if a:
        print "%20s %5.3f %5.3f %5.3f" % (a.variable.name, a.min, a.max, a.avg)

Output:

     feature   min   max   avg
sepal length 4.300 7.900 5.843
 sepal width 2.000 4.400 3.054
petal length 1.000 6.900 3.759
 petal width 0.100 2.500 1.199

part of distributions-basic-stat.py

print bas["sepal length"].avg

Output:

5.84333467484