This is documentation for Orange 2.7. For the latest documentation, see Orange 3.
Basic Statistics for Continuous Features (basic)¶
The are two simple classes for computing basic statistics for continuous features, such as their minimal and maximal value or average: Orange.statistics.basic.Variable holds the statistics for a single variable and Orange.statistics.basic.Domain behaves like a list of instances of the above class for all variables in the domain.
- class Orange.statistics.basic.Variable¶
Computes and stores minimal, maximal, average and standard deviation of a variable. It does not include the median or any other statistics that can be computed on the fly, without remembering the data; such statistics can be obtained classes from module Orange.statistics.distribution.
Instances of this class are seldom constructed manually; they are more often returned by Domain described below.
- variable¶
The variable to which the data applies.
- min¶
Minimal value encountered
- max¶
Maximal value encountered
- avg¶
Average value
- dev¶
Standard deviation
- n¶
Number of instances for which the value was defined. If instances were weighted, n holds the sum of weights
- sum¶
Weighted sum of values
- sum2¶
Weighted sum of squared values
- class Orange.statistics.basic.Domain¶
statistics.basic.Domain behaves like an ordinary list, except that its elements can also be indexed by variable names or descriptors.
- __init__(data[, weight=None])¶
Compute the statistics for all continuous variables in the data, and put None to the places corresponding to variables of other types.
Parameters: - data (Orange.data.Table) – A table of instances
- weight (int or none) – The id of the meta-attribute with weights
- purge()¶
Remove the None‘s corresponding to non-continuous features; this truncates the list, so the indices do not respond to indices of variables in the domain.
part of distributions-basic-stat.py
import Orange iris = Orange.data.Table("iris.tab") bas = Orange.statistics.basic.Domain(iris) print "%20s %5s %5s %5s" % ("feature", "min", "max", "avg") for a in bas: if a: print "%20s %5.3f %5.3f %5.3f" % (a.variable.name, a.min, a.max, a.avg)
Output:
feature min max avg sepal length 4.300 7.900 5.843 sepal width 2.000 4.400 3.054 petal length 1.000 6.900 3.759 petal width 0.100 2.500 1.199
part of distributions-basic-stat.py
print bas["sepal length"].avg
Output:
5.84333467484