This is documentation for Orange 2.7. For the latest documentation, see Orange 3.

Contingency table (contingency)

Contingency table contains conditional distributions. Unless explicitly ‘normalized’, they contain absolute frequencies, that is, the number of instances with a particular combination of two variables’ values. If they are normalized by dividing each cell by the row sum, the represent conditional probabilities of the column variable (here denoted as innerVariable) conditioned by the row variable (outerVariable).

Contingency tables are usually constructed for discrete variables. Tables for continuous variables have certain limitations described in a separate section.

The example below loads the monks-1 data set and prints out the conditional class distribution given the value of e.

import Orange

monks = Orange.data.Table("monks-1.tab")
cont = Orange.statistics.contingency.VarClass("e", monks)
for val, dist in cont.items():
    print val, dist

This code prints out:

1 <0.000, 108.000>
2 <72.000, 36.000>
3 <72.000, 36.000>
4 <72.000, 36.000>

Contingencies behave like lists of distributions (in this case, class distributions) indexed by values (of e, in this example). Distributions are, in turn indexed by values (class values, here). The variable e from the above example is called the outer variable, and the class is the inner. This can also be reversed. It is also possible to use features for both, outer and inner variable, so the table shows distributions of one variable’s values given the value of another. There is a corresponding hierarchy of classes: Table is a base class for VarVar (both variables are attributes) and Class (one variable is the class). The latter is the base class for VarClass and ClassVar.

The most commonly used of the above classes is VarClass which can compute and store conditional probabilities of classes given the feature value.

Contingency tables

class Orange.statistics.contingency.Table

Provides a base class for storing and manipulating contingency tables. Although it is not abstract, it is seldom used directly but rather through more convenient derived classes described below.

outerVariable

Outer variable (Orange.feature.Descriptor) whose values are used as the first, outer index.

innerVariable

Inner variable(Orange.feature.Descriptor), whose values are used as the second, inner index.

outerDistribution

The marginal distribution (Distribution) of the outer variable.

innerDistribution

The marginal distribution (Distribution) of the inner variable.

innerDistributionUnknown

The distribution (distribution.Distribution) of the inner variable for instances for which the outer variable was undefined. This is the difference between the innerDistribution and (unconditional) distribution of inner variable.

varType

The type of the outer variable (Orange.feature.Type, usually Orange.feature.Discrete or Orange.feature.Continuous); equals outerVariable.varType and outerDistribution.varType.

__init__(outer_variable, inner_variable)

Construct an instance of contingency table for the given pair of variables.

Parameters:
  • outer_variable (Orange.feature.Descriptor) – Descriptor of the outer variable
  • outer_variable – Descriptor of the inner variable
add(outer_value, inner_value[, weight=1])

Add an element to the contingency table by adding weight to the corresponding cell.

Parameters:
  • outer_value (int, float, string or Orange.data.Value) – The value for the outer variable
  • inner_value (int, float, string or Orange.data.Value) – The value for the inner variable
  • weight (float) – Instance weight
normalize()

Normalize all distributions (rows) in the table to sum to 1:

>>> cont.normalize()
>>> for val, dist in cont.items():
       print val, dist

Output:

1 <0.000, 1.000>
2 <0.667, 0.333>
3 <0.667, 0.333>
4 <0.667, 0.333>

Note

This method does not change the innerDistribution or outerDistribution.

With respect to indexing, contingency table is a cross between dictionary and a list. It supports standard dictionary methods keys, values and items.

>> print cont.keys()
['1', '2', '3', '4']
>>> print cont.values()
[<0.000, 108.000>, <72.000, 36.000>, <72.000, 36.000>, <72.000, 36.000>]
>>> print cont.items()
[('1', <0.000, 108.000>), ('2', <72.000, 36.000>),
('3', <72.000, 36.000>), ('4', <72.000, 36.000>)]

Although keys returned by the above functions are strings, contingency can be indexed by anything that can be converted into values of the outer variable: strings, numbers or instances of Orange.data.Value.

>>> print cont[0]
<0.000, 108.000>
>>> print cont["1"]
<0.000, 108.000>
>>> print cont[orange.Value(data.domain["e"], "1")]

The length of the table equals the number of values of the outer variable. However, iterating through contingency does not return keys, as with dictionaries, but distributions.

>>> for i in cont:
    ... print i
<0.000, 108.000>
<72.000, 36.000>
<72.000, 36.000>
<72.000, 36.000>
<72.000, 36.000>
class Orange.statistics.contingency.Class

An abstract base class for contingency tables that contain the class, either as the inner or the outer variable.

classVar(read only)

The class attribute descriptor; always equal to either Table.innerVariable or :obj:Table.outerVariable.

variable

Variable; always equal either to either innerVariable or outerVariable

add_var_class(variable_value, class_value[, weight=1])

Add an element to contingency by increasing the corresponding count. The difference between this and Table.add is that the variable value is always the first argument and class value the second, regardless of which one is inner and which one is outer.

Parameters:
class Orange.statistics.contingency.VarClass

A class derived from Class in which the variable is used as Table.outerVariable and class as the Table.innerVariable. This form is a form suitable for computation of conditional class probabilities given the variable value.

Calling VarClass.add_var_class(v, c) is equivalent to Table.add(v, c). Similar as Table, VarClass can compute contingency from instances.

__init__(feature, class_variable)

Construct an instance of VarClass for the given pair of variables. Inherited from Table.

Parameters:
__init__(feature, data[, weightId])

Compute the contingency table from data.

Parameters:
p_class(value)

Return the probability distribution of classes given the value of the variable.

Parameters:value (int, float, string or Orange.data.Value) – The value of the variable
Return type:Orange.statistics.distribution.Distribution
p_class(value, class_value)

Returns the conditional probability of the class_value given the feature value, p(class_value|value) (note the order of arguments!)

Parameters:
  • value (int, float, string or Orange.data.Value) – The value of the variable
  • class_value – The class value
Return type:

float

import Orange.statistics.contingency

monks = Orange.data.Table("monks-1.tab")
cont = Orange.statistics.contingency.VarClass("e", monks)

print "Inner variable: ", cont.inner_variable.name
print "Outer variable: ", cont.outer_variable.name
print
print "Class variable: ", cont.class_var.name
print "Feature:      ", cont.variable.name
print

print "Distributions:"
for val in cont.variable:
    print "  p(.|%s) = %s" % (val.native(), cont.p_class(val))
print

first_class = Orange.data.Value(cont.class_var, 1)
first_native = first_class.native()
print "Probabilities of class '%s'" % first_native
for val in cont.variable:
    print "  p(%s|%s) = %5.3f" % (first_native, val.native(), 
                                  cont.p_class(val, first_class))

The inner and the outer variable and their relations to the class are as follows:

Inner variable:  y
Outer variable:  e

Class variable:  y
Feature:         e

Distributions are normalized, and probabilities are elements from the normalized distributions. Knowing that the target concept is y := (e=1) or (a=b), distributions are as expected: when e equals 1, class 1 has a 100% probability, while for the rest, probability is one third, which agrees with a probability that two three-valued independent features have the same value.

Distributions:
  p(.|1) = <0.000, 1.000>
  p(.|2) = <0.662, 0.338>
  p(.|3) = <0.659, 0.341>
  p(.|4) = <0.669, 0.331>

Probabilities of class '1'
  p(1|1) = 1.000
  p(1|2) = 0.338
  p(1|3) = 0.341
  p(1|4) = 0.331

Distributions from a matrix computed manually:
  p(.|1) = <0.000, 1.000>
  p(.|2) = <0.662, 0.338>
  p(.|3) = <0.659, 0.341>
  p(.|4) = <0.669, 0.331>
class Orange.statistics.contingency.ClassVar

ClassVar is similar to VarClass except that the class is outside and the variable is inside. This form of contingency table is suitable for computing conditional probabilities of variable given the class. All methods get the two arguments in the same order as VarClass.

__init__(feature, class_variable)

Construct an instance of VarClass for the given pair of variables. Inherited from Table, except for the reversed order of arguments.

Parameters:
__init__(feature, data[, weightId])

Compute contingency table from the data.

Parameters:
p_attr(class_value)

Return the probability distribution of variable given the class.

Parameters:class_value (int, float, string or Orange.data.Value) – The value of the variable
Return type:Orange.statistics.distribution.Distribution
p_attr(value, class_value)

Returns the conditional probability of the value given the class, p(value|class_value).

Parameters:
  • value (int, float, string or Orange.data.Value) – Value of the variable
  • class_value – Class value
Return type:

float

import Orange.statistics.contingency

monks = Orange.data.Table("monks-1.tab")
cont = Orange.statistics.contingency.ClassVar("e", monks)

print "Inner variable: ", cont.inner_variable.name
print "Outer variable: ", cont.outer_variable.name
print
print "Class variable: ", cont.class_var.name
print "Attribute:      ", cont.variable.name
print

print "Distributions:"
for val in cont.class_var:
    print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
print

first_value = Orange.data.Value(cont.variable, 0)
first_native = first_value.native()
print "Probabilities for e='%s'" % first_native
for val in cont.class_var:
    print "  p(%s|%s) = %5.3f" % (first_native, val.native(), cont.p_attr(first_value, val))
print

cont = Orange.statistics.contingency.ClassVar(monks.domain["e"], monks.domain.class_var)
for ins in monks:
    cont.add_var_class(ins["e"], ins.get_class())

The role of the feature and the class are reversed compared to ClassVar:

Inner variable:  e
Outer variable:  y

Class variable:  y
Feature:         e

Distributions given the class can be printed out by calling p_attr().

for val in cont.class_var:
    print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
will print::
p(.|0) = <0.000, 0.333, 0.333, 0.333> p(.|1) = <0.500, 0.167, 0.167, 0.167>

If the class value is ‘0’, the attribute e cannot be 1 (the first value), while distribution across other values is uniform. If the class value is 1, e is 1 for exactly half of instances, and distribution of other values is again uniform.

class Orange.statistics.contingency.VarVar

Contingency table in which none of the variables is the class. The class is derived from Table, and adds an additional constructor and method for getting conditional probabilities.

VarVar(outer_variable, inner_variable)

Inherited from Table.

__init__(outer_variable, inner_variable, data[, weightId])

Compute the contingency from the given instances.

Parameters:
p_attr(outer_value)

Return the probability distribution of the inner variable given the outer variable value.

Parameters:outer_value (int, float, string or Orange.data.Value) – The value of the outer variable
Return type:Orange.statistics.distribution.Distribution
p_attr(outer_value, inner_value)

Return the conditional probability of the inner_value given the outer_value.

Parameters:
  • outer_value (int, float, string or Orange.data.Value) – The value of the outer variable
  • inner_value (int, float, string or Orange.data.Value) – The value of the inner variable
Return type:

float

The following example investigates which material is used for bridges of different lengths.

import Orange

bridges = Orange.data.Table("bridges.tab")
cont = Orange.statistics.contingency.VarVar("SPAN", "MATERIAL", bridges)

print "Distributions:"
for val in cont.outer_variable:
    print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
print

cont.normalize()
for val in cont.outer_variable:
    print "%s:" % val.native()
    for inval, p in cont[val].items():
        if p:
            print "   %s (%i%%)" % (inval, int(100*p+0.5))
    print

Short bridges are mostly wooden or iron, and the longer (and most of the middle sized) are made from steel:

SHORT:
   WOOD (56%)
   IRON (44%)

MEDIUM:
   WOOD (9%)
   IRON (11%)
   STEEL (79%)

LONG:
   STEEL (100%)

As all other contingency tables, this one can also be computed “manually”.

cont = Orange.statistics.contingency.VarVar(bridges.domain["SPAN"], bridges.domain["MATERIAL"])
for ins in bridges:
    cont.add(ins["SPAN"], ins["MATERIAL"])

print "Distributions from a matrix computed manually:"
for val in cont.outer_variable:
    print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
print

Contingencies for entire domain

A list of contingency tables, either VarClass or ClassVar.

class Orange.statistics.contingency.Domain
__init__(data[, weight_id=0, class_outer=0|1])

Compute a list of contingency tables.

Parameters:
  • data (Orange.data.Table) – A set of instances
  • weight_id (int) – meta attribute with weights of instances
  • class_is_outer (bool) – True, if class is the outer variable

Note

class_is_outer needs to be given as keyword argument.

class_is_outer(read only)

Tells whether the class is the outer or the inner variable.

classes

Contains the distribution of class values on the entire dataset.

normalize()

Call normalize for all contingencies.

The following script prints the contingency tables for features “a”, “b” and “e” for the dataset Monk 1.

print "c: ", dc["e"]

Contingency tables of type VarClass give the conditional distributions of classes, given the value of the variable.

print "Distributions of feature values given the class value"
dc = Orange.statistics.contingency.Domain(monks, classIsOuter = 1)
print "a: ", dc["a"]
print "b: ", dc["b"]
print "c: ", dc["e"]
print

Contingency tables for continuous variables

If the outer variable is continuous, the index must be one of the values that do exist in the contingency table; other values raise an exception:

import Orange

iris = Orange.data.Table("iris.tab")
cont = Orange.statistics.contingency.VarClass(0, iris)
midkey = (cont.keys()[0] + cont.keys()[1])/2.0
print "cont[%5.3f] =" % midkey, cont[midkey]

Since even rounding can be a problem, the only safe way to get the key is to take it from from the contingencies’ keys.

Contingency tables with discrete outer variable and continuous inner variables are more useful, since methods ContingencyClassVar.p_class and ContingencyVarClass.p_attr use the primitive density estimation provided by Orange.statistics.distribution.Distribution.

For example, ClassVar on the iris dataset can return the probability of the sepal length 5.5 for different classes:

import Orange

iris = Orange.data.Table("iris")
cont = Orange.statistics.contingency.ClassVar("sepal length", iris)

print "Inner variable: ", cont.inner_variable.name
print "Outer variable: ", cont.outer_variable.name
print
print "Class variable: ", cont.class_var.name
print "Attribute:      ", cont.variable.name
print

print "Distributions:"
for val in cont.class_var:
    print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
print

print "Estimated for e=5.5"
for val in cont.class_var:
    print "  f(%s|%s) = %5.3f" % (5.5, val.native(), cont.p_attr(5.5, val))
print

cont = Orange.statistics.contingency.ClassVar(iris.domain["sepal length"], 
                                              iris.domain.class_var)
for ins in iris:
    cont.add_var_class(ins["sepal length"], ins.get_class())

print "Distributions from a matrix computed manually:"
for val in cont.class_var:
    print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
print

The script outputs:

Estimated frequencies for e=5.5
  f(5.5|Iris-setosa) = 2.000
  f(5.5|Iris-versicolor) = 5.000
  f(5.5|Iris-virginica) = 1.000

“”“