This is documentation for Orange 2.7. For the latest documentation, see Orange 3.

# Contingency table (contingency)¶

Contingency table contains conditional distributions. Unless explicitly ‘normalized’, they contain absolute frequencies, that is, the number of instances with a particular combination of two variables’ values. If they are normalized by dividing each cell by the row sum, the represent conditional probabilities of the column variable (here denoted as innerVariable) conditioned by the row variable (outerVariable).

Contingency tables are usually constructed for discrete variables. Tables for continuous variables have certain limitations described in a separate section.

The example below loads the monks-1 data set and prints out the conditional class distribution given the value of e.

```import Orange

monks = Orange.data.Table("monks-1.tab")
cont = Orange.statistics.contingency.VarClass("e", monks)
for val, dist in cont.items():
print val, dist
```

This code prints out:

```1 <0.000, 108.000>
2 <72.000, 36.000>
3 <72.000, 36.000>
4 <72.000, 36.000>```

Contingencies behave like lists of distributions (in this case, class distributions) indexed by values (of e, in this example). Distributions are, in turn indexed by values (class values, here). The variable e from the above example is called the outer variable, and the class is the inner. This can also be reversed. It is also possible to use features for both, outer and inner variable, so the table shows distributions of one variable’s values given the value of another. There is a corresponding hierarchy of classes: Table is a base class for VarVar (both variables are attributes) and Class (one variable is the class). The latter is the base class for VarClass and ClassVar.

The most commonly used of the above classes is VarClass which can compute and store conditional probabilities of classes given the feature value.

## Contingency tables¶

class Orange.statistics.contingency.Table

Provides a base class for storing and manipulating contingency tables. Although it is not abstract, it is seldom used directly but rather through more convenient derived classes described below.

outerVariable

Outer variable (Orange.feature.Descriptor) whose values are used as the first, outer index.

innerVariable

Inner variable(Orange.feature.Descriptor), whose values are used as the second, inner index.

outerDistribution

The marginal distribution (Distribution) of the outer variable.

innerDistribution

The marginal distribution (Distribution) of the inner variable.

innerDistributionUnknown

The distribution (distribution.Distribution) of the inner variable for instances for which the outer variable was undefined. This is the difference between the innerDistribution and (unconditional) distribution of inner variable.

varType

The type of the outer variable (Orange.feature.Type, usually Orange.feature.Discrete or Orange.feature.Continuous); equals outerVariable.varType and outerDistribution.varType.

__init__(outer_variable, inner_variable)

Construct an instance of contingency table for the given pair of variables.

Parameters: outer_variable (Orange.feature.Descriptor) – Descriptor of the outer variable outer_variable – Descriptor of the inner variable

Add an element to the contingency table by adding weight to the corresponding cell.

Parameters: outer_value (int, float, string or Orange.data.Value) – The value for the outer variable inner_value (int, float, string or Orange.data.Value) – The value for the inner variable weight (float) – Instance weight
normalize()

Normalize all distributions (rows) in the table to sum to 1:

```>>> cont.normalize()
>>> for val, dist in cont.items():
print val, dist
```

Output:

```1 <0.000, 1.000>
2 <0.667, 0.333>
3 <0.667, 0.333>
4 <0.667, 0.333>```

Note

This method does not change the innerDistribution or outerDistribution.

With respect to indexing, contingency table is a cross between dictionary and a list. It supports standard dictionary methods keys, values and items.

```>> print cont.keys()
['1', '2', '3', '4']
>>> print cont.values()
[<0.000, 108.000>, <72.000, 36.000>, <72.000, 36.000>, <72.000, 36.000>]
>>> print cont.items()
[('1', <0.000, 108.000>), ('2', <72.000, 36.000>),
('3', <72.000, 36.000>), ('4', <72.000, 36.000>)]```

Although keys returned by the above functions are strings, contingency can be indexed by anything that can be converted into values of the outer variable: strings, numbers or instances of Orange.data.Value.

```>>> print cont
<0.000, 108.000>
>>> print cont["1"]
<0.000, 108.000>
>>> print cont[orange.Value(data.domain["e"], "1")]
```

The length of the table equals the number of values of the outer variable. However, iterating through contingency does not return keys, as with dictionaries, but distributions.

```>>> for i in cont:
... print i
<0.000, 108.000>
<72.000, 36.000>
<72.000, 36.000>
<72.000, 36.000>
<72.000, 36.000>
```
class Orange.statistics.contingency.Class

An abstract base class for contingency tables that contain the class, either as the inner or the outer variable.

The class attribute descriptor; always equal to either Table.innerVariable or :obj:Table.outerVariable.

variable

Variable; always equal either to either innerVariable or outerVariable

Add an element to contingency by increasing the corresponding count. The difference between this and Table.add is that the variable value is always the first argument and class value the second, regardless of which one is inner and which one is outer.

Parameters: variable_value (int, float, string or Orange.data.Value) – Variable value class_value (int, float, string or Orange.data.Value) – Class value weight (float) – Instance weight
class Orange.statistics.contingency.VarClass

A class derived from Class in which the variable is used as Table.outerVariable and class as the Table.innerVariable. This form is a form suitable for computation of conditional class probabilities given the variable value.

Calling VarClass.add_var_class(v, c) is equivalent to Table.add(v, c). Similar as Table, VarClass can compute contingency from instances.

__init__(feature, class_variable)

Construct an instance of VarClass for the given pair of variables. Inherited from Table.

Parameters: feature (Orange.feature.Descriptor) – Outer variable class_attribute (Orange.feature.Descriptor) – Class variable; used as innerVariable
__init__(feature, data[, weightId])

Compute the contingency table from data.

Parameters: feature (Orange.feature.Descriptor) – Outer variable data (Orange.data.Table) – A set of instances weightId (int) – meta attribute with weights of instances
p_class(value)

Return the probability distribution of classes given the value of the variable.

Parameters: value (int, float, string or Orange.data.Value) – The value of the variable Orange.statistics.distribution.Distribution
p_class(value, class_value)

Returns the conditional probability of the class_value given the feature value, p(class_value|value) (note the order of arguments!)

Parameters: value (int, float, string or Orange.data.Value) – The value of the variable class_value – The class value float
```import Orange.statistics.contingency

monks = Orange.data.Table("monks-1.tab")
cont = Orange.statistics.contingency.VarClass("e", monks)

print "Inner variable: ", cont.inner_variable.name
print "Outer variable: ", cont.outer_variable.name
print
print "Class variable: ", cont.class_var.name
print "Feature:      ", cont.variable.name
print

print "Distributions:"
for val in cont.variable:
print "  p(.|%s) = %s" % (val.native(), cont.p_class(val))
print

first_class = Orange.data.Value(cont.class_var, 1)
first_native = first_class.native()
print "Probabilities of class '%s'" % first_native
for val in cont.variable:
print "  p(%s|%s) = %5.3f" % (first_native, val.native(),
cont.p_class(val, first_class))
```

The inner and the outer variable and their relations to the class are as follows:

```Inner variable:  y
Outer variable:  e

Class variable:  y
Feature:         e```

Distributions are normalized, and probabilities are elements from the normalized distributions. Knowing that the target concept is y := (e=1) or (a=b), distributions are as expected: when e equals 1, class 1 has a 100% probability, while for the rest, probability is one third, which agrees with a probability that two three-valued independent features have the same value.

```Distributions:
p(.|1) = <0.000, 1.000>
p(.|2) = <0.662, 0.338>
p(.|3) = <0.659, 0.341>
p(.|4) = <0.669, 0.331>

Probabilities of class '1'
p(1|1) = 1.000
p(1|2) = 0.338
p(1|3) = 0.341
p(1|4) = 0.331

Distributions from a matrix computed manually:
p(.|1) = <0.000, 1.000>
p(.|2) = <0.662, 0.338>
p(.|3) = <0.659, 0.341>
p(.|4) = <0.669, 0.331>```
class Orange.statistics.contingency.ClassVar

ClassVar is similar to VarClass except that the class is outside and the variable is inside. This form of contingency table is suitable for computing conditional probabilities of variable given the class. All methods get the two arguments in the same order as VarClass.

__init__(feature, class_variable)

Construct an instance of VarClass for the given pair of variables. Inherited from Table, except for the reversed order of arguments.

Parameters: feature (Orange.feature.Descriptor) – Outer variable class_variable (Orange.feature.Descriptor) – Class variable
__init__(feature, data[, weightId])

Compute contingency table from the data.

Parameters: feature (Orange.feature.Descriptor) – Descriptor of the outer variable data (Orange.data.Table) – A set of instances weightId (int) – meta attribute with weights of instances
p_attr(class_value)

Return the probability distribution of variable given the class.

Parameters: class_value (int, float, string or Orange.data.Value) – The value of the variable Orange.statistics.distribution.Distribution
p_attr(value, class_value)

Returns the conditional probability of the value given the class, p(value|class_value).

Parameters: value (int, float, string or Orange.data.Value) – Value of the variable class_value – Class value float
```import Orange.statistics.contingency

monks = Orange.data.Table("monks-1.tab")
cont = Orange.statistics.contingency.ClassVar("e", monks)

print "Inner variable: ", cont.inner_variable.name
print "Outer variable: ", cont.outer_variable.name
print
print "Class variable: ", cont.class_var.name
print "Attribute:      ", cont.variable.name
print

print "Distributions:"
for val in cont.class_var:
print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
print

first_value = Orange.data.Value(cont.variable, 0)
first_native = first_value.native()
print "Probabilities for e='%s'" % first_native
for val in cont.class_var:
print "  p(%s|%s) = %5.3f" % (first_native, val.native(), cont.p_attr(first_value, val))
print

cont = Orange.statistics.contingency.ClassVar(monks.domain["e"], monks.domain.class_var)
for ins in monks:
```

The role of the feature and the class are reversed compared to ClassVar:

```Inner variable:  e
Outer variable:  y

Class variable:  y
Feature:         e```

Distributions given the class can be printed out by calling p_attr().

```for val in cont.class_var:
print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
```
will print::
p(.|0) = <0.000, 0.333, 0.333, 0.333> p(.|1) = <0.500, 0.167, 0.167, 0.167>

If the class value is ‘0’, the attribute e cannot be 1 (the first value), while distribution across other values is uniform. If the class value is 1, e is 1 for exactly half of instances, and distribution of other values is again uniform.

class Orange.statistics.contingency.VarVar

Contingency table in which none of the variables is the class. The class is derived from Table, and adds an additional constructor and method for getting conditional probabilities.

VarVar(outer_variable, inner_variable)

Inherited from Table.

__init__(outer_variable, inner_variable, data[, weightId])

Compute the contingency from the given instances.

Parameters: outer_variable (Orange.feature.Descriptor) – Outer variable inner_variable (Orange.feature.Descriptor) – Inner variable data (Orange.data.Table) – A set of instances weightId (int) – meta attribute with weights of instances
p_attr(outer_value)

Return the probability distribution of the inner variable given the outer variable value.

Parameters: outer_value (int, float, string or Orange.data.Value) – The value of the outer variable Orange.statistics.distribution.Distribution
p_attr(outer_value, inner_value)

Return the conditional probability of the inner_value given the outer_value.

Parameters: outer_value (int, float, string or Orange.data.Value) – The value of the outer variable inner_value (int, float, string or Orange.data.Value) – The value of the inner variable float

The following example investigates which material is used for bridges of different lengths.

```import Orange

bridges = Orange.data.Table("bridges.tab")
cont = Orange.statistics.contingency.VarVar("SPAN", "MATERIAL", bridges)

print "Distributions:"
for val in cont.outer_variable:
print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
print

cont.normalize()
for val in cont.outer_variable:
print "%s:" % val.native()
for inval, p in cont[val].items():
if p:
print "   %s (%i%%)" % (inval, int(100*p+0.5))
print
```

Short bridges are mostly wooden or iron, and the longer (and most of the middle sized) are made from steel:

```SHORT:
WOOD (56%)
IRON (44%)

MEDIUM:
WOOD (9%)
IRON (11%)
STEEL (79%)

LONG:
STEEL (100%)```

As all other contingency tables, this one can also be computed “manually”.

```cont = Orange.statistics.contingency.VarVar(bridges.domain["SPAN"], bridges.domain["MATERIAL"])
for ins in bridges:

print "Distributions from a matrix computed manually:"
for val in cont.outer_variable:
print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
print
```

## Contingencies for entire domain¶

A list of contingency tables, either VarClass or ClassVar.

class Orange.statistics.contingency.Domain
__init__(data[, weight_id=0, class_outer=0|1])

Compute a list of contingency tables.

Parameters: data (Orange.data.Table) – A set of instances weight_id (int) – meta attribute with weights of instances class_is_outer (bool) – True, if class is the outer variable

Note

class_is_outer needs to be given as keyword argument.

Tells whether the class is the outer or the inner variable.

classes

Contains the distribution of class values on the entire dataset.

normalize()

Call normalize for all contingencies.

The following script prints the contingency tables for features “a”, “b” and “e” for the dataset Monk 1.

```print "c: ", dc["e"]
```

Contingency tables of type VarClass give the conditional distributions of classes, given the value of the variable.

```print "Distributions of feature values given the class value"
dc = Orange.statistics.contingency.Domain(monks, classIsOuter = 1)
print "a: ", dc["a"]
print "b: ", dc["b"]
print "c: ", dc["e"]
print
```

## Contingency tables for continuous variables¶

If the outer variable is continuous, the index must be one of the values that do exist in the contingency table; other values raise an exception:

```import Orange

iris = Orange.data.Table("iris.tab")
cont = Orange.statistics.contingency.VarClass(0, iris)
midkey = (cont.keys() + cont.keys())/2.0
print "cont[%5.3f] =" % midkey, cont[midkey]
```

Since even rounding can be a problem, the only safe way to get the key is to take it from from the contingencies’ keys.

Contingency tables with discrete outer variable and continuous inner variables are more useful, since methods ContingencyClassVar.p_class and ContingencyVarClass.p_attr use the primitive density estimation provided by Orange.statistics.distribution.Distribution.

For example, ClassVar on the iris dataset can return the probability of the sepal length 5.5 for different classes:

```import Orange

iris = Orange.data.Table("iris")
cont = Orange.statistics.contingency.ClassVar("sepal length", iris)

print "Inner variable: ", cont.inner_variable.name
print "Outer variable: ", cont.outer_variable.name
print
print "Class variable: ", cont.class_var.name
print "Attribute:      ", cont.variable.name
print

print "Distributions:"
for val in cont.class_var:
print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
print

print "Estimated for e=5.5"
for val in cont.class_var:
print "  f(%s|%s) = %5.3f" % (5.5, val.native(), cont.p_attr(5.5, val))
print

cont = Orange.statistics.contingency.ClassVar(iris.domain["sepal length"],
iris.domain.class_var)
for ins in iris:

print "Distributions from a matrix computed manually:"
for val in cont.class_var:
print "  p(.|%s) = %s" % (val.native(), cont.p_attr(val))
print
```

The script outputs:

```Estimated frequencies for e=5.5
f(5.5|Iris-setosa) = 2.000
f(5.5|Iris-versicolor) = 5.000
f(5.5|Iris-virginica) = 1.000```

“”“