This is documentation for Orange 2.7. For the latest documentation, see Orange 3.

Rule induction (rules)

Module rules implements supervised rule induction algorithms and rule-based classification methods. Rule induction is based on a comprehensive framework of components that can be modified or replaced. For ease of use, the module already provides multiple variations of CN2 induction algorithm.

CN2 algorithm

The use of rule learning algorithms is consistent with a typical learner usage in Orange:

rules-cn2.py

import Orange
# Read some data
titanic = Orange.data.Table("titanic")

# construct the learning algorithm and use it to induce a classifier
cn2_learner = Orange.classification.rules.CN2Learner()
cn2_clasifier = cn2_learner(titanic)

# ... or, in a single step.
cn2_classifier = Orange.classification.rules.CN2Learner(titanic)

# All rule-base classifiers can have their rules printed out like this:
for r in cn2_classifier.rules:
    print Orange.classification.rules.rule_to_string(r)
IF sex=['female'] AND status=['first'] AND age=['child'] THEN survived=yes<0.000, 1.000>
IF sex=['female'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 13.000>
IF sex=['male'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 11.000>
IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 140.000>
IF status=['first'] AND age=['child'] THEN survived=yes<0.000, 5.000>
IF sex=['male'] AND status=['second'] THEN survived=no<154.000, 14.000>
IF status=['crew'] AND sex=['female'] THEN survived=yes<3.000, 20.000>
IF status=['second'] THEN survived=yes<13.000, 80.000>
IF status=['third'] AND sex=['male'] AND age=['adult'] THEN survived=no<387.000, 75.000>
IF status=['crew'] THEN survived=no<670.000, 192.000>
IF age=['child'] AND sex=['male'] THEN survived=no<35.000, 13.000>
IF sex=['male'] THEN survived=no<118.000, 57.000>
IF age=['child'] THEN survived=no<17.000, 14.000>
IF TRUE THEN survived=no<89.000, 76.000>
class Orange.classification.rules.CN2Learner(evaluator=Evaluator_Entropy, beam_width=5, alpha=1)

Bases: Orange.classification.rules.RuleLearner

Classical CN2 inducer (Clark and Niblett; 1988) that constructs a set of ordered rules. Constructor returns either an instance of CN2Learner or, if training data is provided, a CN2Classifier.

Parameters:
  • evaluator (Evaluator) – an object that evaluates a rule from instances. By default, entropy is used as a measure.
  • beam_width (int) – width of the search beam.
  • alpha (float) – significance level of the likelihood ratio statistics to determine whether rule is better than the default rule.
class Orange.classification.rules.CN2Classifier(rules=None, instances=None, weight_id=0, **argkw)

Bases: Orange.classification.rules.RuleClassifier

Classical CN2 classifier (Clark and Niblett; 1988) that predicts a class from an ordered list of rules. The classifier is usually constructed by CN2Learner.

Parameters:
  • rules (List) – induced rules
  • instances (Orange.data.Table) – stored training data instances
  • weight_id (int) – ID of the weight meta-attribute.
__call__(instance, result_type=0)
Parameters:
  • instance (Orange.data.Instance) – instance to be classified.
  • result_typeOrange.classification.Classifier.GetValue or Orange.classification.Classifier.GetProbabilities or Orange.classification.Classifier.GetBoth
Return type:

Orange.data.Value, Orange.statistics.distribution.Distribution or a tuple with both

class Orange.classification.rules.CN2UnorderedLearner(evaluator=Evaluator_Laplace(), beam_width=5, alpha=1.0)

Bases: Orange.classification.rules.RuleLearner

Unordered CN2 (Clark and Boswell; 1991) induces a set of unordered rules. Learning rules is quite similar to learning in classical CN2, where the process of learning of rules is separated to learning rules for each class.

Constructor returns either an instance of CN2UnorderedLearner or, if training data is provided, a CN2UnorderedClassifier.

Parameters:
  • evaluator (Evaluator) – an object that evaluates a rule from covered instances. By default, Laplace’s rule of succession is used as a measure.
  • beam_width (int) – width of the search beam.
  • alpha (float) – significance level of the likelihood ratio statistics to determine whether rule is better than the default rule.
class Orange.classification.rules.CN2UnorderedClassifier(rules=None, instances=None, weight_id=0, **argkw)

Bases: Orange.classification.rules.RuleClassifier

Unordered CN2 classifier (Clark and Boswell; 1991) classifies an instance using a set of unordered rules. The classifier is typically constructed with CN2UnorderedLearner.

Parameters:
  • rules (RuleList) – induced rules
  • instances (Orange.data.Table) – stored training data instances
  • weight_id (int) – ID of the weight meta-attribute.
__call__(instance, result_type=0, ret_rules=False)

The call has another optional argument that is used to tell the classifier to also return the rules that cover the given data instance.

Parameters:
  • instance (Orange.data.Instance) – instance to be classified.
  • result_typeOrange.classification.Classifier.GetValue or Orange.classification.Classifier.GetProbabilities or Orange.classification.Classifier.GetBoth
Return type:

Orange.data.Value, Orange.statistics.distribution.Distribution or a tuple with both, and a list of rules if ret_rules is True

class Orange.classification.rules.CN2SDUnorderedLearner(evaluator=WRACCEvaluator(), beam_width=5, alpha=0.05, mult=0.7)

Bases: Orange.classification.rules.CN2UnorderedLearner

CN2-SD (Lavrac et al.; 2004) induces a set of unordered rules used by CN2UnorderedClassifier. CN2-SD differs from unordered CN2 by the default function and covering function: WRACCEvaluator computes weighted relative accuracy and CovererAndRemover_MultWeights decreases the weight of covered data instances instead of removing them.

Constructor returns either an instance of CN2SDUnorderedLearner or, if training data is provided, a CN2UnorderedClassifier.

Parameters:
  • evaluator (Evaluator) – an object that evaluates a rule from covered instances. By default, weighted relative accuracy is used.
  • beam_width (int) – width of the search beam.
  • alpha (float) – significance level of the likelihood ratio statistics to determine whether rule is better than the default rule.
  • mult (float) – multiplicator for weights of covered instances.
class Orange.classification.rules.CN2EVCUnorderedLearner(width=5, nsampling=100, rule_sig=1, att_sig=1, min_coverage=1, max_rule_complexity=5)

Bases: Orange.classification.rules.ABCN2

A learner similar to CN2-SD (CN2SDUnorderedLearner) except that it uses EVC for rule evaluation.

Parameters:
  • evaluator (Evaluator) – an object that evaluates a rule from covered instances. By default, weighted relative accuracy is used.
  • beam_width (int) – width of the search beam.
  • alpha (float) – significance level of the likelihood ratio statistics to determine whether rule is better than the default rule.
  • mult (float) – multiplicator for weights of covered instances.

Rule induction framework

The classes described above are based on a more general framework that can be fine-tuned to specific needs by replacing individual components. Here is an example:

part of rules-customized.py

import Orange

learner = Orange.classification.rules.RuleLearner()
learner.rule_finder = Orange.classification.rules.BeamFinder()
learner.rule_finder.evaluator = Orange.classification.rules.MEstimateEvaluator(m=50)

titanic =  Orange.data.Table("titanic")
classifier = learner(titanic)

for r in classifier.rules:
    print Orange.classification.rules.rule_to_string(r)
IF sex=['male'] AND status=['second'] AND age=['adult'] THEN survived=no<154.000, 14.000>
IF sex=['male'] AND status=['third'] AND age=['adult'] THEN survived=no<387.000, 75.000>
IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 141.000>
IF status=['crew'] AND sex=['male'] THEN survived=no<670.000, 192.000>
IF status=['second'] THEN survived=yes<13.000, 104.000>
IF status=['third'] AND sex=['male'] THEN survived=no<35.000, 13.000>
IF status=['first'] AND age=['adult'] THEN survived=no<118.000, 57.000>
IF status=['crew'] THEN survived=yes<3.000, 20.000>
IF sex=['female'] THEN survived=no<106.000, 90.000>
IF TRUE THEN survived=yes<0.000, 5.000>

In the example, we wanted to use a rule evaluator based on the m-estimate and set m to 50. The evaluator is a subcomponent of the rule_finder component. Thus, to be able to set the evaluator, we first set the rule_finder component, then we added the desired subcomponent and set its options. All other components, which are left unspecified, are provided by the learner at the training time and removed afterwards.

Continuing with the example, assume that we wish to set a different validation function and a different beam width.

part of rules-customized.py

learner.rule_finder.rule_stopping_validator = \
    Orange.classification.rules.Validator_LRS(alpha=0.01,
                             min_coverage=10, max_rule_complexity = 2)
learner.rule_finder.rule_filter = \
    Orange.classification.rules.BeamFilter_Width(width = 50)
class Orange.classification.rules.Rule(filter, classifier, lr, dist, ce, w = 0, qu = -1)

Represents a single rule. Constructor arguments correspond to the first seven of the attributes (from filter to quality) below.

filter

Rule condition; an instance of Orange.data.filter.Filter, typically an instance of a class derived from Orange.data.filter.Values

classifier

A rule predicts the class by calling an embedded classifier that must be an instance of Classifier, typically ConstantClassifier. This classifier is called by the rule classifier, such as RuleClassifier.

learner

Learner that is used for constructing a classifier. It must be an instance of Learner, typically MajorityLearner.

class_distribution

Distribution of class in data instances covered by this rule (Distribution).

instances

Data instances covered by this rule (Table).

weight_id

ID of the weight meta-attribute for the stored data instances (int).

quality

Quality of the rule. Rules with higher quality are better (float).

complexity

Complexity of the rule (float), typically the number of selectors (conditions) in the rule. Complexity is used for choosing between rules with the same quality; rules with lower complexity are preferred.

filter_and_store(instances, weight_id=0, target_class=-1)

Filter passed data instances and store them in instances. Also, compute class_distribution, set the weight of stored examples and create a new classifier using learner.

Parameters:
  • weight_id (int) – ID of the weight meta-attribute.
  • target_class (int) – index of target class; -1 for all classes.
__call__(instance)

Return True if the given instance matches the rule condition.

Parameters:instance (Orange.data.Instance) – data instance
__call__(instances, ref=True, negate=False)

Return a table of instances that match the rule condition.

Parameters:
  • instances (Orange.data.Table) – a table of data instances
  • ref (bool) – if True (default), the constructed table contains references to the original data instances; if False, the data is copied
  • negate (bool) – inverts the selection
class Orange.classification.rules.RuleLearner(store_instances=True, target_class=-1, base_rules=Orange.classification.rules.RuleList())

Bases: Orange.classification.Learner

A base rule induction learner. The algorithm follows separate-and-conquer strategy, which has its origins in the AQ family of algorithms (Fuernkranz J.; Separate-and-Conquer Rule Learning, Artificial Intelligence Review 13, 3-54, 1999). Such algorithms search for the optimal rule for the current training set, remove the covered training instances (separate) and repeat the process (conquer) on the remaining data.

Parameters:
  • store_instances (bool) – if True (default), the induced rules contain a table with references to the stored data instances.
  • target_class (int) – index of a specific class to learn; if -1 there is no target class
  • base_rules (RuleList) – An optional list of initial rules for constraining the rule_finder.

The class’ functionality is best explained by its __call__ function.

def __call__(self, instances, weight_id=0):
    rule_list = Orange.classification.rules.RuleList()
    all_instances = Orange.data.Table(instances)
    while not self.data_stopping(instances, weight_id, self.target_class):
        new_rule = self.rule_finder(instances, weight_id, self.target_class, self.base_rules)
        if self.rule_stopping(rule_list, new_rule, instances, weight_id):
            break
        instances, weight_id = self.cover_and_remove(new_rule, instances, weight_id, self.target_class)
        rule_list.append(new_rule)
    return Orange.classification.rules.RuleClassifier_FirstRule(
        rules=rule_list, instances=all_instances)

The customizable components are data_stopping, rule_finder, cover_and_remove and rule_stopping objects.

data_stopping

An instance of DataStoppingCriteria that determines whether to continue the induction. The default component, DataStoppingCriteria_NoPositives returns True if there are no more instances of the target class.

rule_finder

An instance of Finder that learns a single rule. Default is BeamFinder.

rule_stopping

An instance of StoppingCriteria that decides whether to use the induced rule or to discard it and stop the induction. If None (default) all rules are accepted.

cover_and_remove

An instance of RuleCovererAndRemover that removes instances covered by the rule and returns remaining instances. The default implementation (RuleCovererAndRemover_Default) removes the instances that belong to given target class; if the target is not specified (target_class == -1), it removes all covered instances.

Rule finders

class Orange.classification.rules.Finder

Base class for rule finders, which learn a single rule from instances.

__call__(table, weight_id, target_class, base_rules)

Induce a new rule from the given data.

Parameters:
  • table (Orange.data.Table) – training data instances
  • weight_id (int) – ID of the weight meta-attribute
  • target_class (int) – index of a specific class being learned; -1 for all.
  • base_rules (RuleList) – A list of initial rules for constraining the search space
class Orange.classification.rules.BeamFinder

Bases: Finder

Beam search for the best rule. This is the default finder for RuleLearner. Pseudo code of the algorithm is as follows.

def __call__(self, table, weight_id, target_class, base_rules):
    prior = Orange.statistics.distribution.Distribution(table.domain.class_var, table, weight_id)
    rules_star, best_rule = self.initializer(table, weight_id, target_class, base_rules, self.evaluator, prior)
    # compute quality of rules in rules_star and best_rule
    ...
    while len(rules_star) > 0:
        candidates, rules_star = self.candidate_selector(rules_star, table, weight_id)
        for cand in candidates:
            new_rules = self.refiner(cand, table, weight_id, target_class)
            for new_rule in new_rules:
                if self.rule_stopping_validator(new_rule, table, weight_id, target_class, cand.class_distribution):
                    new_rule.quality = self.evaluator(new_rule, table, weight_id, target_class, prior)
                    rules_star.append(new_rule)
                    if self.validator(new_rule, table, weight_id, target_class, prior) and
                        new_rule.quality > best_rule.quality:
                        best_rule = new_rule
        rules_star = self.rule_filter(rules_star, table, weight_id)
    return best_rule

Modifiable components are shown in bold. These are:

initializer

An instance of BeamInitializer that is used to construct the initial list of rules. The default, BeamInitializer_Default, returns base_rules, or a rule with no conditions if base_rules is not set.

candidate_selector

An instance of BeamCandidateSelector used to separate a subset of rules from the current rules_star that will be further specialized. The default component, an instance of BeamCandidateSelector_TakeAll, selects all rules.

refiner

An instance of BeamRefiner that is used for refining the rules. Refined rule should cover a strict subset of instances covered by the given rule. Default component (BeamRefiner_Selector) adds a conjunctive selector to selectors present in the rule.

rule_filter

An instance of BeamFilter that is used for filtering rules to trim the search beam. The default component, BeamFilter_Width(m=5), keeps the five best rules.

__call__(data, weight_id, target_class, base_rules)

Determines the optimal rule to cover the given data instances.

Parameters:
  • data (Orange.data.Table) – data instances.
  • weight_id (int) – index of the weight meta-attribute.
  • target_class (int) – index of the target class.
  • base_rules (RuleList) – existing rules

Rule evaluators

class Orange.classification.rules.Evaluator

Base class for rule evaluators that evaluate the quality of the rule based on the data instances they cover.

__call__(rule, instances, weight_id, target_class, prior)

Calculate a (non-negative) rule quality.

Parameters:
class Orange.classification.rules.LaplaceEvaluator

Bases: Orange.classification.rules.Evaluator

Laplace’s rule of succession.

class Orange.classification.rules.WRACCEvaluator

Bases: Orange.classification.rules.Evaluator

Weighted relative accuracy.

class Orange.classification.rules.Evaluator_Entropy

Bases: Evaluator

class Orange.classification.rules.Evaluator_LRS

Bases: Evaluator

class Orange.classification.rules.Evaluator_Laplace

Bases: Evaluator

class Orange.classification.rules.Evaluator_mEVC

Bases: Evaluator

Instance covering and removal

class Orange.classification.rules.RuleCovererAndRemover

Base class for rule coverers and removers that, when invoked, remove instances covered by the rule and return remaining instances.

class Orange.classification.rules.CovererAndRemover_MultWeights(mult=0.7)

Covering and removing of instances using weight multiplication.

Parameters:mult (float) – weighting multiplication factor
class Orange.classification.rules.CovererAndRemover_AddWeights

Covering and removing of instances using weight addition.

Miscellaneous functions

static rules.rule_to_string(rule, show_distribution=True)

Write a string presentation of rule in human readable format.

Parameters:
  • rule (Rule) – rule to pretty-print.
  • show_distribution (bool) – determines whether presentation should also contain the distribution of covered instances

References