This is documentation for Orange 2.7. For the latest documentation, see Orange 3.
Rule induction (rules)¶
Module rules implements supervised rule induction algorithms and rule-based classification methods. Rule induction is based on a comprehensive framework of components that can be modified or replaced. For ease of use, the module already provides multiple variations of CN2 induction algorithm.
CN2 algorithm¶
The use of rule learning algorithms is consistent with a typical learner usage in Orange:
import Orange
# Read some data
titanic = Orange.data.Table("titanic")
# construct the learning algorithm and use it to induce a classifier
cn2_learner = Orange.classification.rules.CN2Learner()
cn2_clasifier = cn2_learner(titanic)
# ... or, in a single step.
cn2_classifier = Orange.classification.rules.CN2Learner(titanic)
# All rule-base classifiers can have their rules printed out like this:
for r in cn2_classifier.rules:
print Orange.classification.rules.rule_to_string(r)
IF sex=['female'] AND status=['first'] AND age=['child'] THEN survived=yes<0.000, 1.000>
IF sex=['female'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 13.000>
IF sex=['male'] AND status=['second'] AND age=['child'] THEN survived=yes<0.000, 11.000>
IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 140.000>
IF status=['first'] AND age=['child'] THEN survived=yes<0.000, 5.000>
IF sex=['male'] AND status=['second'] THEN survived=no<154.000, 14.000>
IF status=['crew'] AND sex=['female'] THEN survived=yes<3.000, 20.000>
IF status=['second'] THEN survived=yes<13.000, 80.000>
IF status=['third'] AND sex=['male'] AND age=['adult'] THEN survived=no<387.000, 75.000>
IF status=['crew'] THEN survived=no<670.000, 192.000>
IF age=['child'] AND sex=['male'] THEN survived=no<35.000, 13.000>
IF sex=['male'] THEN survived=no<118.000, 57.000>
IF age=['child'] THEN survived=no<17.000, 14.000>
IF TRUE THEN survived=no<89.000, 76.000>
- class Orange.classification.rules.CN2Learner(evaluator=Evaluator_Entropy, beam_width=5, alpha=1)¶
Bases: Orange.classification.rules.RuleLearner
Classical CN2 inducer (Clark and Niblett; 1988) that constructs a set of ordered rules. Constructor returns either an instance of CN2Learner or, if training data is provided, a CN2Classifier.
Parameters:
- class Orange.classification.rules.CN2Classifier(rules=None, instances=None, weight_id=0, **argkw)¶
Bases: Orange.classification.rules.RuleClassifier
Classical CN2 classifier (Clark and Niblett; 1988) that predicts a class from an ordered list of rules. The classifier is usually constructed by CN2Learner.
Parameters: - rules (List) – induced rules
- instances (Orange.data.Table) – stored training data instances
- weight_id (int) – ID of the weight meta-attribute.
- __call__(instance, result_type=0)¶
Parameters: - instance (Orange.data.Instance) – instance to be classified.
- result_type – Orange.classification.Classifier.GetValue or Orange.classification.Classifier.GetProbabilities or Orange.classification.Classifier.GetBoth
Return type: Orange.data.Value, Orange.statistics.distribution.Distribution or a tuple with both
- class Orange.classification.rules.CN2UnorderedLearner(evaluator=Evaluator_Laplace(), beam_width=5, alpha=1.0)¶
Bases: Orange.classification.rules.RuleLearner
Unordered CN2 (Clark and Boswell; 1991) induces a set of unordered rules. Learning rules is quite similar to learning in classical CN2, where the process of learning of rules is separated to learning rules for each class.
Constructor returns either an instance of CN2UnorderedLearner or, if training data is provided, a CN2UnorderedClassifier.
Parameters: - evaluator (Evaluator) – an object that evaluates a rule from covered instances. By default, Laplace’s rule of succession is used as a measure.
- beam_width (int) – width of the search beam.
- alpha (float) – significance level of the likelihood ratio statistics to determine whether rule is better than the default rule.
- class Orange.classification.rules.CN2UnorderedClassifier(rules=None, instances=None, weight_id=0, **argkw)¶
Bases: Orange.classification.rules.RuleClassifier
Unordered CN2 classifier (Clark and Boswell; 1991) classifies an instance using a set of unordered rules. The classifier is typically constructed with CN2UnorderedLearner.
Parameters: - rules (RuleList) – induced rules
- instances (Orange.data.Table) – stored training data instances
- weight_id (int) – ID of the weight meta-attribute.
- __call__(instance, result_type=0, ret_rules=False)¶
The call has another optional argument that is used to tell the classifier to also return the rules that cover the given data instance.
Parameters: - instance (Orange.data.Instance) – instance to be classified.
- result_type – Orange.classification.Classifier.GetValue or Orange.classification.Classifier.GetProbabilities or Orange.classification.Classifier.GetBoth
Return type: Orange.data.Value, Orange.statistics.distribution.Distribution or a tuple with both, and a list of rules if ret_rules is True
- class Orange.classification.rules.CN2SDUnorderedLearner(evaluator=WRACCEvaluator(), beam_width=5, alpha=0.05, mult=0.7)¶
Bases: Orange.classification.rules.CN2UnorderedLearner
CN2-SD (Lavrac et al.; 2004) induces a set of unordered rules used by CN2UnorderedClassifier. CN2-SD differs from unordered CN2 by the default function and covering function: WRACCEvaluator computes weighted relative accuracy and CovererAndRemover_MultWeights decreases the weight of covered data instances instead of removing them.
Constructor returns either an instance of CN2SDUnorderedLearner or, if training data is provided, a CN2UnorderedClassifier.
Parameters: - evaluator (Evaluator) – an object that evaluates a rule from covered instances. By default, weighted relative accuracy is used.
- beam_width (int) – width of the search beam.
- alpha (float) – significance level of the likelihood ratio statistics to determine whether rule is better than the default rule.
- mult (float) – multiplicator for weights of covered instances.
- class Orange.classification.rules.CN2EVCUnorderedLearner(width=5, nsampling=100, rule_sig=1, att_sig=1, min_coverage=1, max_rule_complexity=5)¶
Bases: Orange.classification.rules.ABCN2
A learner similar to CN2-SD (CN2SDUnorderedLearner) except that it uses EVC for rule evaluation.
Parameters: - evaluator (Evaluator) – an object that evaluates a rule from covered instances. By default, weighted relative accuracy is used.
- beam_width (int) – width of the search beam.
- alpha (float) – significance level of the likelihood ratio statistics to determine whether rule is better than the default rule.
- mult (float) – multiplicator for weights of covered instances.
Rule induction framework¶
The classes described above are based on a more general framework that can be fine-tuned to specific needs by replacing individual components. Here is an example:
part of rules-customized.py
import Orange
learner = Orange.classification.rules.RuleLearner()
learner.rule_finder = Orange.classification.rules.BeamFinder()
learner.rule_finder.evaluator = Orange.classification.rules.MEstimateEvaluator(m=50)
titanic = Orange.data.Table("titanic")
classifier = learner(titanic)
for r in classifier.rules:
print Orange.classification.rules.rule_to_string(r)
IF sex=['male'] AND status=['second'] AND age=['adult'] THEN survived=no<154.000, 14.000>
IF sex=['male'] AND status=['third'] AND age=['adult'] THEN survived=no<387.000, 75.000>
IF sex=['female'] AND status=['first'] THEN survived=yes<4.000, 141.000>
IF status=['crew'] AND sex=['male'] THEN survived=no<670.000, 192.000>
IF status=['second'] THEN survived=yes<13.000, 104.000>
IF status=['third'] AND sex=['male'] THEN survived=no<35.000, 13.000>
IF status=['first'] AND age=['adult'] THEN survived=no<118.000, 57.000>
IF status=['crew'] THEN survived=yes<3.000, 20.000>
IF sex=['female'] THEN survived=no<106.000, 90.000>
IF TRUE THEN survived=yes<0.000, 5.000>
In the example, we wanted to use a rule evaluator based on the m-estimate and set m to 50. The evaluator is a subcomponent of the rule_finder component. Thus, to be able to set the evaluator, we first set the rule_finder component, then we added the desired subcomponent and set its options. All other components, which are left unspecified, are provided by the learner at the training time and removed afterwards.
Continuing with the example, assume that we wish to set a different validation function and a different beam width.
part of rules-customized.py
learner.rule_finder.rule_stopping_validator = \
Orange.classification.rules.Validator_LRS(alpha=0.01,
min_coverage=10, max_rule_complexity = 2)
learner.rule_finder.rule_filter = \
Orange.classification.rules.BeamFilter_Width(width = 50)
- class Orange.classification.rules.Rule(filter, classifier, lr, dist, ce, w = 0, qu = -1)¶
Represents a single rule. Constructor arguments correspond to the first seven of the attributes (from filter to quality) below.
- filter¶
Rule condition; an instance of Orange.data.filter.Filter, typically an instance of a class derived from Orange.data.filter.Values
- classifier¶
A rule predicts the class by calling an embedded classifier that must be an instance of Classifier, typically ConstantClassifier. This classifier is called by the rule classifier, such as RuleClassifier.
- learner¶
Learner that is used for constructing a classifier. It must be an instance of Learner, typically MajorityLearner.
- class_distribution¶
Distribution of class in data instances covered by this rule (Distribution).
- weight_id¶
ID of the weight meta-attribute for the stored data instances (int).
- quality¶
Quality of the rule. Rules with higher quality are better (float).
- complexity¶
Complexity of the rule (float), typically the number of selectors (conditions) in the rule. Complexity is used for choosing between rules with the same quality; rules with lower complexity are preferred.
- filter_and_store(instances, weight_id=0, target_class=-1)¶
Filter passed data instances and store them in instances. Also, compute class_distribution, set the weight of stored examples and create a new classifier using learner.
Parameters:
- __call__(instance)¶
Return True if the given instance matches the rule condition.
Parameters: instance (Orange.data.Instance) – data instance
- __call__(instances, ref=True, negate=False)
Return a table of instances that match the rule condition.
Parameters: - instances (Orange.data.Table) – a table of data instances
- ref (bool) – if True (default), the constructed table contains references to the original data instances; if False, the data is copied
- negate (bool) – inverts the selection
- class Orange.classification.rules.RuleLearner(store_instances=True, target_class=-1, base_rules=Orange.classification.rules.RuleList())¶
Bases: Orange.classification.Learner
A base rule induction learner. The algorithm follows separate-and-conquer strategy, which has its origins in the AQ family of algorithms (Fuernkranz J.; Separate-and-Conquer Rule Learning, Artificial Intelligence Review 13, 3-54, 1999). Such algorithms search for the optimal rule for the current training set, remove the covered training instances (separate) and repeat the process (conquer) on the remaining data.
Parameters: - store_instances (bool) – if True (default), the induced rules contain a table with references to the stored data instances.
- target_class (int) – index of a specific class to learn; if -1 there is no target class
- base_rules (RuleList) – An optional list of initial rules for constraining the rule_finder.
The class’ functionality is best explained by its __call__ function.
def __call__(self, instances, weight_id=0): rule_list = Orange.classification.rules.RuleList() all_instances = Orange.data.Table(instances) while not self.data_stopping(instances, weight_id, self.target_class): new_rule = self.rule_finder(instances, weight_id, self.target_class, self.base_rules) if self.rule_stopping(rule_list, new_rule, instances, weight_id): break instances, weight_id = self.cover_and_remove(new_rule, instances, weight_id, self.target_class) rule_list.append(new_rule) return Orange.classification.rules.RuleClassifier_FirstRule( rules=rule_list, instances=all_instances)
The customizable components are data_stopping, rule_finder, cover_and_remove and rule_stopping objects.
- data_stopping¶
An instance of DataStoppingCriteria that determines whether to continue the induction. The default component, DataStoppingCriteria_NoPositives returns True if there are no more instances of the target class.
- rule_finder¶
An instance of Finder that learns a single rule. Default is BeamFinder.
- rule_stopping¶
An instance of StoppingCriteria that decides whether to use the induced rule or to discard it and stop the induction. If None (default) all rules are accepted.
- cover_and_remove¶
An instance of RuleCovererAndRemover that removes instances covered by the rule and returns remaining instances. The default implementation (RuleCovererAndRemover_Default) removes the instances that belong to given target class; if the target is not specified (target_class == -1), it removes all covered instances.
Rule finders¶
- class Orange.classification.rules.Finder¶
Base class for rule finders, which learn a single rule from instances.
- __call__(table, weight_id, target_class, base_rules)¶
Induce a new rule from the given data.
Parameters: - table (Orange.data.Table) – training data instances
- weight_id (int) – ID of the weight meta-attribute
- target_class (int) – index of a specific class being learned; -1 for all.
- base_rules (RuleList) – A list of initial rules for constraining the search space
- class Orange.classification.rules.BeamFinder¶
Bases: Finder
Beam search for the best rule. This is the default finder for RuleLearner. Pseudo code of the algorithm is as follows.
def __call__(self, table, weight_id, target_class, base_rules): prior = Orange.statistics.distribution.Distribution(table.domain.class_var, table, weight_id) rules_star, best_rule = self.initializer(table, weight_id, target_class, base_rules, self.evaluator, prior) # compute quality of rules in rules_star and best_rule ... while len(rules_star) > 0: candidates, rules_star = self.candidate_selector(rules_star, table, weight_id) for cand in candidates: new_rules = self.refiner(cand, table, weight_id, target_class) for new_rule in new_rules: if self.rule_stopping_validator(new_rule, table, weight_id, target_class, cand.class_distribution): new_rule.quality = self.evaluator(new_rule, table, weight_id, target_class, prior) rules_star.append(new_rule) if self.validator(new_rule, table, weight_id, target_class, prior) and new_rule.quality > best_rule.quality: best_rule = new_rule rules_star = self.rule_filter(rules_star, table, weight_id) return best_rule
Modifiable components are shown in bold. These are:
- initializer¶
An instance of BeamInitializer that is used to construct the initial list of rules. The default, BeamInitializer_Default, returns base_rules, or a rule with no conditions if base_rules is not set.
- candidate_selector¶
An instance of BeamCandidateSelector used to separate a subset of rules from the current rules_star that will be further specialized. The default component, an instance of BeamCandidateSelector_TakeAll, selects all rules.
- refiner¶
An instance of BeamRefiner that is used for refining the rules. Refined rule should cover a strict subset of instances covered by the given rule. Default component (BeamRefiner_Selector) adds a conjunctive selector to selectors present in the rule.
- rule_filter¶
An instance of BeamFilter that is used for filtering rules to trim the search beam. The default component, BeamFilter_Width(m=5), keeps the five best rules.
- __call__(data, weight_id, target_class, base_rules)¶
Determines the optimal rule to cover the given data instances.
Parameters: - data (Orange.data.Table) – data instances.
- weight_id (int) – index of the weight meta-attribute.
- target_class (int) – index of the target class.
- base_rules (RuleList) – existing rules
Rule evaluators¶
- class Orange.classification.rules.Evaluator¶
Base class for rule evaluators that evaluate the quality of the rule based on the data instances they cover.
- __call__(rule, instances, weight_id, target_class, prior)¶
Calculate a (non-negative) rule quality.
Parameters: - rule (Rule) – rule to evaluate
- instances (Orange.data.Table) – data instances covered by the rule
- weight_id (int) – index of the weight meta-attribute
- target_class (int) – index of target class of this rule
- prior (Orange.statistics.distribution.Distribution) – prior class distribution
- class Orange.classification.rules.LaplaceEvaluator¶
Bases: Orange.classification.rules.Evaluator
Laplace’s rule of succession.
- class Orange.classification.rules.WRACCEvaluator¶
Bases: Orange.classification.rules.Evaluator
Weighted relative accuracy.
Instance covering and removal¶
- class Orange.classification.rules.RuleCovererAndRemover¶
Base class for rule coverers and removers that, when invoked, remove instances covered by the rule and return remaining instances.
- class Orange.classification.rules.CovererAndRemover_MultWeights(mult=0.7)¶
Covering and removing of instances using weight multiplication.
Parameters: mult (float) – weighting multiplication factor
- class Orange.classification.rules.CovererAndRemover_AddWeights¶
Covering and removing of instances using weight addition.
Miscellaneous functions¶
References¶
- Clark, Niblett. The CN2 Induction Algorithm. Machine Learning 3(4):261–284, 1989.
- Clark, Boswell. Rule Induction with CN2: Some Recent Improvements. In Machine Learning - EWSL-91. Proceedings of the European Working Session on Learning, pp 151–163, Porto, Portugal, March 1991.
- Lavrac, Kavsek, Flach, Todorovski: Subgroup Discovery with CN2-SD. Journal of Machine Learning Research 5: 153-188, 2004.