Linear regression (linear)
- Utility functions

Linear regression (`linear`)¶

Linear regression is a statistical regression method which tries to predict a value of a continuous response (class) variable based on the values of several predictors. The model assumes that the response variable is a linear combination of the predictors, the task of linear regression is therefore to fit the unknown coefficients.

To fit the regression parameters on housing data set use the following code:

import Orange
housing = Orange.data.Table("housing")
learner = Orange.regression.linear.LinearRegressionLearner()
classifier = learner(housing)

class Orange.regression.linear.LinearRegressionLearner(name=linear regression, intercept=True, compute_stats=True, ridge_lambda=None, imputer=None, continuizer=None, use_vars=None, stepwise=False, add_sig=0.05, remove_sig=0.2, **kwds)¶

Fits the linear regression model, i.e. learns the regression parameters The class is derived from Orange.regression.base.BaseRegressionLearner which is used for preprocessing the data (continuization and imputation) before fitting the regression parameters.

__call__(table, weight=None, verbose=0)¶

Parameters:	table (`Orange.data.Table`) – data instances. weight (None or list of Orange.feature.Continuous which stores weights for instances) – the weights for instances. Default: None, i.e. all data instances are equally important in fitting the regression parameters

__init__(name=linear regression, intercept=True, compute_stats=True, ridge_lambda=None, imputer=None, continuizer=None, use_vars=None, stepwise=False, add_sig=0.05, remove_sig=0.2, **kwds)¶

Parameters:

name (string) – name of the linear model, default ‘linear regression’
intercept (bool) – if True, the intercept beta0 is included in the model
compute_stats (bool) – if True, statistical properties of the estimators (standard error, t-scores, significances) and statistical properties of the model (sum of squares, R2, adjusted R2) are computed
ridge_lambda (int or None) – if not None, ridge regression is performed with the given lambda parameter controlling the regularization
use_vars (list of Orange.feature.Descriptor or None) – the list of independent varaiables included in regression model. If None (default) all variables are used
stepwise (bool) – if True, stepwise regression based on F-test is performed. The significance parameters are add_sig and remove_sig
add_sig (float) – lower bound of significance for which the variable is included in regression model default value = 0.05
remove_sig (float) – upper bound of significance for which the variable is excluded from the regression model default value = 0.2

class Orange.regression.linear.LinearRegression(class_var=None, domain=None, coefficients=None, F=None, std_error=None, t_scores=None, p_vals=None, dict_model=None, fitted=None, residuals=None, m=None, n=None, mu_y=None, r2=None, r2adj=None, sst=None, sse=None, ssr=None, std_coefficients=None, intercept=None)¶

Linear regression predicts value of the response variable based on the values of independent variables.

F¶: F-statistics of the model.

coefficients¶: Regression coefficients stored in list. If the intercept is included the first item corresponds to the estimated intercept.

std_error¶: Standard errors of the coefficient estimator, stored in list.

t_scores¶: List of t-scores for the estimated regression coefficients.

p_vals¶: List of p-values for the null hypothesis that the regression coefficients equal 0 based on t-scores and two sided alternative hypothesis.

dict_model¶: Statistical properties of the model in a dictionary: Keys - names of the independent variables (or “Intercept”) Values - tuples (coefficient, standard error, t-value, p-value)

fitted¶: Estimated values of the dependent variable for all instances from the training table.

residuals¶: Differences between estimated and actual values of the dependent variable for all instances from the training table.

m¶: Number of independent (predictor) variables.

n¶: Number of instances.

mu_y¶: Sample mean of the dependent variable.

r2¶: Coefficient of determination.

r2adj¶: Adjusted coefficient of determination.

sst, sse, ssr: Total sum of squares, explained sum of squares and residual sum of squares respectively.

std_coefficients¶: Standardized regression coefficients.

__call__(instance, result_type=0)¶

Parameters:	instance (`Instance`) – data instance for which the value of the response variable will be predicted

__init__(class_var=None, domain=None, coefficients=None, F=None, std_error=None, t_scores=None, p_vals=None, dict_model=None, fitted=None, residuals=None, m=None, n=None, mu_y=None, r2=None, r2adj=None, sst=None, sse=None, ssr=None, std_coefficients=None, intercept=None)¶

Parameters:	model (`LinearRegressionLearner`) – fitted linear regression model

to_string()¶: Pretty-prints linear regression model, i.e. estimated regression coefficients with standard errors, t-scores and significances.

Utility functions¶

Orange.regression.linear.stepwise(table, weight, add_sig=0.05, remove_sig=0.2)¶

Performs stepwise linear regression: on table and returns the list of remaing independent variables which fit a significant linear regression model.coefficients

Parameters:

table (Orange.data.Table) – data instances.
weight (None or list of Orange.feature.Continuous which stores the weights) – the weights for instances. Default: None, i.e. all data instances are eqaully important in fitting the regression parameters
add_sig (float) – lower bound of significance for which the variable is included in regression model default value = 0.05
remove_sig (float) – upper bound of significance for which the variable is excluded from the regression model default value = 0.2

Examples¶

Prediction¶

Predict values of the first 5 data instances

# prediction for five data instances and comparison to actual values
for ins in housing[:5]:
    print "Actual: %3.2f, predicted: %3.2f " % (ins.get_class(), classifier(ins))

The output of this code is

Actual: 24.00, predicted: 30.00
Actual: 21.60, predicted: 25.03
Actual: 34.70, predicted: 30.57
Actual: 33.40, predicted: 28.61
Actual: 36.20, predicted: 27.94

Poperties of fitted model¶

Print regression coefficients with standard errors, t-scores, p-values and significances

print classifier

The code output is

 Variable  Coeff Est  Std Error    t-value          p      
Intercept     36.459      5.103      7.144      0.000   ***
     CRIM     -0.108      0.033     -3.287      0.001    **
       ZN      0.046      0.014      3.382      0.001   ***
    INDUS      0.021      0.061      0.334      0.738      
     CHAS      2.687      0.862      3.118      0.002    **
      NOX    -17.767      3.820     -4.651      0.000   ***
       RM      3.810      0.418      9.116      0.000   ***
      AGE      0.001      0.013      0.052      0.958      
      DIS     -1.476      0.199     -7.398      0.000   ***
      RAD      0.306      0.066      4.613      0.000   ***
      TAX     -0.012      0.004     -3.280      0.001    **
  PTRATIO     -0.953      0.131     -7.283      0.000   ***
        B      0.009      0.003      3.467      0.001   ***
    LSTAT     -0.525      0.051    -10.347      0.000   ***

Stepwise regression¶

To use stepwise regression initialize learner with stepwise=True. The upper and lower bound for significance are controlled with add_sig and remove_sig.

learner2 = Orange.regression.linear.LinearRegressionLearner(stepwise=True,
                                                           add_sig=0.05,
                                                           remove_sig=0.2)
classifier = learner2(housing)
print classifier

As you can see from the output, the non-significant coefficients have been removed from the model.