This is documentation for Orange 2.7. For the latest documentation, see Orange 3.

# Lasso regression (lasso)¶

The lasso (least absolute shrinkage and selection operator) is a regularized version of least squares regression. It minimizes the sum of squared errors while also penalizing the L_1 norm (sum of absolute values) of the coefficients.

Concretely, the function that is minimized in Orange is:

\frac{1}{n}\|Xw - y\|_2^2 + \frac{\lambda}{m} \|w\|_1

Where X is a n \times m data matrix, y the vector of class values and w the regression coefficients to be estimated.

class Orange.regression.lasso.LassoRegressionLearner(lasso_lambda=0.1, max_iter=20000, eps=1e-06, n_boot=0, n_perm=0, imputer=None, continuizer=None, name=Lasso)

Fits the lasso regression model using FISTA (Fast Iterative Shrinkage-Thresholding Algorithm).

__call__(data, weight=None)
Parameters: data (Orange.data.Table) – Training data. weight – Weights for instances. Not implemented yet.
__init__(lasso_lambda=0.1, max_iter=20000, eps=1e-06, n_boot=0, n_perm=0, imputer=None, continuizer=None, name=Lasso)
Parameters: lasso_lambda (float) – Regularization parameter. max_iter (int) – Maximum number of iterations for the optimization method. eps (float) – Stop optimization when improvements are lower than eps. n_boot (int) – Number of bootstrap samples used for non-parametric estimation of standard errors. n_perm (int) – Number of permuations used for non-parametric estimation of p-values. name (str) – Learner name.
fista(X, y, l, lipschitz, w_init=None)

Fast Iterative Shrinkage-Thresholding Algorithm (FISTA).

get_lipschitz(X)

Return the Lipschitz constant of \nabla f, where f(w) = \frac{1}{2}||Xw-y||^2.

class Orange.regression.lasso.LassoRegression(domain=None, class_var=None, coef0=None, coefficients=None, std_errors=None, p_vals=None, model=None, mu_x=None)

Lasso regression predicts the value of the response variable based on the values of independent variables.

coef0

Intercept (sample mean of the response variable).

coefficients

Regression coefficients.

std_errors

Standard errors of coefficient estimates for a fixed regularization parameter. The standard errors are estimated using the bootstrapping method.

p_vals

List of p-values for the null hypotheses that the regression coefficients equal 0 based on a non-parametric permutation test.

model

Dictionary with the statistical properties of the model: Keys - names of the independent variables Values - tuples (coefficient, standard error, p-value)

mu_x

Sample mean of independent variables.

__call__(instance, result_type=0)
Parameters: instance (Orange.data.Instance) – Data instance for which the value of the response variable will be predicted.
to_string(skip_zero=True)

Pretty-prints a lasso regression model, i.e. estimated regression coefficients with standard errors and significances. Standard errors are obtained using the bootstrapping method and significances by a permuation test.

Parameters: skip_zero (bool) – If True, variables with estimated coefficient equal to 0 are omitted.

## Utility functions¶

Orange.regression.lasso.get_bootstrap_sample(data)

Generate a bootstrap sample of a given data set.

Parameters: data (Orange.data.Table) – the original data sample
Orange.regression.lasso.permute_responses(data)

Permute values of the class (response) variable. The independence between independent variables and the response is obtained but the distribution of the response variable is kept.

Parameters: data (Orange.data.Table) – Original data.

### Examples¶

To fit the regression parameters on housing data set use the following code:

housing = Orange.data.Table("housing")
learner = Orange.regression.lasso.LassoRegressionLearner(
lasso_lambda=1, n_boot=100, n_perm=100)


To predict values of the response for the first five instances:

for ins in housing[:5]:
print "Actual: %3.2f, predicted: %3.2f" % (


Output:

Actual: 24.00, predicted: 30.45
Actual: 21.60, predicted: 25.60
Actual: 34.70, predicted: 31.48
Actual: 33.40, predicted: 30.18
Actual: 36.20, predicted: 29.59

To see the fitted regression coefficients, print the model:

print classifier


Output:

  Variable  Coeff Est  Std Error          p
Intercept     22.533
CRIM     -0.023      0.024      0.050     .
CHAS      1.970      1.331      0.040     *
NOX     -4.226      2.944      0.010     *
RM      4.270      0.934      0.000   ***
DIS     -0.373      0.170      0.010     *
PTRATIO     -0.798      0.117      0.000   ***
B      0.007      0.003      0.020     *
LSTAT     -0.519      0.102      0.000   ***
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1 empty 1

For 5 variables the regression coefficient equals 0:
ZN, INDUS, AGE, RAD, TAX

Note that some of the regression coefficients are equal to 0.