Lasso regression (lasso)
- Utility functions
  - Examples

Lasso regression (`lasso`)¶

The lasso (least absolute shrinkage and selection operator) is a regularized version of least squares regression. It minimizes the sum of squared errors while also penalizing the L_1 norm (sum of absolute values) of the coefficients.

Concretely, the function that is minimized in Orange is:

\frac{1}{n}\|Xw - y\|_2^2 + \frac{\lambda}{m} \|w\|_1

Where X is a n \times m data matrix, y the vector of class values and w the regression coefficients to be estimated.

class Orange.regression.lasso.LassoRegressionLearner(lasso_lambda=0.1, max_iter=20000, eps=1e-06, n_boot=0, n_perm=0, imputer=None, continuizer=None, name=Lasso)¶

Bases: Orange.regression.base.BaseRegressionLearner

Fits the lasso regression model using FISTA (Fast Iterative Shrinkage-Thresholding Algorithm).

__call__(data, weight=None)¶

Parameters:	data (`Orange.data.Table`) – Training data. weight – Weights for instances. Not implemented yet.

__init__(lasso_lambda=0.1, max_iter=20000, eps=1e-06, n_boot=0, n_perm=0, imputer=None, continuizer=None, name=Lasso)¶

Parameters:

lasso_lambda (float) – Regularization parameter.
max_iter (int) – Maximum number of iterations for the optimization method.
eps (float) – Stop optimization when improvements are lower than eps.
n_boot (int) – Number of bootstrap samples used for non-parametric estimation of standard errors.
n_perm (int) – Number of permuations used for non-parametric estimation of p-values.
name (str) – Learner name.

fista(X, y, l, lipschitz, w_init=None)¶: Fast Iterative Shrinkage-Thresholding Algorithm (FISTA).

get_lipschitz(X)¶: Return the Lipschitz constant of \nabla f, where f(w) = \frac{1}{2}||Xw-y||^2.

class Orange.regression.lasso.LassoRegression(domain=None, class_var=None, coef0=None, coefficients=None, std_errors=None, p_vals=None, model=None, mu_x=None)¶

Bases: Orange.classification.Classifier

Lasso regression predicts the value of the response variable based on the values of independent variables.

coef0¶: Intercept (sample mean of the response variable).

coefficients¶: Regression coefficients.

std_errors¶: Standard errors of coefficient estimates for a fixed regularization parameter. The standard errors are estimated using the bootstrapping method.

p_vals¶: List of p-values for the null hypotheses that the regression coefficients equal 0 based on a non-parametric permutation test.

model¶: Dictionary with the statistical properties of the model: Keys - names of the independent variables Values - tuples (coefficient, standard error, p-value)

mu_x¶: Sample mean of independent variables.

__call__(instance, result_type=0)¶

Parameters:	instance (`Orange.data.Instance`) – Data instance for which the value of the response variable will be predicted.

to_string(skip_zero=True)¶

Pretty-prints a lasso regression model, i.e. estimated regression coefficients with standard errors and significances. Standard errors are obtained using the bootstrapping method and significances by a permuation test.

Parameters:	skip_zero (bool) – If True, variables with estimated coefficient equal to 0 are omitted.

Utility functions¶

Orange.regression.lasso.get_bootstrap_sample(data)¶

Generate a bootstrap sample of a given data set.

Parameters:	data (`Orange.data.Table`) – the original data sample

Orange.regression.lasso.permute_responses(data)¶

Permute values of the class (response) variable. The independence between independent variables and the response is obtained but the distribution of the response variable is kept.

Parameters:	data (`Orange.data.Table`) – Original data.

Examples¶

To fit the regression parameters on housing data set use the following code:

housing = Orange.data.Table("housing")
learner = Orange.regression.lasso.LassoRegressionLearner(
    lasso_lambda=1, n_boot=100, n_perm=100)

To predict values of the response for the first five instances:

for ins in housing[:5]:
    print "Actual: %3.2f, predicted: %3.2f" % (

Output:

Actual: 24.00, predicted: 30.45
Actual: 21.60, predicted: 25.60
Actual: 34.70, predicted: 31.48
Actual: 33.40, predicted: 30.18
Actual: 36.20, predicted: 29.59

To see the fitted regression coefficients, print the model:

print classifier

Output:

  Variable  Coeff Est  Std Error          p
 Intercept     22.533
      CRIM     -0.023      0.024      0.050     .
      CHAS      1.970      1.331      0.040     *
       NOX     -4.226      2.944      0.010     *
        RM      4.270      0.934      0.000   ***
       DIS     -0.373      0.170      0.010     *
   PTRATIO     -0.798      0.117      0.000   ***
         B      0.007      0.003      0.020     *
     LSTAT     -0.519      0.102      0.000   ***
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1 empty 1

For 5 variables the regression coefficient equals 0:
ZN, INDUS, AGE, RAD, TAX

Note that some of the regression coefficients are equal to 0.

Lasso regression (lasso)¶

Utility functions¶

Examples¶

Lasso regression (`lasso`)¶