This is documentation for Orange 2.7. For the latest documentation, see Orange 3.
Lasso regression (lasso)¶
The lasso (least absolute shrinkage and selection operator) is a regularized version of least squares regression. It minimizes the sum of squared errors while also penalizing the L_1 norm (sum of absolute values) of the coefficients.
Concretely, the function that is minimized in Orange is:
\frac{1}{n}\|Xw - y\|_2^2 + \frac{\lambda}{m} \|w\|_1
Where X is a n \times m data matrix, y the vector of class values and w the regression coefficients to be estimated.
- class Orange.regression.lasso.LassoRegressionLearner(lasso_lambda=0.1, max_iter=20000, eps=1e-06, n_boot=0, n_perm=0, imputer=None, continuizer=None, name=Lasso)¶
Bases: Orange.regression.base.BaseRegressionLearner
Fits the lasso regression model using FISTA (Fast Iterative Shrinkage-Thresholding Algorithm).
- __call__(data, weight=None)¶
Parameters: - data (Orange.data.Table) – Training data.
- weight – Weights for instances. Not implemented yet.
- __init__(lasso_lambda=0.1, max_iter=20000, eps=1e-06, n_boot=0, n_perm=0, imputer=None, continuizer=None, name=Lasso)¶
Parameters: - lasso_lambda (float) – Regularization parameter.
- max_iter (int) – Maximum number of iterations for the optimization method.
- eps (float) – Stop optimization when improvements are lower than eps.
- n_boot (int) – Number of bootstrap samples used for non-parametric estimation of standard errors.
- n_perm (int) – Number of permuations used for non-parametric estimation of p-values.
- name (str) – Learner name.
- fista(X, y, l, lipschitz, w_init=None)¶
Fast Iterative Shrinkage-Thresholding Algorithm (FISTA).
- get_lipschitz(X)¶
Return the Lipschitz constant of \nabla f, where f(w) = \frac{1}{2}||Xw-y||^2.
- class Orange.regression.lasso.LassoRegression(domain=None, class_var=None, coef0=None, coefficients=None, std_errors=None, p_vals=None, model=None, mu_x=None)¶
Bases: Orange.classification.Classifier
Lasso regression predicts the value of the response variable based on the values of independent variables.
- coef0¶
Intercept (sample mean of the response variable).
- coefficients¶
Regression coefficients.
- std_errors¶
Standard errors of coefficient estimates for a fixed regularization parameter. The standard errors are estimated using the bootstrapping method.
- p_vals¶
List of p-values for the null hypotheses that the regression coefficients equal 0 based on a non-parametric permutation test.
- model¶
Dictionary with the statistical properties of the model: Keys - names of the independent variables Values - tuples (coefficient, standard error, p-value)
- mu_x¶
Sample mean of independent variables.
- __call__(instance, result_type=0)¶
Parameters: instance (Orange.data.Instance) – Data instance for which the value of the response variable will be predicted.
- to_string(skip_zero=True)¶
Pretty-prints a lasso regression model, i.e. estimated regression coefficients with standard errors and significances. Standard errors are obtained using the bootstrapping method and significances by a permuation test.
Parameters: skip_zero (bool) – If True, variables with estimated coefficient equal to 0 are omitted.
Utility functions¶
- Orange.regression.lasso.get_bootstrap_sample(data)¶
Generate a bootstrap sample of a given data set.
Parameters: data (Orange.data.Table) – the original data sample
- Orange.regression.lasso.permute_responses(data)¶
Permute values of the class (response) variable. The independence between independent variables and the response is obtained but the distribution of the response variable is kept.
Parameters: data (Orange.data.Table) – Original data.
Examples¶
To fit the regression parameters on housing data set use the following code:
housing = Orange.data.Table("housing")
learner = Orange.regression.lasso.LassoRegressionLearner(
lasso_lambda=1, n_boot=100, n_perm=100)
To predict values of the response for the first five instances:
for ins in housing[:5]:
print "Actual: %3.2f, predicted: %3.2f" % (
Output:
Actual: 24.00, predicted: 30.45
Actual: 21.60, predicted: 25.60
Actual: 34.70, predicted: 31.48
Actual: 33.40, predicted: 30.18
Actual: 36.20, predicted: 29.59
To see the fitted regression coefficients, print the model:
print classifier
Output:
Variable Coeff Est Std Error p
Intercept 22.533
CRIM -0.023 0.024 0.050 .
CHAS 1.970 1.331 0.040 *
NOX -4.226 2.944 0.010 *
RM 4.270 0.934 0.000 ***
DIS -0.373 0.170 0.010 *
PTRATIO -0.798 0.117 0.000 ***
B 0.007 0.003 0.020 *
LSTAT -0.519 0.102 0.000 ***
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 empty 1
For 5 variables the regression coefficient equals 0:
ZN, INDUS, AGE, RAD, TAX
Note that some of the regression coefficients are equal to 0.