This is documentation for Orange 2.7. For the latest documentation, see Orange 3.
Partial least sqaures regression (PLS)¶
Partial least squares regression is a statistical method for simultaneous prediction of multiple response variables. Orange’s implementation is based on Scikit learn python implementation.
The following code shows how to fit a PLS regression model on a multi-target data set.
import Orange
data = Orange.data.Table("multitarget-synthetic.tab")
learner = Orange.regression.pls.PLSRegressionLearner()
classifier = learner(data)
- class Orange.regression.pls.PLSRegressionLearner(n_comp=2, deflation_mode=regression, mode=PLS, algorithm=nipals, max_iter=500, imputer=None, continuizer=None, **kwds)¶
Fit the partial least squares regression model, i.e. learn the regression parameters. The implementation is based on Scikit learn python implementation
The class is derived from Orange.regression.base.BaseRegressionLearner that is used for preprocessing the data (continuization and imputation) before fitting the regression parameters
- __call__(table, weight_id=None, x_vars=None, y_vars=None)¶
Parameters: - table (Orange.data.Table) – data instances.
- y_vars (x_vars,) – List of input and response variables (Orange.feature.Continuous or Orange.feature.Discrete). If None (default) it is assumed that the data domain provides information which variables are reponses and which are not. If data has class_var defined in its domain, a single-target regression learner is constructed. Otherwise a multi-target learner predicting response variables defined by class_vars is constructed.
- __init__(n_comp=2, deflation_mode=regression, mode=PLS, algorithm=nipals, max_iter=500, imputer=None, continuizer=None, **kwds)¶
- n_comp¶
number of components to keep (default: 2)
- deflation_mode¶
“canonical” or “regression” (default)
- mode¶
“CCA” or “PLS” (default)
- algorithm¶
The algorithm for estimating the weights: “nipals” or “svd” (default)
- fit(X, Y)¶
Fit all unknown parameters, i.e. weights, scores, loadings (for x and y) and regression coefficients. Return a dict with all of the parameters.
- class Orange.regression.pls.PLSRegression(domain=None, multitarget=False, coefs=None, sigma_x=None, sigma_y=None, mu_x=None, mu_y=None, x_vars=None, y_vars=None, **kwargs)¶
Predict values of the response variables based on the values of independent variables.
Basic notations: n - number of data instances p - number of independent variables q - number of reponse variables
- T¶
A n x n_comp numpy array of x-scores
- U¶
A n x n_comp numpy array of y-scores
- W¶
A p x n_comp numpy array of x-weights
- C¶
A q x n_comp numpy array of y-weights
- P¶
A p x n_comp numpy array of x-loadings
- Q¶
A q x n_comp numpy array of y-loading
- coefs¶
A p x q numpy array coefficients of the linear model: Y = X coefs + E
- x_vars¶
Predictor variables
- y_vars¶
Response variables
- __call__(instance, result_type=0)¶
Parameters: instance (Orange.data.Instance) – data instance for which the value of the response variable will be predicted
- to_string()¶
Pretty-prints the coefficient of the PLS regression model.
Utility functions¶
- Orange.regression.pls.normalize_matrix(X)¶
Normalize a matrix column-wise: subtract the means and divide by standard deviations. Returns the standardized matrix, sample mean and standard deviation
Parameters: X (numpy.array) – data matrix
- Orange.regression.pls.nipals_xy(X, Y, mode=PLS, max_iter=500, tol=1e-06)¶
NIPALS algorithm; returns the first left and rigth singular vectors of X’Y.
Parameters:
- Orange.regression.pls.svd_xy(X, Y)¶
Return the first left and right singular vectors of X’Y.
Parameters: Y (X,) – data matrix
Examples¶
The following code predicts the values of output variables for the first two instances in data.
print "Prediction for the first 2 data instances: \n"
for d in data[:2]:
print "Actual ", d.get_classes()
print "Predicted ", classifier(d)
print
Actual [<orange.Value 'Y1'='0.490'>, <orange.Value 'Y2'='1.237'>, <orange.Value 'Y3'='1.808'>, <orange.Value 'Y4'='0.422'>]
Predicted [<orange.Value 'Y1'='0.613'>, <orange.Value 'Y2'='0.826'>, <orange.Value 'Y3'='1.084'>, <orange.Value 'Y4'='0.534'>]
Actual [<orange.Value 'Y1'='0.167'>, <orange.Value 'Y2'='-0.664'>, <orange.Value 'Y3'='-1.378'>, <orange.Value 'Y4'='0.589'>]
Predicted [<orange.Value 'Y1'='0.058'>, <orange.Value 'Y2'='-0.706'>, <orange.Value 'Y3'='-1.420'>, <orange.Value 'Y4'='0.599'>]
To see the coefficient of the model, print the model:
print 'Regression coefficients:\n', classifier
Regression coefficients:
Y1 Y2 Y3 Y4
X1 0.714 2.153 3.590 -0.078
X2 -0.238 -2.500 -4.797 -0.036
X3 0.230 -0.314 -0.880 -0.060
Note that coefficients are stored in a matrix since the model predicts values of multiple outputs.