Projection (projection)

PCA

Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

Example

>>> from Orange.projection import PCA
>>> from Orange.data import Table
>>> iris = Table('iris')
>>> pca = PCA()
>>> model = pca(iris)
>>> model.components_    # PCA components
array([[ 0.36158968, -0.08226889,  0.85657211,  0.35884393],
    [ 0.65653988,  0.72971237, -0.1757674 , -0.07470647],
    [-0.58099728,  0.59641809,  0.07252408,  0.54906091],
    [ 0.31725455, -0.32409435, -0.47971899,  0.75112056]])
>>> transformed_data = model(iris)    # transformed data
>>> transformed_data
[[-2.684, 0.327, -0.022, 0.001 | Iris-setosa],
[-2.715, -0.170, -0.204, 0.100 | Iris-setosa],
[-2.890, -0.137, 0.025, 0.019 | Iris-setosa],
[-2.746, -0.311, 0.038, -0.076 | Iris-setosa],
[-2.729, 0.334, 0.096, -0.063 | Iris-setosa],
...
]
class Orange.projection.pca.PCA(n_components=None, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', random_state=None, preprocessors=None)[source]

A wrapper for Orange.projection.pca.ImprovedPCA. The following is its documentation:

Patch sklearn PCA learner to include randomized PCA for sparse matrices.

Scikit-learn does not currently support sparse matrices at all, even though efficient methods exist for PCA. This class patches the default scikit-learn implementation to properly handle sparse matrices.

Notes

  • This should be removed once scikit-learn releases a version which implements this functionality.

class Orange.projection.pca.SparsePCA(n_components=None, alpha=1, ridge_alpha=0.01, max_iter=1000, tol=1e-08, method='lars', n_jobs=1, U_init=None, V_init=None, verbose=False, random_state=None, preprocessors=None)[source]

A wrapper for sklearn.decomposition._sparse_pca.SparsePCA. The following is its documentation:

Sparse Principal Components Analysis (SparsePCA).

Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.

Read more in the User Guide.

class Orange.projection.pca.IncrementalPCA(n_components=None, whiten=False, copy=True, batch_size=None, preprocessors=None)[source]

A wrapper for sklearn.decomposition._incremental_pca.IncrementalPCA. The following is its documentation:

Incremental principal components analysis (IPCA).

Linear dimensionality reduction using Singular Value Decomposition of the data, keeping only the most significant singular vectors to project the data to a lower dimensional space. The input data is centered but not scaled for each feature before applying the SVD.

Depending on the size of the input data, this algorithm can be much more memory efficient than a PCA, and allows sparse input.

This algorithm has constant memory complexity, on the order of batch_size * n_features, enabling use of np.memmap files without loading the entire file into memory. For sparse matrices, the input is converted to dense in batches (in order to be able to subtract the mean) which avoids storing the entire dense matrix at any one time.

The computational overhead of each SVD is O(batch_size * n_features ** 2), but only 2 * batch_size samples remain in memory at a time. There will be n_samples / batch_size SVD computations to get the principal components, versus 1 large SVD of complexity O(n_samples * n_features ** 2) for PCA.

Read more in the User Guide.

New in version 0.16.

FreeViz

FreeViz uses a paradigm borrowed from particle physics: points in the same class attract each other, those from different class repel each other, and the resulting forces are exerted on the anchors of the attributes, that is, on unit vectors of each of the dimensional axis. The points cannot move (are projected in the projection space), but the attribute anchors can, so the optimization process is a hill-climbing optimization where at the end the anchors are placed such that forces are in equilibrium.

Example

>>> from Orange.projection import FreeViz
>>> from Orange.data import Table
>>> iris = Table('iris')
>>> freeviz = FreeViz()
>>> model = freeviz(iris)
>>> model.components_    # FreeViz components
array([[  3.83487853e-01,   1.38777878e-17],
   [ -6.95058218e-01,   7.18953457e-01],
   [  2.16525357e-01,  -2.65741729e-01],
   [  9.50450079e-02,  -4.53211728e-01]])
>>> transformed_data = model(iris)    # transformed data
>>> transformed_data
[[-0.157, 2.053 | Iris-setosa],
[0.114, 1.694 | Iris-setosa],
[-0.123, 1.864 | Iris-setosa],
[-0.048, 1.740 | Iris-setosa],
[-0.265, 2.125 | Iris-setosa],
...
]
class Orange.projection.freeviz.FreeViz(weights=None, center=True, scale=True, dim=2, p=1, initial=None, maxiter=500, alpha=0.1, gravity=None, atol=1e-05, preprocessors=None)[source]

LDA

Linear discriminant analysis is another way of finding a linear transformation of data that reduces the number of dimensions required to represent it. It is often used for dimensionality reduction prior to classification, but can also be used as a classification technique itself ([1]).

Example

>>> from Orange.projection import LDA
>>> from Orange.data import Table
>>> iris = Table('iris')
>>> lda = LDA()
>>> model = LDA(iris)
>>> model.components_    # LDA components
array([[ 0.20490976,  0.38714331, -0.54648218, -0.71378517],
   [ 0.00898234,  0.58899857, -0.25428655,  0.76703217],
   [-0.71507172,  0.43568045,  0.45568731, -0.30200008],
   [ 0.06449913, -0.35780501, -0.42514529,  0.828895  ]])
>>> transformed_data = model(iris)    # transformed data
>>> transformed_data
[[1.492, 1.905 | Iris-setosa],
[1.258, 1.608 | Iris-setosa],
[1.349, 1.750 | Iris-setosa],
[1.180, 1.639 | Iris-setosa],
[1.510, 1.963 | Iris-setosa],
...
]
class Orange.projection.lda.LDA(solver='svd', shrinkage=None, priors=None, n_components=None, store_covariance=False, tol=0.0001, preprocessors=None)[source]

A wrapper for sklearn.discriminant_analysis.LinearDiscriminantAnalysis. The following is its documentation:

Linear Discriminant Analysis.

A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes' rule.

The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.

The fitted model can also be used to reduce the dimensionality of the input by projecting it to the most discriminative directions, using the transform method.

New in version 0.17: LinearDiscriminantAnalysis.

Read more in the User Guide.

References