Login

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

TE
Thibault ETIENNE
Feb 29, 2024

Public
Live task description

Latest version infos (V1)

Infos about this version

Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data


  1. The data is uniformly distributed on Riemannian manifold;
    1. The Riemannian metric is locally constant (or can be approximated as such);
      1. The manifold is locally connected.

        From these assumptions it is possible to model the manifold with a fuzzy topological structure. The embedding is found by searching for a low dimensional projection of the data that has the closest possible equivalent fuzzy topological structure.



        The details for the underlying mathematics can be found in our paper on ArXiv:


        McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018



        UMAP has several hyperparameters that can have a significant impact on the resulting embedding. In this notebook we will be covering the four major ones:


        • n_neighbors
          • This parameter controls how UMAP balances local versus global structure in the data. It does this by constraining the size of the local neighborhood UMAP will look at when attempting to learn the manifold structure of the data. This means that low values of n_neighbors will force UMAP to concentrate on very local structure (potentially to the detriment of the big picture), while large values will push UMAP to look at larger neighborhoods of each point when estimating the manifold structure of the data, losing fine detail structure for the sake of getting the broader of the data.
          • min_dist
            • The min_dist parameter controls how tightly UMAP is allowed to pack points together. It, quite literally, provides the minimum distance apart that points are allowed to be in the low dimensional representation. This means that low values of min_dist will result in clumpier embeddings. This can be useful if you are interested in clustering, or in finer topological structure. Larger values ofmin_dist will prevent UMAP from packing points together and will focus on the preservation of the broad topological structure instead.
            • n_components
              • As is standard for many scikit-learn dimension reduction algorithms UMAP provides a n_components parameter option that allows the user to determine the dimensionality of the reduced dimension space we will be embedding the data into. Unlike some other visualisation algorithms such as t-SNE, UMAP scales well in the embedding dimension, so you can use it for more than just visualisation in 2- or 3-dimensions
              • Metric

                Minkowski style metrics


                • euclidean
                  • manhattan
                    • chebyshev
                      • minkowski

                        Miscellaneous spatial metrics


                        • canberra
                          • braycurtis
                            • haversine

                              Normalized spatial metrics


                              • mahalanobis
                                • wminkowski
                                  • seuclidean

                                    Angular and correlation metrics


                                    • cosine
                                      • correlation

                                        Metrics for binary data


                                        • hamming
                                          • jaccard
                                            • dice
                                              • russellrao
                                                • kulsinski
                                                  • rogerstanimoto
                                                    • sokalmichener
                                                      • sokalsneath
                                                        • yule

                                                          For more information : https://umap-learn.readthedocs.io/en/latest/parameters.html






                                                          Input
                                                          Output
                                                          File or folder
                                                          Optional
                                                          File or folder
                                                          Parameters
                                                          Environment file
                                                          Code
                                                          Authors
                                                          TE
                                                          Thibault ETIENNE