Uniform Manifold Approximation and Projection (UMAP) is a dimensionality reduction method that transforms data from a high-dimensional space into a low-dimensional space while retaining the original data's essential properties.
Usage
dr_umap(
processed_se,
n_neighbors = 15,
scaling = c("none", FALSE, NULL, "Z", "scale", TRUE, "maxabs", "range", "colrange"),
umap_metric = c("euclidean", "cosine", "manhattan", "hamming", "categorical"),
clustering = c("kmeans", "kmedoids", "hclustering", "dbscan", "group_info"),
cluster_num = 2,
kmedoids_metric = NULL,
distfun = NULL,
hclustfun = NULL,
eps = NULL,
minPts = NULL
)
Arguments
- processed_se
A SummarizedExperiment object constructed by
as_summarized_experiment
and processed bydata_process
. (NOTE: A SummarizedExperiment object generated bydeSp_twoGroup
,deChar_twoGroup
,deSp_multiGroup
, ordeChar_multiGroup
is also allowed.)- n_neighbors
Numeric. The size of local neighborhood (in terms of number of neighboring sample points) used for manifold approximation.
- scaling
Logical/Character. Scaling to apply to X if it is a data frame or matrix:
"none"
orFALSE
orNULL
: No scaling."Z"
or"scale"
orTRUE
: Scale each column to zero mean and variance 1."maxabs"
: Center each column to mean 0, then divide each element by the maximum absolute value over the entire matrix."range"
: Range scale the entire matrix, so the smallest element is 0 and the largest is 1."colrange"
: Scale each column in the range (0,1).
Default is
TRUE
.- umap_metric
Character. Type of distance metric to use to find nearest neighbors. One of "euclidean", "cosine", "manhattan", "hamming", "categorical". Default is
'euclidean'
.- clustering
Character. The method to be used for clustering. Allowed method include "kmeans", "kmedoids", "hclustering", "dbscan", "group_info". Default is
"kmeans"
.- cluster_num
Numeric. A positive integer specifying the number of clusters. The number must be between 1 and 10. Default is
2
.- kmedoids_metric
Character. The metric to be used for calculating dissimilarities between observations when choosing
"kmedoids"
as clustering method. Must be one of "euclidean" and "manhattan". If "kmedoids" is not selected as the clustering method, set the value to NULL.- distfun
Character. The distance measure to be used when choosing
"hclustering"
as clustering method. Allow method include "pearson", "kendall", "spearman", "euclidean", "manhattan", "maximum", "canberra", "binary", and "minkowski". If "hclustering" is not selected as the clustering method, set the value to NULL.- hclustfun
Character. The agglomeration method to be used when choosing
"hclustering"
as clustering method. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (=UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC), or "centroid" (= UPGMC). If "hclustering" is not selected as the clustering method, set the value to NULL.- eps
Numeric. The size of the epsilon neighborhood when choosing
"dbscan"
as clustering method. If "dbscan" is not selected as the clustering method, set the value to NULL.- minPts
Numeric. The number of minimum points in the eps region (for core points) when choosing
"dbscan"
as clustering method. If "dbscan" is not selected as the clustering method, set the value to NULL.
Value
Return a list with 1 data frame, 1 interactive plot, and 1 static plot.
umap_result: a data frame of UMAP data.
interactive_umap & static_umap: UMAP plot
Examples
data("profiling_data")
processed_se <- data_process(profiling_data, exclude_missing=TRUE, exclude_missing_pct=70,
replace_na_method='min', replace_na_method_ref=0.5, normalization='Percentage', transform='log10')
result_umap <- dr_umap(processed_se, n_neighbors=15, scaling=TRUE, umap_metric='euclidean',
clustering='kmeans', cluster_num=2, kmedoids_metric=NULL,
distfun=NULL, hclustfun=NULL, eps=NULL, minPts=NULL)