t-distributed stochastic neighbor embedding (t-SNE) is a dimensionality reduction method that transforms data from a high-dimensional space into a low-dimensional space while retaining the original data's essential properties.

dr_tsne(
  processed_se,
  pca = TRUE,
  perplexity = 5,
  max_iter = 500,
  clustering = c("kmeans", "kmedoids", "hclustering", "dbscan", "group_info"),
  cluster_num = 2,
  kmedoids_metric = NULL,
  distfun = NULL,
  hclustfun = NULL,
  eps = NULL,
  minPts = NULL
)

Arguments

processed_se

A SummarizedExperiment object constructed by as_summarized_experiment and processed by data_process.

pca

Logical. If pca=TRUE, an initial PCA step will be performed. Default is TRUE.

perplexity

Numeric. Perplexity parameter (should not be bigger than 3 * perplexity < nrow(X) - 1.

max_iter

Integer. Number of iterations.

clustering

Character. The method to be used for clustering. Allowed method include "kmeans", "kmedoids", "hclustering", "dbscan", "group_info". Default is "kmeans".

cluster_num

Numeric. A positive integer specifying the number of clusters. The number must be between 1 and 10. Default is 2.

kmedoids_metric

Character. The metric to be used for calculating dissimilarities between observations when choosing "kmedoids" as clustering method. Must be one of "euclidean" and "manhattan". If "kmedoids" is not selected as the clustering method, set the value to NULL.

distfun

Character. The distance measure to be used when choosing "hclustering" as clustering method. Allow method include "pearson", "kendall", "spearman", "euclidean", "manhattan", "maximum", "canberra", "binary", and "minkowski". If "hclustering" is not selected as the clustering method, set the value to NULL.

hclustfun

Character. The agglomeration method to be used when choosing "hclustering" as clustering method. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (=UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC), or "centroid" (= UPGMC). If "hclustering" is not selected as the clustering method, set the value to NULL.

eps

Numeric. The size of the epsilon neighborhood when choosing "dbscan" as clustering method. If "dbscan" is not selected as the clustering method, set the value to NULL.

minPts

Numeric. The number of minimum points in the eps region (for core points) when choosing "dbscan" as clustering method. If "dbscan" is not selected as the clustering method, set the value to NULL.

Value

Return a list with 1 data frame, 1 interactive plot, and 1 static plot.

  1. tsne_result: a data frame of tsne data.

  2. interactive_tsne & static_tsne: tsne plot.

Examples

data("profiling_data")
processed_se <- data_process(profiling_data, exclude_missing=TRUE, exclude_missing_pct=70,
    replace_na_method='min', replace_na_method_ref=0.5, normalization='Percentage')
result_tsne <- dr_tsne(processed_se, pca=TRUE, perplexity=5, max_iter=500, clustering='kmeans',
    cluster_num=2, kmedoids_metric=NULL, distfun=NULL, hclustfun=NULL, eps=NULL, minPts=NULL)
#> Performing PCA
#> Read the 23 x 23 data matrix successfully!
#> OpenMP is working. 1 threads.
#> Using no_dims = 2, perplexity = 5.000000, and theta = 0.000000
#> Computing input similarities...
#> Symmetrizing...
#> Done in 0.03 seconds!
#> Learning embedding...
#> Iteration 50: error is 51.511642 (50 iterations in 0.00 seconds)
#> Iteration 100: error is 56.169578 (50 iterations in 0.00 seconds)
#> Iteration 150: error is 55.342230 (50 iterations in 0.00 seconds)
#> Iteration 200: error is 61.025258 (50 iterations in 0.00 seconds)
#> Iteration 250: error is 53.069303 (50 iterations in 0.00 seconds)
#> Iteration 300: error is 1.599018 (50 iterations in 0.00 seconds)
#> Iteration 350: error is 1.001601 (50 iterations in 0.00 seconds)
#> Iteration 400: error is 0.651798 (50 iterations in 0.00 seconds)
#> Iteration 450: error is 0.279978 (50 iterations in 0.00 seconds)
#> Iteration 500: error is 0.238895 (50 iterations in 0.00 seconds)
#> Fitting performed in 0.00 seconds.