dr_pca — dr_pca • LipidSigR

Principal Component Analysis (PCA) is a dimensionality reduction method that transforms data from a high-dimensional space into a low-dimensional space while retaining the original data's essential properties. This function calculates PCA using the classical prcomp function and visualizes the results.

Usage

dr_pca(
  processed_se,
  scaling = TRUE,
  centering = TRUE,
  clustering = c("kmeans", "kmedoids", "hclustering", "dbscan", "group_info"),
  cluster_num = 2,
  kmedoids_metric = NULL,
  distfun = NULL,
  hclustfun = NULL,
  eps = NULL,
  minPts = NULL,
  feature_contrib_pc = c(1, 2),
  plot_topN = 10
)

Arguments

processed_se

A SummarizedExperiment object constructed by as_summarized_experiment and processed by data_process. (NOTE: A SummarizedExperiment object generated by deSp_twoGroup, deChar_twoGroup, deSp_multiGroup, or deChar_multiGroup is also allowed.)

scaling

Logical. If scaling=TRUE, each block is standardized to zero means and unit variances. Default is TRUE.

centering

Logical. If centering=TRUE, the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of x can be supplied. The value is passed to scale. Default is TRUE.

clustering

Character. The method to be used for clustering. Allowed method include "kmeans", "kmedoids", "hclustering", "dbscan", "group_info". Default is "kmeans".

cluster_num

Numeric. The interpretation of cluster_num depends on the value of clustering:

"group_info": A positive integer equal to the number of groups.
"kmeans" or "kmedoids": A positive integer between 1 and (number of samples - 1).
"hclustering": A positive integer between 1 and the number of samples.
"dbscan": Should be NULL.

Default is 2.

kmedoids_metric

Character. The metric to be used for calculating dissimilarities between observations when choosing "kmedoids" as clustering method. Must be one of "euclidean" and "manhattan". If "kmedoids" is not selected as the clustering method, set the value to NULL.

distfun

Character. The distance measure to be used when choosing "hclustering" as clustering method. Allow method include "pearson", "kendall", "spearman", "euclidean", "manhattan", "maximum", "canberra", "binary", and "minkowski". If "hclustering" is not selected as the clustering method, set the value to NULL.

hclustfun

Character. The agglomeration method to be used when choosing "hclustering" as clustering method. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (=UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC), or "centroid" (= UPGMC). If "hclustering" is not selected as the clustering method, set the value to NULL.

eps

Numeric. The size of the epsilon neighborhood when choosing "dbscan" as clustering method. If "dbscan" is not selected as the clustering method, set the value to NULL.

minPts

Numeric. The number of minimum points in the eps region (for core points) when choosing "dbscan" as clustering method. If "dbscan" is not selected as the clustering method, set the value to NULL.

feature_contrib_pc

Numeric. The dimension(s) of interest.

plot_topN

Numeric. The number of top elements to be shown.

Value

Return a list with 2 tables, 4 interactive plots, and 4 static plots.

pca_rotated_data: a data frame of PCA rotated data
table_pca_contribution: a data frame, PCA contribution table
interactive_pca & static_pca: PCA plot
interactive_screePlot & static_screePlot: Scree plot of top n principle components
interactive_feature_contribution & static_feature_contribution: plot, the contribution of top N features of the user-defined principal components.
interactive_variablePlot & static_variablePlot: correlation circle plot(factor map) of PCA variables.

Examples

data("profiling_data")
processed_se <- data_process(profiling_data, exclude_missing=TRUE, exclude_missing_pct=70,
replace_na_method='min', replace_na_method_ref=0.5, normalization='Percentage', transform='log10')
result_pca <- dr_pca(processed_se, scaling=TRUE, centering=TRUE,
clustering='kmeans', cluster_num=2, kmedoids_metric=NULL, distfun=NULL,
hclustfun=NULL, eps=NULL, minPts=NULL, feature_contrib_pc=c(1,2), plot_topN=10)