This function constructs the machine learning model, the output object can be used as input for plotting and further analyses.

ml_model(
  processed_se,
  char = "none",
  transform = c("none", "log10", "square", "cube"),
  ranking_method = c("p_value", "pvalue_FC", "ROC", "Random_forest", "SVM", "Lasso",
    "Ridge", "ElasticNet"),
  ml_method = c("Random_forest", "SVM", "Lasso", "Ridge", "ElasticNet", "xgboost"),
  split_prop = 0.3,
  nfold = 10,
  alpha = NULL
)

Arguments

processed_se

A SummarizedExperiment object constructed by as_summarized_experiment and processed by data_process.

char

Character list. Lipid characteristics selected from the ml_char list returned by list_lipid_char. Select 'none' to exclude all lipid characteristics.

transform

Character. Method for data transformation. Allowed methods include "none", "log10", "square", "cube". Select 'none' to skip data transformation. Default is 'log10'.

ranking_method

Character. The ranking method to be computed. Allowed methods include 'p_value', 'pvalue_FC', 'ROC', 'Random_forest', 'SVM', 'Lasso', 'Ridge', 'ElasticNet'. Default is 'Random_forest'.

ml_method

Character. The machine learning method to be computed. Allowed methods include 'Random_forest', 'SVM', 'Lasso', 'Ridge', 'ElasticNet', 'xgboost'. Default is 'Random_forest'.

split_prop

Numeric. The proportion of data to be retained for modeling/analysis. The range is 0.1 to 0.5. Default is 0.3.

nfold

Numeric. The number of fold that the original dataset is randomly partitioned into equal-sized subsamples. Must be a positive interger. Default is 10.

alpha

Numeric. The alpha value between 0 and 1 when choosing "ElasticNet" as ml_method. 0 is for Ridge and 1 is for Lasso. If "ElasticNet" is not selected as the ml_method, set the value to NULL.

Value

Return a SummarizedExperiment object containing analysis results.

Examples

data("ml_sub")
processed_se <- data_process(
    ml_sub, exclude_missing=TRUE, exclude_missing_pct=70,
    replace_na_method='min', replace_na_method_ref=0.5,
    normalization='Percentage')
char_list <- list_lipid_char(processed_se)
ml_se <- ml_model(
    processed_se, char=c("class","Total.DB"), transform='log10',
    ranking_method='Random_forest', ml_method='Random_forest', split_prop=0.3,
    nfold=5, alpha=NULL)
#> Processing CV fold 1
#> Processing CV fold 2
#> Processing CV fold 3
#> Processing CV fold 4
#> Processing CV fold 5