FAQ 1: What is LipidSig? What is different from LipidSigv1 to v2?
FAQ 2: What is the workflow of LipidSig?
FAQ 3: How do I begin using LipidSig?
FAQ 4: Does LipidSig provide demo dataset?
FAQ 5: How do I prepare my dataset for analysis?
FAQ 6: How can I obtain the required input for network and enrichment analysis?
FAQ 7: How to upload my dataset?
FAQ 8: What kinds of lipidome specific analyses that LipidSig provides?
FAQ 9: When to use these analyses? What is the difference?
Results from "Differential Expression" are further analyzed using the "GATOM network", which isolates significant subnetworks within a constructed metabolite-level network. The "Pathway activity network" computes flux changes in the lipid reaction network, facilitating the identification of active or suppressed pathways. Lastly, the "Lipid reaction network" graphically represents significant lipid classes/species within lipid biosynthesis pathways.
FAQ 10: Can I download these figures? How to manipulate interactive figures in LipidSig?
FAQ 11: How is ID conversion performed? How can I understand the lipid characteristics converted from it?
Aspect | Characteristic | Description | Source | Differential expression analysis | Enrichment analysis |
---|---|---|---|---|---|
Lipid classification | Class | Lipid class abbreviation. | Goslin | v | v |
Fatty acid properties | Total.FA | Specification of total fatty acid chains. | Goslin | v | v |
Fatty acid properties | Total.C | Total number of chain lengths. | Goslin | v | v |
Fatty acid properties | Total.DB | Total number of double bonds. | Goslin | v | v |
Fatty acid properties | Total.OH | Total number of hydroxyl groups. | Goslin | v | v |
Fatty acid properties | FA | Specification of fatty acid chains. | Goslin | v | v |
Fatty acid properties | FA.C | The count of chain lengths in each specific chain. | Goslin | v | v |
Fatty acid properties | FA.DB | The count of double bonds in each specific chain. | Goslin | v | v |
Fatty acid properties | FA.OH | The count of hydroxyl groups in each specific chain. | Goslin | v | v |
Lipid classification | Category | Category in the LIPIDMAPS classification system. | LION | v | v |
Lipid classification | Main.Class | Main class in the LIPIDMAPS classification system. | LION | v | v |
Lipid classification | Sub.Class | Sub class in the LIPIDMAPS classification system. | LION | v | v |
Cellular component | Cellular.Component | The predominant organellar localization of lipids. | LION | v | v |
Function | Function | The biological functions of lipids. | LION | v | v |
Physical or chemical properties | Bond.type | The types of chemical bonds in lipids. | LION | v | v |
Physical or chemical properties | Headgroup.Charge | The net electric charge of the lipid head group. | LION | v | v |
Physical or chemical properties | Lateral.Diffusion | The lateral movement of lipids. | LION | v | v |
Physical or chemical properties | Bilayer.Thickness | The thickness of a lipid bilayer. | LION | v | v |
Physical or chemical properties | Intrinsic.Curvature | The intrinsic or spontaneous curvature of lipids. | LION | v | v |
Physical or chemical properties | Transition.Temperature | The temperature at which a lipid undergoes a phase transition from a more ordered state (gel phase) to a more disordered state (liquid-crystalline phase), or vice versa. | LION | v | v |
Fatty acid properties | FA.Unsaturation.Category1 | Fatty acids with <2 double bonds or >=2 double bonds. | LION | v | v |
Fatty acid properties | FA.Unsaturation.Category2 | Fatty acids with 0, 1, 2, >2 or >5 double bonds. | LION | v | v |
Fatty acid properties | FA.Chain.Length.Category1 | Fatty acids with <=18 carbons or >18 carbons. | LION | v | v |
Fatty acid properties | FA.Chain.Length.Category2 | Fatty acids with <13, 13~15, 16~18, 19~21, 22~24 or >24 carbons. | LION | v | v |
Fatty acid properties | FA.Chain.Length.Category3 | Fatty acids with 2~6 (short-chain), 7~12 (median-chain), 13~21 (long-chain) or >21 (very long-chain) carbons. | J Biochem. 2012 Nov;152(5):387-95., J Lipid Res. 2016 Jun;57(6):943-54. | v | v |
Specific ratios | Chains Ether/Ester linked ratio | Assessing the frequency of ether-linked chains within different lipid classes. | LipidOne | v | |
Specific ratios | Chains odd/even ratio | Evaluating the odd/even chain ratio across various lipid classes. | LipidOne | v | |
Specific ratios | Ratio of Lysophospholipids to Phospholipids | The ratio of lysophospholipids to phospholipids, encompassing both total phospholipids and each individual phospholipid. | J Clin Invest. 2021 Apr 15;131(8):e135963. | v | |
Specific ratios | Ratio of specific lipid class A to lipid class B | The ratio of PC/PE and TG/DG. | Dev Cell. 2018 May 21;45(4):481-495.e8., Hepatology. 2007 Oct;46(4):1081-90. | v |
FAQ 12: What formats should lipid names (features) adhere to for analysis?
FAQ 13: How to choose the appropriate data processing method? What do these methods perform?
Aspect | Methods | Usage | Ref. |
---|---|---|---|
Missing values | Remove features with > 70% missing values (default threshold) | When the integrity of the dataset is paramount, removing features with a high proportion of missing values ensures that the analysis is based on more complete and reliable data. This step reduces the potential noise and bias that could be introduced by imputing a large amount of missing data for those features. For statistical analyses that require complete or near-complete data, such as multivariate analysis, correlation studies, or regression modeling, removing features with excessive missingness helps maintain the analysis's robustness. It prevents the undue influence of imputed values, which might not accurately represent the actual biological variation. | |
Missing values | Replace with half of the minimum positive values in the original data (default threshold) | For lipid species that are known to be present in the sample but are at concentrations too low for accurate quantification, this imputation method allows for their inclusion in the analysis without artificially inflating their estimated concentrations. This can be particularly important in studies focusing on trace lipids or in comparative analyses where the presence/absence of certain lipids may be significant. | (1) |
Missing values | Replace by mean | When the lipid concentrations across samples are approximately normally distributed without significant skewness or outliers, replacing missing values with the mean provides a reasonable estimate that maintains the overall distributional characteristics of the data. | (1) |
Missing values | Replace by median | In lipidomics datasets where the distribution of lipid concentrations is skewed or non-normal, replacing missing values with the median is preferred over the mean. The median is less sensitive to outliers and extreme values, providing a more representative central tendency measure for skewed data. | (1) |
Missing values | Quantile Regression Imputation of Left-Censored data (QRILC) | QRILC is a statistical method designed to handle left-censored data where observations are below measuring instruments' detecting concentrations or sensitivity threshold. This approach helps to provide a more accurate statistical analysis of the entire dataset by preserving its distributional characteristics and dealing with extreme values and asymmetric distributions. | (2) |
Missing values | Singular Value Decomposition | When the objective is to uncover and utilize latent patterns within the lipidomics data for imputation, SVD provides a mathematical framework. SVD can capture these relationships in its decomposition, enabling a more nuanced approach to imputation that considers the multivariate structure of the data. | (1) |
Missing values | K-Nearest Neighbours | When the dataset has a moderate to low level of missing data, KNN imputation can be effectively applied. The method requires enough complete or nearly complete cases to find neighbors with similar profiles for accurate imputation. KNN is particularly effective when missing values are randomly distributed across the dataset. The method relies on the premise that the pattern of missingness is not systematic but random. | (1) |
Missing values | IRMI: The Iterative Robust Model-based Imputation | IRMI is designed to be an efficient and robust approach for imputing missing values, especially when dealing with datasets with outliers or irregular distributions. It is particularly useful in complex datasets where missing data are not random and might depend on other variables in the dataset. | (3) |
Missing values | Probabilistic principal component analysis | PPCA can effectively handle dimensionality, providing a principled way to estimate missing values by capturing the major trends and patterns in the data. It models the underlying structure of the data, using a probabilistic framework to estimate missing values, making it well-suited for datasets where the relationships among variables are not straightforward. | (1) |
Missing values | Principal Component Analysis | PCA-based imputation is most suitable for datasets where the level of missingness is not overwhelmingly high. This approach assumes that the observed data can represent the dataset's underlying structure well enough to predict missing values accurately. PCA for missing value imputation is most effective when missing values occur randomly. | (1) |
Sample normalization | Percentage | In experiments where variations in sample volume, concentration, or extraction efficiency are expected, percentage normalization compensates for these variations. By converting lipid concentrations to percentages, the method ensures that the analysis reflects differences in lipid composition, independent of the total amount of lipids extracted. | |
Sample normalization | Perform Probabilistic Quotient Normalization (PQN) | PQN is designed to correct for multiplicative (i.e., proportional) variations across samples, such as those due to differences in dilution, concentration, or volume. It is not intended to correct additive errors or biases. | (4) |
Sample normalization | Normalization by sum | When the objective is to compare lipid profiles across various samples, normalization by sum ensures that differences in lipid concentrations are not merely due to variations in the total lipid content of the samples. By normalizing to the total sum, each sample is adjusted to have the same total lipid content, allowing for meaningful comparisons of individual lipid species relative abundances. In studies where lipidomics data are collected in multiple batches or from different platforms, resulting in systematic variations, normalization by sum can help mitigate these batch effects by standardizing the total lipid content across all batches. This is crucial for integrating data cohesively and ensuring that observed differences are due to biological variability rather than technical discrepancies. | (1) |
Sample normalization | Normalization by median | When the distribution of lipid concentrations across samples is skewed, normalization by median can help mitigate the influence of extreme values or outliers. The median provides a more robust central tendency measure, making it suitable for datasets with skewed distributions. | (1) |
Sample normalization | Quantile normalization (suggested only for > 1000 features) | Quantile normalization is utilized when systematic biases across samples that may arise from differences in sample preparation, handling, or measurement technologies need to be corrected. This normalization ensures that the overall distribution of lipid concentrations is consistent across samples, making them directly comparable. In studies comparing lipid profiles across different conditions, treatments, or time points, quantile normalization helps minimize the impact of technical variability. | (1) |
Sample normalization | Mean centering | Mean centering is used when the objective is to emphasize the variability of each lipid species relative to the average behavior within the dataset. | (1) |
Sample normalization | Auto scaling (mean-centered and divided by the standard deviation of each variable) | Auto-scaling is applied when variables within the dataset span different units of measurement or magnitudes, making them incomparable. By standardizing, all variables are brought to a common scale with a mean of 0 and a standard deviation of 1, facilitating direct comparisons across variables. | (1) |
Sample normalization | Pareto scaling (mean-centered and divided by the square root of the standard deviation of each variable) | Pareto scaling is a preprocessing technique used in lipidomics data analysis under specific conditions that require a balance between retaining the data's original structure and reducing the influence of high-magnitude variables. It is chosen when an intermediate scaling level is needed between no scaling and auto-scaling | (1) |
Sample normalization | Range scaling (mean-centered and divided by the range of each variable) | When the analysis involves comparing measurements across different lipid species that may have vastly different ranges of concentrations, range scaling ensures that all variables contribute equally to the analysis. When the goal is to highlight relative changes across samples rather than absolute concentrations, range scaling allows for identifying patterns that are not biased by the absolute magnitude of the measurements. | (1) |
Sample normalization | Variance Stabilizing Transformation (VAST) scaling | The technique is most applicable when the data exhibit heteroscedasticity, meaning the variance differs across the range of data. VAST scaling is ideal for studies that involve comparisons across multiple groups or conditions, as it ensures that the variance within features does not unduly influence the results. | (5) |
Sample normalization | Level scaling | It aims to adjust the measurements to be on a similar scale or magnitude. In datasets where some features have much higher magnitudes or values than others, scaling ensures that all features have an equal opportunity to influence the analysis, thereby preventing the analysis from being skewed towards high-magnitude features. | (5) |
Data transformation | Log10 transformation | This transformation is particularly beneficial in Lipidomics data, which often exhibit non-normal (skewed) distributions due to the wide range of concentrations across different lipid species. It helps normalize the data, making it more suitable for statistical analyses that assume normality. | (1) |
Data transformation | Square root transformation | The transformation is less drastic than logging, making it suitable for datasets where the dynamic range is not extremely wide but still presents heteroscedasticity. | (1) |
Data transformation | Cube root transformation | For lipidomics data exhibiting slight to moderate skewness, the cube root transformation can effectively normalize distributions, making it a preferable choice when data do not require the more substantial impact of log or square root transformations. | (1) |
FAQ 14: What is the version of all the R packages used in LipidSig2.0?
Package | Version |
---|---|
base | 4.2.3 |
dplyr | 1.1.3 |
tibble | 3.2.1 |
purrr | 1.0.2 |
shiny | 1.7.5 |
gatom | 0.99.3 |
scales | 1.2.1 |
visNetwork | 2.1.2 |
igraph | 1.5.1 |
data.table | 1.14.8 |
tidyr | 1.3.0 |
ggplot2 | 3.4.3 |
ggthemes | 4.2.4 |
ggpubr | 0.6.0 |
hwordcloud | 0.1.0 |
wordcloud | 2.6 |
broom | 1.0.5 |
stats | 4.2.3 |
stringr | 1.5.0 |
plotly | 4.10.2 |
rstatix | 0.7.2 |
S4Vectors | 0.36.2 |
grDevices | 4.2.3 |
iheatmapr | 0.7.0 |
SummarizedExperiment | 1.28.0 |
readxl | 1.4.3 |
dbscan | 1.1.11 |
factoextra | 1.0.7 |
Rtsne | 0.16 |
uwot | 0.1.16 |
mixOmics | 6.22.0 |
tidyselect | 1.2.0 |
Hmisc | 5.1.1 |
pcaMethods | 1.90.0 |
preprocessCore | 1.60.2 |
imputeLCMD | 2.1 |
fgsea | 1.24.0 |
reshape2 | 1.4.4 |