Skip to content

4. Outputs

MikiSchikora edited this page Dec 21, 2023 · 16 revisions

Fitness measurements

Q-PHAST calculates different fitness and susceptibility measurements. First, our pipeline uses the time-vs-growth curve to infer fitness for each spot in each drug concentration, using the QFA package. Note that for one strain we may have multiple spots (technical replicates). There are two types of fitness estimates:

  • Model-based fitness estimates: estimated by fitting a generalised logistic model to the time-vs-growth curve. The model parameters give us different fitness estimates. These estimates can be useful if we have some spots that did not reach stationary phase (to predict maximum growth, for example) or we have mixed samples with different growth times. These don't work well if we have slow-growing spots or non-logistic curves (which may happen because there is cell death after reaching stationary phase). K, r, g, v, MDR, MDP, DT, AUC, MDRMDP, rsquare (see below) are related to such model fitting.

  • Non parametric (or numeric) fitness estimates: these are calculated directly from the data, without assuming any underlying growth model. We generally use these (nAUC and DT_h) if we have experiments with the same growth times. nAUC, nr, nr_t, maxslp, maxslp_t, DT_h, nSTP and DT_h_goodR2 (see below) are non-parametric measurements.

These are the relevant fitness estimates (check the qfa manual for more information):

  • K, r, g and v are the parameters of a generalised logistic model that is fit to the data. K (maximum predicted growth) and r (predicted growth rate) are fitness estimates that may be used.

  • MDR (Maximum Doubling Rate), MDP (Maximum Doubling Potential), DT (Doubling Time estimated from the model fit at t=0), AUC (Area Under the growth-vs-fitness fit Curve) and MDRMDP (Addinall et al. style fitness) are several fitness estimates calculated from the model fit.

  • rsquare is the coefficient of determination between the model fit and the data. You can use it to determine which curves have a good model fit (i.e. rsquare > 0.95).

  • nr is a numerical estimate of intrinsic growth rate. It is estimated by fitting smoothing function to log of data, calculating numerical slope estimate across range of data and selecting the maximum estimate (should occur during exponential phase). nr_t is the time at which nr occurs, so that is an estimator of the lag phase time.

  • maxslp is a numerical estimate of maximum slope of growth curve, and maxslp_t is the time at which this maximum slope of observations occurs. maxslp_t is a way to calculate the lag phase.

  • nAUC is the numerical Area Under Curve. This is a model-free fitness estimate, directly calculated from the data, measuring the AUC of the time-vs-growth curve between t=0 and t=hours_experiment (24h by default). This is our preferred fitness esimate.

  • nSTP is the numerical Single-Timepoint growth estimate, calculated at t=hours_experiment (24h by default).

  • DT_h is a numerical estimate for the maximum doubling time, in hours. DT_h_goodR2 is the same value but only for those spots with a good model fit (rsquare>0.95). For poorly fit curves the DT_h_goodR2 is set to 25.0 (very high). This DT_h_goodR2 can be used to have as non-growing the samples with weird curves.

Once all fitness estimates are measured, Q-PHAST calculates the relative fitness (i.e. nAUC_rel, r_rel or nSTP_rel) for each spot by dividing the raw fitness by the fitness at concentration==0. This is only performed if a plate with concentration==0 is provided. These relative fitness measurements are essential to perform the susceptibility analysis.

Susceptibility measurements

For drugs with at least two non-0 concentrations, Q-PHAST calculates susceptibility for each spot in each drug. Note that there are some filters applied to ensure high-quality susceptibility measurements. We only considered as spots 'valid for relative fitness and susceptibility calculations' (in specific concentrations) as those that i) were not 'bad spots' (defined as explained here), ii) had a maximum of one non-0 concentration where the spot was flagged as 'bad spot' and iii) the concentration==0 was growing (according to the nAUC threshold explained here) and was not flagged as a 'bad spot'. Our pipeline uses these 'valid spots' to infer the following susceptibility estimates (considering either K_rel, r_rel, nr_rel, maxslp_rel, MDP_rel, MDR_rel, MDRMDP_rel, AUC_rel, nAUC_rel or nSTP_rel as relative fitness estimates):

  • MIC: Minimum Inhibitory Concentration, a typical estimator of drug susceptibility. It is the minimum concentration in which relative fitness is below 0.25 (MIC_25), 0.5 (MIC_50), 0.75 (MIC_75) or 0.9 (MIC_90). 0.25, 0.5, 0.75 and 0.9 are hereafter referred to as <mic fraction>. In cases where all concentrations have a relative fitness above the <mic fraction>, MIC is set to twice the maximum assayed concentration. In addition, to take into account that for some spots it cannot be accurately measured, MIC is set to NaN if i) all 'valid spots' have relative fitness above the <mic fraction> but the maximum concentration is a 'bad spot', ii) MIC is apparently the second assayed concentration, but the first concentration is a 'bad spot' or iii) the concentration before the MIC is a bad-spot and there is a large distance between MIC and the previous 'valid spot' concentration (>= 0.001). `

  • SMG: Supra-MIC Growth, an estimator of drug tolerance (see Berman et. al. 2020). It is the average (raw) fitness for concentrations above the MIC, normalized by the fitness at concentration==0. There is one SMG estimate for each <mic fraction>. Note that SMG is only calculated for spots in which i) MIC is not NaN and ii) there are at least two concentrations above the MIC.

  • rAUC: Resistance AUC, an estimator of drug susceptibility proposed in Ksiezopolska, Schikora-Tamarit et. al. 2021. This is the Area Under the concentration - vs - relative fitness Curve, normalized by a 'maximum AUC' where relative fitness is 1.0 across all assayed concentrations. Higher rAUCs indicate lower drug susceptibility. To take into account that concentrations are often set in logarithmic ranges (i.e. 0, 0.1, 0.2, 0.4, 0.8 ...), rAUC is calculated using either log2-transformed concentrations (rAUC_log2_concentration) or real concentrations (rAUC_concentration). To take into account that for some spots it cannot be accurately measured, rAUC is set to NaN if i) there are <3 concentrations (including 0) or ii) the maximum concentration is a 'bad spot' and the highest concentration with a 'valid spot' is growing (according to the nAUC threshold explained here).

Main Q-PHAST outputs

This pipeline generates several following files / folders under the output directory. Those related to relative fitness calculations are only generated in there is some plate with concentration==0. In addition, outputs related to susceptibility measurements are only generated for drugs with at least two non-0 concentrations. Below is the description of the files generated.

Integrated fitness measurements

The files fitness_measurements_simple.xlsx and relative_fitness_measurements_simple.xlsx include the averaged per-strain fitness and relative fitness measurements, respectively. These files use nAUC as the fitness estimate. These are the columns:

  • drug, concentration, strain and experiment_name are the sample identifiers as specified in the input plate layout.

  • # replicates indicates the number of technical replicate spots used to do the averaged fitness calculations. For raw fitness, our pipeline considers all spots that are not 'bad spots'. For relative fitness, Q-PHAST only considers spots that are 'valid for relative fitness and susceptibility calculations' (defined above).

  • median_nAUC, mode_nAUC indicate the median and mode nAUC across technical replicates. In the relative fitness table there are the equivalent median_nAUC_rel and mode_nAUC_rel.

  • mad_nAUCand range_nAUC show the dispersion across replicates. mad_nAUC shows the median absolute deviation, and range_nAUC indicates the minimum and maximum values. In the relative fitness table there are the equivalent mad_nAUC_rel and range_nAUC_rel.

Fitness heatmap

When no concentration==0 is provided, Q-PHAST generates a file called 'raw_nAUC_across_drugs_heatmap.pdf'. This is a heatmap showing, for each strain in each drug, the median and MAD (median absolute deviation) nAUC across replicates. This plot provides an overview about the experiment. Note that there is also a plot with a .no_clustering. tag, which have the strains sorted alphabetically.

Integrated susceptibility measurements

The file susceptibility_measurements_simple.xlsx includes the averaged per-strain susceptibility measurements, when considering nAUC as a fitness estimate. To make it simple, this table only includes information about MIC_50, SMG_MIC_50, rAUC_concentration (based on real, not log2-transformed, concentrations) and rAUC_log2_concentration (based on log2-transformed concentrations). These are the columns:

  • drug, concentration, strain and experiment_name are the sample identifiers as specified in the input plate layout.

  • max_concentration indicates the maximum assayed concentration. This is relevant for comparisons with other datasets.

  • replicates_MIC50, replicates_SMG-MIC50, replicates_rAUC and replicates_rAUC_log2 indicate the number of technical replicate spots used to calculate each susceptibility measurement in each strain. Note that all these values might be NaN for certain spots (explained above), and that only spots considered 'valid for relative fitness and susceptibility calculations' were used (see above).

  • median_MIC50, mode_MIC50, median_SMG-MIC50, mode_SMG-MIC50, median_rAUC, mode_rAUC, median_rAUC_log2 and mode_rAUC_log2 indicate the median and mode across technical replicates.

  • mad_MIC50, range_MIC50, mad_SMG-MIC50, range_SMG-MIC50, mad_rAUC, range_rAUC, mad_rAUC_log2 and range_rAUC_log2 indicate the dispersion across technical replicates. mad stands for median absolute deviation.

Summary plots

The folder summary_plots contains several plots for each drug which provide an overview about the experiment. These are the files:

  • [<drug>]_vs_nAUC_lines_all.pdf shows for each spot the concentration vs fitness (nAUC) curve. Spots that are not 'valid for relative fitness and susceptibility calculations' (defined above) are outlined with squares. This plot is very useful to see the consistency of the measurements across replicates.

  • [<drug>]_vs_nAUC_lines_only_correct.pdf is equivalent to [drug]_vs_nAUC_lines_all.pdf, but only showing spots that are 'valid for relative fitness and susceptibility calculations'.

  • [<drug>]_vs_nAUC_rel_lines_only_correct.pdf is equivalent to [drug]_vs_nAUC_lines_only_correct.pdf, but showing nAUC_rel values (relative fitness).

  • [<drug>]_vs_nAUC_heatmap.pdf and [drug]_vs_nAUC_rel_heatmap.pdf are heatmaps showing, for each strain in each concentration, the median and MAD (median absolute deviation) nAUC and nAUC_rel across replicates. Only spots that are 'valid for relative fitness and susceptibility calculations' were used. Note that there are also the plots with a .no_clustering. tag, which have the strains sorted alphabetically.

  • <drug>_susceptibility_heatmap_by_nAUC.pdf is a heatmap showing, for each strain, the median and MAD MIC_50, SMG_MIC_50 and rAUC_concentration across replicates. Only spots that are 'valid for relative fitness and susceptibility calculations' were used. Note that there are also the plots with a .no_clustering. tag, which have the strains sorted alphabetically.

Extended Q-PHAST outputs

Beyond these outputs, Q-PHAST generates many other files that may be useful for some users, under the directory extended_outputs:

Tab-separated versions of the integrated fitness and susceptibility tables

The files fitness_measurements_simple.csv, relative_fitness_measurements_simple.csv and susceptibility_measurements_simple.csv are the .csv versions (tab-separated) of the files fitness_measurements_simple.xlsx, relative_fitness_measurements_simple.xlsx and susceptibility_measurements_simple.xlsx described above. You should use these .csv files for further calculations on these files, as parsing excel files may be dangerous.

Growth measurements

The file growth_measurements_all_timepoints.csv is a table with all growth measurements in all spots and timepoints. This is the raw data used for fitness measurements. These are the columns:

  • plate_batch, plate, row, column, strain, drug, concentration are the spot information provided in the plate layout.

  • Growth has the inferred cell density. It is calculated as Trimmed / (Tile.Dimensions.X * Tile.Dimensions.Y * 255), as suggested in the qfa manual.

  • Inoc.Time is the innoculation time in YYYY-MM-DD_HH-MM-SS.

  • Date.Time is the timepoint in YYYY-MM-DD_HH-MM-SS, and Expt.Time is the timepoint in days.

  • Timeseries.orderis the categorical timepoint.

  • X.Offsetand Y.Offset are the coordinates of the spot withing the plate.

  • The remaining columns (i.e. Area or redMean) are related to the growth inference. Check the qfa manual for more information.

Fitness measurements

The file fitness_measurements.csv is a table with all the raw and relative fitness estimates per spot and drug concentration. These are the columns:

  • plate_batch, plate, row, column, spotID, strain, drug, concentration and experiment_name are the spot information provided in the plate layout.

  • replicateID indicates the spot as r<row>c<column> (i.e. A1). Derived from this there is the sampleID field, which includes <strain>_<replicateID>.

  • Many columns are the fitness estimates (i.e. nAUC, K...) or relative fitness estimates (i.e. nAUC_rel, K_rel...) (defined above).

  • Inoc.Time, XOffset, YOffset are equivalent to those in growth_measurements_all_timepoints.csv.

  • bad_spot indicates whether the spot is flagged as a 'bad spot', either from the input plate layout or by the automatic definition of bad spots.

  • is_growing indicates whether the spot is growing (it has an nAUC above the set 'min nAUC growing', see this).

  • conc0_is_growing and conc0_is_bad_spot indicate whether the corresponding concentration==0 for a given spot is growing or is a bad spot (as defined above).

  • idx_correct_rel_estimates is a True/False boolean indicating whether a given spot is 'valid for relative fitness and susceptibility calculations' (defined above).

Susceptibility measurements

The file susceptibility_measurements.csv is a table with all susceptibility estimates (by different fitness estimates). The columns are:

  • strain, row, column, replicateID, drug are experiment_name the spot identifiers.

  • MIC_25, MIC_50, MIC_75 and MIC_90 are the MIC values at different values for <mic fraction>. As mentioned above, this can be sometimes NaN because of our quality control filtering.

  • SMG_MIC_25, SMG_MIC_50, SMG_MIC_75 and SMG_MIC_90 are the SMG values at different values for <mic fraction>. As mentioned above, this can be sometimes NaN because of our quality control filtering.

  • rAUC_concentration and rAUC_log2_concentration are the different rAUC values.

  • fitness_estimate indicates the fitness estimate used to caluclate the rAUC, MIC and SMG values. These are always relative fitness estimates (i. e. nAUC_rel, K_rel).

  • raw_fitness_conc0 is the raw fitness at concentration==0.

  • max_concentration is the maximum assayed concentration, relevant because it can affect susceptibility measurements.

Extended fitness and susceptibility plots

In the folder summary_plots within the main output directory there are many plots that provide an overview about the experiment (described above), based solely on nAUC as the fitness estimate. Under extended_outputs there are several equivalent plots for all the other fitness estimates:

  • drug_vs_fitness_lines_all_spots contains figures equivalent to summary_plots/<drug>/[<drug>]_vs_nAUC_lines_all.pdf, but for all fitness estimates.

  • drug_vs_fitness_lines contains figures equivalent to summary_plots/<drug>/[<drug>]_vs_nAUC_lines_only_correct.pdf and summary_plots/<drug>/[<drug>]_vs_nAUC_rel_lines_only_correct.pdf, but for all fitness estimates.

  • drug_vs_fitness_heatmaps contains figures equivalent to summary_plots/<drug>/[<drug>]_vs_nAUC_heatmap.pdf and summary_plots/<drug>/[<drug>]_vs_nAUC_rel_heatmap.pdf, but for all fitness estimates.

  • susceptibility_heatmaps contains figures equivalent to summary_plots/<drug>/<drug>_susceptibility_heatmap_by_nAUC.pdf, but for all fitness estimates.

  • susceptibility_heatmaps_log_scale contains figures similar to susceptibility_heatmaps, but with log10(MIC_50) and rAUC_log2 values.

Extended fitness heatmaps

When no concentration==0 is provided, Q-PHAST generates various plots under the folder drug_vs_raw_fitness_heatmaps. These show, for each strain in each drug, the median and MAD (median absolute deviation) fitness (for different fitness estimates) across replicates. These plots provide an overview about the experiment.

Quality control plots

Q-PHAST generates various plots that are useful to assess the quality of the analysis. These are the folders under extended_outputs that contain such plots:

  • growth_curves contains the time - vs - Cell density curves for all spots in the experiment. These are generated by QFA.

  • growth_curves_and_images includes one plot for each strain and drug, useful for quality control. Each plot is a grid where the columns correspond to different concentrations. The first row shows the time - vs - Cell density curves for all replicates of the strain, showing with the linestyle the 'type spot' ('used' or 'discarded'). If concentration==0 was provided, 'discarded' means that the spot was not 'valid for relative fitness and susceptibility calculations' (defined above). If not, 'discarded' means that it is a 'bad spot'. The legend of these plots also show the nAUC of each spot. The rows 2-4 show the images of the plates throughout the experiment. This representation is useful to understand why certain spots have a given growth curve.

Quality control files

Beyond the plots, there are several files that are useful for quality control of the analysis, as well as to reproduce the running:

  • plate_layout.xlsx is a copy of the provided plate layout.

  • bad_spots.xlsx includes the spots defined as bad spots (either because you i) set them as such in the input plate layout or ii) accepted them as bad spots during the manual curation of flagged outliers). This file has the column bad_spot_reason which indicates which is the type of bad spot. In the case of automatically-inferred bad spots, bad_spot_reason shows the nAUC of the spot and the (Q1 - 2.5·IQR, Q3 + 2.5·IQR) range, which defined it as a possible outlier. To recall how bad spots are inferred see this page.

  • reduced_input_dir.zip includes the input plate layout, the run command and a subset of the input images. This file is useful for debugging errors. For instance, if you have errors you can send it to us to reproduce your errors and be able to fix them.