User Ratings

Your rating: None
Your rating: None
Your rating: None
Add your comments

Alternatives to Log‐Scale Data Display

Joseph Trotter1

1BD Biosciences, San Diego, California

Unit Number: 
UNIT 10.16
DOI: 
10.1002/0471142956.cy1016s42
Online Posting Date: 
October, 2007
GO TO THE FULL TEXT:
PDF or HTML at Wiley Interscience
Are you the author of this protocol? Login or register and return to this page.

Abstract

Traditional log-scale data display of multiparameter immunofluorescence cytometry data has several perplexing intrinsic problems that are largely mitigated by recent transformation alternatives that are log-like at the high end of the scale, near linear at the low end of the scale, and symmetrical about zero. These alternative log-like display transformations provide a means for better interpretation and analysis of compensated data. Curr. Protocol. Cytom. 42:10.16.1-10.16.11. © 2007 by John Wiley & Sons, Inc.

Keywords: Flow cytometry; data display; log transformation; Logicle transformation; biexponential; Hyperlog transformation; generalized log transformation; glog; compensation artifacts

     
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Interscience

Table of Contents

  • Introduction
  • Background Information
  • Appendix: Transformation Functions
  • Literature Cited
  • Figures
  • Tables
     
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Interscience

Figures

  • Figure 10.16.1
    Multicolor cell data transformed with 5-decade log, 4-decade log, and Logicle. All displays are of the same data file: human peripheral blood lymphocytes stained with CD4-APC and CD8-APC-Cy7 and collected on a BD FACSAria with FACSDiVa software. The data represent integrated pulse area measurements with a maximal value of 262,143 (18-bit equivalent). However, within the system the data are measured with a slight voltage offset at the analog to digital converter (ADC) input (~90 mV) and dynamically corrected with a digital baseline correction algorithm (22-bit resolution for area) that subtracts sampled levels when no cells are present. Because of this virtual zero approach, some uncompensated negative cells may have values less than zero after correction owing to chance alone. Since the data are sampled with a 14-bit ADC and log10(214) = 4.2, the dynamic range for log display is most appropriate at four decades; extending the display range another decade lower devotes too much screen real estate to noise, even though the precision of the area measurement is greater than 14 bits. Panels A, B, and C show the data set properly compensated visualized with 5-decade log, with 4-decade log, and with the Logicle transformation. For reference, Panels D, E, and F are uncompensated presentations of the same data. In this example, the cells negative for both CD4 and CD8 have very low medians, the region where the log transformation contributes often-confusing visualization artifacts. Using five decades for log display essentially adds a decade of screen real estate devoted to electronic noise and creates undesirable visualization artifacts. The Logicle display provides more desirable visualization of the data than either 4- or 5-decade log display because all cells are rendered on scale, and the population boundaries are easy to see.

  • Figure 10.16.2
    Compensation visualization issues with log transformation. BD CompBeads (antibody capture beads) were stained with differing intensities of an APC-labeled antibody (CD4) and analyzed on a BD LSR II flow cytometer. The same data are shown scaled with 5-decade log, 4-decade log, and the Logicle transformation, both uncompensated and compensated. Panel A shows correctly compensated data where the undesirable effects of both log-scale visualization and post compensation data spread are evident. It is unclear by visual inspection alone in Panels A and B if the compensation used is actually correct for two reasons: (1) the realigned medians are below the low-end horizon line of the log scale, and (2) log scaling artifacts create false visual pseudo peaks for the APC-Cy7-negative populations that appear from near 100 (gray arrows in panels A and B) when the true population medians are all less than zero (see Table 10.16.1 for the actual values). The negative population viewed on a log scale is grossly distorted and graphically misleads the viewer by nearly two orders of magnitude as to the true central tendency of the population. The 4-decade log transformation (B) is only a little better than the 5-decade log transformation (A), and both visualizations might mislead the viewer. Panel C shows how the Logicle transformation aligns the populations visually such that both their medians and relative variances are immediately clear. Panels D, E, and F show the uncompensated data for reference. Antibody capture beads were used in this example because they have a low intrinsic coefficient of variation (CV) relative to cells, allowing photon statistics to dominate their variance.

  • Figure 10.16.3
    Logicle, glog, log, and Hyperlog displays of the same cell data. In this figure the CD4-APC versus CD8-APC-Cy7 cell data shown in Figure 10.16.1 are displayed by four different transforms. Logicle (B) and glog (C) produce nearly identical scales and are only a little different from that produced by Hyperlog (E) by empirically varying the b coefficient. However, the log display (D) renders it unclear to the viewer whether the data are properly compensated, and unclear as to where the negative medians actually reside. The three alternative scales use the same extent below zero and comparable appropriate scaling factors to make them as similar to each other as possible. Panel A shows the transformation used for the CD4-APC scale in panel C contrasted to log. The simplified glog function used for Panels A and C is that given in Equation (8), where z = the raw linear input data value, and = (5th percentile below zero)2 = 120,000.

  • Figure 10.16.4
    Effects of population variance and Logicle scaling. The same BD CompBeads data shown in Figure 10.16.2 are plotted a little differently. Since digital compensation aligns the populations and largely preserves the variances, the brighter the stain, the greater the “spread” after compensation. The feature emphasized here is that the particles unstained in the ACP-Cy7 dimension (no APC-Cy7 added) may be a complex mixture of populations stained with one or more levels of one or more other dyes, as shown in the paired overlaid histogram in both panels. In this example, different levels of APC signal that raise background in the APC-Cy7 detector result in multiple APC-Cy7-negative populations with the same median and different variances after compensation. The Logicle scaling effects are shown using the 5th percentile below zero for each of two indicated populations as the data-dependent input variable. Panel A is scaled using only the unstained and the dimmest positive population, and panels B is scaled on only the unstained and the brightest population (i.e., the largest variance). This shows the scaling effects that may occur when the data-dependent input may not be representative of what we want to measure for one reason or another, and is typical of the occasional situation where the sampling is not correct. The best approach is to remove artifacts and then scale on all the desired populations together as shown in Figure 10.16.2C, even though some structure in the cumulative variance may remain visible in a histogram.

  • Figure 10.16.5
    Dealing with artifacts that bias Logicle scaling estimates. In this example an eight-color listmode file of 160,000 events has a few spurious events in the PE and FITC dimensions that are artifactual (presumably due to the cells being permeabilized for intracellular cytokine labeling and binding some fluorescent debris) and bias the estimate of the 5th percentile below zero in each color. For each dimension, the spurious events (extending as low as –4900 in PE and –7900 in FITC) cause the algorithm to significantly overestimate the size of the negative reference value in each color dimension and provide a stronger than appropriate linear compression of the lower part of the scale. This undesirable effect may usually be mitigated either by removing the spurious events by careful gating and rescaling of the data, or by manually defining the scale with an estimate of the negative reference value. Panel A shows the raw data and the automatically estimated scaling without any gating. Panel B shows the events gated on scatter, but the scaling estimate remains based upon the ungated data. Panel C shows the Logicle scaling after removing the outliers from the estimates for the 5th percentile below zero.

  • Figure 10.16.6
    Elucidating population boundaries. This figure demonstrates how the Logicle transformation, as an alternative to log, helps elucidate population boundaries for correct analysis, and it does a better job than log transformation of visually indicating where those boundaries actually are. Panel A shows human PBMCs stained with CD45RA-APC and CD27-APC-Cy7 correctly compensated on a conventional 4-decade log display. In Panel A it is unclear if the data are correctly compensated or where the region boundaries should be drawn for proper analysis, owing to the artifactual bifurcation of the single positive population in each color dimension (note log-scale artifacts), and the distortion at the low end makes the CD27 boundary for CD45RA cells ambiguous. One common error in log scale analysis is to exclude the lowest on-axis events, as shown by the questionable range for CD27 APC-Cy7 negative cells in Panel A. Panel B is the Logicle display of the same data file, where all four populations are clearly rendered and appropriate boundaries for CD45RA+ and CD45RA that are either CD27+ or CD27 may be seen. Note that rectilinear quadrant analysis is not appropriate for these data because the CD45RA+ cells have much higher variance than the CD45RA cells in the CD27 dimension, and any unilateral rectilinear delineation for CD27 positivity will be incorrect for either CD45RA+ or CD45RA subsets, or for both, because there are at least two classes of CD27 “negatives.”

Literature Cited

Literature Cited
    Bagwell, B.C. 2005. HyperLog—a flexible log-like transformation for negative, zero, and positive valued data. Cytometry A 64A:34-42.
    Durbin, B. and Rocke, D.M. 2003. Estimation of transformation parameters for microarray data. Bioinformatics 19:1360-1367.
    Herzenberg, L.A., Tung, J., Moore, W.A., Herzenberg, L.A., and Parks, D.R. 2006. Interpreting flow cytometry data: A guide for the perplexed. Nat. Immunol. 7:681-685.
    Johnson, N.L. 1949. Systems of frequency curves generated by methods of translation. Biometrika 36:149-176.
    Parks, D.R., Roederer, M., and Moore, W.A. 2006. A new “Logicle” display method avoids deceptive effects of logarithmic scaling for low signals and compensated data. Cytometry A 69A:541-551.
    Rocke, D.M. and Durbin, B. 2003. Approximate variance-stabilizing transformations for gene-expression microarray data. Bioinformatics 19:966-972.
    Roederer, M. 2001. Spectral compensation for flow cytometry: Visualization artifacts, limitations, and caveats. Cytometry 45:194-205.
    Tung, J.W., Parks, D.R., Moore, W.A., Herzenberg, L.A., and Herzenberg, L.A. 2003. New approaches to fluorescence compensation and visualization of FACS data. Clin. Immunol. 110:277-283.
    Zhou, L. and Rocke, D.M. 2005. An expression index for Affymetrix GeneChips based on the generalized logarithm. Bioinformatics 21:3983-3989.
 Internet Resource
    http://stat-www.berkeley.edu/users/terry/zarray/Affy/GL_Workshop/genelogic2001.html

Workshop Web site presentation: Munson, P.J. 2001. “Consistency” test for determining the significance of gene expression changes on replicate samples and two convenient variance-stabilizing transformations. In Genelogic Workshop on Low Level Analysis of Affymetrix Genechip Data.

     
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Interscience
Looking for Answers?
Do you have tips, tricks, or improvements to share?

Join the Conversation

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.