## A non-parametric analysis to determine the ED50, SD50 or similar metrics

The first time I came across the term *Spearman-Kärber *was with the analysis of RT-QuIC. These assays are typically used to detect small amounts of prions, i.e. proteins that can induce conformational changes in other proteins and lead to diseases like Creutzfeldt-Jakob disease. A short-hand notation for prion protein that is often used is PrP. In an RT-QuIC assay a liquid body sample (e.g. cerebrospinal fluid) supposed to contain infectious PrP^{I} is diluted with native-conformation-PrP^{C}. The dilution series is typically performed on a logarithmic scale while keeping the concentration of diluent (PrP^{C} solution) constant. Each dilution series is replicated several times and each sample of this series is applied to vigorous shaking typically in a microplate reader capable of detecting Thioflavin T (ThT) fluorescence. Thioflavin T is a fluorescence dye that becomes fluorescent when bound to aggregated PrP^{I}. Thus, the time course of aggregation can be monitored with ThT. If PrP^{I} were present in the original sample at high concentration one would observe an increase of the fluorescence over time. At high concentrations of PrP^{I} all replicates will start aggregating at some point, while for more diluted samples only a portion of the replicates will aggregate (in a given time period). At the highest dilution there is likely to be no aggregation at all. Plotting the portion of aggregated replicates as a function of dilution often gives a sigmoidal curve going from 0 (no replicate was positive /aggregated) to 1 (all replicates were positive /aggregated). There are various ways how to analyze such data. But one of the most common ways is to perform a Spearman-Kärber analysis. It is a general non-parametric way to estimate the mean effective seeding dose SD50, mean lethal dose, mean effective dose or some other mean EX50 quantity and its corresponding error.

If the aforementioned portions p_{i} were plotted as a function of the seeding dilution, the graph would be the empirical estimate of the cumulative distribution function F(x) of the underlying (continuous) distribution. It is often referred to as tolerance distribution whose probability distribution function shall be denoted with f(x). For the continuous case, f(x) can be obtained by differentiating F(x) with respect to x (since F(x) is an integral of f(x)). Although the tolerance distribution is continuous, we can approximate it by the discrete empirical cumulative distribution function F(x). In the discrete case, f(x) (I should rather write f(x_{i}) can be obtained by differencing F(x), i.e. F(x_{i+1})- F(x_{i}). Herein x denotes the log(dilution) or log(dose). In formal terms what was said means: If we were to estimate the mean of the log(dose) we use the general formula for the mean of a discrete random variable:

The term takes into account that we approximate the continuous tolerance distribution by an empirical with only discrete base points . For more details refer to Spearman’s publication. The summation in the equation is over all doses assuming that there is no response for any of the replicates at the lowest dose (i.e. ) and that all replicates are positive at the highest dose , i.e. . If this is not given by the experimental data, one can introduce fake data points. Say , then one can introduce a fake dose and set . But how to set ? If the ’s are evenly spaced (which is often the case) one can simply subtract the constant from to end-up with , i.e.

On the other hand, if , a fake dose can be calculated by:

One can try using these types of fake doses even if the experimental doses are not evenly spaced. Please note that if the fake doses are taken into account, the lower and upper limit of summation in the equation for will change.

The standard error of the mean can be calculated by a simple equation:

The standard error can then be used to calculate an (1-α)-percent confidence interval:

Where denotes the -quantile of the Student t-distribution.

While the “classical” Spearman-Kärber (SK) analysis works nicely in many practical cases, state-of-the-art is the so-called trimmed Spearman-Kärber analysis as developed by Hamilton et al. It adds a trimming and scaling procedure before calculating the estimate for the median effective dose. Trimming is done in as much the same way as a the trimmed mean is calculated. However, I will not go to much into the details here. In the trimmed SK analysis, you need to set a trimming value in the range from 0-0.5 and subsequently only percent of the data are finally used for calculating the mean effective seeding dose. In essence, the upper and lower percent of the dose-response curve are cut-off. With this, one can get more robust estimates of the mean effective seeding dose as the heads and tails of the dose-response curve tend to be more prone to variations.

Thus, it becomes obvious that the “classical” SK analysis is a special case of the trimmed SK analysis with set to zero.

I created a simple Excel-sheet in order to demonstrate the calculations for the Spearman-Kärber analysis. Feel free to download and to paste your data into the appropriate table and run the analysis (no macros required).

I also created an Excel-sheet for the trimmed Spearman-Kärber which is available upon request from the author.

## References

D. J. Finney, Statistical Method in Biological Assay, Hafner, 1952.

C. Spearman, “The Method of “Right and Wrong Cases” (Constant Stimuli) without Gauss’s Formula.,” British Journal of Psychology, p. 227–242, 1908.

G. Kärber, “Beitrag zur kollektiven Behandlung pharmakologischer Reihenversuche,” Archiv für experimentelle Pathologie und Pharmakologie, pp. 480-483, 1931.

M. A. Hamilton, R. C. Russo and R. V. Thurston, “Trimmed Spearman-Karber method for estimating median lethal concentrations in toxicity bioassays,” Environmental Science & Technology, p. 714–719, 1977. DOI: https://doi.org/10.1021/es60130a004.