Department of Mathematics, Virginia Tech, 460 McBryde Hall, Blacksburg, VA 24060, USA

Department of Pediatrics, Duke University Medical Center, Durham, NC 27710, USA

Department of Immunology, Duke University Medical Center, Durham, NC 27710, USA

Department of Microbiology, Boston University School of Medicine, Boston MA 02118, USA

Abstract

Background

T-cell receptor diversity correlates with immune competency and is of particular interest in patients undergoing immune reconstitution. Spectratyping generates data about T-cell receptor CDR3 length distribution for each BV gene but is technically complex. Flow cytometry can also be used to generate data about T-cell receptor BV gene usage, but its utility has not been compared to or tested in combination with spectratyping.

Results

Using flow cytometry and spectratype data, we have defined a divergence metric that quantifies the deviation from normal of T-cell receptor repertoire. We have shown that the sample size is a sensitive parameter in the predicted flow divergence values, but not in the spectratype divergence values. We have derived two ways to correct for the measurement bias using mathematical and statistical approaches and have predicted a lower bound in the number of lymphocytes needed when using the divergence as a substitute for diversity.

Conclusions

Using both flow cytometry and spectratyping of T-cells, we have defined the divergence measure as an indirect measure of T-cell receptor diversity. We have shown the dependence of the divergence measure on the sample size before it can be used to make predictions regarding the diversity of the T-cell receptor repertoire.

Background

The immune system’s ability to fight a large array of foreign particles is facilitated by the diversity of the T-cell receptor (TCR) repertoire

Different T-cell clones use different V gene families in the rearrangement of their

Spectratyping uses messenger RNA (mRNA) from T-cells to amplify, by PCR, the complementary DNA (cDNA) across the CDR3 region. This generates information about the heterogeneity of the relative frequencies of different CDR3 length products within a functional TCR BV family. Because different T-cell clones have different sequences or lengths of CDR3, analysis of the CDR3 length distributions can be used to determine the overall TCR repertoire diversity

TCR diversity can also be assessed by nucleotide sequencing of DNA CDR3 regions, but this is labor-intensive and generates an even lower level of resolution of the whole T-cell repertoire compared to spectratyping

This paper focuses on the role of flow cytometry in measuring T-cell population diversity and compares it to T-cell population diversity as given by spectratyping. Traditionally, spectratyping data is quantified using a wide range of methods from visual

Estimator bias is a concern when using this method of divergence scoring. In particular, it is desirable to determine how much deviation in the computation of the divergence occurs when the initial number of lymphocytes used in generating the data is varied. We have addressed this question in the context of divergence measures generated individually by flow cytometry and spectratyping. The results are especially useful when using the techniques for limited numbers of cells.

Results

We used the Kullback-Leibler divergence to quantify similarities between different frequency distributions in the T-cell repertoire diversity when measured by either flow cytometry or spectratyping. We started with two assumptions: 1) the reference distribution corresponds to a polyclonal TCR repertoire and 2) in individual subjects, a positive divergence determines the deviation from the normal TCR repertoire. The flow divergence, _{f}, is the distance between the individual and the perfectly sampled reference control distributions of all TCR BV family usage measured by flow cytometry. The spectratype divergence, _{s}, is the distance between the individual and the perfectly sampled reference control distributions of the CDR3 lengths of each TCR BV family and averaged over all TCR BV families as measured by spectratyping (see section Kullback-Leibler divergence and

We specifically wanted to assess the performance of the divergences _{f} and _{s} in predicting the diversity of the T-cell receptor repertoire under stressful, i.e. cell limited, circumstances. While _{f} and _{s} account for deviations from normal of distributions of TCR BV family usage and CDR3 lengths within each TCR BV family, additional variability is added due to the dependence on the number of measured events, _{i}, we derived the corrected divergence value, _{i,corr} (see section ‘Sampling bias - theoretical derivation’) to be given by

Measured and corrected divergence measures as function of inverted sample number

**Measured and corrected divergence measures as function of inverted sample number. ****(a)** Measured flow divergence, _{f}, (red solid diamonds) and corrected flow divergence, _{f,corr}, (blue circles) as functions of the inverted sample number 1/**(b)** Measured spectratype divergence, _{s}, (red empty diamonds) and corrected spectratype divergence, _{s,corr}, (blue circles) as a function of the inverted sample number 1/_{0} in one DiGeorge patient.

Flow divergence, _{f}, as a function of sample size n (∙), presented on a log-log scale

**Flow divergence, **
**
D
**

where _{f} is the number of BV families used in the flow cytometry assay (in our case 18) and _{s} is the number of CDR3 lengths used in the spectratype assay (in our case 14).

Therefore, only the number of measured events, _{i} are needed to correct the divergence measures. We used this formula to assess the performance of _{f} and _{s} measures in an athymic DiGeorge subject (Figure

Flow cytometry results

Flow divergence measurements, _{f}, were determined at seven time points following thymus transplantation in DiGeorge subject 5 (Table _{f,corr} is found by subtracting (_{f} - 1)/2_{f} = 18, from the measured divergence _{f} at each time point (Table _{f} compared to _{f} estimates from samples with high event numbers, for which the correction is not significant. Formula (1) helped address the effect of event number on the _{f} prediction.

**Days after**

**Average CD4 nr**

**Measured flow**

**Corrected flow**

**transplant**

**in gate ****(****)**

**
D**

_{f, corr} **value**

Values are measured over time following thymic transplantation.

70

341

0.47

0.44

88

103

1.02

0.94

117

174

0.39

0.34

145

581

0.129

0.11

181

737

0.103

0.091

398

1569

0.063

0.057

868

4514

0.06

0.058

To further test the dependence of _{f} on the sample size we assumed that _{f} is a function of the decreasing event numbers in the CD4 T-cell gate used for TCR BV analysis. For this analysis we used a single blood sample collection from each of four complete DiGeorge subjects after thymus transplantation and from each of four healthy controls. Each blood sample was serially diluted, followed by flow cytometry. The results are presented in Table _{f} as a function of

**
Subject
**

**Average CD4 T-cell nr**

**Measured flow**

**in gate n**

**divergence **_{f
}

Control 1

66

0.252

340

0.135

675

0.132

10051

0.098

Control 2

58

0.260

290

0.135

603

0.079

4438

0.070

29438

0.053

Control 3

60

0.214

290

0.084

585

0.366

5965

0.021

11889

0.022

Control 4

136

0.112

282

0.083

425

0.045

4354

0.018

Subject 1

89

0.679

445

0.379

756

0.445

887

0.466

Subject 2

59

0.678

194

0.403

299

0.399

605

0.355

Subject 3

19

0.479

95

0.366

207

0.191

2013

0.182

3946

0.183

Subject 4

103

0.158

213

0.229

329

0.115

3367

0.087

For each of these eight cases, we wanted to predict the corrected divergence value, _{f,corr}, using the measured _{f}s and determine their dependence on the sample size

where, _{f} and _{f,corr}, and the slope _{f} - 1)/2 value, which for an assay that uses 18 BV families, reduces to 8.5. The errors,

We derived estimates and 95% confidence intervals for parameters _{f} values in Table

**
Subject
**

**Value**

**CI**

Control 1

0.107

[0.079,0.135]

C

9.7

[6.1, 13.4]

Control 2

0.07

[0.02,0.129]

C

10.9

[4.7, 17.2]

Control 3

0.111

[-0.17,0.373]

C

6.9

[-29, 43]

Control 4

0.02

[-0.027,0.067]

C

13

[2, 24]

Subject 1

0.39

[0.214, 0.574]

C

25

[-6.3, 56]

Subject 2

0.32

[0.253, 0.377]

C

21.3

[14.5, 28.1]

Subject 3

0.205

[0.087, 0.322]

C

5.5

[0.7, 10.4]

Subject 4

0.113

[-0.116, 0.342]

C

7.9

[-33, 49]

Flow divergence _{f} as a function of the inverted sample number ** 1/n** in eight subjects

**Flow divergence **_{f}** as a function of the inverted sample number **** in eight subjects. **The solid line represents the fit of the three parameter linear model (2) to the data (∙). Results are presented on a log-log scale. The same model was fitted to a data set that excluded point (0.0017,0.366) for control 3 (dashed line). The best parameter estimates and their 90% confidence intervals are presented in Table

Moreover, if we consider the slope

where _{i} are the corrected divergence values for the patient _{i}, are independent and normally distributed.

The fitting procedure was done using a quasi-Newton method for finding the minimum of a multivariate function

**
Subject
**

**
α
**

**CI**

Control 1

0.117

[0.033,0.202]

Control 2

0.085

[0.009,0.161]

Control 3

0.107

[0.032,0.184]

Control 4

0.039

[-0.045,0.123]

Subject 1

0.46

[0.38, 0.55]

Subject 2

0.41

[0.32, 0.49]

Subject 3

0.175

[0.089, 0.261]

Subject 4

0.113

[0.029, 0.2]

CI

All

7.705

[4.55, 10.85]

Flow divergence _{f} as a function of the inverted sample number ** 1/n** for the same slope

**Flow divergence **_{f}** as a function of the inverted sample number **** for the same slope ****. **The solid and dashed lines shows the fit of a three parameter linear model (3) to the data (∙). The results are presented on a log-log scale. The best parameter estimates and their 90% confidence intervals are presented in Table

From the flow cytometry analysis we can estimate the minimum number of CD4 T-cells needed in a sample for an accurate _{f,corr} estimate. If we want our estimates to be 90% accurate,

This translates into the following condition

From our estimates _{f,corr} estimate. In our case, we gated the flow cytometry on CD4 T-cells, so more than 364 CD4 T-cells, or events, must be captured in the flow analysis.

Spectratype results

Spectratype divergence measurements, _{s}, were determined in five patients for three to seven time points following thymic transplantation (Table _{0}, is known (Table

**
Subject
**

**Days after transplant**

**CD3 T-cells **_{0}

**Measured **_{s} **value**

**Corrected **_{s, corr} **value**

Values are measured over time following thymic transplantation.

Subject 1

9

420,000

0.91

0.9096

34

12,220,000

0.61

0.61

70

550,000

0.97

0.9697

Subject 4

540

670,000

0.039

0.0388

1540

1,260,000

0.073

0.0729

2017

1,140,000

0.076

0.0759

Subject 5

70

700,000

1.15

1.1498

88

400,000

0.83

0.8296

117

700,000

0.41

0.4098

145

1,000,000

0.46

0.4599

181

1,080,000

0.106

0.1059

398

2,000,000

0.116

0.1159

Subject 6

175

1,440,000

0.107

0.1069

209

800,000

0.168

0.1678

286

1,480,000

0.086

0.0859

730

1,200,000

0.12

0.1199

Subject 7

102

380,000

0.43

0.4296

130

460,000

0.23

0.2297

166

500,000

0.08

0.0797

372

1,250,000

0.14

0.1399

The corrected _{s,corr} is found by subtracting (_{s} - 1)/2_{0}/_{s} = 14 (Table _{0} are plotted in Figure _{s}, since the number _{0} of CD3 T-cells that we are starting with is always high.

Total divergence

By combining the individual contributions of flow and spectratype divergence, we defined the total divergence,

Discussion

The data used in our study came from flow cytometry and spectratype assays in both DiGeorge subjects after thymus transplantation and healthy adult volunteers. This study presents significant information regarding the utility of flow cytometry, as well as spectratyping, to assess the diversity of the antigen receptor repertoire. Importantly, these data identify a bias in measurement errors which must be corrected. The paper presents the relationships between the number of gated events in the flow cytometry assay, as well as the number of CD3 T-cells in the spectratype assay, and the information-theory measures, _{f} and _{s}, used as surrogates of TCR diversity.

We addressed a critical issue of estimator bias. Starting with the assumption that such a bias exists, we have derived ways to account for the error in the measured divergences. We show that _{f} and _{s} can be corrected by substracting a number inversely proportional to the sample size.

For the flow cytometry data, the constant of proportionality can either be deduced theoretically as a function of the total number of BV TCR families used in the flow cytometry assay, or derived from a statistical model applied to individual data. Both methods predict similar results, with the constant equal to 8.5 in the theoretical approach and 7.7 in the statistical approach. It is important to note that we found a direct correlation between the measured _{f} and the sample size in five out of eight subjects (see Table

**
Subject
**

**Correlation coefficient**

**p-value**

Control 1

0.99

0.0076

Control 2

0.98

0.0031

Control 3

0.32

0.58

Control 4

0.96

0.035

Subject 1

0.92

0.075

Subject 2

0.99

0.005

Subject 3

0.9

0.036

Subject 4

0.5

0.49

Our study allows us to predict a lower bound for the number of CD4 T-cells needed in the flow cytometry gated events. We have shown that at least 364 CD4 T-cells have to be counted as gated events for a 90% confidence in the _{f} measures. With fewer gated events, the _{f} measurement cannot be used as a substitute for diversity. This is particularly important to keep in mind when assessing patients with limited numbers of T-cells, such as those undergoing immune reconstitution following thymus, stem cell or bone marrow transplantation. Each of these is a clinical situation in which the development of the T-cell repertoire correlates to immune competency. Thus, these data provide a quantitative basis by which T-cell repertoire diversity can be assessed by flow cytometry.

For the spectratype data, the results are quite different. Although, using the same theoretical approach, we derive a constant, _{s}, and the sample size in four out of five patients (Table

**
Subject
**

**Correlation coefficient**

**p-value**

Subject 1

0.92

0.25

Subject 4

-0.98

0.11

Subject 5

0.66

0.15

Subject 6

0.97

0.03

Subject 7

0.64

0.35

The total divergence actively incorporates the flow divergence. Correction in the flow divergence, _{f}, guarantees independence of the total divergence,

Conclusions

In conclusion, sample size is a sensitive parameter in the predicted flow divergence values, but not in the spectratype divergence values. Although using flow cytometry to assess T-cell repertoire diversity is a valuable tool, one must have sufficient cells, or events, in the flow cytometry gate before using either the flow or the total divergence as a prediction for the TCR repertoire diversity.

Methods

Human subjects

Blood samples used in our study come from healthy adult controls and from infants with complete DiGeorge anomaly after thymus transplantation

**Antibody names**

**Clone**

**Family name**
^{∗}

The antibodies were purchased from Immunotech (Beckman Coulter) and used for the analysis. A kit IOTest Beta Mark became available during the study and was used in place of individually purchased antibodies. ^{∗}Nomenclature of the IMGT, the international ImMunoGeneTics information system

V

BL37.2

TRBV9

V

MPB2D5

TRBV20

V

CH92

TRBV28

V

WJF24

TRBV29

V

IMMU157

TRBV5

V

3D11

TRBV5

V

36213

TRBV5

V

ZOE

TRBV4

V

Zizou4

TRBV4

V

56C5

TRBV12

V

FIN9

TRBV3

V

C21

TRBV25

V

VER2.32.1

TRBV10

V

H132

TRBV6

V

JU-74

TRBV6

V

CAS1.1.3

TRBV27

V

TAMAYA 1.2

TRBV14

V

E17.5F3

TRBV19

V

BA62

TRBV18

V

ELL 1.4

TRBV30

V

IMMU 546

TRBV2

V

AF23

TRBV13

**Antibody names**

**Clone**

**Family name**
^{∗}

These antibodies are included in the kit but were not included in the analysis. ^{∗}Nomenclature of the IMGT, the international ImMunoGeneTics information system

V

IMMU 222

TRBV6-5 & 6-6 & 6-9

V

IG125

TRBV11-2

Human subjects

Subjects were enrolled in protocols that were approved by the Duke University Health System Institutional Review Board and were reviewed by the Food and Drug Administration under an Investigational New Drug application. All subjects were children. The parent(s) of each subject provided written informed consent.

Flow cytometry

Reference distributions of TCR BV family usage determined by flow cytometry were generated from peripheral blood samples of fifty healthy individuals (see Table

**Antibody names**

**Mean % of CD4 T-cells**

Note that the antibody used in flow cytometry assay covers approximately 70% of CD4 T-cells. The values are averaged across 50 normal volunteers.

V

3.21

V

9.79

V

4.80

V

2.58

V

6.78

V

0.97

V

0.70

V

1.89

V

1.12

V

4.71

V

3.48

V

0.73

V

1.85

V

2.66

V

1.84

V

3.03

V

0.91

V

5.79

V

1.96

V

2.35

V

4.12

V

0.45

Spectratyping

CD3 T-cells from the peripheral blood of patients were isolated. RNA was prepared and used for cDNA synthesis. The cDNA was used as a template for 23 TCR BV specific primer pairs to amplify the complete CDR3 region by PCR

CD4 T-cell spectratype data

**CD4 T-cell spectratype data. **Spectratype histograms show the number of CD4 T-cells bearing receptors versus CDR3 length for each TCR BV families tested.

Kullback-Leibler divergence

Let _{i},_{F}} be the relative frequencies corresponding to the ideal, perfectly sampled reference distribution of BV family _{F} is the number of BV families (in our case 18). Let _{i},_{F}} be the relative frequency of cells that use BV family

The flow Kullback-Leibler divergence is a measure of the distance between the two frequency distributions or, equivalently, it is the inefficiency of assuming that the distribution of BV family usage is _{i}, _{F}, when the true frequency usage is _{i},_{F}.

Similarly, let _{ij} = _{i}_{j/i},_{F} and _{C}}, and _{ij} = _{i}_{j/i},_{F} and _{C}}, respectively, be the relative numbers of T-cells of CDR3 lengths _{C} is the number of CDR3 lengths (in our case 14), (_{i} are the relative frequencies of cells which use the BV family _{j/i} the relative frequencies of of cells that have CDR3 length

and the total spectratype divergence, which is the average of spectratype divergences of TCR BV families _{F}} is given by

We can combine these two measures to obtain a total divergence measure from normal repertoire, derived as follows

Sampling bias - theoretical derivation

The distribution of BV family usage (CDR3 length within a BV family) of a perfectly sampled reference control can be described by a _{f} (_{s})-dimensional multinomial distribution with the parameter vector P, where _{i} is the relative numbers of T-cells that use the BV family (CDR3 length) _{i} are the relative numbers of T-cells that use the BV family (CDR3 length) ^{-1}, with a large _{i} are the relative numbers of T-cells that use the BV family (CDR3 length) _{f} (_{s}) is the dimension of the measured space,

For a large sampling number,

where _{i} large enough, can be approximated using Stirling’s formula (see

where

is the Kullback-Leibler divergence between

As shown in Kepler et al.

and

Moreover, as shown in Kepler et al. ^{-1} of _{i} around _{i}, leads to the following expression for (14)

where

and

From this, one can derive the expected values, _{D} up to order

From here we can derive the corrected individual divergence,

which relaxes the concern of variability due to sampling error.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

Conceived the study: BHD and TBK. Developed mathematical components: SMC and TBK. Developed empirical components: BHD and MLM. Interpreted results and wrote the manuscript: SMC, BHD, MLM and TBK. All authors read and approved the final manuscript.

Acknowledgements

This work was supported by National Institute of Health grants R01 AI 54843, R01 AI 47040, M03 RR60 (Duke General Clinical Research Center, National Center for Research Resources, National Institute of Health), and Office of Orphan Products Development, Food and Drug Administration, grant FD-R-002606. MLM and TBK are members of the Duke Comprehensive Cancer Center. We acknowledge the technical assistance of Marilyn Alexieff, Jie Li, Chia-San Hsieh, Jennifer Lonon and Julie E. Smith, the clinical research assistance of Stephanie Gupton and Alice Jackson, and the regulatory affairs assistance of Elizabeth McCarthy and Michele Cox are appreciated as is the clinical care by the faculty and fellows of the Duke Pediatric Allergy and Immunology Division. We acknowledge the collaboration of surgeons James Jaggers, Andrew Lodge, Henry Rice, Micheal Skinner, and Jeffrey Hoehner. We appreciate the assistance of Drs. Michael Cook and Scott Langdon in the Duke Comprehensive Cancer Center flow cytometry and sequencing facilities.