| INTEROBSERVER
AGREEMENT OF GLEASON SCORE AND MODIFIED GLEASON SCORE IN NEEDLE BIOPSY
AND IN SURGICAL SPECIMEN OF PROSTATE CANCER
(
Download pdf )
SERGIO G. VELOSO,
MARIO F. LIMA, PAULO G. SALLES, CYNTHIA K. BERENSTEIN, JOAO D. SCALON,
EDUARDO A. BAMBIRRA
Section of
Urology, Mario Penna Hospital, and Department of Pathology, School of
Medicine, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais,
Brazil
ABSTRACT
Introduction:
Gleason score, which has a high interobserver variability, is used to
classify prostate cancer. The most recent consensus valued the tertiary
Gleason pattern and recommended its use in the final score of needle biopsies
(modified Gleason score). This pattern is considered to be of high prognostic
value in surgical specimens. This study emphasized the evaluation of the
modified score agreement in needle biopsies and in surgical specimen,
as well as the interobserver variability of this score.
Materials and Methods: Three pathologists
evaluated the slides of needle biopsies and surgical specimens of 110
patients, reporting primary, secondary and tertiary Gleason patterns and
after that, traditional and modified Gleason scores were calculated. Kappa
test (K) assessed the interobserver agreement and the agreement between
the traditional and modified scores of the biopsy and of the surgical
specimen.
Results: Interobserver agreement in the
biopsy was K = 0.36 and K = 0.35, and in the surgical specimen it was
K = 0.46 and K = 0.36, for the traditional and modified scores, respectively.
The tertiary Gleason grade was found in 8%, 0% and 2% of the biopsies
and in 8%, 0% and 13% of the surgical specimens, according to observers
1, 2 and 3, respectively. When evaluating the agreement of the traditional
and modified Gleason scores in needle biopsy with both scores of the surgical
specimen, a similar agreement was found through Kappa.
Conclusion: Contrary to what was expected,
the modified Gleason score was not superior in the agreement between the
biopsy score and the specimen, or in interobserver reproducibility, in
this study.
Key
words: prostatic neoplasms; biopsy, needle; surgery; pathology
Int Braz J Urol. 2007; 33: 639-51
INTRODUCTION
Prostate
cancer tends to be morphologically heterogeneous (1), showing several
patterns of differentiation, classified by Gleason system (2). Prostate
needle biopsy provides random samples, which might not represent neoplasia
in all its heterogeneity, generally downgrading the tumor (3-5). By clinical
accompaniment, a worse prognosis was found in the patients who had small
proportions of Gleason patterns 4 and 5 tumors, which are not mentioned
in the Gleason score (6-10). From this observation, the concept of modified
Gleason score was created incorporating these small most aggressive patterns
in the patient’s score and being used in some prognostic nomograms
(11,12), Figure-1.
Several studies deal with interobserver
agreement of Gleason score, with all sorts of different results (13).
Other studies deal with the agreement as regards modified Gleason score
in slides (14). Recently, Helpap reported better association between needle
biopsy and surgical specimen using the modified Gleason score (11). Gleason
histopathological classification shows high level of subjectivity. Despite
its undeniable clinical importance, as a diagnostic method, the Gleason
score, more precisely the modified score, needs to be evaluated in relation
to its reliability. Taking this into account, we tried to evaluate the
interobserver agreement and the association between needle biopsy and
the surgical specimen adopting Gleason and modified Gleason scores.
MATERIALS
AND METHODS
A
hundred and ten patients suffering from prostate cancer without any previous
treatment and who would be referred to a radical prostatectomy agreed
to participate in the research. They signed the consent term and sent
their needle sextant biopsies, coming from different laboratories, to
be reevaluated. Those biopsies had about two cores per sextant, mean total
of 12 cores (range 6 to 24 cores). The surgical specimen was processed
in the same laboratory, by partial sampling, producing about nine slides
per surgical specimen (range 7 to 20), evaluating apex, distal third,
mid third, proximal third, bladder neck, right and left seminal vesicles.
Thus, the surgical specimen was not processed as a whole. All material
was stained with hematoxylin-eosin. All the available slides of the needle
biopsy and of the surgical specimens, with or without cancer, were evaluated
by the observers.
Three pathologists belonging to different
services of Pathological Anatomy examined the slides of the needle and
surgical samples of these patients. They did not know the clinical data
nor did they know about the pairing between needle biopsy and surgical
specimen. They filled in a protocol in which they should classify the
primary, secondary and the most aggressive Gleason patterns of each examined
area of both specimens. At the end of this task, Gleason score was calculated
(the sum of primary and secondary patterns) of each sextant separately
(12). The score of the specimen was the highest score found among the
evaluated sextants, therefore, the global score was not calculated (4,5,7,15).
From the most aggressive Gleason pattern, the tertiary pattern was determined,
whenever it was possible. The modified Gleason score was calculated (the
sum of primary and tertiary patterns) (12). Similarly, the highest modified
score of the examined slides was adopted as the modified Gleason score
of the specimen. The primary Gleason pattern was defined as the most frequent
Gleason pattern of the sample. The secondary Gleason pattern was the second
most frequent pattern, obligatory higher than 5% of the tumor area (12).
When the secondary pattern was less than 5%, the primary pattern was repeated.
The tertiary Gleason pattern corresponded to the third Gleason pattern,
necessarily more aggressive than the secondary pattern (12). In order
to avoid terminology confusion, the Gleason score was called traditional
Gleason score, being clearly differentiated from the modified Gleason
score.
Data were collected in a data bank and statistically
evaluated by Stata program version 9.1 (StatCorp. 4905 Lakeway Dr, College
Station, USA). Kappa (K) and weighted Kappa test were used to evaluate
the interobserver agreement and the agreement between the Gleason score
of the two specimens. The interpretation of the agreement by Kappa value
was done by the intervals: K < 0, poor; K = 0-0.2, slight; K = 0.2-0.4,
fair; K = 0.4-0.6, moderate; K = 0.6-0.8, substantial; and K = 0.8-1.0,
almost perfect (13). In the statistic inferences, in general, the level
of significance of 5% was adopted and, consequently, a confidence level
of 95% was used.
RESULTS
The
samples’ mean age was 63.5 +/- 7.7 years old (range 44 to 79 years
old). The mean preoperative PSA was 10.2 +/- 8.2 ng/mL (range 1.2 to 53.4
ng/mL). The clinical tumor staging (digital rectal examination) was 46.7%
of T1, 47.5% of T2 and 5.8% of T3. In the initial anatomic pathological
test, extracapsular tumor extension was found in 17% (pT3a) and in seminal
vesicles invasion in 11% (pT3b).
The three pathologists are specialized in
the same university even though they nowadays work in different hospitals
and laboratories. The experience of working in surgical pathology and
the weekly amount of prostate tests, criteria adopted by Taille (13),
allow us to classify the observers 1 and 2 as experienced and the observer
3 as less experienced.
Some slides, considered unsatisfactory,
were rejected from the research.
In the biopsies, there was a predominance
of Gleason pattern 3 in the primary pattern, range from 66% to 86%, and
of secondary pattern, range from 63% to 71% among the observers. Similarly,
in the surgical specimens, Gleason pattern 3 was more frequent in the
primary pattern, being found from 75% to 81%, and as secondary pattern
from 60% to 69% of the observations. There was an absolute predominance
of Gleason grade 3 in the primary and secondary grade in both specimens.
Gleason grade 6 was also predominant in the needle biopsy. In the surgical
specimen there was a similar proportion of Gleason score 6 and 7. Table-1
shows the distribution of traditional and modified Gleason scores.
Interobserver agreement in needle biopsy
as regards to primary Gleason grade was reasonable to moderate, according
to Kappa. In the surgical specimen, the agreement was moderate to substantial.
In the secondary Gleason pattern there was a divergence among the observers,
the agreement was generally low, occasionally reasonable. As for the most
aggressive Gleason pattern it was from reasonable to moderate (Table-2).
Interobserver agreement of traditional Gleason
score in the needle samples was reasonable, with exact agreement among
60% to 68% and agreement +/- 1 Gleason score from 91% to 98%. In the specimens
the agreement was from reasonable to moderate, with exact diagnosis from
66% to 71% and accepting difference of one unit from 96% to 99%. Modified
Gleason grade presented similar agreement in both specimens, being reasonable
to moderate. Exact diagnosis in the biopsy was from 58% to 69% and accepting
agreement +/- 1 Gleason score from 86% to 97%. In the specimen the exact
diagnosis was from 60% to 64%, accepting divergence of one unit changing
from 94% to 95%. By adopting weighted Kappa, values similar to Kappa (not
weighted) were found (Table-3).
Tertiary Gleason pattern was diagnosed in
8%, 0% and 2% of the biopsies and in 8%, 0% and 13% of the surgical specimen
according to observers 1, 2 and 3, respectively. Thus, traditional and
modified Gleason scores, according to observer 1, were the same in 92%
of both specimens. Observer 2 did not consider any pattern as tertiary,
having 100% precision between the two Gleason scores. Examiner 3 had 98%
of the needle biopsies and 87% of the surgical specimens with the same
diagnosis between the two scores.
Traditional and modified Gleason scores
were used to evaluate the association among their scores in both specimens
by each observer. For observer 1, adopting the traditional score in needle
biopsy and in surgical specimen K = 0.24 was found. Adopting the modified
score in the biopsy and the traditional one in the specimen, we got K
= 0.21. The same happened when using the modified score in the needle
biopsy and in the surgical specimen. Examiner 2 did not find any difference
in the association of scores between specimens (K = 0.26). When examiner
3 used the traditional score in the needle biopsy and in the surgical
specimen, the value for Kappa was 0.18 and when using the modified score
in the biopsy and the traditional one in the specimen, Kappa was 0.17.
Adopting the traditional Gleason score in both specimens, lower downgrading
in needle biopsy was found than by adopting the modified score in both
samples (Table-4).
COMMENTS
The
sample used reflects a group of patients referred to radical prostatectomy,
in other words, young patients, with localized illness and generally low
Gleason score. The three observers, also young, had similar academic and
professional background and learned the Gleason system during medical
residence in the same institution. Therefore, a good agreement among them
would be expected.
Higher agreement of primary Gleason pattern
was found in the surgical specimen and not in the needle biopsy. By observing
smaller areas, it is expected that more attention would be devoted to
a specific area and higher agreement would happen. On the other hand,
once the specimen is better represented in tissue extension, the suspected
areas with borderline pattern were better examined, resulting in higher
agreement. This reflects the difficulties in diagnosing secondary pattern,
which besides involving the identification of Gleason patterns, demands
tumor volume determination. As a rule, secondary Gleason pattern is the
one that is more than 5% of the tumor area and with smaller extension
than the primary pattern. Determining the tumor extension is not necessary
for the diagnosis of the most aggressive Gleason pattern, the recognition
of the worst pattern is sufficient. Glaessgen found a weak agreement as
regards the diagnosis of the most aggressive patterns and considered that
the difficulty in diagnosing them was bigger than in determining their
volume (14). The experience did not influence the agreement much because
it was not higher between the more experienced observers, what contradicts
some authors (15,16).
Interobserver agreement of traditional Gleason
score was slightly higher in the surgical specimen than in the needle
biopsy. By adopting the modified Gleason score, the agreement was similar
in needle biopsy and surgical specimen. In general, adopting weighted
Kappa, the agreement values were a little higher, but without altering
the previous relations. It is interesting to notice that the modified
Gleason score did not show any superiority over traditional score, as
Glaessgen reported (14). Evaluating the agreement in relation to the patterns,
it is higher in the primary pattern and in the most aggressive one (this
is intimately related to tertiary pattern) and too low in the secondary
pattern. The modified score would be expected to obtain a higher agreement,
but this did not happen. This fact might have happened due to the small
number of tertiary pattern diagnosed and, as a result, the two scores
were similar. However, this number is similar to the one found in Griffiths‘
study, where the diagnostic proportion of tertiary Gleason pattern was
6% for general pathologists and 9% for uropathologists, showing weak agreement
in relation to tertiary pattern (17). This pattern, in general, refers
to patterns 4 or 5, which can present borderline structures making the
diagnosis more difficult (3). Generally, the studies regarding the use
of tertiary pattern use it in the prognostic evaluation, in surgical specimens.
(8-10) Mosse, when evaluating the prognosis of patients with tertiary
pattern 5, found a worse prognosis in those with Gleason score 6 or 7
in the surgical specimens. (8) It is known that, statistically, those
scores are the most frequent ones.
Considering that prostate cancer is heterogeneous
and multicentric (1), it is assumed that the biopsy, which samples a small
portion of it, might not represent it efficiently (3-5). Traditionally
it is believed that Gleason score in needle biopsy tends to downgrade
the surgical specimen, because a less differentiated pattern may not have
been sampled in the biopsy (4,12). That was observed by observers 1 and
2. Taking the downgrading concept as a starting-point, some authors suggest
the use of modified Gleason score, which would better reflect the real
tumor characteristics for it values the most aggressive small patterns
(6). The International Society of Urological Pathology (ISUP) on Gleason
grading recommends the inclusion of tertiary pattern (modified Gleason
score) in needle biopsies. (12) In the surgical specimens, however, it
is still recommended to mention the tertiary pattern, whenever it is present,
without including it in the score (Gleason score). (12) Considering that
the needle biopsy downgrades the score, it was expected that the modified
Gleason score would have a better agreement with the traditional Gleason
score in the surgical specimen. However this fact could not be demonstrated.
The modified score in needle biopsy compared to traditional and modified
score in the specimen, presented the same Kappa values or even slightly
inferior ones when adopting the traditional score in the biopsy. The best
representation in the biopsy was not proved when adopting the modified
Gleason score. This fact, as previously mentioned, might have happened
due to the low diagnosis of tertiary pattern. Helpap, on the contrary,
evaluating slides of 368 patients, found improvement of the exact agreement
between the two specimens using the modified Gleason score instead of
the traditional score, ranging from 58% to 78% (8). However, he did not
use the Kappa test to evaluate the real agreement, nor reported the diagnostic
proportion of tertiary pattern.
CONCLUSIONS
In
this study, the modified Gleason score did not prove to be superior in
reproducibility compared to the traditional Gleason score, both in the
needle biopsy and in the surgical specimen. Contrary to what was expected,
the use of the modified score in the biopsy was not superior to the traditional
score, comparing to the Gleason scores of the specimen. Within the aim
of the study, the modified Gleason score was not superior to the traditional
one. These conclusions might be due to the methodology used, as well as
to the observers involved. Isolated morphological analysis is based in
criteria of low reproducibility. It is necessary to reevaluate the association
between the two Gleason scores, using different samples with a higher
amount of tertiary pattern.
ACKNOWLEDGMENT
Dr.
Sergio G. Veloso has a CNPq Grant, Ministry of Technology, Brazil.
CONFLICT
OF INTEREST
None
declared.
REFERENCES
- Aihara M, Wheeler TM, Ohori M, Scardino PT: Heterogeneity of prostate
cancer in radical prostatectomy specimens. Urology. 1994; 43: 60-6;
discussion 66-7.
- Billis A, Pompeo AC: Adenocarcinoma da próstata. Int Braz
J Urol. 2003; 29 (suppl 1): 27-34.
- Montironi R, Mazzuccheli R, Scarpelli M, Lopez-Beltran A, Fellegara
G, Algaba F: Gleason grading of prostate cancer in needle biopsies or
radical prostatectomy specimens: contemporary approach, current clinical
significance and sources of pathology discrepancies. BJU Int. 2005;
95: 1146-52.
- Lopez-Beltran A, Mikuz G, Luque RJ, Mazzucchelli R, Montironi R:
Current practice of Gleason grading of prostate carcinoma. Virchows
Arch. 2006; 448: 111-8.
- Egevad L, Allsbrook WC Jr, Epstein JI: Current practice of Gleason
grading among genitourinary pathologists. Hum Pathol. 2005; 36: 5-9.
- Pan CC, Potter SR, Partin AW, Epstein JI: The prognostic significance
of tertiary Gleason patterns of higher grade in radical prostatectomy
specimens: a proposal to modify the Gleason grading system. Am J Surg
Pathol. 2000; 24: 563-9.
- Rioux-Leclercq NC, Chan DY, Epstein JI: Prediction of outcome after
radical prostatectomy in men with organ-confined Gleason score 8 to
10 adenocarcinoma. Urology. 2002; 60: 666-9.
- Helpap B, Egevad L: The significance of modified Gleason grading
of prostatic carcinoma in biopsy and radical prostatectomy specimens.
Virchows Arch. 2006; 449: 622-7.
- Epstein JI, Allsbrook WC Jr, Amin MB, Egevad LL; ISUP Grading Committee:
The 2005 International Society of Urological Pathology (ISUP) Consensus
Conference on Gleason Grading of Prostatic Carcinoma. Am J Surg Pathol.
2005; 29: 1228-42.
- De la Taille A, Viellefond A, Berger N, Boucher E, De Fromont M,
Fondimare A, et al.: Evaluation of the interobserver reproducibility
of Gleason grading of prostatic adenocarcinoma using tissue microarrays.
Hum Pathol. 2003; 34: 444-9.
- Glaessgen A, Hamberg H, Pihl CG, Sundelin B, Nilsson B, Egevad L:
Interobserver reproducibility of modified Gleason score in radical prostatectomy
specimens. Virchows Arch. 2004; 445: 17-21.
- Hollenbeck BK, Bassily N, Wei JT, Montie JE, Hayasaka S, Taylor JM,
et al.: Whole mounted radical prostatectomy specimens do not increase
detection of adverse pathological features. J Urol. 2000; 164: 1583-6.
- Kunz GM Jr, Epstein JI: Should each core with prostate cancer be assigned
a separate gleason score? Hum Pathol. 2003; 34: 911-4.
- Glaessgen A, Hamberg H, Pihl CG, Sundelin B, Nilsson B, Egevad L:
Interobserver reproducibility of percent Gleason grade 4/5 in total
prostatectomy specimens. J Urol. 2002; 168: 2006-10.
- Allsbrook WC Jr, Mangold KA, Johnson MH, Lane RB, Lane CG, Epstein
JI: Interobserver reproducibility of Gleason grading of prostatic carcinoma:
general pathologist. Hum Pathol. 2001; 32: 81-8. Erratum in: Hum Pathol
2001; 32: 1417.
- Allsbrook WC Jr, Mangold KA, Johnson MH, Lane RB, Lane CG, Amin MB,
et al.: Interobserver reproducibility of Gleason grading of prostatic
carcinoma: urologic pathologists. Hum Pathol. 2001; 32: 74-80.
- Griffiths DF, Melia J, McWilliam LJ, Ball RY, Grigor K, Harnden P,
et al.: A study of Gleason score interpretation in different groups
of UK pathologists; techniques for improving reproducibility. Histopathology.
2006; 48: 655-62.
____________________
Accepted
after revision:
August 8, 2007
_______________________
Correspondence address:
Dr. Sergio Geraldo Veloso
Rua Henrique Benfenatti, 237
São João del-Rei, MG, 36307-042, Brazil
Fax: + 55 32 3371-8003
E-mail: velososg@ig.com.br
EDITORIAL COMMENT
At
a consensus conference organized in 2005 by the International Society
of Urological Pathology (ISUP), the Gleason grading system underwent its
first systematic revision (1). The purpose of the meeting was to standardize
both the perception of histological patterns and how the grade information
is compiled and reported. One of the decisions of the ISUP working group
was that high-grade tumor of any quantity on needle biopsy should be included
in the Gleason score. The ISUP recommendations contribute to a general
shift upwards of the Gleason scores and it may be necessary to re-iterate
some previous studies on grading of prostate cancer. Helpap et al. recently
compared conventional and modified Gleason grading in radical prostatectomy
specimens and preoperative biopsies and reported on the distribution of
modified Gleason score and its correlation with other prognostic factors
such as age, stage and serum PSA (2-4). Few studies have been performed
on interobserver reproducibility of this new variant of Gleason grading.
In
a study by Glaessgen et al., the reproducibility of modified Gleason grading
among four genitourinary pathologists was analyzed using a set of 69 consecutive
radical prostatectomy specimens (5). Mean weighted kappa for conventional
and modified Gleason score were 0.56 (range 0.52-0.66) and 0.58 (range
0.49-0.74), respectively. This study was carried out before the ISUP consensus
meeting was held and only addressed the effect of inclusion of tertiary
patterns of higher grade in the Gleason score. Hence, recent changes in
pattern recognition were not taken into account. Furthermore, the ISUP
recommendations to include tertiary higher patterns in the score pertained
to needle biopsies, while the study by Glaessgen et al. was done on radical
prostatectomy specimens only (5).
Veloso
et al., in this paper, present a similar study on the reproducibility
of a modified Gleason grading, now done on both needle biopsies and radical
prostatectomy specimens. Again, only the effect of inclusion of tertiary
higher patterns was studied. In needle biopsies a weighted kappa of 0.36
was reached both with conventional and modified Gleason grading. In radical
prostatectomy specimens, the weighted kappa was 0.46 and 0.36, respectively.
This interobserver agreement was slightly lower than that of previous
studies. For example, in a biopsy study on conventional Gleason score
by Glaessgen et al., a weighted kappa of 0.48 to 0.55 (mean 0.51) was
reached among 4 genitourinary pathologists using a consecutive series
of needle biopsies from 69 men (279 glass slides) (6). Allsbrook et al.
circulated 46 needle biopsies containing prostatic carcinoma among 10
genitourinary pathologists (6). The weighted kappa for Gleason score ranged
from 0.56 to 0.70. However, the biopsies of this series were selected
rather than consecutive which may lead to a better reproducibility.
From
studies performed so far, it seems that the interobserver reproducibility
of the Gleason grading remains essentially the same with modified Gleason
grading and results are probably more influenced by the study design.
Revision
of a grading system may be necessary when we gain new knowledge of the
biology of cancer. However, it must also be remembered that a revision
has consequences in terms of modified prognostic impact of a certain grade
and also warrants new studies to verify the value of the novel grading
system (7). Whether modified Gleason grading of needle biopsies is superior
as predictor of prognosis remains to be seen.
REFERENCES
- Epstein JI, Allsbrook WC Jr., Amin MB, Egevad L: The 2005 International
Society of Urological Pathology (ISUP) Consensus Conference on Gleason
grading of prostatic carcinoma. Am J Surg Pathol. 2005; 29: 1228-42.
- Helpap B, Egevad L: Correlation of modified Gleason grading with
pT stage of prostatic carcinoma after radical prostectomy. Anal Quant
Cytol Histol (in press).
- Helpap B, Egevad L: Correlation of modified Gleason grading of prostate
carcinoma with serum PSA, age and tumor extent in needle biopsy specimens.
Anal Quant Cytol Histol (in press).
- Helpap B, Egevad L: The significance of modified Gleason grading
of prostatic carcinoma in biopsy and radical prostatectomy specimens.
Virchows Arch. 2006; 449: 622-7.
- Glaessgen A, Hamberg H, Pihl CG, Sundelin B, Nilsson B, Egevad L:
Interobserver reproducibility of modified Gleason score in radical prostatectomy
specimens. Virchows Arch. 2004; 445: 17-21.
- Glaessgen A, Hamberg H, Pihl CG, Sundelin B, Nilsson B, Egevad L:
Interobserver reproducibility of percent Gleason grade 4/5 in prostate
biopsies. J Urol. 2004; 171: 664-7.
- Albertsen PC, Hanley JA, Barrows GH, Penson DF, Kowalczyk PD, Sanders
MM, et al.: Prostate cancer and the Will Rogers phenomenon. J Natl Cancer
Inst. 2005; 97: 1248-53.
Dr. Lars
Egevad
Department of Pathology & Cytology
Karolinska Hospital
Stockholm, Sweden
E-mail: lars.egevad@ki.se
EDITORIAL COMMENT
This
paper by Dr Veloso et al. deal with interobserver agreement of Gleason score
and modified Gleason score (1) in needle biopsy and in surgical specimen
of prostate cancer. This group of authors found that the modified Gleason
score was not superior in the agreement between the biopsy score and the
specimen, or in interobserver reproducibility.
The
Gleason grading system is a powerful tool to prognosticate and aid in the
treatment of men with prostate cancer. The needle biopsy Gleason score correlates
with virtually all other pathologic parameters, including tumor volume and
margin status in radical prostatectomy specimens, serum PSA levels and many
molecular markers. The Gleason score assigned to the tumor at radical prostatectomy
is the most powerful predictor of progression following radical prostatectomy.
However, there exist significant deficiencies in the practice of this grading
system. Not only does there exist problems among practicing pathologists
but also a relative lack of interobserver reproducibility among experts.
Correlation
There
have been several studies addressing the correlation between Gleason scores
in needle biopsies and corresponding radical prostatectomy specimens.
Although earlier studies used the thicker (14-gauge) needle biopsies (2,3),
more recent series based on thin-core (18-gauge) needles used in conjunction
with biopsy guns attached to transrectal ultrasound. Sextant or other
modes of systematic sampling are typically performed in the more current
series. In a recent compilation of data on 3,789 patients from 18 studies,
exact correlation of Gleason scores was found in 43% of cases and correlation
plus or minus one Gleason core unit in 77% of cases (4). Under-grading
of carcinoma in needle biopsy is the most common problem, occurring in
42% of all reviewed cases. Importantly, over-grading of carcinoma in needle
biopsies may also occur, but this was only found in 15% of cases. In general,
adverse findings on needle biopsy accurately predict adverse findings
in the radical prostatectomy specimen, whereas favorable findings on the
needle biopsy do not necessarily predict favorable findings in the radical
prostatectomy specimens in large part due to sampling error.
Sources of Discrepancies
Sampling error
Perhaps
the most important factor is sampling error, which relates to the small
amount of tissue removed by thin-core needle biopsies. The average 20-mm,
18-gauge core samples approximately 0.04% of the average gland volume
(40 cc). The most common type of sampling error occurs when there is a
higher grade component present within the radical prostatectomy specimen,
which is not sampled on needle biopsy (5). This typically occurs when
a needle biopsy tumor is graded as Gleason score 3 + 3 = 6. In the radical
prostatectomy, there exists a Gleason pattern 4, which was not sampled
on the biopsy, resulting in a prostatectomy Gleason 3 + 4 = 7.
In some instances, under-grading results from an attempt to grade very
tiny areas of carcinoma, so-called minimal or limited adenocarcinoma (6).
Scores of minimal adenocarcinoma in needle biopsies show a reasonably
strong correlation with radical prostatectomy scores, but the Gleason
scores do not have the same power to predict extra-prostatic extension
and positive margin status as they do in non-minimal carcinomas (6).
Over-grading
can result from sampling error in cases where the high-grade pattern is
selectively represented in needle biopsy. It may only represent a very
minor element in the radical prostatectomy specimen. Even the same cancer
focus may have different grades depending on the area sampled.
Borderline cases
The
other source of discrepancy between biopsy and radical prostatectomy is
borderline cases. In the description of the Gleason grading system, there
are some cases that are right at the interface between two different patterns
where there will be inter-observer variability and possible even intra-observer
variability (7).
Pathology error
Pathology
error is most frequently seen when pathologists assigned a Gleason score
of £ 4 on a needle biopsy, which in fact was Gleason score 5-6.
Many pathologists under-grade needle biopsies by confusing quantitative
changes with qualitative changes. When there is a limited focus of small
glands of cancer on needle biopsy, by definition this is a Gleason pattern
3. Gleason pattern 3 consists of small glands with an infiltrative pattern.
Biopsying truly low-grade adenocarcinoma of the prostate could not result
in just a few neoplastic glands but rather would be more extensive, as
low-grade adenocarcinoma grows as nodules of closely packed glands rather
than infiltrating in and amongst normal glands.
Under-grading may result from difficulty
in recognizing an infiltrative growth pattern or failing to recognize
the presence of small areas of gland fusion (7).
Pathologists’
education and experience
The pathologists’ experience in grading
thin-core needle biopsies can also influence overall correlation
with radical prostatectomy results. With experience, pathologists recognize
grading pitfalls; in particular, the fact that Gleason scores of 4 and
lower are almost non-existent in needle biopsy situation. Furthermore,
small areas of fusion in the presence of a predominantly grade 3 background
are recognized and will yield a Gleason score of 7, which often correlates
well with radical prostatectomy results (8).
Intra-observer and interobserver variability
Reproducibility
studies can be categorized as intra-observer and interobserver. For investigations
of intra-observer agreement of Gleason grades, exact agreement was reported
in 43% to 78% of cases (8,9), and agreement within plus or minus one Gleason
score unit was reported in 72% to, 87% of cases. Gleason wrote tat he
duplicated exactly his previous histologic scores approximately 50% of
times. Highly variable levels of interobserver agreement on Gleason scores
have also been reported, with range of 36% to 81% for exact agreement
and 69% to 86% observers within plus or minus one Gleason score unit.
Improvements in Gleason grading reproducibility can be achieved by recognizing
problematic areas and educating physicians via meetings, courses, website
tutorials, and publications that specifically focus on the Gleason grading
system (10).
REFERENCES
- Epstein JI, Allsbrook WC Jr., Amin MB, Egevad LL, ISUP Grading Committee:
The 2005 International Society of Urological Pathology (ISUP) Consensus
Conference on Gleason Grading of Prostatic Carcinoma. Am J Surg Pathol.
2005; 29: 1228-42.
- Garnett JE, Oyasu R, Grayhack JT: The accuracy of diagnostic biopsy
specimens in predicting tumor grades by Gleason’s classification
of radical prostatectomy specimens. J Urol. 1984; 131: 690-3.
- Mills SE, Fowler JE Jr: Gleason histologic grading of prostatic carcinoma.
Correlation between biopsy and prostatectomy specimens. Cancer. 1986;
57: 346-9.
- Humphrey PA: Prostate Pathology. Chicago, ASCP Press. 2003; p. 138.
- Algaba F, Chivite A, Santaularia JM, Oliver A: Evidence of the radical
prostatectomy Gleason score in the biopsy Gleason score. Actas Urol.
Esp. 2004; 28: 21-6.
- Rubin MA, Dunn R, Kambham PA, Misick CP, O’Toole KM: Should
a Gleason score be assigned to a minute focus of carcinoma on prostate
biopsy? Am J Surg Pathol. 2000; 24: 1634-40.
- Steinberg DM, Sauvageot J, Piantadosi S, Epstein JI: Correlation
of prostate needle biopsy and radical prostatectomy Gleason grade in
academic and community setting. Am J Surg Pathol. 1997; 21: 566-76.
- Cintra ML, Billis A: Histologic grading of prostatic adenocarcinoma.
Intra-observer reproducibility of the Mostofi, Gleason, and Böcking
grading systems. Int Urol Nephrol. 1991; 23: 449-54.
- Özdamar SO, Sarikaya S, Yildiz L, Atilla MK, Kandemir B, Yildiz
S: Intra-observer and interobserver reproducibility of WHO and Gleason
histologic grading systems in prostatic adenocarcinomas. Int Urol Nephrol.
1996; 28: 73-7.
- Egevad L, Allsbrook WC, Epstein JI: Current practice of Gleason grading
among genitourinary pathologists. Hum Pathol. 2005; 36: 5-9.
Dr. Rodolfo Montironi
Institute of Pathological Anatomy
Polytechnic University of the Marche Region
Ancona, Italy
E-mail:r.montironi@univpm.it
Dr. Liang Cheng
Dept. of Pathology and Laboratory Medicine
Indiana University School of Medicine
Indianapolis, IN, USA
E-mail: liang_cheng@yahoo.com
EDITORIAL COMMENT
In
the original Gleason system, the most common and second most common grade
patterns are added to arrive at the Gleason score with tertiary patterns
not factored in. For example, in a needle biopsy with Gleason score 3 +
4 = 7, a smaller tertiary component of very high grade pattern 5 tumor would
not be factored in. In the Consensus Conference on Updating the Gleason
grading system, it was recommended that a tertiary component of higher grade
tumor on biopsy be included within the Gleason score by adding the most
common and highest grade patterns. In the above example, this would result
in a Gleason score of 3 + 5 = 8. This study by Veloso et al. found that
the interobserver reproducibility for the modified biopsy Gleason score
was not superior to the routine Gleason score and was also not more accurate
in predicting radical prostatectomy Gleason score. The major limitation
of their study, as they acknowledge, is the limited number of cases with
a tertiary pattern on biopsy, ranging from 0%, 2%, to 8% amongst the three
observers out of a total of 110 patients. With such small numbers, it would
be impossible to show any differences between the routine and modified Gleason
score. In a recent paper on 2,370 men with prostate cancer, Patel et al.
also found that Gleason score 7 with tertiary pattern 5 was uncommon, occurring
in 1.5% of cases (1). However, they documented that Gleason score 7 tumor
on biopsy with tertiary pattern 5 has the same prognosis as Gleason score
8 tumor when treated by radiotherapy or radical prostatectomy. These findings
are in concert with several studies that have documented the same adverse
prognostic significance of tertiary pattern 5 in radical prostatectomy specimens.
The growing body of evidence suggests that Gleason score 3 + 4 with tertiary
pattern 5, whether on biopsy or radical prostatectomy, should be considered
as Gleason score 8.
REFERENCE
1. Patel AA, Chen M, Renshaw AA, Da Amico AV: PSA failure following definitive
treatment of prostate cancer having biopsy Gleason score 7 with tertiary
grade 5. JAMA 2007; 298: 1533-38.
Dr. Jonathan I. Epstein
Dept of Pathology, Urology and Oncology
The Johns Hopkins Hospital
Baltimore, MD, USA
E-mail: jepstein@jhmi.edu
|