診断研究メタアナリシス報告事例

1
臨床疫学研究における報告の質向上のための統計学の研究会
2014/10/25
診断研究のMeta-analysis
報告事例
藤原崇志
愛媛大学耳鼻咽喉科

報告事例の題材
Accuracy of rapid influenza diagnostic test:
a meta-analysis
Chartrand C, et al. Ann Intern Med. 2012;156(7):500-11
（フリーで入手可です）

背景と目的①
ü インフルエンザ
毎年多くのインフルエンザ患者が世界中で発症
　重症化例： 300～500万人/年
　関連死　： 25～30万人/年
感染力が強く、いっきに流行する
感染コントロールには早期診断が重症
ü インフルエンザの診断
確定診断のゴールドスタンダードはウイルス培養
もしくはRT-PCR

背景と目的②
ü インフルエンザの診断
ウイルス培養、RT-PCTは時間・コストがかかる
➡　臨床上は使用に耐えない
そこで、15分～30分で結果が判定できる
インフルエンザ迅速検査が使用されている

背景と目的③
ü インフルエンザ迅速検査の過去の研究の限界
ただし、過去の研究は特異度は90％以上と一致
一方で、感度の報告が10～80%とばらついている。
Unfortunately, RIDTs may have inconsistent accuracy, with reported
sensitivity ranging from 10% to 80%
これまでのレビューでは小児に限られ、また
成人の報告は1つの検査キットのみ評価。
Previous systematic reviews have been limited to pediatric studies
or have addressed only 1 commercial RIDT.

① 研究を集める
　（網羅的検索および研究の選択）
② 研究のデータ抽出と質の評価
③ 研究の統合と異質性評価

方法①　対象データベースの設定
ü 対象としたデーターベース検索
PubMed, EMBASE, BIOSIS, Web of Science
We searched 4 electronic databases: PubMed (January1950 to
December 2011), EMBASE (January 1980 to December 2011), BIOSIS
(January 1969 to March 2010), and Web of Science (January 1980 to
March 2010). The databases were searched in March 2010.
ü データーベース以外の検索
メタアナリシスに組み入れた研究の参考文献、
総説、ガイドライン、ハンドサーチ、
関連する研究者への連絡を行った。
Bibliographies of included studies, recent narrative reviews on RIDTs,
and guidelines on influenza were hand-searched for additional relevant
studies. Diagnostic manufacturers were also contacted to get additional
or unpublished studies.

方法②　データーベースの検索方法
ü データーベースの検索方法
疾患名（インフルエンザ）、ウイルス名と
検査名（一般名、商品名）を用いて検索
The search strategy contained search terms for the influenza disease or
virus combined with search terms for rapid diagnostic immunoassays,
including brand names for the most common commercial RIDTs

一般的な診断メタ分析における検索の話
ü 調べたい検査（Index test）と病気（target condition）を
キーワードにして探す。
Index test and target condition will generally be the focus of the
search, although...(以下略).
病気に関する
網羅的検索
検査に関する
網羅的検索
Handbook
for
DTA
Reviews,
chapter
7
(Searching
for
studies)

一般的な診断メタ分析における検索の話
ü 検索する場合にはシソーラス検索、キーワード検
索を併用する。
ü RCTの検索の時のような“研究デザインのフィル
ター”は十分に確立してないので検索に使用しない。
Routine use of methodology search filters to identify diagnostic
test accuracy studies should be avoided as...(以下略)
Handbook
for
DTA
Reviews,
chapter
7
(Searching
for
studies)

RCTの検索とDTAの検索
　　 RCTの種類　　 DTAの種類
研究デザインに
関する網羅的検索
Handbook
for
DTA
Reviews,
chapter
7
(Searching
for
studies)
病気に関する
網羅的検索
介入に関する
網羅的検索
病気に関する
網羅的検索
検査に関する
網羅的検索

今回の研究における検索式
Search terms for influenza included:
“Influenza, Human” [MeSh] OR
“Influenza A virus” [MeSh] OR
“Influenza B virus” [MeSh] OR
“influenza” OR
“flu” OR
“grippe.”
Search terms for the tests included:
“rapid test*” OR
“rapid diagnos*” OR
“rapid diagnostic test*” OR
“point-of-care test*” OR
“antigen detection test*” OR
“antigen detection” OR
“rapid antigen test*” OR
“immunoassay*” OR
“immunochromatographic test*” OR
“Binax NOW” OR
“Directigen Flu” OR
“Flu OIA” OR
“QuickVue Influenza” OR
“Rapid Detection Flu” OR
“SAS Influenza” OR
“TRU FLU” OR
“XPECT flu” OR
“Zstat flu.”

ü 試しにやってみた。（PubMed, 2014/10/18）
#1 “Influenza, Human” [MeSh] Results = 35,835
#2 “Influenza A virus” [MeSh] Results = 31,992
#3 “Influenza B virus” [MeSh] Results = 3,002
#4 “influenza” Results = 79,893
#5 “flu” Results = 9,118
#6 “grippe.” Results = 278
#7 or/1-7 Results = 84,235
検査の検索には“ * ”マークを使用。ただ病名（influenza）で
例えば“Influenza*”で検索するとHibが検索にひっかかって検索が膨大になり、
検索精度が落ちるため行っていないよう。
シソーラス検索で病名（#1）および原因ウイルス（#2, #3）でカバー。
キーワード検索（#4）も併用。
シソーラス検索
キーワード検索

ü 試しにやってみた。（PubMed, 2014/10/19）
#1 antigen detection test* Results = 418
#2 antigen detection Results = 65,265
#3 rapid antigen test* Results = 243
#4 immunoassay Results = 1,244,040
#5 immunochromatographic test* Results = 498
#6 or/1-6 Results = 1,275,544
#7 “Immunochromatography”[mesh] Results = 369
#8 “immunoassay”[mesh] Results = 428,462
#9 7 or 8 Results = 428,462
#10 6 or 9 Results = 1,275,544
検査の検索はキーワード検索のみで検索している。
シソーラス検索の候補として“Immunoassay” [mesh] などもあるが
キーワード検索の“Immunoassay” が膨大にキーワードをカバーしているので、
上記の検索候補をいれても検索結果は増えない。
論文中の
検索
試してみた
検索

方法③　対象研究の設定
ü 最終診断の定義からの設定
ウイルス培養、RT-PCRで診断された研究を組入
Studies were included if they assessed the accuracy of an RIDT
against accepted reference standards.
Acceptable reference standards included viral culture or RT-PCR.
If both were available, data on RT-PCR were chosen because of
the test’s superior sensitivity and specificity.
それ以外の方法で診断された研究は除外
Studies were excluded if they compared RIDTs with immunofluorescence
or enzyme-linked immunosorbent assay.

ü 研究テーマ故の設定
商業ベースで販売されてるもの、
気道内の分泌物を検査するものを対象
RIDTs were defined as any commercially available assay identifying
influenza viral antigens in respiratory specimens.
商業販売されてないもの、自宅検査用のものは除外
In-house tests and pre-commercial versions were excluded.

ü 研究デザインからの設定
症例対照研究は検査精度を過剰評価するため除外。
学会抄録も除外
We also excluded conference abstracts and case‒ control studies
(testing with the RIDT of known positive or negative samples),
which, by creating spectrum bias, can overestimate the accuracy of a test.

ü 研究デザインからの設定
症例対照研究は検査精度を過剰評価するため除外。
学会抄録も除外
We also excluded conference abstracts and case‒ control studies
(testing with the RIDT of known positive or negative samples),
which, by creating spectrum bias, can overestimate the accuracy of a test.
・・・実際のところ症例対照研究はどのくらい
検査精度を過剰評価するの！？

注）診断研究における研究の種類
ランダム化比較試験と診断研究における研究の種類
　　 RCTの種類
単純ランダム化
（Simple-randomized）
（層別、ブロック別etc.）
クラスターランダム化
（Cluster-randomized）
準ランダム化
（Quasi-randomized）
クロスオーバー
（Cross-over）
　　 DTAの種類
Single-gate
　いわゆるcohort type,
　cross-sectional study.
　
Two-gate
　いわゆるcase-control.

注）診断研究における2つの研究デザイン
ü Single-gate（≒ cohort type）
１個の組入基準を使用
例）造影CTによる虫垂炎の診断で、
　「救急外来で腹痛を訴えた人」を対象にする場合
ü Two-gate（≒ case-control type）
病気の人と健康な人、2個の組み入れ基準
例）造影CTによる虫垂炎の診断で、
　「虫垂炎と診断された人」と「健康な人」を対象
Handbook
for
DTA
Reviews,
chapter
4
(Guide
to
the
contents
of
DTA
protocol)

注）診断研究におけるStudy Type
ü Two-Gateは診断精度を過剰評価しがちなので注意。
集めた研究の性質で診断精度はどのくらい異なるか？ ü 　
ü 　研究の性質　　　　　　　DORの比（95%CI）
*
DOR
=
LR+/LR-‐,
数字が大きいほど診断精度がよい。
Bias
in
studies
of
diagnosOc
tests.
JAMA
1999;282:1061-‐1066

参考（DORと感度・特異度）
Diagnos(c
odds
ra(o（DOR）
陽性尤度比
陰性尤度比
感度
（1
–
1.0
0.8
0.6
0.4
0.2
0.0
1
−
Specificity
0.0
0.2
0.4
0.6
0.8
1.0
SensiOvity
DOR
=
1
DOR
=
2
DOR
=
5
DOR
=
16
DOR
=
81
DOR
=
361
DOR
=
DOR
=　　　　　　　　　　÷
特異度）
（1
−
感度）
特異度

方法④　データの抽出
ü 2人のReviewerが 2×2 の表を抽出
1人が全研究の 2×2 の表を抽出。
A data extraction form was piloted on a subset of included articles by
2 reviewers before being finalized.
One reviewer extracted data from all of the articles.
もう1人が全研究のうち20％の 2×2 の表を抽出。
A second reviewer extracted data from a randomly chosen sample of
22 articles (approximately 20% of all included articles).

方法⑤　組み入れた研究の質の評価
ü QUADASを用いて研究の質を評価
Methodological quality of the included studies was assessed by using
Quality Assessment of Diagnostic Accuracy Studies Criteria.

Table 9.1 Recommended quality items derived from QUADAS tool (Whiting 2003)
1. Was the spectrum of patients representative of the patients who will receive the test in practice?
(representative spectrum)
2. Is the reference standard likely to classify the target condition correctly? (acceptable reference
Handbook
for
DTA
Reviews,
chapter
9
“Assessing
methodological
quality”.
standard)
3. Is the time period between reference standard and index test short enough to be reasonably sure that
the target condition did not change between the two tests? (acceptable delay between tests)
4. Did the whole sample or a random selection of the sample, receive verification using the intended
reference standard? (partial verification avoided)
5. Did patients receive the same reference standard irrespective of the index test result? (differential
verification avoided)
6. Was the reference standard independent of the index test (i.e. the index test did not form part of the
reference standard)? (incorporation avoided)
7. Were the reference standard results interpreted without knowledge of the results of the index test?
(index test results blinded)
8. Were the index test results interpreted without knowledge of the results of the reference standard?
(reference standard results blinded)
9. Were the same clinical data available when test results were interpreted as would be available when
the test is used in practice? (relevant clinical information)
10. Were uninterpretable/ intermediate test results reported? (uninterpretable results reported)
11. Were withdrawals from the study explained? (withdrawals explained)

Quality assessment
Table 9.1 Recommended quality items derived from QUADAS tool (Whiting 2003)
* 誤訳してたらすみません。
患者（patients）検査を受けるべき人を対象にしているか？
ランダムまたは連続的に症例を選んでいるか？
脱落者はどう扱われたか？
検査（Index test）検査の解釈は、最終診断の結果を知らずになされたか？
診断（Reference standard）診断は適切な方法（Gold standard）で行われたか？
最終診断は、検査の結果を知らずに行われたか？
検査と最終診断はそれぞれ独立して行われたか？
検査と最終診断は同じタイミングで行われたか？
検査の解釈が難しい症例はどう扱われたか？
外的妥当性研究の結果はほかの患者にも外挿できるか？
Handbook
for
DTA
Reviews,
chapter
9
“Assessing
methodological
quality”.

方法⑥　データの統合
ü 感度・特異度の算出
Bivariate random-effect regression modelを
使用してデータを統合。
The sensitivity and specificity estimates were pooled by using bivariate
random-effects regression models. The bivariate model takes into
consideration the potential tradeoff between sensitivity and specificity
by explicitly incorporating this negative correlation in the analysis.
ü RCOカーブの作成
HSROCカーブを上記モデルを使用して作成。
The model was also used to draw hierarchical summary
receiver-operating characteristic (HSROC) curves

方法⑥　サブグループ解析、異質性評価
ü 異質性の評価
Random-effect modelを用いて評価。
We expected substantial heterogeneity in test accuracy and used
random-effects models that also allow for the addition of covariates
to account for that heterogeneity.
以下の項目について異質性を評価。
The following variables were selected a priori as potential sources of
heterogeneity
　・対象者（小児vs.成人）
・ウイルスタイプ（A型vs.
B型）
　・商品名
・検査の採取部位
　・検査までの症状の持続期間
　・最終診断の方法（培養vs.
RT-‐PCR）
　・研究の質（ブラインド化、対象者の定義等）

結果①　どういう検索結果だったか
ü 検索で見つかった論文について
119件の論文が該当。
After the titles and abstracts were screened, 286 articles were eligible
for full-text review. Of these, 119 were included. Because some articles
evaluated more than 1 RIDT, the final analysis included 159 studies.
ü 組み入れた研究の特徴
成人対象14％、小児対象34％、両者対象52％
Most studies (52%) included both adults and children, although 34% and
14% included only children and adults, respectively. Only 33% of the
studies defined the basis on which patients or specimens were recruited,
and even fewer (13%) gave any information on duration of patients’
clinical symptoms before testing.
The included studies evaluated 26 commercial RIDTs.

結果②　組み入れた研究の質の評価
ü 組み入れた研究のうちバイアスが高い/低い項目
組入研究の定義のため対象検証バイアスは低い
Because of our inclusion criteria, most studies were free of partial
verification, differential verification, and incorporation bias and used an
appropriate reference standard.
対象者の定義についてはバイアス高い
検査と診断の独立性はバイアス高い
However, only 33% of the included studies gave a clear rationale for
patient or specimen inclusion (selection criteria), and only 41%
reported blinding of the evaluation of the result of the RIDTs (mostly
because they were evaluated at the point of care).

結果③　全体的な結果
ü 統合した感度、特異度
集めた研究は感度のばらつきは4.4％～100％
特異度のばらつきは50.5%～100％
統合した結果、感度 62.3％、特異度 98.2％
The pooled sensitivity from bivariate random-effects regression was
62.3% (95% CI, 57.9% to 66.6%) and the pooled specificity was
98.2% (CI, 97.5% to 98.7%).

結果③　サブグループ解析
ü 対象者の違いや検査キットによる違い
小児 (66.6%)の方が成人(53.9%)より感度が高かった
Rapid influenza diagnostic tests showed a higher pooled sensitivity in
children than in adults, whereas specificities in the 2 groups were
similar.
小児と成人の違いは他のサブ解析の項目で
補正しても認められた。
The difference in pooled sensitivity between children and adults
remained statistically significant when adjusted for brand of RIDT,
specimen type, or reference standard (results not shown).

インフルエンザA型の方がB型に比べて
感度が高かった。
Virus type also had an effect on the accuracy of RIDTs.
Rapid influenza diagnostic tests had increased sensitivity for detecting
influenza A (64.6% [CI, 59.0% to 70.1%]) compared with influenza B
(52.2% [CI, 45.0% to 59.3%]; P 0.050).
2009年のインフルエンザA(H1N1)流行期でも
検査の感度は大きく変わらなかった。
They did not perform markedly worse in studies during the recent
outbreak of pandemic influenza A(H1N1) 2009:

採取部位（鼻腔、咽頭など）で精度は差異なし
Neither the type of specimen collected from patients had a noticeable
effect on their accuracy.

結果③　サブグループ解析と異質性の評価
ü 研究の質と診断精度の差異
研究の質による診断精度の差はほとんどなし。
the quality criteria investigated (patient selection, blinding, and
handling of uninterpretable results) did not have a statistically
significant effect on pooled accuracy estimates.
流行期以外の方が感度は高い。
Higher sensitivity for the few studies for which the timing (during or
outside the influenza season).

結果③　サブグループ解析と異質性の評価
ü 研究の質と診断精度の差異
スポンサー有りの研究の方が感度は高く報告。
ただし研究数が少ないので全体への影響は少ない。
Industry-sponsored studies showed a higher sensitivity (73.3% [CI,
65.3% to 81.3%]) than studies not sponsored by industry (59.4% [CI,
54.6% to 64.2%]).
Although this difference was statistically significant, sensitivity analysis
revealed that the overall estimates did not change when sponsored
studies were removed from the analyses, which was probably due to
the small number of sponsored studies (n = 23).

サブグループの結果。どの項目も差異はなし。
ただ、差がありそうにみえるので、次の次のスライドでDOR比を提示
* DOR = LR+/LR-
* DOR比はDORの比。

追加）サブグループ解析と異質性の評価
ü 研究の質と、そのDORへの影響
Reference
DOR比
Spectrum
of
disease
（Influenza
season
vs.
non-‐season）
Season
1.523
Pa(ent
selec(on
（ILI
defined
vs.
not
defined）
ILI
defined
1.513
Blinding
（Any
blinding
reported
vs.
not
reported）
Any
blinding
reported
1.554
Handling
of
indeterminate
results
（Reported
vs.
not
reported）
Reported
0.879
Industry
sponsoring
（Sponsored
vs.
not
sponsored）
Not
sponsored
1.142
*
ILI
=
Influenza
like
illness　　*
DOR（DiagnosOc
odds
raOo）
=
LR+/LR-‐

54
Discussion
Summary of main results
Strengths and weaknesses of the review
Applicability of findings to review question

Summary of overall results
ü 統合した結果、特異度は高い。
偽陽性は生じ難く、確定診断に有用。
Overall, RIDTs have high specificity, with modest and highly variable
sensitivity. For the clinician, this means that a positive test result is
unlikely to be false positive.
ü 一方で感度は低く、除外診断には不適。
However, a negative RIDT result has a reasonable likelihood of being
false negative and should be confirmed by other laboratory diagnostic
tests if the result is likely to affect patient management.

Q. どの部位をサンプルにした方がよいのか？
（ガイドラインは経鼻サンプル推奨していることが多いが）
A. 特に部位によって感度・特異度は変わらない。
No difference in accuracy was found among the respiratory
specimens, although these analyses were limited by the absence of
stratification by specimen type in most studies and the inconsistent
reporting of many other factors known to affect specimen quality,
such as the type of swab and the operator.
感度（%,
95%CI）
特異度（%,
95%CI）
径鼻で咽頭吸引（N
=
15）
66.6
(56.2–77.0)
97.8
(95.6–100)
径鼻での咽頭拭い（N
=
19）
61.6
(52.0–71.3)
99.1
(98.4–99.9)
鼻咽腔洗浄（N
=
3）
50.7
(25.1–76.3)
98.1
(94.0–100)
鼻腔ぬぐい（N
=
10）
65.9
(53.3–78.5)
99.2
(98.2–100)
咽頭ぬぐい（N
=
4）
54.9
(32.7–77.1)
90.0
(74.7–100)

参考採取サンプル部位別の診断精度
1.0
0.8
0.6
0.4
0.2
0.0
1
−
Specificity
0.0
0.2
0.4
0.6
0.8
1.0
SensiOvity
Nasopharyngeal
aspirate
（15
studies）
Nasal
swab（10
studies）
Nasopharyngeal
swab（3
studies）
Throat
swab（4
studies）
Nasopharyngeal
wash（3
studies）

Q. 特段すぐれた精度のブランド/商品はあるか？
A. 直接比較の研究はなかったが、統合した結果では
特に検査キット間の違いは大差はない。
Overall, no single commercial brand of RIDT seemed to perform
markedly better or worse than others, but this finding should be
interpreted cautiously because head-to-head comparisons were not
done in most studies.
感度（%,
95%CI）
特異度（%,
95%CI）
BinaxNOW
（N
=
17）
57.0
(45.9–67.5)
98.6
(96.9–99.3)
DirecOgen
Flu
A
（N
=
10）
76.7
(63.8–86.0)
97.2
(92.6–99.0)
DirecOgen
Flu
A
+
B
（N
=
30）
57.2
(48.8–65.2)
99.3
(98.8–99.6)
QuickVue
Influenza
（N
=
16）
69.0
(58.1-‐78.2)
95.8
(91.3-‐98.0)
QuickVue
Influenza
A
+
B（N
=
21）
48.8
(39.0–58.8)
98.4
(96.8-‐99.2)

参考検査キット別の診断精度
1.0
0.8
0.6
0.4
0.2
0.0
1
−
Specificity
0.0
0.2
0.4
0.6
0.8
1.0
SensiOvity
Directgen
Flu
A（10
studies）
QuickVue
Influenza（16
studies）
Directgen
Flu
A
+
B（30
studies）
BinaxNOW（17
studies）
QuickVue
Influenza
A
+
B（21
studies）

Q. 検査のタイミング（発症からの時期）で
検査精度は異なるのか。
A. 検査時期別の研究は少ないが、発症時期により
精度に影響がでる。早期ほど感度は低い。
Gordon
et
al
（Plos
One
2009;4:e7907)
感度（%,
95%CI）
特異度（%,
95%CI）
Day
1
51.9
(40.3-‐63.3)
98.4
(95.3-‐99.7)
Day
2
75.1
(68.3-‐81.1)
97.9
(96.0-‐99.1)
Day
3
74.2
(62.0–84.2)
97.9
(94.1-‐99.6)
Day
4
57.9
(33.5-‐79.7)
98.6
(94.2-‐100)
Keitel
et
al
（Eur
J
pedidatr
2011;170:511-‐7)
感度（%,
95%CI）
特異度（%,
95%CI）
≦12hr
35.0
(19.0-‐55.0)
100
(88.0-‐100)
12-‐24hr
66.0
(54.0-‐76.0)
97.0
(86.0-‐1000)
24-‐48hr
92.0
(80.0-‐97.0)
96.0
(82.0-‐99.0)
＞48hr
59.0
(36.0-‐78.9)
100
(90.0-‐100)

参考検査タイミングによる精度の変化
24-‐48h
Day4
Day3
12-‐24h
48h
Day2
Day1
1.0
0.8
0.6
0.4
0.2
0.0
1
−
Specificity
0.0
0.2
0.4
0.6
0.8
1.0
SensiOvity
Gordon
et
al
2009
Keitel
et
al2011
≦12h

Limitation
ü Reference standardにRT-PCRとウイルス培養
２種類を使用している。（昔はウイルス培養のみ
だったので仕方がないが、バイアスの要因）
ü 診断精度に影響する要因について評価できていな
いものがある。
例えば検査時期について記載のある論文は少なく、
その影響を加味した統合ができていない。

診断研究メタアナリシス報告事例

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à 診断研究メタアナリシス報告事例

Similaire à 診断研究メタアナリシス報告事例 (20)

Plus de Takashi Fujiwara

Plus de Takashi Fujiwara (13)

診断研究メタアナリシス報告事例