6. 背景と目的③
ü インフルエンザ迅速検査の過去の研究の限界
ただし、過去の研究は特異度は90%以上と一致
一方で、感度の報告が10~80%とばらついている。
Unfortunately, RIDTs may have inconsistent accuracy, with reported
sensitivity ranging from 10% to 80%
これまでのレビューでは小児に限られ、また
成人の報告は1つの検査キットのみ評価。
Previous systematic reviews have been limited to pediatric studies
or have addressed only 1 commercial RIDT.
9. 方法① 対象データベースの設定
ü 対象としたデーターベース検索
PubMed, EMBASE, BIOSIS, Web of Science
We searched 4 electronic databases: PubMed (January1950 to
December 2011), EMBASE (January 1980 to December 2011), BIOSIS
(January 1969 to March 2010), and Web of Science (January 1980 to
March 2010). The databases were searched in March 2010.
ü データーベース以外の検索
メタアナリシスに組み入れた研究の参考文献、
総説、ガイドライン、ハンドサーチ、
関連する研究者への連絡を行った。
Bibliographies of included studies, recent narrative reviews on RIDTs,
and guidelines on influenza were hand-searched for additional relevant
studies. Diagnostic manufacturers were also contacted to get additional
or unpublished studies.
10. 方法② データーベースの検索方法
ü データーベースの検索方法
疾患名(インフルエンザ)、ウイルス名と
検査名(一般名、商品名)を用いて検索
The search strategy contained search terms for the influenza disease or
virus combined with search terms for rapid diagnostic immunoassays,
including brand names for the most common commercial RIDTs
11. 一般的な診断メタ分析における検索の話
ü 調べたい検査(Index test)と病気(target condition)を
キーワードにして探す。
Index test and target condition will generally be the focus of the
search, although...(以下略).
病気に関する
網羅的検索
検査に関する
網羅的検索
Handbook
for
DTA
Reviews,
chapter
7
(Searching
for
studies)
12. 一般的な診断メタ分析における検索の話
ü 検索する場合にはシソーラス検索、キーワード検
索を併用する。
ü RCTの検索の時のような“研究デザインのフィル
ター”は十分に確立してないので検索に使用しない。
Routine use of methodology search filters to identify diagnostic
test accuracy studies should be avoided as...(以下略)
Handbook
for
DTA
Reviews,
chapter
7
(Searching
for
studies)
14. 今回の研究における検索式
Search terms for influenza included:
“Influenza, Human” [MeSh] OR
“Influenza A virus” [MeSh] OR
“Influenza B virus” [MeSh] OR
“influenza” OR
“flu” OR
“grippe.”
Search terms for the tests included:
“rapid test*” OR
“rapid diagnos*” OR
“rapid diagnostic test*” OR
“point-of-care test*” OR
“antigen detection test*” OR
“antigen detection” OR
“rapid antigen test*” OR
“immunoassay*” OR
“immunochromatographic test*” OR
“Binax NOW” OR
“Directigen Flu” OR
“Flu OIA” OR
“QuickVue Influenza” OR
“Rapid Detection Flu” OR
“SAS Influenza” OR
“TRU FLU” OR
“XPECT flu” OR
“Zstat flu.”
17. 方法③ 対象研究の設定
ü 最終診断の定義からの設定
ウイルス培養、RT-PCRで診断された研究を組入
Studies were included if they assessed the accuracy of an RIDT
against accepted reference standards.
Acceptable reference standards included viral culture or RT-PCR.
If both were available, data on RT-PCR were chosen because of
the test’s superior sensitivity and specificity.
それ以外の方法で診断された研究は除外
Studies were excluded if they compared RIDTs with immunofluorescence
or enzyme-linked immunosorbent assay.
18. 方法③ 対象研究の設定
ü 研究テーマ故の設定
商業ベースで販売されてるもの、
気道内の分泌物を検査するものを対象
RIDTs were defined as any commercially available assay identifying
influenza viral antigens in respiratory specimens.
商業販売されてないもの、自宅検査用のものは除外
In-house tests and pre-commercial versions were excluded.
19. 方法③ 対象研究の設定
ü 研究デザインからの設定
症例対照研究は検査精度を過剰評価するため除外。
学会抄録も除外
We also excluded conference abstracts and case‒ control studies
(testing with the RIDT of known positive or negative samples),
which, by creating spectrum bias, can overestimate the accuracy of a test.
20. 方法③ 対象研究の設定
ü 研究デザインからの設定
症例対照研究は検査精度を過剰評価するため除外。
学会抄録も除外
We also excluded conference abstracts and case‒ control studies
(testing with the RIDT of known positive or negative samples),
which, by creating spectrum bias, can overestimate the accuracy of a test.
・・・実際のところ症例対照研究はどのくらい
検査精度を過剰評価するの!?
26. 方法④ データの抽出
ü 2人のReviewerが 2×2 の表を抽出
1人が全研究の 2×2 の表を抽出。
A data extraction form was piloted on a subset of included articles by
2 reviewers before being finalized.
One reviewer extracted data from all of the articles.
もう1人が全研究のうち20%の 2×2 の表を抽出。
A second reviewer extracted data from a randomly chosen sample of
22 articles (approximately 20% of all included articles).
27. 方法⑤ 組み入れた研究の質の評価
ü QUADASを用いて研究の質を評価
Methodological quality of the included studies was assessed by using
Quality Assessment of Diagnostic Accuracy Studies Criteria.
28. Table 9.1 Recommended quality items derived from QUADAS tool (Whiting 2003)
1. Was the spectrum of patients representative of the patients who will receive the test in practice?
(representative spectrum)
2. Is the reference standard likely to classify the target condition correctly? (acceptable reference
Handbook
for
DTA
Reviews,
chapter
9
“Assessing
methodological
quality”.
standard)
3. Is the time period between reference standard and index test short enough to be reasonably sure that
the target condition did not change between the two tests? (acceptable delay between tests)
4. Did the whole sample or a random selection of the sample, receive verification using the intended
reference standard? (partial verification avoided)
5. Did patients receive the same reference standard irrespective of the index test result? (differential
verification avoided)
6. Was the reference standard independent of the index test (i.e. the index test did not form part of the
reference standard)? (incorporation avoided)
7. Were the reference standard results interpreted without knowledge of the results of the index test?
(index test results blinded)
8. Were the index test results interpreted without knowledge of the results of the reference standard?
(reference standard results blinded)
9. Were the same clinical data available when test results were interpreted as would be available when
the test is used in practice? (relevant clinical information)
10. Were uninterpretable/ intermediate test results reported? (uninterpretable results reported)
11. Were withdrawals from the study explained? (withdrawals explained)
31. 方法⑥ データの統合
ü 感度・特異度の算出
Bivariate random-effect regression modelを
使用してデータを統合。
The sensitivity and specificity estimates were pooled by using bivariate
random-effects regression models. The bivariate model takes into
consideration the potential tradeoff between sensitivity and specificity
by explicitly incorporating this negative correlation in the analysis.
ü RCOカーブの作成
HSROCカーブを上記モデルを使用して作成。
The model was also used to draw hierarchical summary
receiver-operating characteristic (HSROC) curves
32. 方法⑥ サブグループ解析、異質性評価
ü 異質性の評価
Random-effect modelを用いて評価。
We expected substantial heterogeneity in test accuracy and used
random-effects models that also allow for the addition of covariates
to account for that heterogeneity.
以下の項目について異質性を評価。
The following variables were selected a priori as potential sources of
heterogeneity
・ 対象者(小児vs.成人)
・ウイルスタイプ(A型vs.
B型)
・ 商品名
・ 検査の採取部位
・ 検査までの症状の持続期間
・ 最終診断の方法(培養vs.
RT-‐PCR)
・ 研究の質(ブラインド化、対象者の定義等)
35. 結果① どういう検索結果だったか
ü 検索で見つかった論文について
119件の論文が該当。
After the titles and abstracts were screened, 286 articles were eligible
for full-text review. Of these, 119 were included. Because some articles
evaluated more than 1 RIDT, the final analysis included 159 studies.
ü 組み入れた研究の特徴
成人対象14%、小児対象34%、両者対象52%
Most studies (52%) included both adults and children, although 34% and
14% included only children and adults, respectively. Only 33% of the
studies defined the basis on which patients or specimens were recruited,
and even fewer (13%) gave any information on duration of patients’
clinical symptoms before testing.
The included studies evaluated 26 commercial RIDTs.
40. 結果② 組み入れた研究の質の評価
ü 組み入れた研究のうちバイアスが高い/低い項目
組入研究の定義のため対象検証バイアスは低い
Because of our inclusion criteria, most studies were free of partial
verification, differential verification, and incorporation bias and used an
appropriate reference standard.
対象者の定義についてはバイアス高い
検査と診断の独立性はバイアス高い
However, only 33% of the included studies gave a clear rationale for
patient or specimen inclusion (selection criteria), and only 41%
reported blinding of the evaluation of the result of the RIDTs (mostly
because they were evaluated at the point of care).
43. 結果③ 全体的な結果
ü 統合した感度、特異度
集めた研究は感度のばらつきは4.4%~100%
特異度のばらつきは50.5%~100%
統合した結果、感度 62.3%、特異度 98.2%
The pooled sensitivity from bivariate random-effects regression was
62.3% (95% CI, 57.9% to 66.6%) and the pooled specificity was
98.2% (CI, 97.5% to 98.7%).
44.
45. 結果③ サブグループ解析
ü 対象者の違いや検査キットによる違い
小児 (66.6%)の方が成人(53.9%)より感度が高かった
Rapid influenza diagnostic tests showed a higher pooled sensitivity in
children than in adults, whereas specificities in the 2 groups were
similar.
小児と成人の違いは他のサブ解析の項目で
補正しても認められた。
The difference in pooled sensitivity between children and adults
remained statistically significant when adjusted for brand of RIDT,
specimen type, or reference standard (results not shown).
46. 結果③ サブグループ解析
ü 対象者の違いや検査キットによる違い
インフルエンザA型の方がB型に比べて
感度が高かった。
Virus type also had an effect on the accuracy of RIDTs.
Rapid influenza diagnostic tests had increased sensitivity for detecting
influenza A (64.6% [CI, 59.0% to 70.1%]) compared with influenza B
(52.2% [CI, 45.0% to 59.3%]; P 0.050).
2009年のインフルエンザA(H1N1)流行期でも
検査の感度は大きく変わらなかった。
They did not perform markedly worse in studies during the recent
outbreak of pandemic influenza A(H1N1) 2009:
47. 結果③ サブグループ解析
ü 対象者の違いや検査キットによる違い
採取部位(鼻腔、咽頭など)で精度は差異なし
Neither the type of specimen collected from patients had a noticeable
effect on their accuracy.
48.
49. 結果③ サブグループ解析と異質性の評価
ü 研究の質と診断精度の差異
研究の質による診断精度の差はほとんどなし。
the quality criteria investigated (patient selection, blinding, and
handling of uninterpretable results) did not have a statistically
significant effect on pooled accuracy estimates.
流行期以外の方が感度は高い。
Higher sensitivity for the few studies for which the timing (during or
outside the influenza season).
50. 結果③ サブグループ解析と異質性の評価
ü 研究の質と診断精度の差異
スポンサー有りの研究の方が感度は高く報告。
ただし研究数が少ないので全体への影響は少ない。
Industry-sponsored studies showed a higher sensitivity (73.3% [CI,
65.3% to 81.3%]) than studies not sponsored by industry (59.4% [CI,
54.6% to 64.2%]).
Although this difference was statistically significant, sensitivity analysis
revealed that the overall estimates did not change when sponsored
studies were removed from the analyses, which was probably due to
the small number of sponsored studies (n = 23).
52. 参考(DORと感度・特異度)
Diagnos(c
odds
ra(o(DOR)
陽性尤度比
陰性尤度比
感度
(1
–
1.0
0.8
0.6
0.4
0.2
0.0
1
−
Specificity
0.0
0.2
0.4
0.6
0.8
1.0
SensiOvity
DOR
=
1
DOR
=
2
DOR
=
5
DOR
=
16
DOR
=
81
DOR
=
361
DOR
=
DOR
= ÷
特異度)
(1
−
感度)
特異度
53. 追加)サブグループ解析と異質性の評価
ü 研究の質と、そのDORへの影響
Reference
DOR比
Spectrum
of
disease
(Influenza
season
vs.
non-‐season)
Season
1.523
Pa(ent
selec(on
(ILI
defined
vs.
not
defined)
ILI
defined
1.513
Blinding
(Any
blinding
reported
vs.
not
reported)
Any
blinding
reported
1.554
Handling
of
indeterminate
results
(Reported
vs.
not
reported)
Reported
0.879
Industry
sponsoring
(Sponsored
vs.
not
sponsored)
Not
sponsored
1.142
*
ILI
=
Influenza
like
illness *
DOR(DiagnosOc
odds
raOo)
=
LR+/LR-‐
54. 54
Discussion
Summary of main results
Strengths and weaknesses of the review
Applicability of findings to review question
55. Summary of overall results
ü 統合した結果、特異度は高い。
偽陽性は生じ難く、確定診断に有用。
Overall, RIDTs have high specificity, with modest and highly variable
sensitivity. For the clinician, this means that a positive test result is
unlikely to be false positive.
ü 一方で感度は低く、除外診断には不適。
However, a negative RIDT result has a reasonable likelihood of being
false negative and should be confirmed by other laboratory diagnostic
tests if the result is likely to affect patient management.
56. Q. どの部位をサンプルにした方がよいのか?
(ガイドラインは経鼻サンプル推奨していることが多いが)
A. 特に部位によって感度・特異度は変わらない。
No difference in accuracy was found among the respiratory
specimens, although these analyses were limited by the absence of
stratification by specimen type in most studies and the inconsistent
reporting of many other factors known to affect specimen quality,
such as the type of swab and the operator.
感度(%,
95%CI)
特異度(%,
95%CI)
径鼻で咽頭吸引 (N
=
15)
66.6
(56.2–77.0)
97.8
(95.6–100)
径鼻での咽頭拭い(N
=
19)
61.6
(52.0–71.3)
99.1
(98.4–99.9)
鼻咽腔洗浄(N
=
3)
50.7
(25.1–76.3)
98.1
(94.0–100)
鼻腔ぬぐい(N
=
10)
65.9
(53.3–78.5)
99.2
(98.2–100)
咽頭ぬぐい(N
=
4)
54.9
(32.7–77.1)
90.0
(74.7–100)
58. Q. 特段すぐれた精度のブランド/商品はあるか?
A. 直接比較の研究はなかったが、統合した結果では
特に検査キット間の違いは大差はない。
Overall, no single commercial brand of RIDT seemed to perform
markedly better or worse than others, but this finding should be
interpreted cautiously because head-to-head comparisons were not
done in most studies.
感度(%,
95%CI)
特異度(%,
95%CI)
BinaxNOW
(N
=
17)
57.0
(45.9–67.5)
98.6
(96.9–99.3)
DirecOgen
Flu
A
(N
=
10)
76.7
(63.8–86.0)
97.2
(92.6–99.0)
DirecOgen
Flu
A
+
B
(N
=
30)
57.2
(48.8–65.2)
99.3
(98.8–99.6)
QuickVue
Influenza
(N
=
16)
69.0
(58.1-‐78.2)
95.8
(91.3-‐98.0)
QuickVue
Influenza
A
+
B(N
=
21)
48.8
(39.0–58.8)
98.4
(96.8-‐99.2)