Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

プログラム評価セミナーⅠ

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Homecoming
Homecoming
Chargement dans…3
×

Consultez-les par la suite

1 sur 63 Publicité

Plus De Contenu Connexe

Plus par Urban Community Research Center for Asia (9)

Publicité

プログラム評価セミナーⅠ

  1. 1. プログラム評価セミナーⅠ アーバン・インスティテュート ( The Urban Institute )   開発・原作 : レイモンドJ.ストライク   ( Directed by Raymond J. Struyk )   構成・提供 : 上野真城子   ( Presented by Makiko Ueno )
  2. 2. プログラム評価:概論 Program Evaluation : An Overview
  3. 3. トピック <ul><li>はじめに  -  目的、歴史、的確な「質問」 </li></ul><ul><li>プロセス評価又は実施評価 </li></ul><ul><li>インパクト(影響・効果)評価 </li></ul><ul><li>費用対効果分析 </li></ul><ul><li>終わりに </li></ul>
  4. 4. 1.はじめに -  目的、歴史、的確な「質問」 >> Section 1 <<
  5. 5. 政策プログラム(施策・事業)の事業活動・内容、 そしてその達成成果に関する情報の 系統だった システマチックな収集、まとめ   これらはプログラムに判定判断を下し、プログラムの効果を改善し、そして / または将来のプログラム形成についての決定に情報提供するもの プログラム評価の定義 << プログラム評価①
  6. 6. 評価の「質問」は何か? 評価で「何」を問うのか? <ul><li>プログラム(政策介入)は目的とする対象層に到達しているか ? </li></ul><ul><li>プログラムは上手く実施されているか ? 意図したサービスは供給されているか ? </li></ul><ul><li>プログラムは期待する目標(便益)達成のために効果的なものか ? </li></ul><ul><li>プログラムの費用はいくらか ? </li></ul><ul><li>プログラムの費用は効果と便益という点で妥当か ? </li></ul><< プログラム評価①
  7. 7. なぜプログラム評価が必要か? <ul><li>フィードバックしプログラムを改善すること </li></ul><ul><li>説明責任(アカウンタビリティー)を高める </li></ul><ul><li>プログラム資金・経費の増減の判断に用いる </li></ul><ul><li>オーナ-シップ(所有感覚)を高める </li></ul><ul><li>評価結果は政策にインパクト(影響)を与えられる </li></ul><< プログラム評価①
  8. 8. -評価小史- <ul><li>アメリカでは初めての政策評価は教育と公衆衛生に関するもので、第一次世界大戦前に行われていた </li></ul><ul><li>第二次世界大戦後は住宅や職業訓練、国際援助プログラム関係、新しいプログラムの評価が盛んに行われた </li></ul><ul><li>社会実験・パイロットプログラムの興隆 </li></ul><< プログラム評価①
  9. 9. -評価小史- (続き) <ul><li>1970 年代初頭、 “評価”は学界での独立した分野になった </li></ul><ul><li>1960 年代半ばより、連邦政府省庁内に 政策開発 / 調査 / 評価局が作られた </li></ul><ul><ul><li>供給者としてのシンクタンク </li></ul></ul><ul><ul><li>顧客としての議会のスタッフと省庁 </li></ul></ul><< プログラム評価①
  10. 10. 的確な質問をつくること <ul><li><間違った始め方> </li></ul><ul><ul><li>評価者がクライアントに評価の目的や </li></ul></ul><ul><ul><li>分析方法を示すこと </li></ul></ul><ul><li><正しいすすめ方> </li></ul><ul><ul><li>プログラムに関わるステークホルダー(利害関係者)が プログラムについて欲しいと思う「質問をつくり出す」 </li></ul></ul><ul><ul><li>プロセス(過程、手順と時間)があること </li></ul></ul><< プログラム評価①
  11. 11. 評価において “ ステークホルダー ” とは誰か? <ul><li>“ 基本的に意図する「評価」ユーザーとは” ・・・プログラムを変えることが出来る人 </li></ul><ul><ul><li>資金を出す人は評価ユーザーか </li></ul></ul><ul><ul><li>プログラムの実施管理者-上部管理者と現場第一線職員の両方  </li></ul></ul><ul><ul><li>顧客/受益者 特にプログラムへの参加率が低い場合 </li></ul></ul><< プログラム評価①
  12. 12. プログラム評価セミナーⅠ アーバン・インスティテュート ( The Urban Institute )   開発・原作 : レイモンドJ.ストライク   ( Directed by Raymond J. Struyk )   構成・提供 : 上野真城子   ( Presented by Makiko Ueno )
  13. 13. 2.プロセス評価又は実施評価 >> Section 2 <<
  14. 14. 実施評価・プロセス評価 <実施・評価・プロセス評価とは・・・>  プログラムが意図した通りに機能しているか、あるいは  適切な基準に沿っているかなどを示唆するプログラムの 業績についての系統的な記録・文書化 評価は、  (1)プログラムの行うサービスの実施行政局面 と  (2)プログラム実施の組織と運営運用の局面 について一般に行われる << プログラム評価②
  15. 15. 評価とは 「判定的」 ( Judgmental ) なものである <ul><li>基準判断/判定の拠り所としては: </li></ul><ul><ul><li>プログラムの規定 ・・・ 資格認定基準の誤差率 </li></ul></ul><ul><ul><li>類似したプログラムの経験 ・・・ 参加率 ローンの返済率 </li></ul></ul><< プログラム評価②
  16. 16. 1) サービスの実施局面についての 評価の質問として~ <ul><li>サービスを受けている人がどのくらいいるか? (受益者数)被サービス者数 </li></ul><ul><li>これらのサービスは意図した対象層であるか。 (偏りは?) </li></ul><ul><li>ターゲット層の構成メンバーはこのプログラム を知っているか? </li></ul><< プログラム評価②
  17. 17. ロシアの住宅手当のプログラム 受益者範囲とサービス対象層 (プログラム対象者) ( サービスの実施評価の事例) << プログラム評価②
  18. 18. 評価項目 質問 <ul><li>どのくらいの世帯が、実際にプログラムに参加したか? </li></ul><ul><li>サービスは貧しい世帯に集中しているのか? </li></ul><ul><li>参加者は行政機関の管理・対応に満足しているか? </li></ul><< プログラム評価②
  19. 19. 背景 <ul><li>80%の都市住宅は国に属していた (地方政府への所有権移譲) </li></ul><ul><li>1993年、厳しい家賃統制、膨大な補助金、 緊迫した地方政府予算 </li></ul><ul><li>1994年、賃貸料の値上げを許可 ただし住宅手当てがあった場合のみ </li></ul><< プログラム評価②
  20. 20. プログラムの組み立て S =  手当額 MSR =  最大社会家賃 Y =  世帯所得 t =  受益前の住宅費への所得負担率 <ul><li>便益 : S = MSR – tY </li></ul><ul><li>資格 : (a) Y < Y* Y* = MSR/t (b) 国営またはコープ住宅 :地方により他の保有形態の住宅 </li></ul><< プログラム評価②
  21. 21. データ収集 <ul><li>“ 早期にプログラムが開始された”2つの都市 ・・・ Vladimir 市と Gorodetz 市 </li></ul><ul><li>調査対象の母集団のサンプル ・・・300&500世帯 </li></ul><ul><li>プログラム参加者の追加サンプル </li></ul><ul><li>1994年の結果により1995年に繰り返し評価 </li></ul><< プログラム評価②
  22. 22. 住宅手当評価の結果:参加率について << プログラム評価② Vladimir  市 Gorodetz  市 年 1994 1995 1994 1995 “ t” .10 .15 .025-.10 .15 該当世帯(%) 46 45 55 19 該当世帯の参加率(%) 2.1 20.4 3.6 37.8
  23. 23. 収入別(5分割)による参加者の分布 << プログラム評価② 所得階層分類 Vladimir  市 Gorodetz  市 1 (最低所得) 63 58 2 23 37 3 11 4 4 2 - 5 ( 最高所得) 1 1
  24. 24. クライアント(顧客利用者)の満足度 << プログラム評価② もっとも多く接触した担当職員の対応の評価 Vladimir  市 Gorodetz  市 Fully satisfied 全く満足 62 62 Satisfied 満足 23 27 More or less satisfied どちらともいえない 7 11 Not satisfied 不満 4 1 Not satisfied at all 全く不満 2 - No answer 2 -
  25. 25. 2) 組織と運営・運用局面 についての質問 <ul><li>必要なプログラムの機能が適切に果たされ実施されているか? </li></ul><ul><li>プログラムの職員配備は機能を果たすために 人数と能力において十分なものか? </li></ul><ul><li>プログラムに関わる他の機関と効率的に共同作業(コーディネーション)が行われているか? </li></ul><ul><li>プログラムの資金・資源は効果的にそして効率的に使われているか? </li></ul><< プログラム評価②
  26. 26. ( プログラム運営評価の事例) << プログラム評価② アメリカでの福祉改革の実施後のフードスタンプ(食料券)プログラムにおける参加率の変化
  27. 27. 新福祉制度:その1 <ul><li>旧来の福祉プログラム(AFDC)は、貧困母子世帯に現金扶助。フードスタンプと合わせて実施されていた </li></ul><ul><li>新しいプログラム(TANF)はある期限付きの援助をする一方、福祉受益者を就労させることに重点を置くもの </li></ul><< プログラム評価②
  28. 28. 新福祉制度:その2 <ul><li>実施後: </li></ul><ul><ul><li>フードスタンプの利用者数が減った </li></ul></ul><ul><ul><li>“ 便益へのアクセス、サービスの受け易さ”への 疑問が浮上 </li></ul></ul><ul><li>評価の目的: </li></ul><ul><ul><li>プログラム運営行政の変化がフードスタンプの利用者数を減らす事につながったのかを判定すること </li></ul></ul><< プログラム評価②
  29. 29. 評価:フィールドワーク <ul><li>2つのプログラムの運営運用局面に関する詳しい分析 </li></ul><ul><li>24オフィス:8州、それぞれ3つの福祉事務所 </li></ul><ul><li>チームは顧客サービスを調査する </li></ul><ul><ul><li>プログラムへのアクセスのしやすさ </li></ul></ul><ul><ul><li>サービスの質 </li></ul></ul><ul><ul><li>受益者(顧客)の雇用に向けたサポートサービスの 受け易さ </li></ul></ul><< プログラム評価②
  30. 30. 評価:フィールドワーク(続き) <ul><li>州政府機関のスタッフのインタビュー </li></ul><ul><li>地方政府のスタッフのインタビュー: </li></ul><ul><ul><li>ディレクター、ケースワーカーの管理者 </li></ul></ul><ul><ul><li>ケースワーカープログラム統括責任者 </li></ul></ul><ul><li>サービス提供者、事業所、クライアント代弁者( NPOs,NGOs )とのフォーカス・グループ (グループ・ディスカッション) </li></ul><< プログラム評価②
  31. 31. 評価分析結果 新しい運営制度はフードスタンププログラムの申請を妨げている <ul><li>フードスタンプ申請前に雇用関係のスタッフとの多数のインタビュー </li></ul><ul><li>フードスタンプ申請前に雇用オフィスを尋ねることが必要条件 </li></ul><ul><li>新福祉制度のもとでのルール違反フードスタンプ利用者に対する制裁の強化 </li></ul><< プログラム評価②
  32. 32. プログラム評価セミナーⅠ アーバン・インスティテュート ( The Urban Institute )   開発・原作 : レイモンドJ.ストライク   ( Directed by Raymond J. Struyk )   構成・提供 : 上野真城子   ( Presented by Makiko Ueno )
  33. 33. 3.インパクト(影響・効果)評価 (1) >> Section 3-1 <<
  34. 34. 1) インパクト評価 介入プログラムの純成果 =総成果(グロス・アウトカム)   - 他の要因による効果(交絡要因) << プログラム評価③
  35. 35. 影響を与える交絡要因 <ul><li>選択におけるバイアス </li></ul><ul><ul><li>コントロール(対照)集団と処理集団の違い </li></ul></ul><ul><li>社会変化要因 </li></ul><ul><ul><li>長期的傾向  (例)出生率、持ち家率、世帯収入 </li></ul></ul><ul><li>阻害要因 </li></ul><ul><ul><li>自然災害によるプロジェクトの質の低下 </li></ul></ul><ul><li>ホーソン効果 </li></ul><< プログラム評価③
  36. 36. 「評価」デザインの方法 <ul><li>ランダムアサインメント・コントロール </li></ul><ul><ul><li>対象者は、2つの集団、プログラム介入を受ける実験 集団と、介入を受けない対照集団に振り分けられる </li></ul></ul><ul><li>準実験法 </li></ul><ul><ul><li>1.マッチング・コントロール 実験集団と対照集団を「等質」とする </li></ul></ul><< プログラム評価③
  37. 37. 「評価」デザインの方法(続き) <ul><li>準実験法(続き) </li></ul><ul><ul><li>2.統計的等化コントロール 参加・非参加層を統計手法によって比較する </li></ul></ul><ul><ul><li>3.再帰(回顧)コントロール プログラム受益グループ自身の受益前と受益後   を比較する。 </li></ul></ul><< プログラム評価③
  38. 38. ランダム・アサインメント ~ケース・スタディ~ (例)青年職業訓練隊  ジョブコープログラム評価   <ul><li>青年職業訓練 : 組織ベース </li></ul><ul><li>質問: </li></ul><ul><ul><li>プログラムは参加者の雇用促進や収入増加につながっているか? </li></ul></ul><< プログラム評価③
  39. 39. ランダム・アサインメント ~ケース・スタディ~ ジョブコープログラム評価(続き)   <ul><li>デザイン: 1994、1995年の全米の有資格応募者に ついて、ランダムのサンプルをとり実験集団と対照集団 を作った </li></ul><ul><li>4年以上にわたり追跡調査、複数回のインタビューを 両グループに行う </li></ul><ul><li>トレ-ニングセンターでの実地調査と、インタビュー調査を行う </li></ul><< プログラム評価③
  40. 40. <ul><li>プログラムサービスは全米のセンターにおいて総合的に 一貫性をもって行われていた </li></ul><ul><li>プログラムは参加者の教育や職業訓練を1,000時間 増やした(学校教育と同時間) </li></ul><ul><li>プログラム参加後 4 年目に、参加者は対象集団に比べ 年間$1,150以上多く(12%高)稼いだ </li></ul><< プログラム評価③ ランダム・アサインメント ~ケース・スタディ~ ジョブコープログラム成果  
  41. 41. <ul><li>プログラムは犯罪(16%)や生活保護にまきこまれる割合を明らかに減らした </li></ul><ul><li>薬物使用や出生率には影響が見られなかった </li></ul><ul><li>参加者一人当たり約14,000ドル費用がかかったものの費用に対しては効果は高かった </li></ul><< プログラム評価③ ランダム・アサインメント ~ケース・スタディ~ ジョブコープログラム成果(続き)  
  42. 42. プログラム評価セミナーⅠ アーバン・インスティテュート ( The Urban Institute )   開発・原作 : レイモンドJ.ストライク   ( Directed by Raymond J. Struyk )   構成・提供 : 上野真城子   ( Presented by Makiko Ueno )
  43. 43. 3.インパクト(影響・効果)評価 (2) >> Section 3-2 <<
  44. 44. 2) 統計的手順を用いた 等化グループ比較ということ-要点 準実験のデザインのための基本的アプローチ 回帰分析の結果を用いる意味: 他の要因を管理すれば、独立変数(処置)の係数は従属変数(結果・アウトカム)にインパクトを与える << プログラム評価③
  45. 45. 統計的コントロール ~ケース・スタディー~   兄姉メンター(良き師)プログラム  <ul><li>集中的個人指導メンタープログラムは問題を抱えた青少年を学校に通わせ非行に走らないようにデザインされた </li></ul><ul><li>メンタ-と青少年はプログラムに適当であるかどうか審査される </li></ul><ul><li>メンタ-と青少年は月2-4回、3-4時間会う </li></ul><ul><li>活動は学校につなぐものではなく、 スポーツイベントの参加や食事といった多様な活動を通して行われる </li></ul><< プログラム評価③
  46. 46. 兄姉メンタ-プログラム 「評価」デザイン <ul><li>8都市のプログラム </li></ul><ul><li>プログラムの応募者はランダムに実験集団と対照集団に指定された </li></ul><ul><li>1,138人が研究登録され、彼らの行動は18ヶ月以上モニターされた </li></ul><< プログラム評価③
  47. 47. 兄姉メンタ-プログラム 統計モデル <ul><li>回帰分析はプログラム加入時点での状況、機関の違い、プログラム参加の期間など差をコントロールするために用いられた </li></ul><ul><li>コントロール変数の例 : 年、性別、人種、落第年数、虐待の有無、家族の特徴(収入、福祉受益者) </li></ul><ul><li>プログラムはダミー(0 / 1)係数によって効果の有無が計られた </li></ul><< プログラム評価③
  48. 48. 兄姉メンタ-プログラム 評価結果 <ul><li>このプログラムの受け手グループはドラッグを使用し始める傾向が 45 %低い </li></ul><ul><li>このプログラムの受け手グループは 32 %暴力を振るわない傾向にある </li></ul><ul><li>このプログラムの受け手グループは両親のより強い絆がある </li></ul><ul><li>社会活動や文化活動の参加の頻繁さなどにおいて明らかな違いは見られない </li></ul><< プログラム評価③
  49. 49. 3) 再帰コントロール ~ケース・スタディー~ 民間営利企業による住宅管理サービス委託プログラム <ul><li>モスクワでの民間企業による住宅管理サービス・コントラクトの生活環境の質向上の効果  - 民間企業が公共自治体機関より 良い仕事をするであろう - </li></ul><< プログラム評価③
  50. 50. 民間企業による 住宅管理サービスプログラム  背景 <ul><li>モスクワの住宅の90%は市によって維持管理されている (民営化によって所有割合は減少) </li></ul><ul><li>1993年では全ての住宅は市営維持管理会社によって管理されていた(無競争) </li></ul><ul><li>住宅内の公共のスペースはとても汚く、住人が改善を要求していた </li></ul><ul><ul><li>遅い返答、仕事の質が悪い、支払いが悪い </li></ul></ul><< プログラム評価③
  51. 51. パイロット・プロジェクト <ul><li>民間企業のみの入札コンペ </li></ul><ul><li>維持サービスを提供する企業(マネージメントではなく) </li></ul><ul><li>1993年3月: </li></ul><ul><ul><li>3グループ(典型的な 2000 戸)、3会社が落札 </li></ul></ul><ul><li>1993年9月: </li></ul><ul><ul><li>4グループ(良質な 5000 戸)、2企業が落札し、 各々2ヶ所ずつ担当 </li></ul></ul><< プログラム評価③
  52. 52. テナントを使った 再帰コントロール・デザイン <ul><li><3月のプロジェクト> </li></ul><ul><ul><li>調査は300世帯を対象に1993年の2月、5月、11月に 行われた </li></ul></ul><ul><li><9月のプロジェクト> </li></ul><ul><ul><li>調査は379世帯1993年の9月(前)、1994年の1月に行われた </li></ul></ul><ul><li>各回、同じ人(世帯)にインタビューを行った </li></ul><< プログラム評価③
  53. 53. 結果 - 3月グループ <ul><li>修理 </li></ul><ul><ul><li>3月~5月:明らかな改善が見られた </li></ul></ul><ul><ul><li>5月~11月:少し改善度合いは減少したが                  それでも良くなっている </li></ul></ul><ul><li>維持 </li></ul><ul><ul><li>3月~5月まで広範囲での改善が見られた 特に11のうち9分野で著しく良くなった </li></ul></ul><ul><ul><li>5月~11月:混合した結果 </li></ul></ul><< プログラム評価③
  54. 54. 結果 - 9月グループ <ul><li>修理 </li></ul><ul><ul><li>入り混じった結果:例 </li></ul></ul><ul><ul><ul><li>最初の臨検での修理の完了は徹底した修復で良くなっていた </li></ul></ul></ul><ul><ul><ul><li>しかし、修理に完全に満足していた回答者は少ない </li></ul></ul></ul><ul><li>維持 </li></ul><ul><ul><li>パフォーマンス(業績)については少なくとも市営会社と同じ </li></ul></ul><ul><ul><li>11のうち8の分野で改善が見られた </li></ul></ul><ul><ul><li>しかし2つの重要な点で劣化 : 入り口の状態と電灯がきれていない </li></ul></ul><< プログラム評価③
  55. 55. 評価の結論 <ul><li>民間会社は市営会社よりも建物管理において、やや勝る </li></ul><ul><li>民間会社の方が費用がかからない </li></ul><ul><li>結果は </li></ul><ul><ul><li>競争入札を通して管理会社を決める事をモスクワ市と他の市が確認 </li></ul></ul><ul><ul><li>連邦政府が国の政策として競争購入を採用することを奨励 </li></ul></ul><< プログラム評価③
  56. 56. プログラム評価セミナーⅠ アーバン・インスティテュート ( The Urban Institute )   開発・原作 : レイモンドJ.ストライク   ( Directed by Raymond J. Struyk )   構成・提供 : 上野真城子   ( Presented by Makiko Ueno )
  57. 57. 4.費用対効果分析 >> Section 4 <<
  58. 58. 定義 費用対効果分析は 実用的で系統的手順の一組の方法であり 、評価者はプログラムの総費用と便益を現金価値で置き換え示すことに使うことができる 効率的なプログラムとは得られたすべての便益がかかった全ての費用を上回るもの     B / C >1                          ( B :便益、 C :費用) << プログラム評価④
  59. 59. 費用対効果分析 <ul><li>評価者がプログラム便益を 現金価格に置き換えられない がしかしプログラムの目的が明確で単一であるかまたは目的間の関係が十分明らかである場合に用いられる </li></ul><ul><li>例: </li></ul><ul><ul><li>車のシートベルトの設置によって救われる命あたりの 設置コスト </li></ul></ul><< プログラム評価④
  60. 60. 費用対効果分析の主要な問題 <ul><li>複数の便益を持つプログラムを取り扱うのは難しい </li></ul><ul><li>単に B/C 率だけでは結論づけできない </li></ul><< プログラム評価④
  61. 61. 費用対効果分析の問題と限界 -まとめ- <ul><li>測定の問題- 仮説仮定の必要性 </li></ul><ul><li>実体のないものは重要な費用対効果率に含まれない </li></ul><ul><li>分配の問題は無視されるー便益を受けたのは誰か、支払っているのは誰か </li></ul><ul><li>因果関係- 結果対インパクト </li></ul><ul><li>分析の適用に一貫性の欠如 </li></ul><ul><li>結果を操作する事ができる </li></ul><< プログラム評価④
  62. 62. 5.終わりに >> Section 5 <<
  63. 63. プログラム評価 -終わりに- <ul><li>評価への積極的な取り組みを </li></ul><ul><li>的確な設問をつくること </li></ul><ul><li>評価の質を維持する </li></ul><ul><li>外部の評価を促進する </li></ul><ul><li>よい評価は安価ではない </li></ul><< プログラム評価⑤

Notes de l'éditeur

  • Note to instructor: emphasize and discuss the items underscored. The key point is that evaluation requires making judgements about how well a program is working and/or achieving its intended results. As well will see, to make judgements requires that one has criteria against which to measure program performance.
  • P = process or implementation evaluation I = Impact evalution B-C = benefit- cost analysis In B-C analysis, ideally the benefit measure is the program impact. For example, in a training program the principal impact is increased earnings per year. The B-C might be the program cost to achieve the discounted present value of the increased earnings over a 10 year period.
  • Accountability : by knowing in detail how a program is working and what it is accomplishing it is possible to hold program administrators responsible for the program’s operation Funding : evaluation results very often have a strong impact on government and parliamentary decisions to expand, contract, or even eliminate funding for a program. Ownership : means staff idenitfying with the program and taking responsibility for it. Program leaders and staff often become more committed to seeing the program well-run by participating in the process of improving it. Policy impact: Results of well-executed evaluations are very difficult for senior administrators to ignore, especially if they have agreed to the questions being addressed in advance. Evaluations findings almost guarantee the evaluators a seat at the table when program improvements are being discussed.
  • Early evaluations in the U.S.: Pt. 1: By 1930s, social scientists in various disciplines were advocating the use of rigorous research methods to assess social programs. Evaluations became somewhat more frequent Study of introducing boiling water as public health practice in villages in Middle East was landmark study Pt. 2: Types of programs subjected to some type of evaluations by the 1950s Delinquency prevention programs Psychotherapeutic treatments Public housing programs Educational activities Community organization initiatives
  • Academic development First textbooks at the end of the 1970s Evaluation Review founded; others followed rapidly From the mid-1960s the national government in the U.S. routinely commissioned program evaluations. This spurred the development of the evaluation industry. On the supply-side were think tanks. Clients included the various Congressional offices as well as the ministries. From the early 1970s social experiments and piloting new programs were popular with Government agencies. These both required detailed and sophisticated evaluations.
  • The right way: Evaluation method should be driven by the questions being asked--not the other way around. The evaluator must build an audience among decision makers if the results are to be used. This step is very often not done. The evaluator assumes he knows which questions are of the greatest importance and interest and molds the evaluation around them. The result is that there is often little interest in the findings among the people actually responsible for the program.
  • Primary intended users : these are the people who can use the evaluation results to change the program Those funding the evaluation are often not the primary audience. For example, the World Bank may engage a firm to evaluate a social assistance program in India. But it is only the Indian officials that have the power to change the program. So it is essential to engage them in defining the issues to be addressed. Program administrators must be convinced to adopt changes. This is not just the highest officials. So try to engage representatives of the program’s field offices, for example. The opinions of Clients can be critical in defining an evaluation particularly where participation rates are low. What are they finding objectionable about the program? Once you have identified the right people, how do you engage them? For a single stakeholder start by asking: What you most want to know about the program? Name three pieces of information that would be at the top of your list.
  • Emphasize the following points in this definition: program performance program is functioning as intended according to some appropriate standard for two different kinds of issues: service utilization (issues of program participation) and program organization and operations (the actual operations of the program) This type of evaluation permits the program managers to understand how the program actually operates. Knowing this is often essential for interpreting the results of an impact evaluation and determining how to change the program to make it more effective.
  • Similar program experience -- across many programs targeted on the poor in the U.S. participation rates are 50-60% of those believed eligible to receive benefits. So if you are studying such a similar program, your views about reasonable participation rates should take this fact into account. To determine standards for administration practices look at the practices of programs widely regarded to be well operating; look at their practices in such areas as program documentation (e.g., guidelines for in-take workers), training, record keeping, quality control, and reporting.
  • Bias includes Errors of exclusions – leaving out those you want to serve Errors of inclusion – serving those the program is not intended to serve Target population awareness is especially important where participation rates are low.
  • Q1 concerns program coverage. Q2 address the targeting of benefits. Q3 addresses client satisfaction. This was especially important in 1994 because housing allowances were the first means-tested program introduced in Russia. The reference for this case study is: R. Struyk, L. Lee and A. Puzanov, &amp;quot;Monitoring Russia&apos;s Experience with Housing Allowances,&amp;quot;, Urban Studies , vol. 34, no.11, November 1997, pp. 1789-1818.
  • Pt. 1: In 1991 the State housing stock was transferred to the ownership of local governments, who were then completely responsible for its maintenance and operations. Pt. 2: The typical local government was spending half of its total budget to operate the housing stock, including associated subsidies for utilities Pt.3: In December 1992 a framework housing reform law as passed. It called for raising rents and utility fees on municipal housing to full cost recovery levels over a multi-year period. The timing of the increases was left to local governments because they owned the stock. The law specified that the process was to begin in January 1994. Rents could only be raised, however, if the housing allowance program was established in a city to protect poor renters.
  • S = the amount of the subsidy MSR = the “maximum social rent,” which is the cost estimated for a unit of a suitable size for the family applying for the subsidy. It is based on unit size (sq.m.) and standard use rates for utilities. A participant household can occupy a bigger unit or use more utility services but the subsidy is fixed for the standard unit and consumption. Y = household income t = the share of its own income the household must spend on housing before receiving the subsidy This is a gap formula because the subsidy fills the gap between the cost of a standard unit (MSR) and what the family can reasonably afford to pay (tY). Y* is the income cutoff, I.e., the lowest income a family can have and not receive a subsidy. Take some time with the students to think what happens to the number of households eligible for housing allowances as the values of MSR and t are increased and decreased.
  • The responsible ministry and the evaluation team wanted to get an early reading on how the program was working, so two cities than implemented rent increases and housing allowances in the spring of 1994 were selected for the evaluation. A survey of the population was required to estimate the share of eligible households who were participating. The additional sample of program participants was needed to insure that the sample for analyzing participant satisfaction with the program would be large enough. (If participation rates were low, then there would only be a few participants included in the general survey of the population. The evaluation was conducted in both 1994 and 1995 because of the findings in 1994.
  • 1. The share of household eligible for payment changed a lot in Gorodetz between the two years. This is because the city raised the household contribution rate (t) a lot between the two years. It did this to control the cost of the allowance program. (In 1994, the household contribution rate was on a sliding scale with the poorest households only have to pay 2.5% of their incomes on rents.) 2. Participation in 1994 among eligible households was very low. This was a surprising finding. In part this is explained by the fact that rents remained low in absolute terms, so that the benefits of participating in the program were low. Nevertheless, the finding created great concern in the Ministry because it was feared that poor households would refuse to pay their rents or, worse, begin demonstrating against the rent increase. Based on these findings the minister sent a letter to every state governor urging him to press cities to conduct a large information campaign about the availability of the subsidy before the next rent increase. 3. Participation rates rose a lot by 1995, especially in Gorodetz where the information campaign was more effective. Participation was encouraged by a second round of rent increases which increased the value of the subsidies (MSR rose).
  • The figures show that benefits are very strongly concentrated on poor families. So the program is successful in this respect.
  • The satisfaction levels are indeed strikingly positive. Hence, income-testing was clearly not being rejected by the participants. Conclusions The program got off to a good start overall. Participation rates were and remain a problem. In 2001 about 30 percent of eligible households participated across the country. It has served as a model for other means-tested programs in the country.
  • Pt 1, example: for program outreach, is the expected volume of people applying for services? Pt 2: how long do clients have to wait for services? what are the error rates in processing them? How long are potholes on the streets before they are repaired? Is the quality of garbage pick-up consistent with the city’s standards?
  • The study that is the source for this example is, V. Gabor and C. Bsotko, Changes in Client Service in the Food Stamp Program After Welfare Reform . Washington, DC: Health Systems Research, Inc., report to the USDA Food and Nutrition Service, 2001.
  • Broad legislation was enacted in 1996 that fundamentally changed the structure of basic assistance for low income families. The “old program” was called, Aid to Families with Dependent Children. A female headed-family could receive benefits indefinitely under the program. And about 25% of those receiving benefits at any time were long-term (10+ years) recipients. The original program had very weak job-search requirements but these were gradually strengthened over time. The “new program” is called, Temporary Assistance for Needy Families (TANF). The Food Stamp and TANF program work closely together; generally, but not always, the two programs are administered from the same office. Under the Food Stamp program low-income households receive vouchers (that look like play money) that they can use to purchase food and certain other necessities at regular stores. Beneficiaries pay for the stamps, the amount they pay as a percent of the value of the stamps depends on their incomes--poorer families pay less. Very poor families pay about 20% of the value of the stamps; for example, they pay $40 and receive stamps worth $200.
  • Examples of service providers include firms conducting job training programs for clients telephone call centers that contact clients for the agency, for example, to remind them that it is time to recertify their eligibility to receive benefits.
  • Bullet #2: employment offices are generally located separately from the agency administering TANF and Food Stamps. So being sent to the Employment Office is typically a significant effort for the applicant, many of whom do not own a car. Bullet #3: Agencies “punish” clients for not fulfilling their obligations under the contract they sign by reducing or eliminating Food Stamp benefits for 3 or 6 months, for example. Results were used by both the responsible national ministry and the state agencies that administer the program to improve the flow of clients so that they have access to the Food Stamp benefit earlier in the process.
  • Outcome : Status of the social or physical condition the program is accountable for improving. Impact : The net effect of a program. The difference between outcome and impact is that an outcome is the status, which may or may be due to the program’s intervention. An impact is clearly linked to the program. The usual practice is to identify the impact by contrasting the outcomes for an experimental group that receives a treatment and a control group that does not. Example: The objective of the housing allowance program is to protect low income households from having to pay the full cost of housing. In this way participants after more money to spend on non housing consumption. Can the impact of the program being measured by knowing the size of the housing allowance a family receives? No. Consider the following case. An elderly widow receives income from her son to help her with expenses. When she begins receiving the housing allowance subsidy, she tells her son. He in turn reduces his monthly support by about half of the value of the housing subsidy. So the impact on this woman’s non housing consumption is only half of the value of the subsidy payment.
  • Selection bias : Errors made in assigning targets in random process or when “similar” controls are selected they were not matched with the treatment group on a critical factor. Secular drift : General increase in homeownership rate makes it hard to isolate the effect of a mortgage lending initiative. FNMA--the secondary mortgage market firm-- in late 1980s announced a major change in its underwriting standards to make it possible for families with lower incomes to purchase a home. There has been a general increase in homeownership over the period, mostly because this is preferred by households. FNMA suggests that its role was fundamental. Interfering events : A hurricane batters an area that is the target of a slum upgrading project and households and municipality must devote resources to recovery Hawthorne effect : Those in the treatment group respond to the treatment just because they receive it. Especially problematic where satisfaction is being measured—may only reflect that someone is paying attention to their problem.
  • Examples of each type of strategy Random assignment Housing Demand Experiment Component of the U.S. Housing Allowance Program -- objective to determine what share of a housing allowance subsidy would be spent on housing, i.e., improving their housing, and how much would be spent on other things. Note that the program required participants to live in in a unit meeting certain minimum physical standards--they could move to a unit meeting the standards.) -- Pittsburgh and Phoenix, selected for different market conditions: Pittsbugh was loose market and Phoenix a tight market. Random assignment to various sub-experiments Housing allowance vs. pure control (the control was given an equivalent cash payment not related in any way to the money being used for housing) Housing allowance vs. unconstrained cash payment Housing allowance programs with different parameters—both percent of rent and gap formulas .Matched constructed controls Schools defined to be similar--in terms of the test scores of pupils, income of families in the neighborhood, percent students from single parent households—are assigned to treatment and control groups. Students in the treatment group get intensive assistance with reading.
  • Statistically equated controls In project to test the impact of lower (subsidized) interest rates on the price paid for housing, samples are constructed of those receiving the subsidized loans and those paying market interest rates and having similar incomes. Analysis is of the price per sq.m. paid for housing after controlling for other borrower characteristics. Reflexive controls Measuring the impact of a training program by measuring training program participants’ work and pay experience before and after the training experience. (As we will see, this is probably not a reliable way to measure impact of a training program.) Reflexsive controls are especially at risk from the problem of “secular drift.”
  • The Jobs Corps is a long-established training program for young me and women. It is distinctive because participants are taken to rural locations--often retired military bases--where they receive intensive job training and counseling on appropriate life behavior and skills. The usual time at the training facility is six months. So the program is expensive. The source for this case is: Mathematica Policy Research, “Evaluation of the Economic Impact of the Jobs Corps Program: Third Follow-up Report.” Washington, DC: Office of Policy and Evaluation and Research, Employent and Training Administration, U.S. Department of Labor, 1982.
  • Pts. 1&amp;2: These really are part of a process evaluation on how the program operated. This is a good example of the complementarity between implementation and impact evaluation.
  • The positive cost-effectiveness is based on all benefits, i.e., higher earnings and savings to society from lower crime rates (prosecution, court and prison costs).
  • Pt.2: Think of a regression model in which the observations include members of both the treatment group and the control group. This is a so-called “pooled regression.”
  • This is a program under which a young adult spends time consistently with young people who have been in trouble or are risk of being in trouble. The young adults are called mentors . This evaluation is described in
  • The basic question under analysis here is whether private firms selected through a competitive process would do a better job maintaining municipal housing that the city companies that had had a monopoly on providing these services for many years. The reference for this case study is: R. Struyk, K. Angelici and M. Tikhomirova,“Private Maintenance for Moscow’s Municipal Housing Stock: Does It Work?” Journal of Housing Economics , vol. 4, 1995, pp. 50‑70.
  • Two competitions were held--one in March 1993 and the second in September 1993. In March, the 2,000 units for which services were being procured were divided into 3 groups (packets) of about 700 units each. Three different firms won. In the September competition, 4 packets of units (about 1,250 each) were offered and 2 firms each won 2 packets.
  • For both the March and the September buildings, the first survey was conducted a few weeks before the private firm took over responsibility for maintaining the buildings. For the March buildings, two later surveys were conducted--one in May and one in November. For the September buildings, only one later survey was done, in January 1994. Importantly , interviewers returned to the same households in each survey round and almost always the same person in a household was interviewed. Thus, differences in perceptions and opinions among respondents should be minimized.
  • General points about the analysis The questions asked of respondents where to categorize a specific condition for a specified period of time. So one question was, “How often in the last month were the lights in your hallway not working?” They did not ask about satisfaction. X 2 test used throughout to test significance of the distributions from a null hypothesis of all answers being equally given. Maintenance, May-November. The mixed results were as follows: 6 of 11 categories got somewhat worse no change in 4 categories and improvement in 1 but some areas with deterioration in the scores are those where the firm itself was not fully responsible, e.g., secruity systems. Important note : During the period the city was very slow in paying the company and this may have led to reduction of effort. Recall that annual inflation in Russia at this time was about was 800%; so delays were very costly.
  • General points about the analysis The questions asked of respondents where to categorize a specific condition for a specified period of time. So one question was, “How often in the last month were the lights in your hallway not working?” They did not ask about satisfaction. X 2 test used throughout to test significance of the distributions from a null hypothesis of all answers being equally given. Maintenance, May-November. The mixed results were as follows: 6 of 11 categories got somewhat worse no change in 4 categories and improvement in 1 but some areas with deterioration in the scores are those where the firm itself was not fully responsible, e.g., secruity systems. Important note : During the period the city was very slow in paying the company and this may have led to reduction of effort. Recall that annual inflation in Russia at this time was about was 800%; so delays were very costly.
  • Private firms cost less . The terms of the competition was that the private firms had to bid a price less than the cost of services from the city firms. So by definition private firms cost less. Actually, initially private firms were about 10 percent cheaper. Over time in some cities, city firms cut their prices to be competitive. But in a number of cities the total savings in 30 percent or more from the former cost of city firms, adjusting for inflation.
  • Procedure is challenging because the analyst must assign dollar values to all costs and benefits. The simplicity of the B:C ratio is very attractive to decision makers because it provides a simple standard and the ability to compare projects explicitly.
  • Take a program with multiple benefits. For example, building a dam both protects a downstream city against floods and in this respect is credited with savings some lives, and it provides water for irrigation of farm land. So there are three benefits: urban property protected, lives saved, and additional food production. To get a unified estimate of the benefits the evaluator has to assign weights to the various benefits so that they can be added together--which is a highly subjective exercise. What weight to I assign saving a life versus five additional tons of corn production?
  • Lack of consistency Review of BCAs within the same agency found very wide differences in assumptions and measurement practices. Even greater among agencies. Results subject to manipulation This is a problem for projects for which B:C analysis is being done to assess whether the investment should be undertaken. One analysis of eight subway projects build recently in eight different U.S. cities found very systematic overstatements in the volume of passengers that would be carried by the new systems and large understatements of the costs of building and maintaining the systems. In the median case, capital and operating costs were underestimated by one-third and ridership was overestimated by 300%. So the average cost per rail passenger turned out to exceed the forecast in every case by at least 188% and in three cases by more that 700%. The structure of the Federal program that funds these projects contains strong incentives for such behavior: the agency ranks projects by their B:C ratios; often this is the most important criterion determining which projects are funded. Also, the Congress tends to allocate funds to specific ongoing construction projects to cover cost overruns.
  • Because of the uncertainty about the discount rate, governments often specify the rate to be used in assessing projects (which is based roughly on the kind of analysis described in brackets above). A reasonable value is the Government’s cost of long-term money--10+ year bonds. NPV = (B-C)1 + (B-C)2/(1+r) + (B-C)3/(1+r) 2 + …. [If use nominal interest rate, then do not need to deflate benefits and costs for inflation; opposite if one uses the real discount rate.] Note the sensitivity of the NPV in the table to the discount. This will always be the case when there is a large difference in the time profile for benefits and costs

×