SlideShare une entreprise Scribd logo
1  sur  160
Managing Confidential
Information in Research
                               Micah Altman
                   Senior Research Scientist
    Institute for Quantitative Social Science
                          Harvard University
Personally identifiable private
information is surprisingly common
       Includes information from a variety of
        sources, such as…
           Research data, even if you aren‟t the
            original collector
           Student “records” such as e-mail, grades
           Logs from web-servers, other systems
       Lots of things are potentially identifying:
           Under some federal laws: IP
            addresses, dates, zipcodes, …
           Birth date + zipcode + gender uniquely
            identify ~87% of people in the U.S.
            [Sweeney 2002]
           With date and place of birth, can guess
            first five digits of social security number
            (SSN) > 60% of the time. (Can guess the
            whole thing in under 10 tries, for a
            significant minority of people.) [Aquisti&
            Gross 2009]
           Analysis of writing style or eclectic tastes
            has been used to identify individuals          Brownstein, et al., 2006 , NEJM 355(16),

       Tables, graphs and maps can also
        reveal identifiable information

    4                                                                     [Micah Altman, 3/10/2011]
IQSS (and affiliates) offer you support across all stages of your
  quantitative research:

       Research design, including:
        design of surveys, selection of statistical methods.
       Primary and secondary data collection, including:
        the collection of geospatial and survey data.
       Data management, including:
        storage, cataloging, permanent archiving, and distribution.
       Data analysis, including :
        statistical consulting, GIS consulting, high performance research computing.


                        http://iq.harvard.edu/
    6                                                           [Micah Altman, 3/10/2011]
The IQSS grants administration team helps with every aspect of
  the grant process. Contact us when you are planning your
  proposal.

           Assisting in identifying research funding opportunities
           Consulting on writing proposals
           Assisting IQSS affiliates with:
            preparation, review and submission of all grant applications
             (“pre-award support”)
            management of their sponsored research portfolio
             (“post-award support”)
            Interpret sponsor policies
            Coordinate with FAS Research Administration and the Central Office for
             Sponsored Programs



… And, of course, support seminars like this!


    7                                                            [Micah Altman, 3/10/2011]
Goals for course
       Overview of key areas
       Identify key concepts & issues
       Summarize Harvard
        policies, procedures, resources
       Establish framework for action
       Provide connection to resources, literature




    8                                          [Micah Altman, 3/10/2011]
Outline

   [Preliminaries]

   Law, policy, ethics
   Research
    methods, design, manage
    ment
   Information Security
    (Storage, Transmission, U
    se)
   Disclosure Limitation

     [Additional Resources &
      Summary of
    9 Recommendations]          [Micah Altman, 3/10/2011]
Steps to Manage Confidential Research Data
    Identify potentially sensitive information in planning
        Identify legal requirements, institutional requirements, data use agreements
        Consider obtaining a certificate of confidentiality
        Plan for IRB review
    Reduce sensitivity of collected data in design
    Separate sensitive information in collection
    Encrypt sensitive information in transit
    Desensitize information in processing
        Removing names and other direct identifiers
        Suppressing, aggregating, or perturbing indirect identifiers
    Protect sensitive information in systems
        Use systems that are controlled, securely configured, and audited
        Ensure people are authenticated, authorized, licensed
    Review sensitive information before dissemination
        Review disclosure risk
        Apply non-statistical disclosure limitation
        Apply statistical disclosure limitation
        Review past releases and publically available data
        Check for changes in the law
        Require a use agreement



    10                                                                         [Micah Altman, 3/10/2011]
Law, policy, ethics

     Law, Policy & Ethics                   Research design …
                                            Information security
                                            Disclosure
                                            limitation
        Ethical Obligations
        Laws
        Fun and games 
        Harvard Policies
        [Summary]




12                             [Micah Altman, 3/10/2011]
Law, policy, ethics

Confidentiality & Research Ethics                                         Research design …
                                                                          Information security
                                                                          Disclosure
    Belmont Principles                                                   limitation

        Respect for Persons
            individuals should be treated as autonomous agents
            persons with diminished autonomy are entitled to protection
            implies “informed consent”
            implies respect for confidentiality and privacy
        Beneficence
            research must have individual and/or societal benefit to justify
             risks
            implies minimizing risk/benefit ratio




    13                                                       [Micah Altman, 3/10/2011]
Scientific & Societal Benefits                                          Law, policy, ethics

of Data Sharing                                                         Research design …
                                                                        Information security
    Increases replicability of research                                Disclosure
                                                                        limitation
        Journal publication policies may apply
    Increases scientific impact of research
        Follow up studies
        Extensions
        Citations
    Public interest in data produced by public funder
        Funder policies may apply
    Public interest in data that supports public policy
        FOIA and state FOI laws may apply
    Open data facilitates…
        Transparent government
        Scientific collaboration
        Scientific verification
        New forms of science
        Participation in science
        Hands-on education
        Continuity of research

Sources: Fienberg et. al 1985; ICSU 2004; Nature 2009
    14                                                     [Micah Altman, 3/10/2011]
Sources of Confidentiality Restrictions for                Law, policy, ethics

University Research Data                                   Research design …
                                                           Information security
                                                           Disclosure
    Overlapping laws                                      limitation

    Different laws apply
     to different cases
    All affiliates subject to
     university policy

(Not included: EU
  directive, foreign
  laws, classified
  data, …)



    15                                        [Micah Altman, 3/10/2011]
Law, policy, ethics

45 CFR 46 [Overview]                                               Research design …
                                                                   Information security

“The Common Rule”                                                  Disclosure
                                                                   limitation

 Governs human subject research
        With federal funds/ at federal institution
    Establishes rules for conduct of research
    Establishes confidentiality and consent requirement
     for for identified private data
    However, some information may be required to be
     disclosed under state and federal laws (e.g. in cases
     of child abuse)
    Delegates procedural decisions to Institutional
     Review Boards (IRB‟s)

    16                                                [Micah Altman, 3/10/2011]
Law, policy, ethics

HIPAA [Overview]                                                 Research design …
                                                                 Information security

Health Insurance Portability and Accountability Act Disclosure
                                                     limitation

 Protects personal health care information for „covered
  entities‟
 Detailed technical protection requirements
 Provides clearest legal standards for dissemination
 Provides a „safe harbor‟
 Has become an accepted practice for dissemination in
  other areas where laws are less clear
 HITECH Act of 2009 extends HIPAA
     Extends coverage to associated entities of covered entities
     Additional technical safeguards
     Adds breach reporting requirement

  HIPAA provides three dissemination options …
 17                                                 [Micah Altman, 3/10/2011]
Dissemination under HIPAA
                                                                                      Law, policy, ethics
                                                                                      Research design …

[option 1]                                                                            Information security
                                                                                      Disclosure
                                                                                      limitation
    “safe harbor” -- remove 18 identifiers
        [Personal identifiers]
            Names
            Social Security #‟s; Personal Account #‟s; Certificate/License #‟s; full face
             photos (and comparable images); biometric id‟s; medical
            Any other unique identifying number, characteristic, or code
        [Asset identifiers]
            fax #‟s; phone #‟s; vehicle #‟s;
            personal URL‟s; IP addresses; e-mail addresses
            Device ID‟s and serial numbers
        [Quasi identifiers]
            dates smaller than a year (and ages > 89 collapsed into one category)
            geographic subdivisions smaller than a state (except for 3 digits of zipcode, if
             unit > 20,000 people)
     And
        Entity does not have actual knowledge [direct and clear awareness] that it
         would be possible to use the remaining information alone or in
         combination with other information to identify the subject


    18                                                                   [Micah Altman, 3/10/2011]
Dissemination under HIPAA                                        Law, policy, ethics
                                                                 Research design …
[Option 2]                                                       Information security

    “limited dataset” – leave some quasi-id‟s                   Disclosure
                                                                 limitation

        Remove personal and asset identifiers
        Permitted dates: dates of birth, death, service, years
        Permitted geographic subdivisions: town, city, state, zip
         code
And
        Require access control and data use agreement.




    19                                              [Micah Altman, 3/10/2011]
Dissemination under HIPAA                                             Law, policy, ethics
                                                                      Research design …
[Option 3]                                                            Information security

    “qualified statistician” – statistical determination Disclosure
                                                           limitation
     Have qualified statistician determine, using generally
     accepted statistical and scientific principles and
     methods, that the risk is very small that the information
     could be used, alone or in combination with other
     reasonably available information, by the anticipated
     recipient to identify the subject of the information.
    Important caveats
        Methods and results of the analysis must be documented
        No bright line for “qualified”, text of rule is:
         “a person with appropriate knowledge of and experience with
         generally accepted statistical and scientific principles and
         methods for rendering information not individually identifiable.”
         [Section 164.514(b)(1)]
        No clear definitions for “generally accepted” or “very small” or
         “reasonably available information”…
         however, there are references in the federal register to
         statistical publications to be used as “starting points”
    20                                                   [Micah Altman, 3/10/2011]
Law, policy, ethics

FERPA                                                                                              Research design …
                                                                                                   Information security

     Family Educational Rights and Privacy Act                                                     Disclosure
                                                                                                   limitation
    Applies schools that receive federal (D.O.E.) funding
    Restricts use of student (not employee) information
    Establishes
        Right to privacy of educational records
        Right to inspect and correct records (with appeal to Federal government)
        Definition of public “directory” information
        Right to block access to public “directory” information, and to other records
    Educational records include:
        Identified information about student
        Maintained by institution
        Not …
            Employee records
            Some medical and law-enforcement records
            Records solely in the possession and for use by the creator (e.g. unpublished instructor notes)
    Personally identifiable information includes:
        Direct identifiers
        Indirect (quasi) identifiers
        Indirectly linkable identifiers
        “Information requested by a person who the educational agency or institution reasonably
         believes knows the identity of the student to whom the education record relates.”


    21                                                                              [Micah Altman, 3/10/2011]
Law, policy, ethics

MA 201 CMR 17                                                    Research design …
                                                                 Information security
                                                Disclosure
Standards for the Protection of Personal Information
                                                limitation

 Strongest U.S. general privacy protection law
 Has been delayed/modified repeatedly
 Requires reporting of breaches
        If data is not encrypted
        Or encryption key is released in conjunction with data
    Requires specific technical protections:
        Firewalls
        Encryption of data transmitted in public
        Anti-virus software
        Software updates


    22                                              [Micah Altman, 3/10/2011]
Inconsistencies in Requirements                                                    Law, policy, ethics

 and Definitions                                                                    Research design …
                                                                                    Information security
   Inconsistent definitions of “personally identifiable”                           Disclosure
   Inconsistent definitions of sensitive information                               limitation
   Requirements for to de-identify jibes with statistical realities

                 FERPA               HIPAA                  Common               MA 201 CMR
                                                            Rule                 17

Coverage         Students in         Medical Information    Living persons in Mass. Residents
                 Educational         in “Covered            research by
                 Institutions        Entities”              funded
                                                            institutions
Identification   -Direct             -Direct                -Direct              -Direct
Criteria         -Indirect           -Indirect              -Indirect
                 -Linked             -Linked                -Linked
                 -Bad intent (!)
Sensitivity      Any non-directory   Any medical            Private              Financial, State,
Criteria         information         information            information –        Federal
                                                            based on harm        Identifiers
Management - Directory opt-out       - Consent              - Consent             - Specific
Requirement - [Implied] good         - Specific technical   - [Implied] risk      technical
s 23        practice                 safeguards             minimization          safeguards
                                                                        [Micah Altman, 3/10/2011]
                                     -Breach                                      - Breach
Law, policy, ethics

Third Party Requirements                                            Research design …
                                                                    Information security

    Licensing requirements                                         Disclosure
                                                                    limitation
    Intellectual property requirements
    Federal/state law and/or policy requirements
        State protection of personal information laws
        Freedom of information laws (FOIA & State FOI)
        State mandatory abuse/neglect notification laws
    And … think ahead to publisher requirements
        Replication requirements
        IP requirements

    Examples
        NSF requires data from funded research be shared
        NIH requires a data sharing plan for large projects
        Wellcome Trust requires a data sharing plan
        Many leading journals require data sharing

    24                                                 [Micah Altman, 3/10/2011]
Law, policy, ethics

(Some) More Laws & Standards                                                                              Research design …
                                                                                                           Information security
    California Laws                                                Detailed technical controls over information
                                                                                                           Disclosure
        Lots of rules                                               systems
                                                                                                           limitation
        Applies any data about California residents            Sarbanes-Oxley (aka, SOX, aka SARBOX)
        Privacy policy                                             Corporate and Auditing Accountability and
                                                                     Responsibility Act of 2002
        Disclosure
                                                                    Applies to U.S. public company
        Reporting policy                                            boards, management and public accounting firms
    EU Directive 95/46/EC                                          Rarely applies to research in universities
        Data protection directive                                  Section 404 requires annual assessment of
        Provides for notice, limits on purpose of                   organizational internal controls – but does not
         use, consent, security, disclosure, access, account         specify details of controls
         ability                                                Classified Data
        Forbids transfer of data to entities in countries          Separate and complex rules and requirements
         compliant with directive
                                                                    The University does not accept classified data
        U.S. is not compliant but …
                                                                    But, may have “Controlled But Unclassified”
           Organizations can certify compliance with FTC
                                                                       Vaguely defined area
           No auditing/enforcement !
                                                                       Mostly government produced area
           Substantial criticism of this arrangement
                                                                       Penalties unclear
    Payment Card Industry (PCI) Security Standards
                                                                    And… export controlled information, under ITAR
        Governs treatment of credit card numbers                    and EAR
        Requires reports, audits, fines                               Export control include
        Detailed technical measures                                      technologies, software, documentation/design
        Not a law, but helps define good practice                        documents may be included
                                                                       Large penalties
        Nevada law mandates PCI standards
                                                                … and over 1100 International Human Subjects
    FISMA                                                       laws…
       Federal Information Security Management Act
        (FISMA), Public Law (P.L.) 107-347.
       Is starting to be applied to NIH sponsored
    25 research                                                                           [Micah Altman, 3/10/2011]
Law, policy, ethics

Predicted Legal Changes for 2011…                                Research design …
                                                                 Information security
                                                                 Disclosure
    Caselaw                                                     limitation

        “personal privacy” does not apply to information about
         corporations (a corporation is not a “person” for this
         purpose)
         FCC vs. ATT 2011
    Scheduled
        EU “cookie privacy” directive 2009/136/EC goes into
         effect
        Proposed updates to EU information privacy directives
    Very Likely
        New information privacy laws in selected states in 2011
    Likely
        Increased federal regulation of internet privacy
    26                                              [Micah Altman, 3/10/2011]
Law, policy, ethics

What’s wrong with this picture?                                                       Research design …
                                                                                      Information security
                                                                                      Disclosure
                                                                                      limitation
Name         SSN     Birthdate   Zipcode   Gender   Favorite    # of crimes
                                                    Ice Cream   committed
A.Jones      12341   01011961    02145     M        Raspberr    0
                                                    y
B. Jones     12342   02021961    02138     M        Pistachio   0
C. Jones     12343   11111972    94043     M        Chocolat    0
                                                    e
D. Jones     12344   12121972    94043     M        Hazelnut    0
E. Jones     12345   03251972    94041     F        Lemon       0
F. Jones     12346   03251972    02127     F        Lemon       1
G. Jones     12347   08081989    02138     F        Peach       1
H. Smith     12348   01011973    63200     F        Lime        2
I. Smith     12349   02021973    63300     M        Mango       4
J. Smith     12350   02021973    63400     M        Coconut     16
K. Smith     12351   03031974    64500     M        Frog        32
L. Smith     12352   04041974    64600     M        Vanilla     64
M. Smith     12353   04041974    64700     F        Pumpkin     128
N.           12354   04041974    64800     F        Allergic    256
  27                                                                     [Micah Altman, 3/10/2011]
       Smi
Law, policy, ethics

     What’s wrong with this picture?                                                                  Research design …
                                                                                                      Information security
Identifier   Sensitive                 Identifier                            Sensitive
                           Private                                                                    Disclosure
              Private     Identifier                                                                  limitation
             Identifier
Name         SSN      Birthdate        Zipcode      Gender   Favorite    # of crimes
                                                             Ice Cream   committed
A.Jones      12341    01011961         02145        M        Raspberr    0                                Mass resident
                                                             y
B. Jones     12342    02021961         02138        M        Pistachio   0
                                                                                                              Californian
C. Jones     12343    11111972         94043        M        Chocolat    0
                                                             e
D. Jones     12344    12121972         94043        M        Hazelnut    0                     Twins, separated at birth?

E. Jones     12345    03251972         94041        F        Lemon       0
                                                                                                             FERPA too?
F. Jones     12346    03251972         02127        F        Lemon       1
G. Jones     12347    08081989         02138        F        Peach       1
H. Smith     12348    01011973         63200        F        Lime        2
I. Smith     12349    02021973         63300        M        Mango       4
J. Smith     12350    02021973         63400        M        Coconut     16
K. Smith     12351    03031974         64500        M        Frog        32
L. Smith     12352    04041974         64600        M        Vanilla     64                      Unexpected Response?
M. Smith 12353        04041974         64700        F        Pumpkin     128
      28                                                                                 [Micah Altman, 3/10/2011]
N. Smith 12354        04041974         64800        F        Allergic    256
Harvard:                                                                                                            Law, policy, ethics

Enterprise Security Policy (HEISP)                                                                                  Research design …
                                                                                                                    Information security
    Storing High Risk Confidential Information (HRCI)                                                          Disclosure
         Must not be stored on individual user computer or                                                     limitation
         portable storage device
                                                                       Confidential information on Harvard computing devices
        Must be stored on "target computers" or secure locked
         containers                                                        confidential information must be protected
    Human subject information                                             confidential information on portable devices must be
                                                                            encrypted
        All research on human subjects must be approved by
         the IRB                                                           laptops must have encryption (some schools require
                                                                            whole-disk encryption)
        All proposals must include a data management plan
                                                                           systems must be scanned annually
    Personally identifiable medical information (PIMI)
                                                                       Cannot save confidential information on computer
        "Covered entities" at Harvard are subject to HIPAA             directly accessible from the internet, open Harvard
         requirements                                                   networks
         PIMI is to be treated as HRCI throughout the university
                                                                       Employees who have access must annually agree to
    Obtaining confidential information requires approval               confidentiality agreements
    All confidential information must be encrypted when               Access to lists and database of Harvard University ID
     transported across any network                                     numbers is restricted
    Public directories must adhere to privacy preferences             Each school must provide training
     establishes by the individuals                                    Registrars have developed common definition of
    Identifying Users with Access to Confidential                      FERPA directory information
     Information                                                       Must adhere to student requests to block their directory
        System owners must be able to identify users that have         information, per FERPA
         access to confidential information
                                                                       Accepting Payment Cards - Restricted to procedures
        Strong passwords                                               outlined in HU Credit Card Merchant Handbook
        No account/password sharing
    Inhibit password guessing with logging and lockouts
    Limit application availability time with timeouts
    Limit user access to confidential information based on
     business need
                                                                    [ More on next page…]
    31                                                                                             [Micah Altman, 3/10/2011]
Law, policy, ethics

HEISP – Part 2                                                                       Research design …
                                                                                     Information security
    Physical Environment                           Network take down               Disclosure
                                                                                     limitation
        All digital/non-digital media must be          Network managers run vulnerability
         properly protected                              scans
    Computers must be physically                       May take computers off the network
     secure                                         Service Resumption
    Automatic logging must be                          Must have a service resumption plan if
     consistent with written policies                    loss of confidential data is a
                                                         substantial business risk
    Vendor contracts
        require approval by security officer       Incident Response Policy
        Include OGC contract rider                 Disposition and destruction of
                                                     records
    Computer operator
        computer must be regularly updated         Acquisition/use by unauthorized
                                                     persons must be reported to OGC
        operated securely
        Only necessary application installed       Interacting with legal authorities --
                                                     always refer to OGC unless
        annually certify compliance with            imminent health/safety risk requires
         university policies
                                                     otherwise
    Computer setup - must filter
     malicious traffic                              Web based surveys must have
                                                     protections in place
    “Target” systems and controllers
      Private address space; locally
       firewalled
      Annual vulnerability scanning
    32                                                                  [Micah Altman, 3/10/2011]
Law, policy, ethics

    Harvard:                                                                               Research design …
                                                                                           Information security
    Research Data Security Policy (HRDSP)                                                  Disclosure
                                                                                           limitation
   Sensitivity of research data based on potential harm if disclosed:
       Level 5 = “extremely sensitive”
       Level 4 = “very sensitive” ~= HRCI
       Level 3 = “sensitive” ~= HCI
       Level 2 = “benign” ~= Good computer hygiene
       Level 1 = anonymous and not business confidential
   Required protections based on sensitivity
       Level 5:   Entirely disconnected from network (“bubble security”)
       Level 4:   Protections as per HRCI
       Level 3:   Protections as per HCI
       Level 2:   Good computer hygiene
   Designates procedures for treatment of external data use agreements
    [ next section ]
       Legally binding
       Can be both very detailed and not supported by Harvard security procedures
       Investigator should not sign these – forward to OSP
   Designates responsibilities for IRB, Investigator, OSP, IT, Security Officers.



        security.harvard.edu/research-data-security-policy
        33                                                                    [Micah Altman, 3/10/2011]
Harvard:                                                                               Law, policy, ethics
                                                                                       Research design …
Researcher Responsibilities                                                            Information security
    … for knowing the rules                                                       Disclosure
                                                                                   limitation
    … for identifying potentially confidential information in all forms
     (digital/analogue; on-line/off-line)
    … for notifying recipients of their responsibility to protect confidentiality
    … for obtaining IRB approval for any human subjects research
    … for following an IRB approved plan
    … for obtaining OSP approval of restricted data use agreements with providers, even
     if no money involved




… and for proper
        Storage
        Access
        Transmission
        Disposal

                                 Confidentiality is not an “IT problem”


    34                                                                    [Micah Altman, 3/10/2011]
Harvard:                                                          Law, policy, ethics
                                                                  Research design …
Staff – Personnel Manual                                          Information security
                                                                  Disclosure
    Protect Harvard information and systems                      limitation
    Keep your own information in Peoplesoft up to date
    Comply with copyrights and DMCA
    Comply with Harvard systems policies and procedures
    All information produced at work is Harvard property
    Attach only approved devices to the Harvard network

harvie.harvard.edu/docroot/standalone/Policies_Contracts/St
  aff_Personnel_Manual/Section2/Privacy.shtml




    35                                               [Micah Altman, 3/10/2011]
Law, policy, ethics

    Key Concepts & Issues Review                                                                   Research design …
                                                                                                   Information security
    Privacy                                                                                       Disclosure
        Control over extent and circumstances of sharing                                          limitation
    Confidentiality
        Treatment of private, sensitive information
    Sensitive information
        Information that would cause harm if disclosed and linked to an individual
    Personally/individually identifiable information
        Private information
        Directly or indirectly linked to an identifiable individual
    Human subjects
     A living person …
         who is interacted with to obtain research data
         who‟s private identifiable information is included in research data
    Research
        Systematic investigation
        Designed to develop or contribute to generalizable knowledge
    “Common Rule”
        Law governing funded human subjects research
    HIPAA
        Law governing use of personal health information in covered and associated entities
    MA 201 CMR 17
        Law governing use of certain personal identifiers for Massachusetts residents


    36                                                                                [Micah Altman, 3/10/2011]
Law, policy, ethics

        Checklist: Identify Requirements                            Research design …
                                                                    Information security

Check if research includes …                       Disclosure
                                                   limitation
 Interaction with humans  Common Rule &HEISP/HRDSP
  applies

Check if data used includes identified …
 Student records  FERPA &HEISP/HRDSP applies
 State, federal, financial id‟s  state law &HEISP/HRDSP applies
 Medical/health information  HIPAA (likely) &HEISP/HRDSP
  applies
 Human subjects & private info
   Common Rule &HEISP/HRDSP applies

Check for other requirements/restriction on data dissemination:
 Data provider restrictions and University approvals thereof
 Open data requirements and norms
 University information policy
   37                                               [Micah Altman, 3/10/2011]
Law, policy, ethics

         Resources                                                                                     Research design …

        E.A. Bankert& R.J. Andur, 2006, Institutional Review Board: Management and Function,
                                                                                                       Information security
         Jones and Bartlett Publishers
                                                                                                       Disclosure
        P. Ohm, “Broken Promises of Privacy”, SSRN Working Paper                                      limitation
         [ssrn.com/abstract=1450006]
        D. J. Mazur, 2007. Evaluating the Science and ethics of Research on Humans, Johns Hopkins University
         Press
        IRB: Ethics & Human Research [Journal], Hastings Press
         www.thehastingscenter.org/Publications/IRB/
        Journal of Empirical Research on Human Research Ethics, University of California Press
         ucpressjournals.com/journal.asp?j=jer
        201 CMR 17 text
         www.mass.gov/Eoca/docs/idtheft/201CMR17amended.pdf
        FERPA Website
         www.ed.gov/policy/gen/guid/fpco/ferpa/index.html
        HIPAA Website
         www.hhs.gov/ocr/privacy/
        Common Rule Website
         www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.htm
        State laws
         www.ncsl.org/Default.aspx?TabId=13489
        Harvard Enterprise Information Security Policy/ Research Data Security Policy
         www.security.harvard.edu
        Harvard Institutional Review Board
     www.fas.harvard.edu/~research/hum_sub/
        Harvard FAS Policies and Procedures
         www.fas-it.fas.harvard.edu/services/catalog/browse/39
        IQSS Policies and Procedures
         support.hmdc.harvard.edu/kb-930/hmdc_policies


38                                                                                       [Micah Altman, 3/10/2011]
Research design, methods,                 Law, policy, ethics


                   management
                                                       Research design
                                                       …
                                                       Information security
                                                       Disclosure
        Reducing risk                                 limitation


            Sensitivity of information
            Partitioning
        Decreasing identification
        Managing confidentiality and
         dissemination
        [Summary]




39                                        [Micah Altman, 3/10/2011]
Law, policy, ethics

Trade-offs                                                    Research design
                                                              …
                                                              Information security
    Anonymity vs. research utility                  Disclosure
                                                     limitation
    Sensitivity vs. research utility
    (Anonymity * Sensitivity) vs. research costs/efforts




    40                                           [Micah Altman, 3/10/2011]
Law, policy, ethics

Types of Sensitive Information                                    Research design
                                                                  …
                                                                  Information security
    Information is sensitive, if, once disclosed there
                                                      Disclosure
                                                      limitation
     is a “significant” likelihood of harm
    IRB literature suggests possible categories of
     harm:
        loss of insurability
        loss of employability
        criminal liability
        psychological harm
        social harm to a vulnerable group
        loss of reputational harm
        emotional harm
        dignitary harm
        physical harm: risk of disease, injury, or death
    41                                               [Micah Altman, 3/10/2011]
Law, policy, ethics

    Levels of sensitivity                                                                                                  Research design
                                                                                                                           …

   No widely accepted scale                                                                                               Information security
   Publicly available data not sensitive under “common rule”                                                              Disclosure
   Common rule anchors scale at “minimal risk”:                                                                           limitation
    “if disclosed, the probability and magnitude of harm or discomfort anticipated are not greater in and of
         themselves than those ordinarily encountered in daily life or during the performance of routine
         physical or psychological examinations or tests”
   Harvard Research Data Security Policy
       Level 5- Extremely sensitive information about individually identifiable people.
        Information that if exposed poses significant risk of serious harm.
        Includes information posing serious risk of criminal liability, serious psychological harm or other
        significant injury, loss of insurability or employability, or significant social harm to an individual or
        group.

       Level 4 - Very sensitive information about individually identifiable people
        Information that if exposed poses a non-minimal risk of moderate harm.
        Includes civil liability, moderate psychological harm, or material social harm to individuals or
        groups, medical records not classified as Level 5, sensitive-but-unclassified national security
        information, and financial identifiers (as per HRCI standards).

       Level 3- Sensitive information about individually identifiable people
        Information that if disclosed poses a significant risk of minor harm.
        Includes information that would reasonably be expected to damage reputation or cause
        embarrassment; and FERPA records.

       Level 2 – Benign information about individually identifiable people
        Information that would not be considered harmful, but that as to which a subject has been
        promised confidentiality.

       Level 1 – De-identified information about people, and information not about people

        42                                                                                                    [Micah Altman, 3/10/2011]
Law, policy, ethics

IRB Review Scope                                                        Research design
                                                                        …
    IRB approval needed for all:                                       Information security
        federally-funded research;                               Disclosure
                                                                  limitation
        or any research at (almost all) institutions receiving federal
         funding that involve “human subjects”.
         ( Any organization operating under a general “federal-wide
         assurance”)
        All human subjects research at Harvard
    Human subject: individual about whom an investigator
     (whether professional or student) conducting research
     obtains
          (1) Data through intervention or interaction with a living
         individual, or
          (2) Identifiable private information about living individuals

    See
                      www.hhs.gov/ohrp/

    43                                                     [Micah Altman, 3/10/2011]
Law, policy, ethics

Research not requiring IRB approval                                                                   Research design
                                                                                                      …

    Non-research:                                                                                    Information security
     not generalizable knowledge & systematic inquiry
                                                                                                      Disclosure
    Non-funded:                                                                                      limitation
     institution receives no federal funds for research
    Not human subject:
        No living people described
        Observation only AND no private identifiable information is obtained
    Human Subjects, but “exempt” under 45 CFR 46
        use of existing, publicly-available data
        use of existing non-public data, if data is individuals cannot be directly or
         indirectly identified
        research conducted in educational settings, involving normal educational
         practices
        taste & food quality evaluation
        federal program evaluation approved by agency head
        observational, survey, test & interview of public officials and candidates
         (in their formal capacity, or not identified)
    Caution not all “exempt” is exempt…
       Some research on prisoners, children, not exemptable
       Some universities require review of “exempt” research
    Harvard requires review of all human subject research
    See:
     www.hhs.gov/ohrp/humansubjects/guidance/
     decisioncharts.htm




    44                                                                                   [Micah Altman, 3/10/2011]
Law, policy, ethics

IRB’s and Confidential Information                            Research design
                                                              …
                                                              Information security
                                                              Disclosure
    IRB‟s review consent procedures and documentation        limitation


    IRB‟s may review data management plans
        May require procedures to minimize risk of disclosure
        May require procedures to minimize harm resulting from
         disclosure
    IRB‟s make determination of sensitivity of information
     -- potential harm resulting from disclosure
    IRB‟s make determination regarding whether data is
     de-identified for “public use”
     [see NHRPAC,
     “Recommendations on Public Use Data Files”]

    45                                           [Micah Altman, 3/10/2011]
Law, policy, ethics

Harvard IRB Approval                                                         Research design
                                                                             …
                                                                             Information security
                                                                             Disclosure
    The Harvard Institutional Review Board (IRB) must approve alllimitation
     human subjects research at Harvard prior to data collection or use
    Research involves human subjects if:
        There is any interaction or intervention with living humans; or
        If identifiable private data about living humans is used
    Some examples of human subject research in soc sci:
        Surveys
        Behavioral experiments
        Educational tests and evaluations
        Analysis of identified private data collected from people
         (your e-mail inbox, logs of web-browsing activity, facebook activity, ebay
         bids … )
    The IRB will:
        Assess research protocol
        Identify whether research is exempt from further review and management
        Identify sensitivity level of data



    46                                                          [Micah Altman, 3/10/2011]
Law, policy, ethics

Harvard Responsibilities                                                     Research design
                                                                             …
                                                                             Information security
                                                                             Disclosure
    The Harvard Institutional Review Board (IRB) must approve alllimitation
     human subjects research at Harvard prior to data collection or use
    Research involves human subjects if:
        There is any interaction or intervention with living humans; or
        If identifiable private data about living humans is used
    Some examples of human subject research in soc sci:
        Surveys
        Behavioral experiments
        Educational tests and evaluations
        Analysis of identified private data collected from people
         (your e-mail inbox, logs of web-browsing activity, facebook activity, ebay
         bids … )
    The IRB will:
        Assess research protocol
        Identify whether research is exempt from further review and management
        Identify sensitivity level of data



    47                                                          [Micah Altman, 3/10/2011]
Law, policy, ethics

    HRDSP Responsibilities                                          Research design
                                                                    …
                                                                    Information security
                                                                    Disclosure
   Responsibilities                                                limitation


       Researchers are responsible for disclosing to IRB, and follow
        IRB approved plan
       IRB is responsible for ensuring adequacy of Investigators
        plans; granting (lawful) variances from security requirements
        justified by research needs
       IT is responsible for assisting with the identification of security
        level, and assisting in the implementation of security
        protections
       Security Officer/CIO may review IT facilities and approve (give
        written designation) that they meet protections for a given level



        48                                             [Micah Altman, 3/10/2011]
Valuation of private                                                                                 Law, policy, ethics

information is uncertain                                                                             Research design
                                                                                                     …
                                                                                                     Information security
    Privacy valuations often inconsistent
                                                                                Disclosure
       Framing effects: ordering, endowment effect, possibly others            limitation
      Non-normal/uniform distribution of valuations
      One study: < 10% of subjects would give up $2 of a $12 gift card to buy anonymity
        of purchases
     [Aquesti and Lowenstein 2009]
    Cost benefit of information security may not be optimal for users [Herley
     2009]
        E.g. Loss from all phishing attacks is 100x less than time spent in avoiding them
        Note, however weaknesses in this analysis:
            Only loss of time modeled – no valuation of privacy made
            Institutional costs not included – only personal costs
            Very simplified model – not calibrated through surveys etc.
    Repeated surveys of students show they tend to disclose a lot, e.g.:
            >80% of students sampled in several studies had public facebook pages with birthdays,
             home town and other private information
               This information can easily be used to link to other databases!
              Disclosure of extensive information on sexual orientation, private cell #‟s, drinking habits, etc.
                etc. not uncommon
             [See Kolek& Saunders 2008]
    Emerging markets for privacy?
      Micropayments for disclosures
      http://www.personal.com/
    49 http://www.i-allow.com/
     
                                                                                      [Micah Altman, 3/10/2011]
Law, policy, ethics

Reducing Risk in Data Collection                                            Research design
                                                                            …
                                                                            Information security
                                                                            Disclosure
    Avoid collecting sensitive information, unless it is limitation
     required by research design, method, or hypothesis
        Unnecessary sensitive information  not minimal risk
        Reducing sensitivity  higher participation, greater honesty
    Collect sensitive information in private settings
        Reduces risk of disclosure
        Increases participation
    Reduce sensitivity through indirect measures
        Less sensitive proxies
            E.g. Implicit association test [Greenwald, et al. 1998]
        Unfolding brackets
        Group response collection
        Random response technique [Warner 1965]
        Item count/unmatched count/list experiment technique


    50                                                         [Micah Altman, 3/10/2011]
Managing Sensitive Data                                            Law, policy, ethics

Collection                                                         Research design
                                                                   …
                                                                   Information security
                                                                   Disclosure
    Separate:                                                     limitation

     sensitive measures, (quasi)-identifiers, other
     measures
    If possible avoid storing identifiers with measures:
        Collect identifying information beforehand
        Assign opaque subject identifiers
    For sensitive data:
        Collect on-line directly (with appropriate protections); or
        Encrypt collection devices/media (laptops, usb keys, etc)
    For very/extremely sensitive data:
        Collect with oversight directly; then
        Store on encrypted device and;
        Transfer to secure server as soon as feasible
    51                                                [Micah Altman, 3/10/2011]
Randomized Response                                    Law, policy, ethics

Technique                                              Research design
                                                       …
                                                       Information security
                                                       Disclosure
                                                       limitation
                     Sensitiv
                       e
                     questio
                       n

          Subjec                Record
           t rolls
          a die >               Answer
              2

                                         Variations:
                                         - Ask two different
                                         questions
                                         - Item counts with
                                         sensitive and non-
                      Say
                                         sensitive items – eliminate
                     “YES”               subject-randomization
                                         - Regression analysis
                                         methods



52                                       [Micah Altman, 3/10/2011]
Law, policy, ethics

   Our Table – Less (?) Sensitive                                 Less (?)
                                                                                Research design
                                                                                …
                                                                  Sensitive     Information security
Name       SSN     Birthdate   Zipcode   Gender   Favorite    Treat?    #       Disclosure
                                                  Ice Cream             acts    limitation
                                                                        *
A.Jones    12341   01011961    02145     M        Raspberry   0         0
B. Jones   12342   02021961    02138     M        Pistachio   1         20
C. Jones   12343   11111972    94043     M        Chocolate   0         0
D. Jones   12344   12121972    94043     M        Hazelnut    1         12
E. Jones   12345   03251972    94041     F        Lemon       0         0
F. Jones   12346   03251972    02127     F        Lemon       1         7
G. Jones   12347   08081989    02138     F        Peach       0         1
H. Smith   12348   01011973    63200     F        Lime        1         17
I. Smith   12349   02021973    63300     M        Mango       0         4
J. Smith   12350   02021973    63400     M        Coconut     1         18
K. Smith   12351   03031974    64500     M        Frog        0         32
L. Smith   12352   04041974    64600     M        Vanilla     1         65
M. Smith   12353   04041974    64700     F        Pumpkin     0         128
N. Smith   12354   04041974    64800     F        Allergic    1         256
* Acts = crimes if treatment = 0; crimes + acts of generosity if treatment =1
     53                                                            [Micah Altman, 3/10/2011]
Randomized Response –                                                                      Law, policy, ethics

Pros and Cons                                                                              Research design
                                                                                           …
                                                                                           Information security
    Pros
                                                                                           Disclosure
        Can substantially reduce risks of disclosure                                      limitation
        Can increase response rate
        Can decrease mis-reporting
    Warning!
        None of the randomized models uses a formal measure of disclosure limitation
        Some would clearly violate measures (such as differential privacy) we‟ll see in
         section 4
        Do not use as a replacement for disclosure limitation
    Other Issues
        Loss of statistical efficiency
         (if compliance would otherwise be the same)
        Complicates data analysis, especially model-based analysis
        Leaving randomization up to subject can be unreliable
        May provide less confidentiality protection if:
            Randomization is incomplete
            Records of randomization assignment are kept
            Lists of responses overlap across questions
            Sensitive question response is large enough to dominate overall response
            Non-sensitive question responses are extremely predictable, or publicly observable


    54                                                                        [Micah Altman, 3/10/2011]
Law, policy, ethics

Partitioning Information                                     Research design
                                                             …
                                                             Information security
    Reduces risk in information management                  Disclosure
                                                             limitation
    Partition data information based on sensitivity
        Identifying information
        Descriptive information
        Sensitive information
        Other information
    Segregate
        Storage of information
        Access regimes
        Data collections channels
        Data transmission channels
    Plan to segregate as early as feasible in data collection
     and processing
    Link segregated information with artificial keys …
    55                                          [Micah Altman, 3/10/2011]
Partitioned table
                                                                Not Identified
Name       SSN     Birthdate   Zipcode   Gender   LINK   LINK          Favorite     Treat   #
                                                                       Ice Cream            act
                                                                                            s
A.Jones    12341   0101196     02145     M        1401   1401          Raspberry    0       0
                   1
                                                         283           Pistachio    1       20
B.         12342   0202196     02138     M        283
                                                         8979          Chocolate    0       0
Jones              1
                                                         7023          Hazelnut     1       12
C.         12343   11111972    94043     M        8979
Jones                                                    1498          Lemon        0       0
D.         12344   1212197     94043     M        7023   1036          Lemon        1       7
Jones              2
                                                         3864          Peach        0       1
E.         12345   0325197     94041     F        1498
                                                         2124          Lime         1       17
Jones              2
                                                         4339          Mango        0       4
F.         12346   0325197     02127     F        1036
     Jon           2                                     6629          Coconut      1       18
     es                                                  9091          Frog         0       32
G.         12347   0808198     02138     F        3864
                                                         9918          Vanilla      1       65
     Jon           9
     es                                                  4749          Pumpkin      0       12
                                                                                            8
H. Smith   12348   0101197     63200     F        2124
                   3                                     8197          Allergic     1       25
                                                                                            6
I. Smith   12349   0202197     63300     M        4339
      56           3                                               [Micah Altman, 3/10/2011]
Law, policy, ethics

Choosing Linking Keys                                                                                                     Research design
                                                                                                                          …

    Entirely randomized                                                                                                  Information security
        Most resistant to relink                                                                                         Disclosure
        Mapping from original id to random keys is highly sensitive                                                      limitation
        Must keep and be able to access mapping to add new identified data
        Most computer-generated random numbers are not sufficient by themselves
            Most are PSEUDO random – predictable sequences
            Use a cryptographic secure PRNG: Blum Blum Shub, AES (or other block cypher) in counter mode OR
            Use real random numbers (e.g. from physical sources – see http://maltman.hmdc.harvard.edu/numal/) OR
            Use a PRNG with a real random seed to random the order of the table; then another to generate the ID‟s for this randomly
             ordered table
    Encryption
        More troublesome to compute
        Same id‟s + same key + same “salt” produces same values  facilitates merging
        ID‟s can be recovered if key is exposed, cracked, or algorithm weak
    Cryptographic Hash
     e.g. SHA-256
        Security is well understood
        Tools available to compute
        Same id‟s produce same hashes  easier to merge new identified data
        ID‟s cannot be recovered from hash because hash loses information
        ID‟s can be confirmed if identifying information is known or guessable
    Cryptographic Hash + secret key
        Security is well understood
        Tools available to compute
        Same id‟s produce same hashes  easier to merge new identified data
        ID‟s cannot be recovered from hash because hash loses information
        ID‟s cannot be confirmed unless key is also known
    Do not choose arbitrary mathematical functions of other identifiers!
    57                                                                                                  [Micah Altman, 3/10/2011]
Anonymous Data Collection:                                        Law, policy, ethics


Pros & Cons
                                                                  Research design
                                                                  …
                                                                  Information security
    Pros                                                         Disclosure
                                                                  limitation
        Presumption that data is not identifiable
        May increase participation
        May increase honesty
    Cons
        Barrier to follow-up, longitudinal studies
        Can conflict with quality control, validation
        Data still may be indirectly identifiable if respondent
         descriptive information is collected
        Linking data to other sources of information may have
         large research benefits



    59                                               [Micah Altman, 3/10/2011]
Law, policy, ethics

Anonymous Data Collection Methods                          Research design
                                                           …
                                                           Information security
    Trusted third party intermediates            Disclosure
                                                  limitation
    Respondent initiates re-contacts
    No identifying information recorded
    Use id‟s randomized to subjects, destroy mapping




    60                                        [Micah Altman, 3/10/2011]
Law, policy, ethics

Remote Data Collection Challenges                                      Research design
                                                                       …
                                                             Information security
    Where network connection is readily available,          Disclosure
     easy to transfer as collected, or enter on remote systemlimitation
        Encrypted network file transfer (e.g. SFTP, part of ssh )
        Encrypted/tunneled network file system (e.g. Expandrive)
    Where network connection is less reliable, high bandwidth
        Whole-disk-encrypted laptop
        Plus, Encrypted cloud backup solutions: CrashPlan, BackBlaze,
         SpiderOak
    Small data, short term
        Encrypted USB keys (e.g. w/IronKey, TrueCrypt, PGP)
    Foreign Travel
        Be aware of U.S. EAR export restrictions, use commercial or
         widely-available open encryption software only. Do not use
         bespoke software.
        Be aware of country import restrictions (as of 2008): Burma,
         Belarus, China, Hungary, Iran, Israel, Morocco, Russia, Saudi
         Arabia, Tunisia, Ukraine
        Encrypt data if possible, but don‟t break foreign laws. Check with
         department of state.
    61                                                    [Micah Altman, 3/10/2011]
Online/Electronic data collection                                                              Law, policy, ethics


challenges
                                                                                               Research design
                                                                                               …
                                                                                               Information security
    IP addresses are identifiers
        IP addresses can be logged automatically by host, even if not intended by researcher  Disclosure
                                                                                               limitation
        IP addresses can trivially be observed as data is collected
        Partial ID numbers can be used for probabilistic geographical identification at sub-zipcode
         levels
    Cookies may be identifiers
        Cookies provide a way to link data more easily
        May or may not explicitly identify subject
    Jurisdiction
        Data collected from subjects from other states / countries could subject you to laws in that
         jurisdiction
        Jurisdiction may depend on residency of subject, availability of data collection instrument in
         jurisdiction, or explicit data collections efforts within jurisdiction
    Vendor
        Vendor could retain IP addresses, identifying cookies, etc., even if not intended by researcher
    Recommendation
        Use only vendors that certify compliance with your confidentiality policy
        Do not retain IP numbers if data is being collected anonymously
        Use SSL/TLS encryption unless data is non-sensitive and anonymous
 Some tools for anonymizing IP addresses and system/network logs
www.caida.org/tools/taxonomy/anonymization.xml
 Harvard policy
        Recommendations as above
     
    62   Plus: do not use or display Level 4+ for web surveys                    [Micah Altman, 3/10/2011]
Law, policy, ethics

Certificates of Confidentiality                               Research design
                                                              …
                                                              Information security
    Issued by DHHS agencies such as NIH, CDC, Disclosure
                                                      FDA
                                                     limitation
    Protects against many types of forced legal
     disclosure of confidential information
    May not protect against all state disclosure law
    Does not protect against voluntary disclosures by
     researcher/research institutions




    64                                           [Micah Altman, 3/10/2011]
Law, policy, ethics

Confidentiality & Consent                                               Research design
                                                                        …
                                                                        Information security
Best practice is to describe in consent form...            Disclosure
                                                           limitation
 Practices in place to protect confidentiality
 Plans for making the data available, to whom, and under
  what circumstances, rationale.
 Limitations on confidentiality (e.g. limits to a certificate of
  confidentiality under state law, planned voluntary
  disclosure)
 Consent form should be consistent with your:
        Data management plan
        Data sharing plans and requirements
    Not generally best practice to promise
        Unlimited confidentiality
        Destruction of all data
        Restriction of all data to original researchers

    65                                                     [Micah Altman, 3/10/2011]
Law, policy, ethics

Data Management Plan                                                                     Research design
                                                                                         …
                                                                                         Information security
   When is it required?
       Any NIH request over $500K                                                       Disclosure
                                                                                         limitation
       All NSF proposals after 12/31/2010
       NIJ
       Wellcome Trust
       Any proposal where collected data will be a resource beyond the project
   Safeguarding data during collection
       Documentation
       Backup and recovery
       Review
   Treatment of confidential information
       Overview: http://www.icpsr.org/DATAPASS/pdf/confidentiality.pdf
       Separation of identifying and sensitive information
       Obtain certificate of confidentiality, other legal safeguards
       De-identification and public use files
   Dissemination
       Archiving commitment (include letter of support)
       Archiving timeline
       Access procedures
       Documentation
       User vetting, tracking, and support

                              One size does not fit all projects.
        66                                                                  [Micah Altman, 3/10/2011]
Data Management Plan                                                                                                                                   Law, policy, ethics

Outline                                                                                                                                                Research design
                                                                                                                                                       …
                                                                                                                                                       Information security

   Data description                                               Planned documentation and supporting                 Budget                    Disclosure
                                                                    materials                                                                       limitation
        nature of data {generated, observed,                                                                                 Cost of preparing data and documentation
         experimental information; amples; publications;           Quality assurance procedures for metadata
         physical collections; software; models}                    and documentation                                         Cost of permanent archiving
        scale of data                                         Data Organization [if complex]                           Intellectual Property Rights
   Access and Sharing                                             File organization                                         Entities who hold property rights
        Plans for depositing in an existing public                Naming conventions                                        Types of IP rights in data
         database                                                                                                             Protections provided
                                                               Quality Assurance [if not described in main
        Access procedures                                      proposal]                                                     Dispute resolution process
        Embargo periods                                           Procedures for ensuring data quality in              Legal Requirements
        Access charges                                             collections, and expected measurement error
                                                                                                                              Provider requirements and plans to meet them
        Timeframe for access                                      Cleaning and editing procedures
                                                                                                                              Institutional requirements and plans to meet
        Technical access methods                                  Validation methods                                         them
        Restrictions on access                                Storage, backup, replication, and                        Archiving and Preservation
   Audience
                                                                versioning                                                    Requirements for data destruction, if applicable
                                                                   Facilities                                                Procedures for long term preservation
        Potential secondary users
                                                                   Methods                                                   Institution responsible for long-term costs of
        Potential scope or scale of use
                                                                   Procedures                                                 data preservation
        Reasons not to share or reuse
                                                                   Frequency                                                 Succession plans for data should archiving
   Existing Data [ If applicable ]                                                                                            entity go out of existence
                                                                   Replication
        description of existing data relevant to the                                                                    Ethics and privacy
         project                                                   Version management
                                                                                                                              Informed consent
        plans for integration with data collection                Recovery guarantees
                                                                                                                              Protection of privacy
        added value of collection, need to collect/create     Security
         new data                                                                                                             Other ethical issues
                                                                   Procedural controls
   Formats                                                        Technical Controls
                                                                                                                         Adherence
        Generation and dissemination formats and                                                                             When will adherence to data management plan
                                                                   Confidentiality concerns                                   be checked or demonstrated
         procedural justification
                                                                   Access control rules                                      Who is responsible for managing data in the
        Storage format and archival justification
                                                                   Restrictions on use                                        project
   Metadata and documentation
                                                               Responsibility                                                Who is responsible for checking adherence to
        Metadata to be provided                                                                                               data management plan
                                                                   Individual or project team role responsible for
        Metadata standards used                                    data management
        Treatment of field notes, and collection records
          67                                                                                                                   [Micah Altman, 3/10/2011]
IQSS                                                               Law, policy, ethics


Data Management Services
                                                                   Research design
                                                                   …
                                                                   Information security
   The Henry A. Murray Research Archive                           Disclosure
       Harvard’s endowed permanent data archive                   limitation

       Assists in developing data management plans
       Can provide cataloging assistance for public release of
        data
       Dissemination of data through IQSS Dataverse Network
   The IQSS Dataverse Network
       Standard data management plan for public, small data
       Provides easy virtual archiving and dissemination
       Data is catalogued and controlled by you
       You theme and brand your virtual archive
       Universally searchable, citable
       Automatically provides data formatting and statistical
        analysis on-line
                         http://dvn.iq.harvard.edu
        68                                            [Micah Altman, 3/10/2011]
Data Management Plans Examples                                                                                 Law, policy, ethics


(Summaries)
                                                                                                               Research design
                                                                                                               …

    Example 1                                                                                                  Information security
    The proposed research will involve a small sample (less than 20 subjects) recruited from clinical facilities in the
                                                                                                                Disclosure
     New York City area with Williams syndrome. This rare craniofacial disorder is associated with distinguishing facial
                                                                                                                limitation
     features, as well as mental retardation. Even with the removal of all identifiers, we believe that it would be difficult if
     not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical
     data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting
     subjects. Therefore, we are not planning to share the data.
    Example 2
    The proposed research will include data from approximately 500 subjects being screened for three bacterial
     sexually transmitted diseases (STDs) at an inner city STD clinic. The final dataset will include self-reported
     demographic and behavioral data from interviews with the subjects and laboratory data from urine specimens
     provided. Because the STDs being studied are reportable diseases, we will be collecting identifying information.
     Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains
     the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and
     associated documentation available to users only under a data-sharing agreement that provides for: (1) a
     commitment to using the data only for research purposes and not to identify any individual participant; (2) a
     commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or
     returning the data after analyses are completed.
    Example 3
    This application requests support to collect public-use data from a survey of more than 22,000 Americans over the
     age of 50 every 2 years. Data products from this study will be made available without cost to researchers and
     analysts. https://ssl.isr.umich.edu/hrs/
    User registration is required in order to access or download files. As part of the registration process, users must
     agree to the conditions of use governing access to the public release data, including restrictions against attempting
     to identify study participants, destruction of the data after analyses are completed, reporting responsibilities,
     restrictions on redistribution of the data to third parties, and proper acknowledgement of the data resource.
     Registered users will receive user support, as well as information related to errors in the data, future releases,
     workshops, and publication lists. The information provided to users will not be used for commercial purposes, and
     will not be redistributed to third parties.

    FROM NIH, [grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#ex]


    69                                                                                        [Micah Altman, 3/10/2011]
External Data Usage
Agreements                                   Controls on Confidential Information

                                             Harvard University has developed extensive technical and administrative
    Between provider and                    procedures to be used with all identified personal information and other
                                             confidential information. The University classifies this form of information internally
     individual                              as "Harvard Confidential Information" -- or HCI.


        Careful – you‟re liable             Any use of HCI at Harvard includes the following safeguards:

                                             - Systems security. Any system used to store HCI is subject to a checklist of
        Harvard will not help if you sign   technical and procedural security measures including: operating system and
                                             applications must be patched to current security levels, a host based firewall is
        University review and OSP           enabled, anti virus software is enabled and the definitions file is current
                                             - Server security. Any server used to distribute HCI to other systems (e.g. through
         signature strongly                  providing remote file system), or otherwise offering login access, must employ

         recommended                         additional security measures including: connection through a private network only;
                                             limitation on length of idle sessions; limitations on incorrect password attempts;
                                             and additional logging and monitoring.
    Between provider and                    - Access restriction: an individual is allowed to access HCI only if there is a
     University                              specific need for access. All access to HCI is over physically controlled and/or
                                             encrypted channels.
        University Liable                   - Disposal processes: including secure file erasure and document destruction.
                                             - Encryption: HCI must be strongly encrypted whenever it is transmitted across a

        Requires University Approved        public network, stored on a laptop, or stored on a portable device such as a flash
                                             drive or on portable media.
         Signer : OSP                        This is only a brief summary. The full University security policy can be found here:
                                                 http://security.harvard.edu/heisp
    Avoid nonstandard protections
                                             And a more detailed checklist used to verify systems compliance is found here:
     whenever possible                         http://security.harvard.edu/files/resources/forms/

        DUA‟s can impose very specific      These safeguards are applied consistently throughout the university, we believe
         and detailed requirements           that these requirements offer stringent protection for the requested data. And
                                             these requirements will be applied in addition to any others required by specific
        Compatible in spirit does not       data use agreement.

         apply compatibility in legal
         practice
     
    70   Use University                                                        [Micah Altman, 3/10/2011]
         policies/procedures as a
Law, policy, ethics

IQSS Data Management Services                                    Research design
                                                                 …
                                                                 Information security
    The Henry A. Murray Research Archive                        Disclosure
        Harvard’s endowed permanent data archive                limitation

        Assists in developing data management plans
        Can provide cataloging assistance for public release of
         data
        Dissemination of data through IQSS Dataverse Network
        Provides letters of commitment to permanent archiving
                         www.murray.harvard.edu

    The IQSS Dataverse Network
        Provides easy virtual archiving and dissemination
        Data is catalogued and controlled by you
        You theme and brand your virtual archive
        Universally searchable, citable
        Automatically provides data formatting and statistical
         analysis on-line
                            dvn.iq.harvard.edu
    71                                              [Micah Altman, 3/10/2011]
Law, policy, ethics

    Key Concepts & Issues Review                   Research design
                                                   …
                                                   Information security
    Levels of sensitivity                         Disclosure
                                                   limitation
    Anonymity criteria
    Sensitivity reduction
    Certificate of Confidentiality
    Data sharing plan
    Data management plan
    Information Partitioning
    Linking Keys




    72                                [Micah Altman, 3/10/2011]
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research
Managing Confidential Information in Research

Contenu connexe

Tendances

HL7 & Health Information Exchange in Thailand
HL7 & Health Information Exchange in ThailandHL7 & Health Information Exchange in Thailand
HL7 & Health Information Exchange in ThailandNawanan Theera-Ampornpunt
 
Patients Medical Records - Paper Based vs Electronic Medical Records (EMR)
Patients Medical Records - Paper Based vs Electronic Medical Records (EMR)Patients Medical Records - Paper Based vs Electronic Medical Records (EMR)
Patients Medical Records - Paper Based vs Electronic Medical Records (EMR)SoftClinic Software
 
Electronic Records Management An Overview
Electronic Records Management An OverviewElectronic Records Management An Overview
Electronic Records Management An OverviewKen Matthews
 
Healthcare Data Management: Three Principles of Using Data to Its Full Potential
Healthcare Data Management: Three Principles of Using Data to Its Full PotentialHealthcare Data Management: Three Principles of Using Data to Its Full Potential
Healthcare Data Management: Three Principles of Using Data to Its Full PotentialHealth Catalyst
 
Principles of records management Mushi
Principles of records management MushiPrinciples of records management Mushi
Principles of records management Mushisylvanus mushi
 
Staffing and personal management
Staffing and personal managementStaffing and personal management
Staffing and personal managementDhani Ahmad
 
Confidentiality and Special Education Training by Madison County Schools
Confidentiality and Special Education Training by Madison County SchoolsConfidentiality and Special Education Training by Madison County Schools
Confidentiality and Special Education Training by Madison County SchoolsAtlantic Training, LLC.
 
The Importance of Records Management
The Importance of Records ManagementThe Importance of Records Management
The Importance of Records ManagementDanique Arthurs
 
Upholding confidentiality
Upholding confidentialityUpholding confidentiality
Upholding confidentialityTheresa Tapley
 
Confidentiality and Data Protection in Health Care
Confidentiality and Data Protection in Health CareConfidentiality and Data Protection in Health Care
Confidentiality and Data Protection in Health CareVaileth Mdete
 
Records Retention Scheduling
Records Retention SchedulingRecords Retention Scheduling
Records Retention SchedulingFe Angela Verzosa
 
ETHICAL CONSIDERATIONS IN RESEARCH.pptx
ETHICAL CONSIDERATIONS IN RESEARCH.pptxETHICAL CONSIDERATIONS IN RESEARCH.pptx
ETHICAL CONSIDERATIONS IN RESEARCH.pptxEjhayeBlas
 
Action research ch 23
Action research ch 23Action research ch 23
Action research ch 23kholodOlemat
 
Qualitative & Mixed Methods Research
Qualitative & Mixed Methods ResearchQualitative & Mixed Methods Research
Qualitative & Mixed Methods ResearchSohail Bajammal
 
DOCTORAL STUDY ORAL DEFENSE - MEDICAL IDENTITY THEFT AND PALM VEIN AUTHENTICA...
DOCTORAL STUDY ORAL DEFENSE - MEDICAL IDENTITY THEFT AND PALM VEIN AUTHENTICA...DOCTORAL STUDY ORAL DEFENSE - MEDICAL IDENTITY THEFT AND PALM VEIN AUTHENTICA...
DOCTORAL STUDY ORAL DEFENSE - MEDICAL IDENTITY THEFT AND PALM VEIN AUTHENTICA...CRUZ CERDA
 

Tendances (20)

HL7 & Health Information Exchange in Thailand
HL7 & Health Information Exchange in ThailandHL7 & Health Information Exchange in Thailand
HL7 & Health Information Exchange in Thailand
 
Patients Medical Records - Paper Based vs Electronic Medical Records (EMR)
Patients Medical Records - Paper Based vs Electronic Medical Records (EMR)Patients Medical Records - Paper Based vs Electronic Medical Records (EMR)
Patients Medical Records - Paper Based vs Electronic Medical Records (EMR)
 
HIPAA & PHI Training
HIPAA & PHI TrainingHIPAA & PHI Training
HIPAA & PHI Training
 
Lecture 4 confidentiality, disclosure and the law.1
Lecture 4  confidentiality, disclosure and the law.1Lecture 4  confidentiality, disclosure and the law.1
Lecture 4 confidentiality, disclosure and the law.1
 
Overview on data privacy
Overview on data privacy Overview on data privacy
Overview on data privacy
 
Electronic Records Management An Overview
Electronic Records Management An OverviewElectronic Records Management An Overview
Electronic Records Management An Overview
 
Computer based record
Computer based recordComputer based record
Computer based record
 
Healthcare Data Management: Three Principles of Using Data to Its Full Potential
Healthcare Data Management: Three Principles of Using Data to Its Full PotentialHealthcare Data Management: Three Principles of Using Data to Its Full Potential
Healthcare Data Management: Three Principles of Using Data to Its Full Potential
 
Case study method
Case study methodCase study method
Case study method
 
Principles of records management Mushi
Principles of records management MushiPrinciples of records management Mushi
Principles of records management Mushi
 
Staffing and personal management
Staffing and personal managementStaffing and personal management
Staffing and personal management
 
Confidentiality and Special Education Training by Madison County Schools
Confidentiality and Special Education Training by Madison County SchoolsConfidentiality and Special Education Training by Madison County Schools
Confidentiality and Special Education Training by Madison County Schools
 
The Importance of Records Management
The Importance of Records ManagementThe Importance of Records Management
The Importance of Records Management
 
Upholding confidentiality
Upholding confidentialityUpholding confidentiality
Upholding confidentiality
 
Confidentiality and Data Protection in Health Care
Confidentiality and Data Protection in Health CareConfidentiality and Data Protection in Health Care
Confidentiality and Data Protection in Health Care
 
Records Retention Scheduling
Records Retention SchedulingRecords Retention Scheduling
Records Retention Scheduling
 
ETHICAL CONSIDERATIONS IN RESEARCH.pptx
ETHICAL CONSIDERATIONS IN RESEARCH.pptxETHICAL CONSIDERATIONS IN RESEARCH.pptx
ETHICAL CONSIDERATIONS IN RESEARCH.pptx
 
Action research ch 23
Action research ch 23Action research ch 23
Action research ch 23
 
Qualitative & Mixed Methods Research
Qualitative & Mixed Methods ResearchQualitative & Mixed Methods Research
Qualitative & Mixed Methods Research
 
DOCTORAL STUDY ORAL DEFENSE - MEDICAL IDENTITY THEFT AND PALM VEIN AUTHENTICA...
DOCTORAL STUDY ORAL DEFENSE - MEDICAL IDENTITY THEFT AND PALM VEIN AUTHENTICA...DOCTORAL STUDY ORAL DEFENSE - MEDICAL IDENTITY THEFT AND PALM VEIN AUTHENTICA...
DOCTORAL STUDY ORAL DEFENSE - MEDICAL IDENTITY THEFT AND PALM VEIN AUTHENTICA...
 

En vedette

Protecting the Agri-Business: Managing Contracts, Trademarks and Non-Disclos...
Protecting the Agri-Business:  Managing Contracts, Trademarks and Non-Disclos...Protecting the Agri-Business:  Managing Contracts, Trademarks and Non-Disclos...
Protecting the Agri-Business: Managing Contracts, Trademarks and Non-Disclos...Cari Rincker
 
Spelling mistakes spelling stress – start to love spelling
Spelling mistakes spelling stress – start to love spellingSpelling mistakes spelling stress – start to love spelling
Spelling mistakes spelling stress – start to love spellingJoanne Rudling
 
19 Things to consider when choosing Question types for User Research
19 Things to consider when choosing Question types for User Research19 Things to consider when choosing Question types for User Research
19 Things to consider when choosing Question types for User ResearchBayUX
 
Información trastorno déficit atención hiperactividad tdah.
Información trastorno déficit atención hiperactividad tdah.Información trastorno déficit atención hiperactividad tdah.
Información trastorno déficit atención hiperactividad tdah.Fundación CADAH TDAH
 
Songs,rhymes,chants,poetry
Songs,rhymes,chants,poetrySongs,rhymes,chants,poetry
Songs,rhymes,chants,poetryfvendrel
 
Teaching spelling
Teaching spellingTeaching spelling
Teaching spellingitsdanimoe
 
Sample Research Writing (WR121) Workshop
Sample Research Writing (WR121) WorkshopSample Research Writing (WR121) Workshop
Sample Research Writing (WR121) Workshopemersonreference
 
Quantitative And Qualitative Research
Quantitative And Qualitative ResearchQuantitative And Qualitative Research
Quantitative And Qualitative Researchdoha07
 

En vedette (12)

Protecting the Agri-Business: Managing Contracts, Trademarks and Non-Disclos...
Protecting the Agri-Business:  Managing Contracts, Trademarks and Non-Disclos...Protecting the Agri-Business:  Managing Contracts, Trademarks and Non-Disclos...
Protecting the Agri-Business: Managing Contracts, Trademarks and Non-Disclos...
 
Review mistakes
Review mistakesReview mistakes
Review mistakes
 
Spelling mistakes spelling stress – start to love spelling
Spelling mistakes spelling stress – start to love spellingSpelling mistakes spelling stress – start to love spelling
Spelling mistakes spelling stress – start to love spelling
 
Spelling mistake
Spelling mistake Spelling mistake
Spelling mistake
 
Alphabet and spelling
Alphabet and spellingAlphabet and spelling
Alphabet and spelling
 
The Research Problem
The Research ProblemThe Research Problem
The Research Problem
 
19 Things to consider when choosing Question types for User Research
19 Things to consider when choosing Question types for User Research19 Things to consider when choosing Question types for User Research
19 Things to consider when choosing Question types for User Research
 
Información trastorno déficit atención hiperactividad tdah.
Información trastorno déficit atención hiperactividad tdah.Información trastorno déficit atención hiperactividad tdah.
Información trastorno déficit atención hiperactividad tdah.
 
Songs,rhymes,chants,poetry
Songs,rhymes,chants,poetrySongs,rhymes,chants,poetry
Songs,rhymes,chants,poetry
 
Teaching spelling
Teaching spellingTeaching spelling
Teaching spelling
 
Sample Research Writing (WR121) Workshop
Sample Research Writing (WR121) WorkshopSample Research Writing (WR121) Workshop
Sample Research Writing (WR121) Workshop
 
Quantitative And Qualitative Research
Quantitative And Qualitative ResearchQuantitative And Qualitative Research
Quantitative And Qualitative Research
 

Similaire à Managing Confidential Information in Research

Overcoming obstacles to sharing data about human subjects
Overcoming obstacles to sharing data about human subjectsOvercoming obstacles to sharing data about human subjects
Overcoming obstacles to sharing data about human subjectsRobin Rice
 
Ethics in Library Research Data Services: Conceptual Gaps & Policy Vacuums
Ethics in Library Research Data Services: Conceptual Gaps & Policy VacuumsEthics in Library Research Data Services: Conceptual Gaps & Policy Vacuums
Ethics in Library Research Data Services: Conceptual Gaps & Policy VacuumsMichael Zimmer
 
Confidential data management_key_concepts
Confidential data management_key_conceptsConfidential data management_key_concepts
Confidential data management_key_conceptsMicah Altman
 
June2014 brownbag privacy
June2014 brownbag privacyJune2014 brownbag privacy
June2014 brownbag privacyMicah Altman
 
Christopher Millard Legally Compliant Use Of Personal Data In E Social Science
Christopher Millard   Legally Compliant Use Of Personal Data In E Social ScienceChristopher Millard   Legally Compliant Use Of Personal Data In E Social Science
Christopher Millard Legally Compliant Use Of Personal Data In E Social ScienceChristopher Millard
 
Brisbane Health-y Data: What are health and sensitive data and why are they t...
Brisbane Health-y Data: What are health and sensitive data and why are they t...Brisbane Health-y Data: What are health and sensitive data and why are they t...
Brisbane Health-y Data: What are health and sensitive data and why are they t...ARDC
 
Privacy Audits in Law Libraries
Privacy Audits in Law LibrariesPrivacy Audits in Law Libraries
Privacy Audits in Law LibrariesRachel Gordon
 
big-data-and-data-sharing_ethical-issues.pdf
big-data-and-data-sharing_ethical-issues.pdfbig-data-and-data-sharing_ethical-issues.pdf
big-data-and-data-sharing_ethical-issues.pdfAsefaAdimasu2
 
What are sensitive data and why might they be trickier to publish?
What are sensitive data and why might they be trickier to publish?What are sensitive data and why might they be trickier to publish?
What are sensitive data and why might they be trickier to publish?ARDC
 
Data Quality: Missing Data (PPT slides)
Data Quality: Missing Data (PPT slides)Data Quality: Missing Data (PPT slides)
Data Quality: Missing Data (PPT slides)Saide OER Africa
 
Publishing and sharing sensitive data 28 June
Publishing and sharing sensitive data 28 JunePublishing and sharing sensitive data 28 June
Publishing and sharing sensitive data 28 JuneARDC
 
Ethics and consent for data sharing
Ethics and consent for data sharingEthics and consent for data sharing
Ethics and consent for data sharingARDC
 
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...Micah Altman
 
ETHICS OF DATA MANAGEMENT.pptx
ETHICS OF DATA MANAGEMENT.pptxETHICS OF DATA MANAGEMENT.pptx
ETHICS OF DATA MANAGEMENT.pptxSwetaShivdasani1
 
Privacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesPrivacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesMicah Altman
 
DATA PRIVACY IN AN AGE OF INCREASINGLY SPECIFIC AND PUBLICLY AVAILABLE DATA: ...
DATA PRIVACY IN AN AGE OF INCREASINGLY SPECIFIC AND PUBLICLY AVAILABLE DATA: ...DATA PRIVACY IN AN AGE OF INCREASINGLY SPECIFIC AND PUBLICLY AVAILABLE DATA: ...
DATA PRIVACY IN AN AGE OF INCREASINGLY SPECIFIC AND PUBLICLY AVAILABLE DATA: ...Ted Myerson
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesMicah Altman
 

Similaire à Managing Confidential Information in Research (20)

Overcoming obstacles to sharing data about human subjects
Overcoming obstacles to sharing data about human subjectsOvercoming obstacles to sharing data about human subjects
Overcoming obstacles to sharing data about human subjects
 
Niso library law
Niso library lawNiso library law
Niso library law
 
Ethics in Library Research Data Services: Conceptual Gaps & Policy Vacuums
Ethics in Library Research Data Services: Conceptual Gaps & Policy VacuumsEthics in Library Research Data Services: Conceptual Gaps & Policy Vacuums
Ethics in Library Research Data Services: Conceptual Gaps & Policy Vacuums
 
Confidential data management_key_concepts
Confidential data management_key_conceptsConfidential data management_key_concepts
Confidential data management_key_concepts
 
June2014 brownbag privacy
June2014 brownbag privacyJune2014 brownbag privacy
June2014 brownbag privacy
 
Christopher Millard Legally Compliant Use Of Personal Data In E Social Science
Christopher Millard   Legally Compliant Use Of Personal Data In E Social ScienceChristopher Millard   Legally Compliant Use Of Personal Data In E Social Science
Christopher Millard Legally Compliant Use Of Personal Data In E Social Science
 
Levine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal ConsiderationsLevine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal Considerations
 
Brisbane Health-y Data: What are health and sensitive data and why are they t...
Brisbane Health-y Data: What are health and sensitive data and why are they t...Brisbane Health-y Data: What are health and sensitive data and why are they t...
Brisbane Health-y Data: What are health and sensitive data and why are they t...
 
Privacy Audits in Law Libraries
Privacy Audits in Law LibrariesPrivacy Audits in Law Libraries
Privacy Audits in Law Libraries
 
Preparing Research Data for Sharing
Preparing Research Data for SharingPreparing Research Data for Sharing
Preparing Research Data for Sharing
 
big-data-and-data-sharing_ethical-issues.pdf
big-data-and-data-sharing_ethical-issues.pdfbig-data-and-data-sharing_ethical-issues.pdf
big-data-and-data-sharing_ethical-issues.pdf
 
What are sensitive data and why might they be trickier to publish?
What are sensitive data and why might they be trickier to publish?What are sensitive data and why might they be trickier to publish?
What are sensitive data and why might they be trickier to publish?
 
Data Quality: Missing Data (PPT slides)
Data Quality: Missing Data (PPT slides)Data Quality: Missing Data (PPT slides)
Data Quality: Missing Data (PPT slides)
 
Publishing and sharing sensitive data 28 June
Publishing and sharing sensitive data 28 JunePublishing and sharing sensitive data 28 June
Publishing and sharing sensitive data 28 June
 
Ethics and consent for data sharing
Ethics and consent for data sharingEthics and consent for data sharing
Ethics and consent for data sharing
 
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...
 
ETHICS OF DATA MANAGEMENT.pptx
ETHICS OF DATA MANAGEMENT.pptxETHICS OF DATA MANAGEMENT.pptx
ETHICS OF DATA MANAGEMENT.pptx
 
Privacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use CasesPrivacy in Research Data Managemnt - Use Cases
Privacy in Research Data Managemnt - Use Cases
 
DATA PRIVACY IN AN AGE OF INCREASINGLY SPECIFIC AND PUBLICLY AVAILABLE DATA: ...
DATA PRIVACY IN AN AGE OF INCREASINGLY SPECIFIC AND PUBLICLY AVAILABLE DATA: ...DATA PRIVACY IN AN AGE OF INCREASINGLY SPECIFIC AND PUBLICLY AVAILABLE DATA: ...
DATA PRIVACY IN AN AGE OF INCREASINGLY SPECIFIC AND PUBLICLY AVAILABLE DATA: ...
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and Approaches
 

Plus de Micah Altman

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesMicah Altman
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset ConversationMicah Altman
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer ReviewMicah Altman
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer ReviewMicah Altman
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An OverviewMicah Altman
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral DistrictingMicah Altman
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenaryMicah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 

Plus de Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 

Dernier

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Dernier (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Managing Confidential Information in Research

  • 1. Managing Confidential Information in Research Micah Altman Senior Research Scientist Institute for Quantitative Social Science Harvard University
  • 2.
  • 3.
  • 4. Personally identifiable private information is surprisingly common  Includes information from a variety of sources, such as…  Research data, even if you aren‟t the original collector  Student “records” such as e-mail, grades  Logs from web-servers, other systems  Lots of things are potentially identifying:  Under some federal laws: IP addresses, dates, zipcodes, …  Birth date + zipcode + gender uniquely identify ~87% of people in the U.S. [Sweeney 2002]  With date and place of birth, can guess first five digits of social security number (SSN) > 60% of the time. (Can guess the whole thing in under 10 tries, for a significant minority of people.) [Aquisti& Gross 2009]  Analysis of writing style or eclectic tastes has been used to identify individuals Brownstein, et al., 2006 , NEJM 355(16),  Tables, graphs and maps can also reveal identifiable information 4 [Micah Altman, 3/10/2011]
  • 5.
  • 6. IQSS (and affiliates) offer you support across all stages of your quantitative research:  Research design, including: design of surveys, selection of statistical methods.  Primary and secondary data collection, including: the collection of geospatial and survey data.  Data management, including: storage, cataloging, permanent archiving, and distribution.  Data analysis, including : statistical consulting, GIS consulting, high performance research computing. http://iq.harvard.edu/ 6 [Micah Altman, 3/10/2011]
  • 7. The IQSS grants administration team helps with every aspect of the grant process. Contact us when you are planning your proposal.  Assisting in identifying research funding opportunities  Consulting on writing proposals  Assisting IQSS affiliates with:  preparation, review and submission of all grant applications (“pre-award support”)  management of their sponsored research portfolio (“post-award support”)  Interpret sponsor policies  Coordinate with FAS Research Administration and the Central Office for Sponsored Programs … And, of course, support seminars like this! 7 [Micah Altman, 3/10/2011]
  • 8. Goals for course  Overview of key areas  Identify key concepts & issues  Summarize Harvard policies, procedures, resources  Establish framework for action  Provide connection to resources, literature 8 [Micah Altman, 3/10/2011]
  • 9. Outline  [Preliminaries]  Law, policy, ethics  Research methods, design, manage ment  Information Security (Storage, Transmission, U se)  Disclosure Limitation  [Additional Resources & Summary of 9 Recommendations] [Micah Altman, 3/10/2011]
  • 10. Steps to Manage Confidential Research Data  Identify potentially sensitive information in planning  Identify legal requirements, institutional requirements, data use agreements  Consider obtaining a certificate of confidentiality  Plan for IRB review  Reduce sensitivity of collected data in design  Separate sensitive information in collection  Encrypt sensitive information in transit  Desensitize information in processing  Removing names and other direct identifiers  Suppressing, aggregating, or perturbing indirect identifiers  Protect sensitive information in systems  Use systems that are controlled, securely configured, and audited  Ensure people are authenticated, authorized, licensed  Review sensitive information before dissemination  Review disclosure risk  Apply non-statistical disclosure limitation  Apply statistical disclosure limitation  Review past releases and publically available data  Check for changes in the law  Require a use agreement 10 [Micah Altman, 3/10/2011]
  • 11.
  • 12. Law, policy, ethics Law, Policy & Ethics Research design … Information security Disclosure limitation  Ethical Obligations  Laws  Fun and games   Harvard Policies  [Summary] 12 [Micah Altman, 3/10/2011]
  • 13. Law, policy, ethics Confidentiality & Research Ethics Research design … Information security Disclosure  Belmont Principles limitation  Respect for Persons  individuals should be treated as autonomous agents  persons with diminished autonomy are entitled to protection  implies “informed consent”  implies respect for confidentiality and privacy  Beneficence  research must have individual and/or societal benefit to justify risks  implies minimizing risk/benefit ratio 13 [Micah Altman, 3/10/2011]
  • 14. Scientific & Societal Benefits Law, policy, ethics of Data Sharing Research design … Information security  Increases replicability of research Disclosure limitation  Journal publication policies may apply  Increases scientific impact of research  Follow up studies  Extensions  Citations  Public interest in data produced by public funder  Funder policies may apply  Public interest in data that supports public policy  FOIA and state FOI laws may apply  Open data facilitates…  Transparent government  Scientific collaboration  Scientific verification  New forms of science  Participation in science  Hands-on education  Continuity of research Sources: Fienberg et. al 1985; ICSU 2004; Nature 2009 14 [Micah Altman, 3/10/2011]
  • 15. Sources of Confidentiality Restrictions for Law, policy, ethics University Research Data Research design … Information security Disclosure  Overlapping laws limitation  Different laws apply to different cases  All affiliates subject to university policy (Not included: EU directive, foreign laws, classified data, …) 15 [Micah Altman, 3/10/2011]
  • 16. Law, policy, ethics 45 CFR 46 [Overview] Research design … Information security “The Common Rule” Disclosure limitation  Governs human subject research  With federal funds/ at federal institution  Establishes rules for conduct of research  Establishes confidentiality and consent requirement for for identified private data  However, some information may be required to be disclosed under state and federal laws (e.g. in cases of child abuse)  Delegates procedural decisions to Institutional Review Boards (IRB‟s) 16 [Micah Altman, 3/10/2011]
  • 17. Law, policy, ethics HIPAA [Overview] Research design … Information security Health Insurance Portability and Accountability Act Disclosure limitation  Protects personal health care information for „covered entities‟  Detailed technical protection requirements  Provides clearest legal standards for dissemination  Provides a „safe harbor‟  Has become an accepted practice for dissemination in other areas where laws are less clear  HITECH Act of 2009 extends HIPAA  Extends coverage to associated entities of covered entities  Additional technical safeguards  Adds breach reporting requirement HIPAA provides three dissemination options … 17 [Micah Altman, 3/10/2011]
  • 18. Dissemination under HIPAA Law, policy, ethics Research design … [option 1] Information security Disclosure limitation  “safe harbor” -- remove 18 identifiers  [Personal identifiers]  Names  Social Security #‟s; Personal Account #‟s; Certificate/License #‟s; full face photos (and comparable images); biometric id‟s; medical  Any other unique identifying number, characteristic, or code  [Asset identifiers]  fax #‟s; phone #‟s; vehicle #‟s;  personal URL‟s; IP addresses; e-mail addresses  Device ID‟s and serial numbers  [Quasi identifiers]  dates smaller than a year (and ages > 89 collapsed into one category)  geographic subdivisions smaller than a state (except for 3 digits of zipcode, if unit > 20,000 people) And  Entity does not have actual knowledge [direct and clear awareness] that it would be possible to use the remaining information alone or in combination with other information to identify the subject 18 [Micah Altman, 3/10/2011]
  • 19. Dissemination under HIPAA Law, policy, ethics Research design … [Option 2] Information security  “limited dataset” – leave some quasi-id‟s Disclosure limitation  Remove personal and asset identifiers  Permitted dates: dates of birth, death, service, years  Permitted geographic subdivisions: town, city, state, zip code And  Require access control and data use agreement. 19 [Micah Altman, 3/10/2011]
  • 20. Dissemination under HIPAA Law, policy, ethics Research design … [Option 3] Information security  “qualified statistician” – statistical determination Disclosure limitation Have qualified statistician determine, using generally accepted statistical and scientific principles and methods, that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by the anticipated recipient to identify the subject of the information.  Important caveats  Methods and results of the analysis must be documented  No bright line for “qualified”, text of rule is: “a person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable.” [Section 164.514(b)(1)]  No clear definitions for “generally accepted” or “very small” or “reasonably available information”… however, there are references in the federal register to statistical publications to be used as “starting points” 20 [Micah Altman, 3/10/2011]
  • 21. Law, policy, ethics FERPA Research design … Information security Family Educational Rights and Privacy Act Disclosure limitation  Applies schools that receive federal (D.O.E.) funding  Restricts use of student (not employee) information  Establishes  Right to privacy of educational records  Right to inspect and correct records (with appeal to Federal government)  Definition of public “directory” information  Right to block access to public “directory” information, and to other records  Educational records include:  Identified information about student  Maintained by institution  Not …  Employee records  Some medical and law-enforcement records  Records solely in the possession and for use by the creator (e.g. unpublished instructor notes)  Personally identifiable information includes:  Direct identifiers  Indirect (quasi) identifiers  Indirectly linkable identifiers  “Information requested by a person who the educational agency or institution reasonably believes knows the identity of the student to whom the education record relates.” 21 [Micah Altman, 3/10/2011]
  • 22. Law, policy, ethics MA 201 CMR 17 Research design … Information security Disclosure Standards for the Protection of Personal Information limitation  Strongest U.S. general privacy protection law  Has been delayed/modified repeatedly  Requires reporting of breaches  If data is not encrypted  Or encryption key is released in conjunction with data  Requires specific technical protections:  Firewalls  Encryption of data transmitted in public  Anti-virus software  Software updates 22 [Micah Altman, 3/10/2011]
  • 23. Inconsistencies in Requirements Law, policy, ethics and Definitions Research design … Information security Inconsistent definitions of “personally identifiable” Disclosure Inconsistent definitions of sensitive information limitation Requirements for to de-identify jibes with statistical realities FERPA HIPAA Common MA 201 CMR Rule 17 Coverage Students in Medical Information Living persons in Mass. Residents Educational in “Covered research by Institutions Entities” funded institutions Identification -Direct -Direct -Direct -Direct Criteria -Indirect -Indirect -Indirect -Linked -Linked -Linked -Bad intent (!) Sensitivity Any non-directory Any medical Private Financial, State, Criteria information information information – Federal based on harm Identifiers Management - Directory opt-out - Consent - Consent - Specific Requirement - [Implied] good - Specific technical - [Implied] risk technical s 23 practice safeguards minimization safeguards [Micah Altman, 3/10/2011] -Breach - Breach
  • 24. Law, policy, ethics Third Party Requirements Research design … Information security  Licensing requirements Disclosure limitation  Intellectual property requirements  Federal/state law and/or policy requirements  State protection of personal information laws  Freedom of information laws (FOIA & State FOI)  State mandatory abuse/neglect notification laws  And … think ahead to publisher requirements  Replication requirements  IP requirements  Examples  NSF requires data from funded research be shared  NIH requires a data sharing plan for large projects  Wellcome Trust requires a data sharing plan  Many leading journals require data sharing 24 [Micah Altman, 3/10/2011]
  • 25. Law, policy, ethics (Some) More Laws & Standards Research design … Information security  California Laws  Detailed technical controls over information Disclosure  Lots of rules systems limitation  Applies any data about California residents  Sarbanes-Oxley (aka, SOX, aka SARBOX)  Privacy policy  Corporate and Auditing Accountability and Responsibility Act of 2002  Disclosure  Applies to U.S. public company  Reporting policy boards, management and public accounting firms  EU Directive 95/46/EC  Rarely applies to research in universities  Data protection directive  Section 404 requires annual assessment of  Provides for notice, limits on purpose of organizational internal controls – but does not use, consent, security, disclosure, access, account specify details of controls ability  Classified Data  Forbids transfer of data to entities in countries  Separate and complex rules and requirements compliant with directive  The University does not accept classified data  U.S. is not compliant but …  But, may have “Controlled But Unclassified”  Organizations can certify compliance with FTC  Vaguely defined area  No auditing/enforcement !  Mostly government produced area  Substantial criticism of this arrangement  Penalties unclear  Payment Card Industry (PCI) Security Standards  And… export controlled information, under ITAR  Governs treatment of credit card numbers and EAR  Requires reports, audits, fines  Export control include  Detailed technical measures technologies, software, documentation/design  Not a law, but helps define good practice documents may be included  Large penalties  Nevada law mandates PCI standards  … and over 1100 International Human Subjects  FISMA laws…  Federal Information Security Management Act (FISMA), Public Law (P.L.) 107-347.  Is starting to be applied to NIH sponsored 25 research [Micah Altman, 3/10/2011]
  • 26. Law, policy, ethics Predicted Legal Changes for 2011… Research design … Information security Disclosure  Caselaw limitation  “personal privacy” does not apply to information about corporations (a corporation is not a “person” for this purpose) FCC vs. ATT 2011  Scheduled  EU “cookie privacy” directive 2009/136/EC goes into effect  Proposed updates to EU information privacy directives  Very Likely  New information privacy laws in selected states in 2011  Likely  Increased federal regulation of internet privacy 26 [Micah Altman, 3/10/2011]
  • 27. Law, policy, ethics What’s wrong with this picture? Research design … Information security Disclosure limitation Name SSN Birthdate Zipcode Gender Favorite # of crimes Ice Cream committed A.Jones 12341 01011961 02145 M Raspberr 0 y B. Jones 12342 02021961 02138 M Pistachio 0 C. Jones 12343 11111972 94043 M Chocolat 0 e D. Jones 12344 12121972 94043 M Hazelnut 0 E. Jones 12345 03251972 94041 F Lemon 0 F. Jones 12346 03251972 02127 F Lemon 1 G. Jones 12347 08081989 02138 F Peach 1 H. Smith 12348 01011973 63200 F Lime 2 I. Smith 12349 02021973 63300 M Mango 4 J. Smith 12350 02021973 63400 M Coconut 16 K. Smith 12351 03031974 64500 M Frog 32 L. Smith 12352 04041974 64600 M Vanilla 64 M. Smith 12353 04041974 64700 F Pumpkin 128 N. 12354 04041974 64800 F Allergic 256 27 [Micah Altman, 3/10/2011] Smi
  • 28. Law, policy, ethics What’s wrong with this picture? Research design … Information security Identifier Sensitive Identifier Sensitive Private Disclosure Private Identifier limitation Identifier Name SSN Birthdate Zipcode Gender Favorite # of crimes Ice Cream committed A.Jones 12341 01011961 02145 M Raspberr 0 Mass resident y B. Jones 12342 02021961 02138 M Pistachio 0 Californian C. Jones 12343 11111972 94043 M Chocolat 0 e D. Jones 12344 12121972 94043 M Hazelnut 0 Twins, separated at birth? E. Jones 12345 03251972 94041 F Lemon 0 FERPA too? F. Jones 12346 03251972 02127 F Lemon 1 G. Jones 12347 08081989 02138 F Peach 1 H. Smith 12348 01011973 63200 F Lime 2 I. Smith 12349 02021973 63300 M Mango 4 J. Smith 12350 02021973 63400 M Coconut 16 K. Smith 12351 03031974 64500 M Frog 32 L. Smith 12352 04041974 64600 M Vanilla 64 Unexpected Response? M. Smith 12353 04041974 64700 F Pumpkin 128 28 [Micah Altman, 3/10/2011] N. Smith 12354 04041974 64800 F Allergic 256
  • 29.
  • 30.
  • 31. Harvard: Law, policy, ethics Enterprise Security Policy (HEISP) Research design … Information security  Storing High Risk Confidential Information (HRCI) Disclosure  Must not be stored on individual user computer or limitation portable storage device  Confidential information on Harvard computing devices  Must be stored on "target computers" or secure locked containers  confidential information must be protected  Human subject information  confidential information on portable devices must be encrypted  All research on human subjects must be approved by the IRB  laptops must have encryption (some schools require whole-disk encryption)  All proposals must include a data management plan  systems must be scanned annually  Personally identifiable medical information (PIMI)  Cannot save confidential information on computer  "Covered entities" at Harvard are subject to HIPAA directly accessible from the internet, open Harvard requirements networks  PIMI is to be treated as HRCI throughout the university  Employees who have access must annually agree to  Obtaining confidential information requires approval confidentiality agreements  All confidential information must be encrypted when  Access to lists and database of Harvard University ID transported across any network numbers is restricted  Public directories must adhere to privacy preferences  Each school must provide training establishes by the individuals  Registrars have developed common definition of  Identifying Users with Access to Confidential FERPA directory information Information  Must adhere to student requests to block their directory  System owners must be able to identify users that have information, per FERPA access to confidential information  Accepting Payment Cards - Restricted to procedures  Strong passwords outlined in HU Credit Card Merchant Handbook  No account/password sharing  Inhibit password guessing with logging and lockouts  Limit application availability time with timeouts  Limit user access to confidential information based on business need [ More on next page…] 31 [Micah Altman, 3/10/2011]
  • 32. Law, policy, ethics HEISP – Part 2 Research design … Information security  Physical Environment  Network take down Disclosure limitation  All digital/non-digital media must be  Network managers run vulnerability properly protected scans  Computers must be physically  May take computers off the network secure  Service Resumption  Automatic logging must be  Must have a service resumption plan if consistent with written policies loss of confidential data is a substantial business risk  Vendor contracts  require approval by security officer  Incident Response Policy  Include OGC contract rider  Disposition and destruction of records  Computer operator  computer must be regularly updated  Acquisition/use by unauthorized persons must be reported to OGC  operated securely  Only necessary application installed  Interacting with legal authorities -- always refer to OGC unless  annually certify compliance with imminent health/safety risk requires university policies otherwise  Computer setup - must filter malicious traffic  Web based surveys must have protections in place  “Target” systems and controllers  Private address space; locally firewalled  Annual vulnerability scanning 32 [Micah Altman, 3/10/2011]
  • 33. Law, policy, ethics Harvard: Research design … Information security Research Data Security Policy (HRDSP) Disclosure limitation  Sensitivity of research data based on potential harm if disclosed:  Level 5 = “extremely sensitive”  Level 4 = “very sensitive” ~= HRCI  Level 3 = “sensitive” ~= HCI  Level 2 = “benign” ~= Good computer hygiene  Level 1 = anonymous and not business confidential  Required protections based on sensitivity  Level 5: Entirely disconnected from network (“bubble security”)  Level 4: Protections as per HRCI  Level 3: Protections as per HCI  Level 2: Good computer hygiene  Designates procedures for treatment of external data use agreements [ next section ]  Legally binding  Can be both very detailed and not supported by Harvard security procedures  Investigator should not sign these – forward to OSP  Designates responsibilities for IRB, Investigator, OSP, IT, Security Officers. security.harvard.edu/research-data-security-policy 33 [Micah Altman, 3/10/2011]
  • 34. Harvard: Law, policy, ethics Research design … Researcher Responsibilities Information security  … for knowing the rules Disclosure limitation  … for identifying potentially confidential information in all forms (digital/analogue; on-line/off-line)  … for notifying recipients of their responsibility to protect confidentiality  … for obtaining IRB approval for any human subjects research  … for following an IRB approved plan  … for obtaining OSP approval of restricted data use agreements with providers, even if no money involved … and for proper  Storage  Access  Transmission  Disposal Confidentiality is not an “IT problem” 34 [Micah Altman, 3/10/2011]
  • 35. Harvard: Law, policy, ethics Research design … Staff – Personnel Manual Information security Disclosure  Protect Harvard information and systems limitation  Keep your own information in Peoplesoft up to date  Comply with copyrights and DMCA  Comply with Harvard systems policies and procedures  All information produced at work is Harvard property  Attach only approved devices to the Harvard network harvie.harvard.edu/docroot/standalone/Policies_Contracts/St aff_Personnel_Manual/Section2/Privacy.shtml 35 [Micah Altman, 3/10/2011]
  • 36. Law, policy, ethics Key Concepts & Issues Review Research design … Information security  Privacy Disclosure  Control over extent and circumstances of sharing limitation  Confidentiality  Treatment of private, sensitive information  Sensitive information  Information that would cause harm if disclosed and linked to an individual  Personally/individually identifiable information  Private information  Directly or indirectly linked to an identifiable individual  Human subjects A living person …  who is interacted with to obtain research data  who‟s private identifiable information is included in research data  Research  Systematic investigation  Designed to develop or contribute to generalizable knowledge  “Common Rule”  Law governing funded human subjects research  HIPAA  Law governing use of personal health information in covered and associated entities  MA 201 CMR 17  Law governing use of certain personal identifiers for Massachusetts residents 36 [Micah Altman, 3/10/2011]
  • 37. Law, policy, ethics Checklist: Identify Requirements Research design … Information security Check if research includes … Disclosure limitation  Interaction with humans  Common Rule &HEISP/HRDSP applies Check if data used includes identified …  Student records  FERPA &HEISP/HRDSP applies  State, federal, financial id‟s  state law &HEISP/HRDSP applies  Medical/health information  HIPAA (likely) &HEISP/HRDSP applies  Human subjects & private info  Common Rule &HEISP/HRDSP applies Check for other requirements/restriction on data dissemination:  Data provider restrictions and University approvals thereof  Open data requirements and norms  University information policy 37 [Micah Altman, 3/10/2011]
  • 38. Law, policy, ethics Resources Research design …  E.A. Bankert& R.J. Andur, 2006, Institutional Review Board: Management and Function, Information security Jones and Bartlett Publishers Disclosure  P. Ohm, “Broken Promises of Privacy”, SSRN Working Paper limitation [ssrn.com/abstract=1450006]  D. J. Mazur, 2007. Evaluating the Science and ethics of Research on Humans, Johns Hopkins University Press  IRB: Ethics & Human Research [Journal], Hastings Press www.thehastingscenter.org/Publications/IRB/  Journal of Empirical Research on Human Research Ethics, University of California Press ucpressjournals.com/journal.asp?j=jer  201 CMR 17 text www.mass.gov/Eoca/docs/idtheft/201CMR17amended.pdf  FERPA Website www.ed.gov/policy/gen/guid/fpco/ferpa/index.html  HIPAA Website www.hhs.gov/ocr/privacy/  Common Rule Website www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.htm  State laws www.ncsl.org/Default.aspx?TabId=13489  Harvard Enterprise Information Security Policy/ Research Data Security Policy www.security.harvard.edu  Harvard Institutional Review Board www.fas.harvard.edu/~research/hum_sub/  Harvard FAS Policies and Procedures www.fas-it.fas.harvard.edu/services/catalog/browse/39  IQSS Policies and Procedures support.hmdc.harvard.edu/kb-930/hmdc_policies 38 [Micah Altman, 3/10/2011]
  • 39. Research design, methods, Law, policy, ethics management Research design … Information security Disclosure  Reducing risk limitation  Sensitivity of information  Partitioning  Decreasing identification  Managing confidentiality and dissemination  [Summary] 39 [Micah Altman, 3/10/2011]
  • 40. Law, policy, ethics Trade-offs Research design … Information security  Anonymity vs. research utility Disclosure limitation  Sensitivity vs. research utility  (Anonymity * Sensitivity) vs. research costs/efforts 40 [Micah Altman, 3/10/2011]
  • 41. Law, policy, ethics Types of Sensitive Information Research design … Information security  Information is sensitive, if, once disclosed there Disclosure limitation is a “significant” likelihood of harm  IRB literature suggests possible categories of harm:  loss of insurability  loss of employability  criminal liability  psychological harm  social harm to a vulnerable group  loss of reputational harm  emotional harm  dignitary harm  physical harm: risk of disease, injury, or death 41 [Micah Altman, 3/10/2011]
  • 42. Law, policy, ethics Levels of sensitivity Research design …  No widely accepted scale Information security  Publicly available data not sensitive under “common rule” Disclosure  Common rule anchors scale at “minimal risk”: limitation “if disclosed, the probability and magnitude of harm or discomfort anticipated are not greater in and of themselves than those ordinarily encountered in daily life or during the performance of routine physical or psychological examinations or tests”  Harvard Research Data Security Policy  Level 5- Extremely sensitive information about individually identifiable people. Information that if exposed poses significant risk of serious harm. Includes information posing serious risk of criminal liability, serious psychological harm or other significant injury, loss of insurability or employability, or significant social harm to an individual or group.  Level 4 - Very sensitive information about individually identifiable people Information that if exposed poses a non-minimal risk of moderate harm. Includes civil liability, moderate psychological harm, or material social harm to individuals or groups, medical records not classified as Level 5, sensitive-but-unclassified national security information, and financial identifiers (as per HRCI standards).  Level 3- Sensitive information about individually identifiable people Information that if disclosed poses a significant risk of minor harm. Includes information that would reasonably be expected to damage reputation or cause embarrassment; and FERPA records.  Level 2 – Benign information about individually identifiable people Information that would not be considered harmful, but that as to which a subject has been promised confidentiality.  Level 1 – De-identified information about people, and information not about people 42 [Micah Altman, 3/10/2011]
  • 43. Law, policy, ethics IRB Review Scope Research design …  IRB approval needed for all: Information security  federally-funded research; Disclosure limitation  or any research at (almost all) institutions receiving federal funding that involve “human subjects”. ( Any organization operating under a general “federal-wide assurance”)  All human subjects research at Harvard  Human subject: individual about whom an investigator (whether professional or student) conducting research obtains  (1) Data through intervention or interaction with a living individual, or  (2) Identifiable private information about living individuals  See www.hhs.gov/ohrp/ 43 [Micah Altman, 3/10/2011]
  • 44. Law, policy, ethics Research not requiring IRB approval Research design …  Non-research: Information security not generalizable knowledge & systematic inquiry Disclosure  Non-funded: limitation institution receives no federal funds for research  Not human subject:  No living people described  Observation only AND no private identifiable information is obtained  Human Subjects, but “exempt” under 45 CFR 46  use of existing, publicly-available data  use of existing non-public data, if data is individuals cannot be directly or indirectly identified  research conducted in educational settings, involving normal educational practices  taste & food quality evaluation  federal program evaluation approved by agency head  observational, survey, test & interview of public officials and candidates (in their formal capacity, or not identified)  Caution not all “exempt” is exempt…  Some research on prisoners, children, not exemptable  Some universities require review of “exempt” research  Harvard requires review of all human subject research  See: www.hhs.gov/ohrp/humansubjects/guidance/ decisioncharts.htm 44 [Micah Altman, 3/10/2011]
  • 45. Law, policy, ethics IRB’s and Confidential Information Research design … Information security Disclosure  IRB‟s review consent procedures and documentation limitation  IRB‟s may review data management plans  May require procedures to minimize risk of disclosure  May require procedures to minimize harm resulting from disclosure  IRB‟s make determination of sensitivity of information -- potential harm resulting from disclosure  IRB‟s make determination regarding whether data is de-identified for “public use” [see NHRPAC, “Recommendations on Public Use Data Files”] 45 [Micah Altman, 3/10/2011]
  • 46. Law, policy, ethics Harvard IRB Approval Research design … Information security Disclosure  The Harvard Institutional Review Board (IRB) must approve alllimitation human subjects research at Harvard prior to data collection or use  Research involves human subjects if:  There is any interaction or intervention with living humans; or  If identifiable private data about living humans is used  Some examples of human subject research in soc sci:  Surveys  Behavioral experiments  Educational tests and evaluations  Analysis of identified private data collected from people (your e-mail inbox, logs of web-browsing activity, facebook activity, ebay bids … )  The IRB will:  Assess research protocol  Identify whether research is exempt from further review and management  Identify sensitivity level of data 46 [Micah Altman, 3/10/2011]
  • 47. Law, policy, ethics Harvard Responsibilities Research design … Information security Disclosure  The Harvard Institutional Review Board (IRB) must approve alllimitation human subjects research at Harvard prior to data collection or use  Research involves human subjects if:  There is any interaction or intervention with living humans; or  If identifiable private data about living humans is used  Some examples of human subject research in soc sci:  Surveys  Behavioral experiments  Educational tests and evaluations  Analysis of identified private data collected from people (your e-mail inbox, logs of web-browsing activity, facebook activity, ebay bids … )  The IRB will:  Assess research protocol  Identify whether research is exempt from further review and management  Identify sensitivity level of data 47 [Micah Altman, 3/10/2011]
  • 48. Law, policy, ethics HRDSP Responsibilities Research design … Information security Disclosure  Responsibilities limitation  Researchers are responsible for disclosing to IRB, and follow IRB approved plan  IRB is responsible for ensuring adequacy of Investigators plans; granting (lawful) variances from security requirements justified by research needs  IT is responsible for assisting with the identification of security level, and assisting in the implementation of security protections  Security Officer/CIO may review IT facilities and approve (give written designation) that they meet protections for a given level 48 [Micah Altman, 3/10/2011]
  • 49. Valuation of private Law, policy, ethics information is uncertain Research design … Information security  Privacy valuations often inconsistent Disclosure  Framing effects: ordering, endowment effect, possibly others limitation  Non-normal/uniform distribution of valuations  One study: < 10% of subjects would give up $2 of a $12 gift card to buy anonymity of purchases [Aquesti and Lowenstein 2009]  Cost benefit of information security may not be optimal for users [Herley 2009]  E.g. Loss from all phishing attacks is 100x less than time spent in avoiding them  Note, however weaknesses in this analysis:  Only loss of time modeled – no valuation of privacy made  Institutional costs not included – only personal costs  Very simplified model – not calibrated through surveys etc.  Repeated surveys of students show they tend to disclose a lot, e.g.:  >80% of students sampled in several studies had public facebook pages with birthdays, home town and other private information  This information can easily be used to link to other databases!  Disclosure of extensive information on sexual orientation, private cell #‟s, drinking habits, etc. etc. not uncommon [See Kolek& Saunders 2008]  Emerging markets for privacy?  Micropayments for disclosures  http://www.personal.com/ 49 http://www.i-allow.com/  [Micah Altman, 3/10/2011]
  • 50. Law, policy, ethics Reducing Risk in Data Collection Research design … Information security Disclosure  Avoid collecting sensitive information, unless it is limitation required by research design, method, or hypothesis  Unnecessary sensitive information  not minimal risk  Reducing sensitivity  higher participation, greater honesty  Collect sensitive information in private settings  Reduces risk of disclosure  Increases participation  Reduce sensitivity through indirect measures  Less sensitive proxies  E.g. Implicit association test [Greenwald, et al. 1998]  Unfolding brackets  Group response collection  Random response technique [Warner 1965]  Item count/unmatched count/list experiment technique 50 [Micah Altman, 3/10/2011]
  • 51. Managing Sensitive Data Law, policy, ethics Collection Research design … Information security Disclosure  Separate: limitation sensitive measures, (quasi)-identifiers, other measures  If possible avoid storing identifiers with measures:  Collect identifying information beforehand  Assign opaque subject identifiers  For sensitive data:  Collect on-line directly (with appropriate protections); or  Encrypt collection devices/media (laptops, usb keys, etc)  For very/extremely sensitive data:  Collect with oversight directly; then  Store on encrypted device and;  Transfer to secure server as soon as feasible 51 [Micah Altman, 3/10/2011]
  • 52. Randomized Response Law, policy, ethics Technique Research design … Information security Disclosure limitation Sensitiv e questio n Subjec Record t rolls a die > Answer 2 Variations: - Ask two different questions - Item counts with sensitive and non- Say sensitive items – eliminate “YES” subject-randomization - Regression analysis methods 52 [Micah Altman, 3/10/2011]
  • 53. Law, policy, ethics Our Table – Less (?) Sensitive Less (?) Research design … Sensitive Information security Name SSN Birthdate Zipcode Gender Favorite Treat? # Disclosure Ice Cream acts limitation * A.Jones 12341 01011961 02145 M Raspberry 0 0 B. Jones 12342 02021961 02138 M Pistachio 1 20 C. Jones 12343 11111972 94043 M Chocolate 0 0 D. Jones 12344 12121972 94043 M Hazelnut 1 12 E. Jones 12345 03251972 94041 F Lemon 0 0 F. Jones 12346 03251972 02127 F Lemon 1 7 G. Jones 12347 08081989 02138 F Peach 0 1 H. Smith 12348 01011973 63200 F Lime 1 17 I. Smith 12349 02021973 63300 M Mango 0 4 J. Smith 12350 02021973 63400 M Coconut 1 18 K. Smith 12351 03031974 64500 M Frog 0 32 L. Smith 12352 04041974 64600 M Vanilla 1 65 M. Smith 12353 04041974 64700 F Pumpkin 0 128 N. Smith 12354 04041974 64800 F Allergic 1 256 * Acts = crimes if treatment = 0; crimes + acts of generosity if treatment =1 53 [Micah Altman, 3/10/2011]
  • 54. Randomized Response – Law, policy, ethics Pros and Cons Research design … Information security  Pros Disclosure  Can substantially reduce risks of disclosure limitation  Can increase response rate  Can decrease mis-reporting  Warning!  None of the randomized models uses a formal measure of disclosure limitation  Some would clearly violate measures (such as differential privacy) we‟ll see in section 4  Do not use as a replacement for disclosure limitation  Other Issues  Loss of statistical efficiency (if compliance would otherwise be the same)  Complicates data analysis, especially model-based analysis  Leaving randomization up to subject can be unreliable  May provide less confidentiality protection if:  Randomization is incomplete  Records of randomization assignment are kept  Lists of responses overlap across questions  Sensitive question response is large enough to dominate overall response  Non-sensitive question responses are extremely predictable, or publicly observable 54 [Micah Altman, 3/10/2011]
  • 55. Law, policy, ethics Partitioning Information Research design … Information security  Reduces risk in information management Disclosure limitation  Partition data information based on sensitivity  Identifying information  Descriptive information  Sensitive information  Other information  Segregate  Storage of information  Access regimes  Data collections channels  Data transmission channels  Plan to segregate as early as feasible in data collection and processing  Link segregated information with artificial keys … 55 [Micah Altman, 3/10/2011]
  • 56. Partitioned table Not Identified Name SSN Birthdate Zipcode Gender LINK LINK Favorite Treat # Ice Cream act s A.Jones 12341 0101196 02145 M 1401 1401 Raspberry 0 0 1 283 Pistachio 1 20 B. 12342 0202196 02138 M 283 8979 Chocolate 0 0 Jones 1 7023 Hazelnut 1 12 C. 12343 11111972 94043 M 8979 Jones 1498 Lemon 0 0 D. 12344 1212197 94043 M 7023 1036 Lemon 1 7 Jones 2 3864 Peach 0 1 E. 12345 0325197 94041 F 1498 2124 Lime 1 17 Jones 2 4339 Mango 0 4 F. 12346 0325197 02127 F 1036 Jon 2 6629 Coconut 1 18 es 9091 Frog 0 32 G. 12347 0808198 02138 F 3864 9918 Vanilla 1 65 Jon 9 es 4749 Pumpkin 0 12 8 H. Smith 12348 0101197 63200 F 2124 3 8197 Allergic 1 25 6 I. Smith 12349 0202197 63300 M 4339 56 3 [Micah Altman, 3/10/2011]
  • 57. Law, policy, ethics Choosing Linking Keys Research design …  Entirely randomized Information security  Most resistant to relink Disclosure  Mapping from original id to random keys is highly sensitive limitation  Must keep and be able to access mapping to add new identified data  Most computer-generated random numbers are not sufficient by themselves  Most are PSEUDO random – predictable sequences  Use a cryptographic secure PRNG: Blum Blum Shub, AES (or other block cypher) in counter mode OR  Use real random numbers (e.g. from physical sources – see http://maltman.hmdc.harvard.edu/numal/) OR  Use a PRNG with a real random seed to random the order of the table; then another to generate the ID‟s for this randomly ordered table  Encryption  More troublesome to compute  Same id‟s + same key + same “salt” produces same values  facilitates merging  ID‟s can be recovered if key is exposed, cracked, or algorithm weak  Cryptographic Hash e.g. SHA-256  Security is well understood  Tools available to compute  Same id‟s produce same hashes  easier to merge new identified data  ID‟s cannot be recovered from hash because hash loses information  ID‟s can be confirmed if identifying information is known or guessable  Cryptographic Hash + secret key  Security is well understood  Tools available to compute  Same id‟s produce same hashes  easier to merge new identified data  ID‟s cannot be recovered from hash because hash loses information  ID‟s cannot be confirmed unless key is also known  Do not choose arbitrary mathematical functions of other identifiers! 57 [Micah Altman, 3/10/2011]
  • 58.
  • 59. Anonymous Data Collection: Law, policy, ethics Pros & Cons Research design … Information security  Pros Disclosure limitation  Presumption that data is not identifiable  May increase participation  May increase honesty  Cons  Barrier to follow-up, longitudinal studies  Can conflict with quality control, validation  Data still may be indirectly identifiable if respondent descriptive information is collected  Linking data to other sources of information may have large research benefits 59 [Micah Altman, 3/10/2011]
  • 60. Law, policy, ethics Anonymous Data Collection Methods Research design … Information security  Trusted third party intermediates Disclosure limitation  Respondent initiates re-contacts  No identifying information recorded  Use id‟s randomized to subjects, destroy mapping 60 [Micah Altman, 3/10/2011]
  • 61. Law, policy, ethics Remote Data Collection Challenges Research design … Information security  Where network connection is readily available, Disclosure easy to transfer as collected, or enter on remote systemlimitation  Encrypted network file transfer (e.g. SFTP, part of ssh )  Encrypted/tunneled network file system (e.g. Expandrive)  Where network connection is less reliable, high bandwidth  Whole-disk-encrypted laptop  Plus, Encrypted cloud backup solutions: CrashPlan, BackBlaze, SpiderOak  Small data, short term  Encrypted USB keys (e.g. w/IronKey, TrueCrypt, PGP)  Foreign Travel  Be aware of U.S. EAR export restrictions, use commercial or widely-available open encryption software only. Do not use bespoke software.  Be aware of country import restrictions (as of 2008): Burma, Belarus, China, Hungary, Iran, Israel, Morocco, Russia, Saudi Arabia, Tunisia, Ukraine  Encrypt data if possible, but don‟t break foreign laws. Check with department of state. 61 [Micah Altman, 3/10/2011]
  • 62. Online/Electronic data collection Law, policy, ethics challenges Research design … Information security  IP addresses are identifiers  IP addresses can be logged automatically by host, even if not intended by researcher Disclosure limitation  IP addresses can trivially be observed as data is collected  Partial ID numbers can be used for probabilistic geographical identification at sub-zipcode levels  Cookies may be identifiers  Cookies provide a way to link data more easily  May or may not explicitly identify subject  Jurisdiction  Data collected from subjects from other states / countries could subject you to laws in that jurisdiction  Jurisdiction may depend on residency of subject, availability of data collection instrument in jurisdiction, or explicit data collections efforts within jurisdiction  Vendor  Vendor could retain IP addresses, identifying cookies, etc., even if not intended by researcher  Recommendation  Use only vendors that certify compliance with your confidentiality policy  Do not retain IP numbers if data is being collected anonymously  Use SSL/TLS encryption unless data is non-sensitive and anonymous  Some tools for anonymizing IP addresses and system/network logs www.caida.org/tools/taxonomy/anonymization.xml  Harvard policy  Recommendations as above  62 Plus: do not use or display Level 4+ for web surveys [Micah Altman, 3/10/2011]
  • 63.
  • 64. Law, policy, ethics Certificates of Confidentiality Research design … Information security  Issued by DHHS agencies such as NIH, CDC, Disclosure FDA limitation  Protects against many types of forced legal disclosure of confidential information  May not protect against all state disclosure law  Does not protect against voluntary disclosures by researcher/research institutions 64 [Micah Altman, 3/10/2011]
  • 65. Law, policy, ethics Confidentiality & Consent Research design … Information security Best practice is to describe in consent form... Disclosure limitation  Practices in place to protect confidentiality  Plans for making the data available, to whom, and under what circumstances, rationale.  Limitations on confidentiality (e.g. limits to a certificate of confidentiality under state law, planned voluntary disclosure)  Consent form should be consistent with your:  Data management plan  Data sharing plans and requirements  Not generally best practice to promise  Unlimited confidentiality  Destruction of all data  Restriction of all data to original researchers 65 [Micah Altman, 3/10/2011]
  • 66. Law, policy, ethics Data Management Plan Research design … Information security  When is it required?  Any NIH request over $500K Disclosure limitation  All NSF proposals after 12/31/2010  NIJ  Wellcome Trust  Any proposal where collected data will be a resource beyond the project  Safeguarding data during collection  Documentation  Backup and recovery  Review  Treatment of confidential information  Overview: http://www.icpsr.org/DATAPASS/pdf/confidentiality.pdf  Separation of identifying and sensitive information  Obtain certificate of confidentiality, other legal safeguards  De-identification and public use files  Dissemination  Archiving commitment (include letter of support)  Archiving timeline  Access procedures  Documentation  User vetting, tracking, and support One size does not fit all projects. 66 [Micah Altman, 3/10/2011]
  • 67. Data Management Plan Law, policy, ethics Outline Research design … Information security  Data description  Planned documentation and supporting  Budget Disclosure materials limitation  nature of data {generated, observed,  Cost of preparing data and documentation experimental information; amples; publications;  Quality assurance procedures for metadata physical collections; software; models} and documentation  Cost of permanent archiving  scale of data  Data Organization [if complex]  Intellectual Property Rights  Access and Sharing  File organization  Entities who hold property rights  Plans for depositing in an existing public  Naming conventions  Types of IP rights in data database  Protections provided  Quality Assurance [if not described in main  Access procedures proposal]  Dispute resolution process  Embargo periods  Procedures for ensuring data quality in  Legal Requirements  Access charges collections, and expected measurement error  Provider requirements and plans to meet them  Timeframe for access  Cleaning and editing procedures  Institutional requirements and plans to meet  Technical access methods  Validation methods them  Restrictions on access  Storage, backup, replication, and  Archiving and Preservation  Audience versioning  Requirements for data destruction, if applicable  Facilities  Procedures for long term preservation  Potential secondary users  Methods  Institution responsible for long-term costs of  Potential scope or scale of use  Procedures data preservation  Reasons not to share or reuse  Frequency  Succession plans for data should archiving  Existing Data [ If applicable ] entity go out of existence  Replication  description of existing data relevant to the  Ethics and privacy project  Version management  Informed consent  plans for integration with data collection  Recovery guarantees  Protection of privacy  added value of collection, need to collect/create Security new data  Other ethical issues  Procedural controls  Formats  Technical Controls  Adherence  Generation and dissemination formats and  When will adherence to data management plan  Confidentiality concerns be checked or demonstrated procedural justification  Access control rules  Who is responsible for managing data in the  Storage format and archival justification  Restrictions on use project  Metadata and documentation  Responsibility  Who is responsible for checking adherence to  Metadata to be provided data management plan  Individual or project team role responsible for  Metadata standards used data management  Treatment of field notes, and collection records 67 [Micah Altman, 3/10/2011]
  • 68. IQSS Law, policy, ethics Data Management Services Research design … Information security  The Henry A. Murray Research Archive Disclosure  Harvard’s endowed permanent data archive limitation  Assists in developing data management plans  Can provide cataloging assistance for public release of data  Dissemination of data through IQSS Dataverse Network  The IQSS Dataverse Network  Standard data management plan for public, small data  Provides easy virtual archiving and dissemination  Data is catalogued and controlled by you  You theme and brand your virtual archive  Universally searchable, citable  Automatically provides data formatting and statistical analysis on-line http://dvn.iq.harvard.edu 68 [Micah Altman, 3/10/2011]
  • 69. Data Management Plans Examples Law, policy, ethics (Summaries) Research design …  Example 1 Information security  The proposed research will involve a small sample (less than 20 subjects) recruited from clinical facilities in the Disclosure New York City area with Williams syndrome. This rare craniofacial disorder is associated with distinguishing facial limitation features, as well as mental retardation. Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data.  Example 2  The proposed research will include data from approximately 500 subjects being screened for three bacterial sexually transmitted diseases (STDs) at an inner city STD clinic. The final dataset will include self-reported demographic and behavioral data from interviews with the subjects and laboratory data from urine specimens provided. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed.  Example 3  This application requests support to collect public-use data from a survey of more than 22,000 Americans over the age of 50 every 2 years. Data products from this study will be made available without cost to researchers and analysts. https://ssl.isr.umich.edu/hrs/  User registration is required in order to access or download files. As part of the registration process, users must agree to the conditions of use governing access to the public release data, including restrictions against attempting to identify study participants, destruction of the data after analyses are completed, reporting responsibilities, restrictions on redistribution of the data to third parties, and proper acknowledgement of the data resource. Registered users will receive user support, as well as information related to errors in the data, future releases, workshops, and publication lists. The information provided to users will not be used for commercial purposes, and will not be redistributed to third parties.  FROM NIH, [grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm#ex] 69 [Micah Altman, 3/10/2011]
  • 70. External Data Usage Agreements Controls on Confidential Information Harvard University has developed extensive technical and administrative  Between provider and procedures to be used with all identified personal information and other confidential information. The University classifies this form of information internally individual as "Harvard Confidential Information" -- or HCI.  Careful – you‟re liable Any use of HCI at Harvard includes the following safeguards: - Systems security. Any system used to store HCI is subject to a checklist of  Harvard will not help if you sign technical and procedural security measures including: operating system and applications must be patched to current security levels, a host based firewall is  University review and OSP enabled, anti virus software is enabled and the definitions file is current - Server security. Any server used to distribute HCI to other systems (e.g. through signature strongly providing remote file system), or otherwise offering login access, must employ recommended additional security measures including: connection through a private network only; limitation on length of idle sessions; limitations on incorrect password attempts; and additional logging and monitoring.  Between provider and - Access restriction: an individual is allowed to access HCI only if there is a University specific need for access. All access to HCI is over physically controlled and/or encrypted channels.  University Liable - Disposal processes: including secure file erasure and document destruction. - Encryption: HCI must be strongly encrypted whenever it is transmitted across a  Requires University Approved public network, stored on a laptop, or stored on a portable device such as a flash drive or on portable media. Signer : OSP This is only a brief summary. The full University security policy can be found here: http://security.harvard.edu/heisp  Avoid nonstandard protections And a more detailed checklist used to verify systems compliance is found here: whenever possible http://security.harvard.edu/files/resources/forms/  DUA‟s can impose very specific These safeguards are applied consistently throughout the university, we believe and detailed requirements that these requirements offer stringent protection for the requested data. And these requirements will be applied in addition to any others required by specific  Compatible in spirit does not data use agreement. apply compatibility in legal practice  70 Use University [Micah Altman, 3/10/2011] policies/procedures as a
  • 71. Law, policy, ethics IQSS Data Management Services Research design … Information security  The Henry A. Murray Research Archive Disclosure  Harvard’s endowed permanent data archive limitation  Assists in developing data management plans  Can provide cataloging assistance for public release of data  Dissemination of data through IQSS Dataverse Network  Provides letters of commitment to permanent archiving www.murray.harvard.edu  The IQSS Dataverse Network  Provides easy virtual archiving and dissemination  Data is catalogued and controlled by you  You theme and brand your virtual archive  Universally searchable, citable  Automatically provides data formatting and statistical analysis on-line dvn.iq.harvard.edu 71 [Micah Altman, 3/10/2011]
  • 72. Law, policy, ethics Key Concepts & Issues Review Research design … Information security  Levels of sensitivity Disclosure limitation  Anonymity criteria  Sensitivity reduction  Certificate of Confidentiality  Data sharing plan  Data management plan  Information Partitioning  Linking Keys 72 [Micah Altman, 3/10/2011]

Notes de l'éditeur

  1. This work. Managing Confidential information in research, by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  2. - Institutions may give “limited assurance” of common rule compliance – just for funded research. Most give “general assurance”