SlideShare une entreprise Scribd logo
1  sur  67
Data	
  Management	
  for	
  Scientists	
  
                     	
  
       Reduce	
  your	
  workload	
  
            Reuse	
  your	
  ideas	
  
           Recycle	
  your	
  data	
  
                                  	
  

                                                                                www.oddee.com	
  



Carly	
  Strasser,	
  PhD	
  
California	
  Digital	
  Library,	
  UC	
  Office	
  of	
  the	
  President	
  
carly.strasser@ucop.edu	
  
www.carlystrasser.net	
  
Roadmap	
  



                          4.  Toolbox	
  
                          	
  
                 3.  How	
  to	
  improve	
  
         2.  Mistakes	
  we	
  make	
  
1.  Background	
  
	
  
NSF	
  funded	
  DataNet	
  Project	
  
Office	
  of	
  Cyberinfrastructure	
  

                                                         Community	
  
           Cyberinfrastructure	
                        Engagement	
  &	
  
                                                          Outreach	
  




            From	
  Flickr	
  by	
  wetwebwork	
     Courtesy	
  of	
  DataONE	
  
What	
  role	
  can	
  
                                                        libraries	
  play	
  in	
  
                                                        data	
  education?	
  


     Why	
  don’t	
  people	
       What	
  barriers	
  to	
  sharing	
  
       share	
  data?	
               can	
  we	
  eliminate?	
  


                                  Is	
  data	
  management	
  
Do	
  attitudes	
  about	
  
                                         being	
  taught?	
  
  sharing	
  differ	
  
among	
  disciplines?	
  
                                       How	
  can	
  we	
  promote	
  storing	
  
                                          data	
  in	
  repositories?	
  
Roadmap	
  



                          4.  Toolbox	
  
                          	
  
                 3.  How	
  to	
  improve	
  
         2.  Mistakes	
  we	
  make	
  
1.  Background	
  
	
  
From	
  Flickr	
  by	
  	
  DW0825	
  
                                                                                                                 From	
  Flickr	
  by	
  Flickmor	
  




                                                          From	
  Flickr	
  by	
  	
  deltaMike	
  
                                                                                                                                                                       Digital	
  data	
  




                                             www.woodrow.org	
  
                                                                                            C.	
  Strasser	
  




                                                                                                                                                        Courtesey	
  of	
  WHOI	
  
 From	
  Flickr	
  by	
  US	
  Army	
  Environmental	
  Command	
  
Digital	
  data	
  
       +	
  	
  
Complex	
  analyses	
  
Data	
                               Models	
  

                    Maximum	
  
                    Likelihood	
  
                    estimation	
  



                      Matrix	
  
                      Models	
  



       Images	
       Tables	
       Paper	
  
UGLY TRUTH
                                                    Many	
  
                                                    Earth	
  |	
  Environmental	
  |	
  Ecological	
  
                                                    scientists…	
  	
  
                                                    	
  
5shortessays.blogspot.com	
  



                                                                 	
  
                          are	
  not	
  taught	
  data	
  management	
  
                          don’t	
  know	
  what	
  metadata	
  are	
  
                          can’t	
  name	
  data	
  centers	
  or	
  repositories	
  
                          don’t	
  share	
  data	
  publicly	
  or	
  store	
  it	
  in	
  an	
  archive	
  
                          aren’t	
  convinced	
  they	
  should	
  share	
  data	
  

                                                                           	
  
2	
  tables	
                             Random	
  notes	
  

C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab     Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore           Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26          -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26            0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                             From	
  Stephanie	
  Hampton	
  (2010)	
          	
  	
  
                                                                                                                                                             ESA	
  Workshop	
  on	
  Best	
  Practices	
  
Wash	
  Cres	
  Lake	
  Dec	
  15	
  Dont_Use.xls	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab     Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore           Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26          -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26            0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                             From	
  Stephanie	
  Hampton	
  (2010)	
          	
  	
  
                                                                                                                                                             ESA	
  Workshop	
  on	
  Best	
  Practices	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                          Peter's lab          Don't use - old data
                         Sample Type: Algal                                                                                                                        Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                      15
                     Reference statistics: SD for delta        C = 0.07                            SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C delta 13C_ca        %N                delta 15N delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05       -24.59         1.96                  4.12        3.47       25354
         A2                            ref    0.98              39.78      -25.00       -24.54         2.03                  4.01        3.36       25356
         A3                            ref    0.98              40.37      -24.99       -24.53         2.04                  4.09        3.44       25358
         A4                            ref    1.01              42.23      -25.06       -24.60         2.17                  4.20        3.55       25360          Shore                Avg Con
         A5          ALG01                    3.05              1.88       -24.34       -23.88         0.17                 -1.65       -2.30       25362 c            -1.26               -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17       -29.71         0.92                  0.87        0.22       25364               1.26                 0.32
         A7          ALG03                    2.91              6.85       -21.11       -20.65         0.48                 -0.97       -1.62       25366 c
         A8          ALG05                    2.91              35.56      -28.05       -27.59         2.30                  0.59       -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56       -29.10         1.68                  0.79        0.14       25370
         A10         ALG06                    2.95              41.17      -27.32       -26.86         1.97                  2.71        2.06       25372
         B1          ALG04                    3.01              43.74      -27.50       -27.04         1.36                  0.99        0.34       25374 c                    SUMMARY OUTPUT
         B2          ALG02                      3               4.51            SampleID
                                                                           -22.68       -22.22        ALG03
                                                                                                       0.34               ALG05
                                                                                                                             4.31        3.66         ALG07
                                                                                                                                                    25376           ALG06            ALG04            ALG02                ALG01                  ALG03           ALG07
         B3          ALG01                    2.99              1.59       -24.58       -24.12         0.15                 -1.69       -2.34       25378 c                 Regression Statistics
         B4          ALG03                    2.92              4.37       -21.06       -20.60         0.34                 -1.52       -2.17       25380 c                Multiple R 0.283158
         B5          ALG07                     2.9              33.58         Weight (mg)
                                                                           -29.44       -28.98          2.91
                                                                                                       1.74                  0.62    2.91
                                                                                                                                        -0.03       25382 3.04          2.95 Square 0.080178
                                                                                                                                                                           R            3.01                     3                  2.99               2.92                  2.9
         B6                            ref    1.01              44.94      -25.00       -24.54         2.59                  3.96        3.31       25384                  Adjusted R Square
                                                                                                                                                                                       -0.022024
         B7                            ref    0.99              42.28      -24.87       -24.41         2.37                  4.33        3.68       25386                  Standard Error
                                                                                                                                                                                        1.906378
         B8          Lk Outlet Alg            3.04              31.43      -29.69 %C-29.23              6.85
                                                                                                       1.07                  0.95   35.560.30       25388 33.49        41.17
                                                                                                                                                                           Observations43.74    11              4.51                1.59              4.37               33.58
         B9          ALG06                    3.09              35.57      -27.26       -26.80         1.96                  2.79        2.14       25390
         B10         ALG02                    3.05              5.52       -22.31
                                                                                 delta 13C
                                                                                        -21.85
                                                                                                       -21.11
                                                                                                       0.45                  4.72
                                                                                                                                   -28.054.07       25392
                                                                                                                                                          -29.56       -27.32
                                                                                                                                                                           ANOVA
                                                                                                                                                                                 -27.50                        -22.68             -24.58             -21.06             -29.44
         C1          ALG04                    2.98              37.90         delta 13C_ca
                                                                           -27.42       -26.96         -20.65
                                                                                                       1.36                  1.21  -27.590.56       25394 -29.10
                                                                                                                                                             c         -26.86    -27.04
                                                                                                                                                                                    df              SS         -22.22
                                                                                                                                                                                                                  MS  F           -24.12
                                                                                                                                                                                                                               Significance F        -20.60             -28.98
         C2          ALG05                    3.04              31.74      -27.93       -27.47         2.40                  0.73        0.08       25396                  Regression          1 2.851116 2.851116 0.784507 0.398813
         C3                            ref    0.99              38.46      -25.09       -24.63         2.40                  4.37        3.72       25398                  Residual            9 32.7085 3.634278
                                                                23.78             %N                    0.48
                                                                                                       1.17                          2.30                 1.68          1.97
                                                                                                                                                                           Total          1.3610 35.55962 0.34                0.15                     0.34                  1.74
                                                                              delta 15N                  -0.97                       0.59                 0.79          2.71              0.99                 4.31                -1.69              -1.52                  0.62
                                                                                                                                                                                         Coefficients
                                                                                                                                                                                                   Standard Error t Stat  P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                              Upper 95.0%
                                                                             delta 15N_ca                -1.62                      -0.06                 0.14          2.06
                                                                                                                                                                           Intercept       -4.297428 4.671099 3.66
                                                                                                                                                                                            0.34                                    -2.34              -2.17
                                                                                                                                                                                                                -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341      -0.03
                                                                                                                                                                               X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569




                                                                                                                                                                                                                                                   4.00



                                                                                                                                                                                                                                                   3.00



                                                                                                                                                                                                                                                   2.00



                                                                                                                                                                                                                                                   1.00

                                                                                                                                                                                                                                                                      Series1

                                                                                                                                                                                                                                                   0.00
                                                                              -35.00                  -30.00                       -25.00                -20.00                 -15.00                  -10.00                  -5.00                  0.00

                                                                                                                                                                                                                                                  -1.00



                                                                                                                                                                                                                                                  -2.00



                                                                                                                                                                                                                                                  -3.00


                                                                                                                                                                                                                                                                                    12	
  
Random	
  stats	
  output	
  


C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                               Peter's lab              Don't use - old data
                         Sample Type: Algal                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                     13                                                   15
                     Reference statistics: SD for delta C = 0.07                              SD for delta N = 0.15


          Position        SampleID        Weight (mg)      %C      delta 13C   delta 13C_ca        %N          delta 15N   delta 15N_ca Spec. No.
         A1                           ref    0.98         38.27     -25.05         -24.59          1.96           4.12          3.47     25354
         A2                           ref    0.98         39.78     -25.00         -24.54          2.03           4.01          3.36     25356
         A3                           ref    0.98         40.37     -24.99         -24.53          2.04           4.09          3.44     25358
         A4                           ref    1.01         42.23     -25.06         -24.60          2.17           4.20          3.55     25360          Shore                    Avg Con
         A5          ALG01                   3.05         1.88      -24.34         -23.88          0.17          -1.65         -2.30     25362      c       -1.26                   -27.22
         A6          Lk Outlet Alg           3.06         31.55     -30.17         -29.71          0.92           0.87          0.22     25364               1.26                     0.32
         A7          ALG03                   2.91         6.85      -21.11         -20.65          0.48          -0.97         -1.62     25366      c
         A8          ALG05                   2.91         35.56     -28.05         -27.59          2.30           0.59         -0.06     25368
         A9          ALG07                   3.04         33.49     -29.56         -29.10          1.68           0.79          0.14     25370
         A10         ALG06                   2.95         41.17     -27.32         -26.86          1.97           2.71          2.06     25372
         B1          ALG04                   3.01         43.74     -27.50         -27.04          1.36           0.99          0.34     25374      c               SUMMARY OUTPUT
         B2          ALG02                     3          4.51      -22.68         -22.22          0.34           4.31          3.66     25376
         B3          ALG01                   2.99         1.59      -24.58         -24.12          0.15          -1.69         -2.34     25378      c                Regression Statistics
         B4          ALG03                   2.92         4.37      -21.06         -20.60          0.34          -1.52         -2.17     25380      c               Multiple R 0.283158
         B5          ALG07                    2.9         33.58     -29.44         -28.98          1.74           0.62         -0.03     25382                      R Square 0.080178
         B6                           ref    1.01         44.94     -25.00         -24.54          2.59           3.96          3.31     25384                      Adjusted R Square
                                                                                                                                                                                -0.022024
         B7                           ref    0.99         42.28     -24.87         -24.41          2.37           4.33          3.68     25386                      Standard Error
                                                                                                                                                                                 1.906378
         B8          Lk Outlet Alg           3.04         31.43     -29.69         -29.23          1.07           0.95          0.30     25388                      Observations         11
         B9          ALG06                   3.09         35.57     -27.26         -26.80          1.96           2.79          2.14     25390
         B10         ALG02                   3.05         5.52      -22.31         -21.85          0.45           4.72          4.07     25392                      ANOVA
         C1          ALG04                   2.98         37.90     -27.42         -26.96          1.36           1.21          0.56     25394      c                                df         SS      MS        F Significance F
         C2          ALG05                   3.04         31.74     -27.93         -27.47          2.40           0.73          0.08     25396                      Regression             1 2.851116 2.851116 0.784507 0.398813
         C3                           ref    0.99         38.46     -25.09         -24.63          2.40           4.37          3.72     25398                      Residual               9 32.7085 3.634278
                                                          23.78                                    1.17                                                             Total                 10 35.55962

                                                                                                                                                                              Coefficients
                                                                                                                                                                                        Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                  Upper 95.0%
                                                                                                                                                                    Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341
                                                                                                                                                                    X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
DATA HANGOVER



What	
  happened?	
  



                        From	
  Flickr	
  by	
  SteveMcN	
  
Where	
  data	
  end	
  up	
  
                                                       From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                  www




                         blog.order2disorder.com	
  




                                                                                                  From	
  Flickr	
  by	
  csessums	
  
  Data	
  
Metadata	
  




                                                                                                      From	
  Flickr	
  by	
  csessums	
  
                                                                          Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Who	
  cares?	
  
       	
  

                                               From	
  Flickr	
  by	
  Redden-­‐McAllister	
  




From	
  Flickr	
  by	
  AJC1	
     www.rba.gov.au	
  
Where	
  data	
  end	
  up	
  
                                                                    From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                               www




  Data	
  
                                                                                         www
Metadata	
  
                             From	
  Flickr	
  by	
  torkildr	
  




                                                                                       Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Data	
  
   Reuse	
  

   Data	
  
  Sharing	
  

   Data	
  
Management	
  
Trends	
  in	
  Data	
  Archiving	
  
Journal	
  publishers	
  
Joint	
  Data	
  Archiving	
  Agreement	
  
Trends	
  in	
  Data	
  Archiving	
  
Journal	
  publishers	
  
Joint	
  Data	
  Archiving	
  Agreement	
  
	
  
Data	
  Papers	
  
Ecological	
  Archives,	
  Beyond	
  the	
  PDF	
  
Trends	
  in	
  Data	
  Archiving	
  
Journal	
  publishers	
  
Joint	
  Data	
  Archiving	
  Agreement	
  
	
  
Data	
  Papers	
  etc.	
  
Ecological	
  Archives,	
  Beyond	
  the	
  PDF	
  
	
  
Funders	
  
Data	
  management	
  requirements	
  
	
  
Roadmap	
  



                          4.  Toolbox	
  
                          	
  
                 3.  How	
  to	
  improve	
  
         2.  Mistakes	
  we	
  make	
  
1.  Background	
  
	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

    1.  Planning	
  
    2.  Data	
  collection	
  &	
  organization	
  
    3.  Quality	
  control	
  &	
  assurance	
  
    4.  Metadata	
  
    5.  Workflows	
  
    6.  Data	
  stewardship	
  &	
  reuse	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Create	
  unique	
  identifiers	
  
     •  Decide	
  on	
  naming	
  scheme	
  early	
  
     •  Create	
  a	
  key	
  
     •  Different	
  for	
  each	
  sample	
  




   From	
  Flickr	
  by	
  zebbie	
          From	
  Flickr	
  by	
  sjbresnahan	
  
2.	
  Data	
  collection	
  &	
  organization	
  

        Standardize	
  
                      •  Consistent	
  within	
  columns	
  
                                    – only	
  numbers,	
  dates,	
  or	
  text	
  
                      •  Consistent	
  names,	
  codes,	
  formats	
  




Modified	
  from	
  K.	
  Vanderbilt	
  	
  
                                                                                     From	
  Pink	
  Floyd,	
  The	
  Wall	
  	
  	
  themurkyfringe.com	
  
2.	
  Data	
  collection	
  &	
  organization	
  

        Standardize	
  
                      •  Reduce	
  possibility	
  
                         of	
  manual	
  error	
  by	
  
                         constraining	
  entry	
  
                         choices	
  


                    Excel	
  lists	
  
                         Data   Google	
  Docs	
  
                                  	
  
                                       Forms	
  
                   validataion	
  

Modified	
  from	
  K.	
  Vanderbilt	
  	
  
2.	
  Data	
  collection	
  &	
  organization	
  
	
  	
  
           Create	
  parameter	
  table	
  
           Create	
  a	
  site	
  table	
  




                                              From	
  doi:10.3334/ORNLDAAC/777	
  

From	
  doi:10.3334/ORNLDAAC/777	
  


                                                                      From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
2.	
  Data	
  collection	
  &	
  organization	
  

   	
  Use	
  descriptive	
  file	
  names	
  *	
  
       •  Unique	
  
       •  Reflect	
  contents	
  

Bad:	
       	
  Mydata.xls	
              Better: 	
  Eaffinis_nanaimo_2010_counts.xls	
  
   	
        	
  2001_data.csv	
  
   	
        	
  best	
  version.txt	
  
                                                Study	
                          Year	
  
                                              organism	
      Site	
  
                                                             name	
                                       What	
  was	
  
                                                                                                          measured	
  	
  



           *Not	
  for	
  everyone	
  
                                                                         From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Organize	
  files	
  	
  logically	
  


                      Biodiversity	
  


                              Lake	
  


                              Experiments	
   Biodiv_H20_heatExp_2005to2008.csv	
  
                                                 Biodiv_H20_predatorExp_2001to2003.csv	
  
                                                 …	
  
                               Field	
  work	
   Biodiv_H20_PlanktonCount_2001toActive.csv	
  
                                                 Biodiv_H20_ChlAprofiles_2003.csv	
  
                                                 …	
  
                                                 	
  
                           Grassland	
  
                                                                                            From	
  S.	
  Hampton	
  
2.	
  Data	
  collection	
  &	
  organization	
  

	
  Preserve	
  information	
                                            R	
  script	
  for	
  processing	
  &	
  
                                                                                                   analysis	
  
 •  Keep	
  raw	
  data	
  raw	
  
 •  Use	
  scripts	
  to	
  process	
  data	
                     	
  
        	
  &	
  save	
  them	
  with	
  data	
  

                                  Raw	
  data	
  as	
  .csv	
  
2.	
  Data	
  collection	
  &	
  oAll	
  of	
  the	
  things	
  that	
  
                                        rganization	
  
                                               make	
  Excel	
  great	
  for	
  
                                               data	
  organization	
  
                                               are	
  bad	
  for	
  archiving!	
  
                                               What	
  to	
  do?	
  



1.    Create	
  archive-­‐ready	
  raw	
  data	
  
2.    Put	
  it	
  somewhere	
  special	
  
3.    Have	
  your	
  fun	
  with	
  fancy	
  Excel	
  techniques	
  
4.    Keep	
  archiving	
  in	
  mind	
  
3.	
  Quality	
  control	
  and	
  quality	
  assurance	
  
 Define	
  &	
  enforce	
  standards	
  
 Double	
  data	
  entry	
  
 Document	
  changes	
  
 Minimize	
  manual	
  data	
  entry	
  
 No	
  missing,	
  impossible,	
  or	
  anomalous	
  values	
  
        •  Perform	
  statistical	
  summaries	
  
                                               60	
  
        •  Use	
  illegal	
  data	
  filter	
   50	
  

        •  Look	
  for	
  outliers	
           40	
  


 	
                                               30	
  

                                                  20	
  

                                                  10	
  

                                                    0	
  
                                                            0	
     5	
     10	
     15	
     20	
     25	
     30	
     35	
  
4.	
  Metadata	
  basics	
  
                What	
  is	
  metadata?	
  
4.	
  Metadata	
  basics	
  
                        What	
  is	
  metadata?	
  
    	
  	
  Data	
  reporting	
  
                                            	
  



      WHO	
  created	
  the	
  data?	
  
      WHAT	
  is	
  the	
  content	
  of	
  the	
  data	
  set?	
  
      WHEN	
  was	
  it	
  created?	
  
      WHERE	
  was	
  it	
  collected?	
  
      HOW	
  was	
  it	
  developed?	
  
      WHY	
  was	
  it	
  developed?	
  
•    Scientific	
  context	
  

       4.	
  Metadata	
  basics	
                                                          •       Scientific	
  reason	
  why	
  the	
  data	
  were	
  
                                                                                                   collected	
  
                                                                                           •       What	
  data	
  were	
  collected	
  
•    Digital	
  context	
                                                                  •       What	
  instruments	
  (including	
  model	
  &	
  
      •     Name	
  of	
  the	
  data	
  set	
                                                     serial	
  number)	
  were	
  used	
  
      •     The	
  name(s)	
  of	
  the	
  data	
  file(s)	
  in	
  the	
  data	
           •       Environmental	
  conditions	
  during	
  collection	
  
            set	
                                                                          •       Where	
  collected	
  &	
  spatial	
  resolution	
  When	
  
      •     Date	
  the	
  data	
  set	
  was	
  last	
  modified	
                                 collected	
  &	
  temporal	
  resolution	
  
      •     Example	
  data	
  file	
  records	
  for	
  each	
  data	
                     •       Standards	
  or	
  calibrations	
  used	
  
            type	
  file	
                                                            •    Information	
  about	
  parameters	
  
      •     Pertinent	
  companion	
  files	
                                               •       How	
  each	
  was	
  measured	
  or	
  produced	
  
      •     List	
  of	
  related	
  or	
  ancillary	
  data	
  sets	
                     •       Units	
  of	
  measure	
  
      •     Software	
  (including	
  version	
  number)	
                                 •       Format	
  used	
  in	
  the	
  data	
  set	
  
            used	
  to	
  prepare/read	
  	
  the	
  data	
  set	
  
                                                                                           •       Precision	
  &	
  accuracy	
  if	
  known	
  
      •     Data	
  processing	
  that	
  was	
  performed	
  
                                                                                     •    Information	
  about	
  data	
  
•    Personnel	
  &	
  stakeholders	
  
                                                                                           •       Definitions	
  of	
  codes	
  used	
  
      •     Who	
  collected	
  	
  
                                                                                           •       Quality	
  assurance	
  &	
  control	
  measures	
  
      •     Who	
  to	
  contact	
  with	
  questions	
  
                                                                                           •       Known	
  problems	
  that	
  limit	
  data	
  use	
  (e.g.	
  
      •     Funders	
                                                                              uncertainty,	
  sampling	
  problems)	
  	
  
                                                                                     •    How	
  to	
  cite	
  the	
  data	
  set	
  
4.	
  Metadata	
  basics	
  
                      What	
  is	
  a	
  metadata	
  standard?	
  


•  Provides	
  structure	
  to	
  describe	
  data	
  
              Common	
  terms	
  	
  |	
  	
  definitions	
  	
  |	
  	
  language	
  	
  |	
  	
  structure	
  

•  Lots	
  of	
  different	
  standards	
  
            	
  EML	
  ,	
  FGDC,	
  ISO19115,	
  DarwinCore,…	
  
     	
  




•  Tools	
  for	
  creating	
  metadata	
  files	
  
            	
  Morpho	
  (EML),	
  Metavist	
  (FGDC),	
  NOAA	
  MERMaid	
  (CSGDM)	
  	
  
4.	
  Metadata	
  basics	
  

   What	
  does	
  a	
  metadata	
  record	
  look	
  like?	
  
5.	
  Workflows	
  

 Simplest	
  workflows:	
  commented	
  scripts,	
  flow	
  charts	
  

 Temperature	
  
    data	
  
                                                             Data	
  import	
  into	
  R	
     Data	
  in	
  R	
  
     Salinity	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                                format	
  
      data	
  
                                                              Quality	
  control	
  &	
  
                                        “Clean”	
  T	
         data	
  cleaning	
  
                                        &	
  S	
  data	
  

                                                             Analysis:	
  mean,	
  SD	
  
                                                                                                Summary	
  
                                                                                                statistics	
  

                                                             Graph	
  production	
  
5.	
  Workflows	
  
Fancy	
  Schmancy:	
  Kepler	
  
                                                         Resulting	
  output	
  




                      https://kepler-­‐project.org	
  
5.	
  Workflows	
  

 Workflows	
  enable	
  
 	
  
                                                                                                       From	
  Flickr	
  by	
  merlinprincesse	
  
        Reproducibility	
  
               	
  can	
  someone	
  independently	
  validate	
  findings?	
  
        Transparency	
  	
  
               	
  others	
  can	
  understand	
  how	
  you	
  arrived	
  at	
  your	
  results	
  
        Executability	
  	
  
               	
  others	
  can	
  re-­‐run	
  or	
  re-­‐use	
  your	
  analysis	
  
        	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
                                                                          From	
  Flickr	
  by	
  greensambaman	
  




      The 20-Year Rule
     The	
  metadata	
  accompanying	
  a	
  
     data	
  set	
  should	
  be	
  written	
  for	
  a	
  
      user	
  20	
  years	
  into	
  the	
  future	
                    RULE	
  
                            	
  
                                 	
  



                                                              (National	
  Research	
  Council	
  1991)	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  

Use	
  stable	
  formats	
  
     	
     	
  csv,	
  txt,	
  tiff	
  
Create	
  back-­‐up	
  copies	
  	
  
             original,	
  near,	
  far	
  
Periodically	
  test	
  ability	
  to	
  restore	
  information	
  




                                                                      Modified from R. Cook	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
                         Where	
  do	
  I	
  put	
  it?	
  
                      Insitutional	
  archive	
  
              Discipline/specialty	
  archive	
  
              DataCite	
  list	
  of	
  repostiories:	
  
                	
  www.datacite.org/repolist	
  
                                                          	
  
                                                          	
  
                                                                 	
  

                   From	
  Flickr	
  by	
  torkildr	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
            Data	
  Citation:	
  Why	
  everyone	
  should	
  do	
  it	
  

                Allow	
  readers	
  to	
  find	
  data	
  products	
  
                Get	
  credit	
  for	
  data	
  and	
  publications	
  
                Promote	
  reproducibility	
  
                Better	
  measure	
  of	
  research	
  impact	
  
     Example:	
  
     Sidlauskas,	
  B.	
  2007.	
  Data	
  from:	
  Testing	
  for	
  unequal	
  rates	
  of	
  morphological	
  
     diversification	
  in	
  the	
  absence	
  of	
  a	
  detailed	
  phylogeny:	
  a	
  case	
  study	
  from	
  
     characiform	
  fishes.	
  Dryad	
  Digital	
  Repository.	
  doi:10.5061/dryad.20	
  
     	
  


Learn	
  more	
  at	
  www.datacite.org	
                                                             Modified from R. Cook	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

    1.  Planning	
  
    2.  Data	
  collection	
  &	
  organization	
  
    3.  Quality	
  control	
  &	
  assurance	
  
    4.  Metadata	
  
    5.  Workflows	
  
    6.  Data	
  stewardship	
  &	
  reuse	
  
    7.  Planning	
  
1.	
  Planning	
  

   What	
  is	
  a	
  data	
  management	
  plan?	
  
A	
  document	
  that	
  describes	
  what	
  you	
  will	
  do	
  with	
  your	
  data	
  
         during	
  and	
  after	
  you	
  complete	
  your	
  research	
  



                            DATA
                          HANGOVER
1.	
  Planning	
  
              Why	
  should	
  I	
  prepare	
  a	
  DMP?	
  
        	
                           	
  
        Saves	
  time	
  
        Increases	
  efficiency	
  
        Easier	
  to	
  use	
  data	
  	
  	
  
        Others	
  can	
  understand	
  &	
  use	
  data	
  
        Credit	
  for	
  data	
  products	
  
        Funders	
  require	
  it	
  
	
  
NSF	
  DMP	
  Requirements	
  
 From	
  Grant	
  Proposal	
  Guidelines:	
  
	
  DMP	
  supplement	
  may	
  include:	
  
     1.  the	
  types	
  of	
  data,	
  samples,	
  physical	
  collections,	
  software,	
  curriculum	
  
         materials,	
  and	
  other	
  materials	
  to	
  be	
  produced	
  in	
  the	
  course	
  of	
  the	
  project	
  
  2.  	
  the	
  standards	
  to	
  be	
  used	
  for	
  data	
  and	
  metadata	
  format	
  and	
  content	
  (where	
  
      existing	
  standards	
  are	
  absent	
  or	
  deemed	
  inadequate,	
  this	
  should	
  be	
  
      documented	
  along	
  with	
  any	
  proposed	
  solutions	
  or	
  remedies)	
  
  3.  	
  policies	
  for	
  access	
  and	
  sharing	
  including	
  provisions	
  for	
  appropriate	
  
      protection	
  of	
  privacy,	
  confidentiality,	
  security,	
  intellectual	
  property,	
  or	
  other	
  
      rights	
  or	
  requirements	
  
  4.  	
  policies	
  and	
  provisions	
  for	
  re-­‐use,	
  re-­‐distribution,	
  and	
  the	
  production	
  of	
  
      derivatives	
  
  5.  	
  plans	
  for	
  archiving	
  data,	
  samples,	
  and	
  other	
  research	
  products,	
  and	
  for	
  
      preservation	
  of	
  access	
  to	
  them	
  
1.  Types	
  of	
  data	
  &	
  other	
  information	
  

•  Types	
  of	
  data	
  produced	
  
•  Relationship	
  to	
  existing	
  data	
  
•  How/when/where	
  will	
  the	
  data	
  be	
  captured	
  or	
  
   created?	
                                                                      C.	
  Strasser	
  




•  How	
  will	
  the	
  data	
  be	
  processed?	
  
•  Quality	
  assurance	
  &	
  quality	
  control	
  measures	
  
•  Security:	
  version	
  control,	
  backing	
  up	
                  biology.kenyon.edu	
  



•  Who	
  will	
  be	
  responsible	
  for	
  data	
  management	
  
   during/after	
  project?	
  

                                                                       From	
  Flickr	
  by	
  Lazurite	
  
2.  Data	
  &	
  metadata	
  standards	
  

•  What	
  metadata	
  are	
  needed	
  to	
  make	
  the	
  data	
  meaningful?	
  
•  How	
  will	
  you	
  create	
  or	
  capture	
  these	
  metadata?	
  	
  
                                                                                 Wired.com	
  

•  Why	
  have	
  you	
  chosen	
  particular	
  standards	
  and	
  approaches	
  
   for	
  metadata?	
  
3.  Policies	
  for	
  access	
  &	
  sharing	
  
       4.  Policies	
  for	
  re-­‐use	
  &	
  re-­‐distribution	
  
•  Are	
  you	
  under	
  any	
  obligation	
  to	
  share	
  data?	
  	
  
•  How,	
  when,	
  &	
  where	
  will	
  you	
  make	
  the	
  data	
  available?	
  	
  
•  What	
  is	
  the	
  process	
  for	
  gaining	
  access	
  to	
  the	
  data?	
  	
  
•  Who	
  owns	
  the	
  copyright	
  and/or	
  intellectual	
  property?	
  
•    Will	
  you	
  retain	
  rights	
  before	
  opening	
  data	
  to	
  wider	
  use?	
  How	
  long?	
  
•    Are	
  permission	
  restrictions	
  necessary?	
  
•    Embargo	
  periods	
  for	
  political/commercial/patent	
  reasons?	
  	
  
•    Ethical	
  and	
  privacy	
  issues?	
  
•    Who	
  are	
  the	
  foreseeable	
  data	
  users?	
  
•    How	
  should	
  your	
  data	
  be	
  cited?	
  
5.  Plans	
  for	
  archiving	
  &	
  preservation	
  

•  What	
  data	
  will	
  be	
  preserved	
  for	
  the	
  long	
  term?	
  For	
  how	
  long?	
  	
  	
  
•  Where	
  will	
  data	
  be	
  preserved?	
  
•  What	
  data	
  transformations	
  need	
  to	
  occur	
  before	
  
   preservation?	
  
•  What	
  metadata	
  will	
  be	
  submitted	
  
   alongside	
  the	
  datasets?	
  
•  Who	
  will	
  be	
  responsible	
  for	
  preparing	
  
   data	
  for	
  preservation?	
  Who	
  will	
  be	
  the	
  
   main	
  contact	
  person	
  for	
  the	
  archived	
  
   data?	
  

                                                                              From	
  Flickr	
  by	
  theManWhoSurfedTooMuch	
  
Don’t	
  forget:	
  Budget	
  
•  Costs	
  of	
  data	
  preparation	
  &	
  documentation	
  
           Hardware,	
  software	
  
           Personnel	
  
           Archive	
  fees	
  
•  How	
  costs	
  will	
  be	
  paid	
  	
  
           Request	
  funding!	
  



                                                                  dorrvs.com	
  
NSF’s	
  Vision*	
  


    DMPs	
  and	
  their	
  evaluation	
  will	
  grow	
  &	
  change	
  over	
  time	
  
    (similar	
  to	
  broader	
  impacts)	
  
    Peer	
  review	
  will	
  determine	
  next	
  steps	
  
    Community-­‐driven	
  guidelines	
  	
  
           –  Different	
  disciplines	
  have	
  different	
  definitions	
  of	
  acceptable	
  
              data	
  sharing	
  
           –  Flexibility	
  at	
  the	
  directorate	
  and	
  division	
  levels	
  
           –  Tailor	
  implementation	
  of	
  DMP	
  requirement	
  

    Evaluation	
  will	
  vary	
  with	
  directorate,	
  division,	
  &	
  program	
  
    officer	
  
    	
  
*Unofficially	
  
                                                                                Help	
  from	
  Jennifer	
  Schopf,	
  NSF	
  
NSF’s	
  Vision*	
  


 DMPs	
  are	
  a	
  good	
  first	
  step	
  towards	
  improving	
  data	
  
 stewardship	
  
        –  starting	
  discussion	
  
        –  scientists	
  learning	
  about	
  data	
  management	
  

 Additional	
  expertise	
  on	
  panels	
  to	
  effectively	
  evaluate	
  
 DMPs	
  (?)	
  
 Working	
  group	
  will	
  assess	
  outcomes	
  
 	
  
*Unofficially	
  
 	
  

 	
  
Roadmap	
  



                          4.  Toolbox	
  
                          	
  
                 3.  How	
  to	
  improve	
  
         2.  Mistakes	
  we	
  make	
  
1.  Background	
  
	
  
DMPTool:	
  	
  	
  	
  	
  dmp.cdlib.org	
  




                       Step-­‐by-­‐step	
  wizard	
  for	
  generating	
  DMP	
  
             Create	
  	
  |	
  	
  edit	
  	
  |	
  	
  re-­‐use	
  	
  |	
  	
  share	
  	
  |	
  	
  save	
  	
  |	
  	
  generate	
  	
  
                                                 Open	
  to	
  community	
  	
  
                                    Links	
  to	
  institutional	
  resources	
  
                              Directorate	
  information	
  &	
  updates	
  
E-­‐notebooks	
  

•    NoteBook	
  
•    ORNL	
  eNote	
  	
  
•    Evernote	
  
•    Google	
  Docs	
  
•    Blogs	
  
•    wikis	
  
•    TheLabNotebook.com	
  
•    iPad	
  ELN	
  
•    NoteBookMaker	
  
                       iPad ELN, the flexible
                       electronic laboratory notebook


                  TheLabNotebook.com"
CDL	
  Services	
  for	
  UC	
  Community	
  


  Where	
  
should	
  I	
  put	
                             Data	
  Repository	
  
 my	
  data?	
           Deposit	
  	
  |	
  	
  Manage	
  	
  |	
  	
  Share	
  	
  |	
  	
  Preserve	
  




                                                  www.cdlib.org/services/uc3	
  
CDL	
  Services	
  for	
  UC	
  Community	
  


                Create	
  &	
  manage	
  persistent	
  identifiers	
  
                   •     Precise	
  identification	
  of	
  a	
  dataset	
  
                   •     Credit	
  to	
  data	
  producers	
  and	
  data	
  publishers	
  
                   •     A	
  link	
  from	
  the	
  traditional	
  literature	
  to	
  the	
  data	
  
                   •     Research	
  metrics	
  for	
  datasets	
  


Example:	
  
Sidlauskas,	
  B.	
  2007.	
  Data	
  from:	
  Testing	
  for	
  unequal	
  rates	
  of	
  morphological	
  
diversification	
  in	
  the	
  absence	
  of	
  a	
  detailed	
  phylogeny:	
  a	
  case	
  study	
  from	
  
characiform	
  fishes.	
  Dryad	
  Digital	
  Repository.	
  doi:10.5061/dryad.20	
  
	
  

                                                             www.cdlib.org/services/uc3	
  
Why	
  are	
  you	
  
                                                                                                 promoting	
  
                                                                                                   Excel?	
  


•    Open	
  source	
  add-­‐in	
  
•    Facilitate	
  data	
  management,	
  sharing,	
  archiving	
  for	
  scientists	
  
•    Part	
  of	
  DataONE	
  investigator	
  toolkit	
  
•    Collecting	
  requirements	
  for	
  add-­‐in	
  from	
  scientists,	
  data	
  
     centers,	
  libraries	
  

                                        dcxl.cdlib.org	
  
                                              	
  

                   Funders:	
  Gordon	
  and	
  Betty	
  Moore	
  Foundation,	
  Microsoft	
  Research	
  
Why	
  are	
  you	
  
                                                                       promoting	
  
                                                                         Excel?	
  


•    Everyone	
  uses	
  it	
  
•    Features	
  that	
  make	
  it	
  good	
  for	
  data	
  organization	
  make	
  it	
  
     bad	
  for	
  archiving	
  
•    Stopgap	
  measure	
  
B	
  




A	
             C	
  
www.dataone.org	
  



•    Data	
  Education	
  Tutorials	
  
•    Database	
  of	
  best	
  practices	
  	
  
     &	
  software	
  tools	
  
•    Links	
  to	
  DMPTool	
  
•    Primer	
  on	
  data	
  management	
  




                                                           From	
  Flickr	
  by	
  Robert	
  Hruzek	
  
Data Management 101"




dcxl.cdlib.org	
  
•    Data	
  Education	
  Tutorials	
  
•    Primer	
  on	
  data	
  management	
  
•    Other	
  resources	
  
Toolbox:	
  
	
  DCXL	
  blog:	
  dcxl.cdlib.org	
  
Lisa	
  Federer	
  
                                                       	
  



dcxl.cdlib.org	
  
@dcxlCDL	
  
www.facebook.com/DCXLatCDL	
  


                        www.carlystrasser.net	
  
                      carlystrasser@gmail.com	
  
                            @carlystrasser	
  

Contenu connexe

Tendances

Data Management from a Scientist's Perspective
Data Management from a Scientist's PerspectiveData Management from a Scientist's Perspective
Data Management from a Scientist's PerspectiveCarly Strasser
 
The DMPTool: A Resource for Data Management Planning
The DMPTool: A Resource for Data Management Planning The DMPTool: A Resource for Data Management Planning
The DMPTool: A Resource for Data Management Planning Carly Strasser
 
DMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessDMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessCarly Strasser
 

Tendances (6)

Data Management from a Scientist's Perspective
Data Management from a Scientist's PerspectiveData Management from a Scientist's Perspective
Data Management from a Scientist's Perspective
 
The DMPTool: A Resource for Data Management Planning
The DMPTool: A Resource for Data Management Planning The DMPTool: A Resource for Data Management Planning
The DMPTool: A Resource for Data Management Planning
 
STI Summit 2011 - Welcome
STI Summit 2011 - WelcomeSTI Summit 2011 - Welcome
STI Summit 2011 - Welcome
 
2012 asq rd org chart
2012 asq rd org chart2012 asq rd org chart
2012 asq rd org chart
 
DMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessDMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for Success
 
UCLAx C2C Class 1
UCLAx C2C Class 1UCLAx C2C Class 1
UCLAx C2C Class 1
 

Similaire à UCLA: Data Management for Scientists

Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Carly Strasser
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Carly Strasser
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsCarly Strasser
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCarly Strasser
 
Data Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFData Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFCarly Strasser
 
Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current LandscapeCarly Strasser
 
Webinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share DataWebinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share DataCarly Strasser
 
Learning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, ExamplesLearning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, ExamplesStefan Dietze
 
DCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small DatasetsDCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small DatasetsCarly Strasser
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekCarly Strasser
 
Keeping Up with Data
Keeping Up with Data Keeping Up with Data
Keeping Up with Data AbigailGoben
 
DataUp: An overview for the DataONE Users Group
DataUp: An overview for the DataONE Users GroupDataUp: An overview for the DataONE Users Group
DataUp: An overview for the DataONE Users GroupCarly Strasser
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeLiz Lyon
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Lukas Mandrake
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesEunjeong (Lucy) Park
 
"Undergrad ecologists aren't learning data management" - ESA 2013
"Undergrad ecologists aren't learning data management" -  ESA 2013"Undergrad ecologists aren't learning data management" -  ESA 2013
"Undergrad ecologists aren't learning data management" - ESA 2013Carly Strasser
 
Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Carly Strasser
 

Similaire à UCLA: Data Management for Scientists (20)

Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for Scientists
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for Researchers
 
Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)
 
Data Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFData Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UF
 
Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current Landscape
 
Webinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share DataWebinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share Data
 
Learning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, ExamplesLearning Analytics & Linked Data – Opportunities, Challenges, Examples
Learning Analytics & Linked Data – Opportunities, Challenges, Examples
 
DCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small DatasetsDCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small Datasets
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data services
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research Week
 
Keeping Up with Data
Keeping Up with Data Keeping Up with Data
Keeping Up with Data
 
DataUp: An overview for the DataONE Users Group
DataUp: An overview for the DataONE Users GroupDataUp: An overview for the DataONE Users Group
DataUp: An overview for the DataONE Users Group
 
Data Citation Made Easy
Data Citation Made EasyData Citation Made Easy
Data Citation Made Easy
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
 
"Undergrad ecologists aren't learning data management" - ESA 2013
"Undergrad ecologists aren't learning data management" -  ESA 2013"Undergrad ecologists aren't learning data management" -  ESA 2013
"Undergrad ecologists aren't learning data management" - ESA 2013
 
Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012
 

Plus de Carly Strasser

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeCarly Strasser
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015Carly Strasser
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceCarly Strasser
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyCarly Strasser
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014Carly Strasser
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataCarly Strasser
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopCarly Strasser
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Carly Strasser
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumCarly Strasser
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14Carly Strasser
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Carly Strasser
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishCarly Strasser
 

Plus de Carly Strasser (20)

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of Change
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career Conference
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14sky
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories Workshop
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch Libraries
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science Workshop
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
 
Dash for IASSIST 2014
Dash for IASSIST 2014Dash for IASSIST 2014
Dash for IASSIST 2014
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience Symposium
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or Perish
 

Dernier

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 

Dernier (20)

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 

UCLA: Data Management for Scientists

  • 1. Data  Management  for  Scientists     Reduce  your  workload   Reuse  your  ideas   Recycle  your  data     www.oddee.com   Carly  Strasser,  PhD   California  Digital  Library,  UC  Office  of  the  President   carly.strasser@ucop.edu   www.carlystrasser.net  
  • 2. Roadmap   4.  Toolbox     3.  How  to  improve   2.  Mistakes  we  make   1.  Background    
  • 3. NSF  funded  DataNet  Project   Office  of  Cyberinfrastructure   Community   Cyberinfrastructure   Engagement  &   Outreach   From  Flickr  by  wetwebwork   Courtesy  of  DataONE  
  • 4. What  role  can   libraries  play  in   data  education?   Why  don’t  people   What  barriers  to  sharing   share  data?   can  we  eliminate?   Is  data  management   Do  attitudes  about   being  taught?   sharing  differ   among  disciplines?   How  can  we  promote  storing   data  in  repositories?  
  • 5. Roadmap   4.  Toolbox     3.  How  to  improve   2.  Mistakes  we  make   1.  Background    
  • 6. From  Flickr  by    DW0825   From  Flickr  by  Flickmor   From  Flickr  by    deltaMike   Digital  data   www.woodrow.org   C.  Strasser   Courtesey  of  WHOI   From  Flickr  by  US  Army  Environmental  Command  
  • 7. Digital  data   +     Complex  analyses  
  • 8. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  • 9. UGLY TRUTH Many   Earth  |  Environmental  |  Ecological   scientists…       5shortessays.blogspot.com     are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data    
  • 10. 2  tables   Random  notes   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  • 11. Wash  Cres  Lake  Dec  15  Dont_Use.xls   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  • 12. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 SampleID -22.68 -22.22 ALG03 0.34 ALG05 4.31 3.66 ALG07 25376 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 Weight (mg) -29.44 -28.98 2.91 1.74 0.62 2.91 -0.03 25382 3.04 2.95 Square 0.080178 R 3.01 3 2.99 2.92 2.9 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 %C-29.23 6.85 1.07 0.95 35.560.30 25388 33.49 41.17 Observations43.74 11 4.51 1.59 4.37 33.58 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 delta 13C -21.85 -21.11 0.45 4.72 -28.054.07 25392 -29.56 -27.32 ANOVA -27.50 -22.68 -24.58 -21.06 -29.44 C1 ALG04 2.98 37.90 delta 13C_ca -27.42 -26.96 -20.65 1.36 1.21 -27.590.56 25394 -29.10 c -26.86 -27.04 df SS -22.22 MS F -24.12 Significance F -20.60 -28.98 C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 %N 0.48 1.17 2.30 1.68 1.97 Total 1.3610 35.55962 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% delta 15N_ca -1.62 -0.06 0.14 2.06 Intercept -4.297428 4.671099 3.66 0.34 -2.34 -2.17 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 -0.03 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 4.00 3.00 2.00 1.00 Series1 0.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 -1.00 -2.00 -3.00 12  
  • 13. Random  stats  output   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
  • 14. DATA HANGOVER What  happened?   From  Flickr  by  SteveMcN  
  • 15. Where  data  end  up   From  Flickr  by  diylibrarian   www blog.order2disorder.com   From  Flickr  by  csessums   Data   Metadata   From  Flickr  by  csessums   Recreated  from  Klump  et  al.  2006  
  • 16. Who  cares?     From  Flickr  by  Redden-­‐McAllister   From  Flickr  by  AJC1   www.rba.gov.au  
  • 17. Where  data  end  up   From  Flickr  by  diylibrarian   www Data   www Metadata   From  Flickr  by  torkildr   Recreated  from  Klump  et  al.  2006  
  • 18. Data   Reuse   Data   Sharing   Data   Management  
  • 19. Trends  in  Data  Archiving   Journal  publishers   Joint  Data  Archiving  Agreement  
  • 20. Trends  in  Data  Archiving   Journal  publishers   Joint  Data  Archiving  Agreement     Data  Papers   Ecological  Archives,  Beyond  the  PDF  
  • 21. Trends  in  Data  Archiving   Journal  publishers   Joint  Data  Archiving  Agreement     Data  Papers  etc.   Ecological  Archives,  Beyond  the  PDF     Funders   Data  management  requirements    
  • 22. Roadmap   4.  Toolbox     3.  How  to  improve   2.  Mistakes  we  make   1.  Background    
  • 23. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse  
  • 24. 2.  Data  collection  &  organization   Create  unique  identifiers   •  Decide  on  naming  scheme  early   •  Create  a  key   •  Different  for  each  sample   From  Flickr  by  zebbie   From  Flickr  by  sjbresnahan  
  • 25. 2.  Data  collection  &  organization   Standardize   •  Consistent  within  columns   – only  numbers,  dates,  or  text   •  Consistent  names,  codes,  formats   Modified  from  K.  Vanderbilt     From  Pink  Floyd,  The  Wall      themurkyfringe.com  
  • 26. 2.  Data  collection  &  organization   Standardize   •  Reduce  possibility   of  manual  error  by   constraining  entry   choices   Excel  lists   Data Google  Docs     Forms   validataion   Modified  from  K.  Vanderbilt    
  • 27. 2.  Data  collection  &  organization       Create  parameter  table   Create  a  site  table   From  doi:10.3334/ORNLDAAC/777   From  doi:10.3334/ORNLDAAC/777   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 28. 2.  Data  collection  &  organization    Use  descriptive  file  names  *   •  Unique   •  Reflect  contents   Bad:    Mydata.xls   Better:  Eaffinis_nanaimo_2010_counts.xls      2001_data.csv      best  version.txt   Study   Year   organism   Site   name   What  was   measured     *Not  for  everyone   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 29. 2.  Data  collection  &  organization   Organize  files    logically   Biodiversity   Lake   Experiments   Biodiv_H20_heatExp_2005to2008.csv   Biodiv_H20_predatorExp_2001to2003.csv   …   Field  work   Biodiv_H20_PlanktonCount_2001toActive.csv   Biodiv_H20_ChlAprofiles_2003.csv   …     Grassland   From  S.  Hampton  
  • 30. 2.  Data  collection  &  organization    Preserve  information   R  script  for  processing  &   analysis   •  Keep  raw  data  raw   •  Use  scripts  to  process  data      &  save  them  with  data   Raw  data  as  .csv  
  • 31. 2.  Data  collection  &  oAll  of  the  things  that   rganization   make  Excel  great  for   data  organization   are  bad  for  archiving!   What  to  do?   1.  Create  archive-­‐ready  raw  data   2.  Put  it  somewhere  special   3.  Have  your  fun  with  fancy  Excel  techniques   4.  Keep  archiving  in  mind  
  • 32. 3.  Quality  control  and  quality  assurance   Define  &  enforce  standards   Double  data  entry   Document  changes   Minimize  manual  data  entry   No  missing,  impossible,  or  anomalous  values   •  Perform  statistical  summaries   60   •  Use  illegal  data  filter   50   •  Look  for  outliers   40     30   20   10   0   0   5   10   15   20   25   30   35  
  • 33. 4.  Metadata  basics   What  is  metadata?  
  • 34. 4.  Metadata  basics   What  is  metadata?      Data  reporting     WHO  created  the  data?   WHAT  is  the  content  of  the  data  set?   WHEN  was  it  created?   WHERE  was  it  collected?   HOW  was  it  developed?   WHY  was  it  developed?  
  • 35. •  Scientific  context   4.  Metadata  basics   •  Scientific  reason  why  the  data  were   collected   •  What  data  were  collected   •  Digital  context   •  What  instruments  (including  model  &   •  Name  of  the  data  set   serial  number)  were  used   •  The  name(s)  of  the  data  file(s)  in  the  data   •  Environmental  conditions  during  collection   set   •  Where  collected  &  spatial  resolution  When   •  Date  the  data  set  was  last  modified   collected  &  temporal  resolution   •  Example  data  file  records  for  each  data   •  Standards  or  calibrations  used   type  file   •  Information  about  parameters   •  Pertinent  companion  files   •  How  each  was  measured  or  produced   •  List  of  related  or  ancillary  data  sets   •  Units  of  measure   •  Software  (including  version  number)   •  Format  used  in  the  data  set   used  to  prepare/read    the  data  set   •  Precision  &  accuracy  if  known   •  Data  processing  that  was  performed   •  Information  about  data   •  Personnel  &  stakeholders   •  Definitions  of  codes  used   •  Who  collected     •  Quality  assurance  &  control  measures   •  Who  to  contact  with  questions   •  Known  problems  that  limit  data  use  (e.g.   •  Funders   uncertainty,  sampling  problems)     •  How  to  cite  the  data  set  
  • 36. 4.  Metadata  basics   What  is  a  metadata  standard?   •  Provides  structure  to  describe  data   Common  terms    |    definitions    |    language    |    structure   •  Lots  of  different  standards    EML  ,  FGDC,  ISO19115,  DarwinCore,…     •  Tools  for  creating  metadata  files    Morpho  (EML),  Metavist  (FGDC),  NOAA  MERMaid  (CSGDM)    
  • 37. 4.  Metadata  basics   What  does  a  metadata  record  look  like?  
  • 38. 5.  Workflows   Simplest  workflows:  commented  scripts,  flow  charts   Temperature   data   Data  import  into  R   Data  in  R   Salinity                 format   data   Quality  control  &   “Clean”  T   data  cleaning   &  S  data   Analysis:  mean,  SD   Summary   statistics   Graph  production  
  • 39. 5.  Workflows   Fancy  Schmancy:  Kepler   Resulting  output   https://kepler-­‐project.org  
  • 40. 5.  Workflows   Workflows  enable     From  Flickr  by  merlinprincesse   Reproducibility    can  someone  independently  validate  findings?   Transparency      others  can  understand  how  you  arrived  at  your  results   Executability      others  can  re-­‐run  or  re-­‐use  your  analysis    
  • 41. 6.  Data  stewardship  &  reuse   From  Flickr  by  greensambaman   The 20-Year Rule The  metadata  accompanying  a   data  set  should  be  written  for  a   user  20  years  into  the  future   RULE       (National  Research  Council  1991)  
  • 42. 6.  Data  stewardship  &  reuse   Use  stable  formats      csv,  txt,  tiff   Create  back-­‐up  copies     original,  near,  far   Periodically  test  ability  to  restore  information   Modified from R. Cook  
  • 43. 6.  Data  stewardship  &  reuse   Where  do  I  put  it?   Insitutional  archive   Discipline/specialty  archive   DataCite  list  of  repostiories:    www.datacite.org/repolist         From  Flickr  by  torkildr  
  • 44. 6.  Data  stewardship  &  reuse   Data  Citation:  Why  everyone  should  do  it   Allow  readers  to  find  data  products   Get  credit  for  data  and  publications   Promote  reproducibility   Better  measure  of  research  impact   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  morphological   diversification  in  the  absence  of  a  detailed  phylogeny:  a  case  study  from   characiform  fishes.  Dryad  Digital  Repository.  doi:10.5061/dryad.20     Learn  more  at  www.datacite.org   Modified from R. Cook  
  • 45. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   7.  Planning  
  • 46. 1.  Planning   What  is  a  data  management  plan?   A  document  that  describes  what  you  will  do  with  your  data   during  and  after  you  complete  your  research   DATA HANGOVER
  • 47. 1.  Planning   Why  should  I  prepare  a  DMP?       Saves  time   Increases  efficiency   Easier  to  use  data       Others  can  understand  &  use  data   Credit  for  data  products   Funders  require  it    
  • 48. NSF  DMP  Requirements   From  Grant  Proposal  Guidelines:    DMP  supplement  may  include:   1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum   materials,  and  other  materials  to  be  produced  in  the  course  of  the  project   2.   the  standards  to  be  used  for  data  and  metadata  format  and  content  (where   existing  standards  are  absent  or  deemed  inadequate,  this  should  be   documented  along  with  any  proposed  solutions  or  remedies)   3.   policies  for  access  and  sharing  including  provisions  for  appropriate   protection  of  privacy,  confidentiality,  security,  intellectual  property,  or  other   rights  or  requirements   4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of   derivatives   5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for   preservation  of  access  to  them  
  • 49. 1.  Types  of  data  &  other  information   •  Types  of  data  produced   •  Relationship  to  existing  data   •  How/when/where  will  the  data  be  captured  or   created?   C.  Strasser   •  How  will  the  data  be  processed?   •  Quality  assurance  &  quality  control  measures   •  Security:  version  control,  backing  up   biology.kenyon.edu   •  Who  will  be  responsible  for  data  management   during/after  project?   From  Flickr  by  Lazurite  
  • 50. 2.  Data  &  metadata  standards   •  What  metadata  are  needed  to  make  the  data  meaningful?   •  How  will  you  create  or  capture  these  metadata?     Wired.com   •  Why  have  you  chosen  particular  standards  and  approaches   for  metadata?  
  • 51. 3.  Policies  for  access  &  sharing   4.  Policies  for  re-­‐use  &  re-­‐distribution   •  Are  you  under  any  obligation  to  share  data?     •  How,  when,  &  where  will  you  make  the  data  available?     •  What  is  the  process  for  gaining  access  to  the  data?     •  Who  owns  the  copyright  and/or  intellectual  property?   •  Will  you  retain  rights  before  opening  data  to  wider  use?  How  long?   •  Are  permission  restrictions  necessary?   •  Embargo  periods  for  political/commercial/patent  reasons?     •  Ethical  and  privacy  issues?   •  Who  are  the  foreseeable  data  users?   •  How  should  your  data  be  cited?  
  • 52. 5.  Plans  for  archiving  &  preservation   •  What  data  will  be  preserved  for  the  long  term?  For  how  long?       •  Where  will  data  be  preserved?   •  What  data  transformations  need  to  occur  before   preservation?   •  What  metadata  will  be  submitted   alongside  the  datasets?   •  Who  will  be  responsible  for  preparing   data  for  preservation?  Who  will  be  the   main  contact  person  for  the  archived   data?   From  Flickr  by  theManWhoSurfedTooMuch  
  • 53. Don’t  forget:  Budget   •  Costs  of  data  preparation  &  documentation   Hardware,  software   Personnel   Archive  fees   •  How  costs  will  be  paid     Request  funding!   dorrvs.com  
  • 54. NSF’s  Vision*   DMPs  and  their  evaluation  will  grow  &  change  over  time   (similar  to  broader  impacts)   Peer  review  will  determine  next  steps   Community-­‐driven  guidelines     –  Different  disciplines  have  different  definitions  of  acceptable   data  sharing   –  Flexibility  at  the  directorate  and  division  levels   –  Tailor  implementation  of  DMP  requirement   Evaluation  will  vary  with  directorate,  division,  &  program   officer     *Unofficially   Help  from  Jennifer  Schopf,  NSF  
  • 55. NSF’s  Vision*   DMPs  are  a  good  first  step  towards  improving  data   stewardship   –  starting  discussion   –  scientists  learning  about  data  management   Additional  expertise  on  panels  to  effectively  evaluate   DMPs  (?)   Working  group  will  assess  outcomes     *Unofficially      
  • 56. Roadmap   4.  Toolbox     3.  How  to  improve   2.  Mistakes  we  make   1.  Background    
  • 57. DMPTool:          dmp.cdlib.org   Step-­‐by-­‐step  wizard  for  generating  DMP   Create    |    edit    |    re-­‐use    |    share    |    save    |    generate     Open  to  community     Links  to  institutional  resources   Directorate  information  &  updates  
  • 58. E-­‐notebooks   •  NoteBook   •  ORNL  eNote     •  Evernote   •  Google  Docs   •  Blogs   •  wikis   •  TheLabNotebook.com   •  iPad  ELN   •  NoteBookMaker   iPad ELN, the flexible electronic laboratory notebook TheLabNotebook.com"
  • 59. CDL  Services  for  UC  Community   Where   should  I  put   Data  Repository   my  data?   Deposit    |    Manage    |    Share    |    Preserve   www.cdlib.org/services/uc3  
  • 60. CDL  Services  for  UC  Community   Create  &  manage  persistent  identifiers   •  Precise  identification  of  a  dataset   •  Credit  to  data  producers  and  data  publishers   •  A  link  from  the  traditional  literature  to  the  data   •  Research  metrics  for  datasets   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  morphological   diversification  in  the  absence  of  a  detailed  phylogeny:  a  case  study  from   characiform  fishes.  Dryad  Digital  Repository.  doi:10.5061/dryad.20     www.cdlib.org/services/uc3  
  • 61. Why  are  you   promoting   Excel?   •  Open  source  add-­‐in   •  Facilitate  data  management,  sharing,  archiving  for  scientists   •  Part  of  DataONE  investigator  toolkit   •  Collecting  requirements  for  add-­‐in  from  scientists,  data   centers,  libraries   dcxl.cdlib.org     Funders:  Gordon  and  Betty  Moore  Foundation,  Microsoft  Research  
  • 62. Why  are  you   promoting   Excel?   •  Everyone  uses  it   •  Features  that  make  it  good  for  data  organization  make  it   bad  for  archiving   •  Stopgap  measure  
  • 63. B   A   C  
  • 64. www.dataone.org   •  Data  Education  Tutorials   •  Database  of  best  practices     &  software  tools   •  Links  to  DMPTool   •  Primer  on  data  management   From  Flickr  by  Robert  Hruzek  
  • 65. Data Management 101" dcxl.cdlib.org   •  Data  Education  Tutorials   •  Primer  on  data  management   •  Other  resources  
  • 66. Toolbox:    DCXL  blog:  dcxl.cdlib.org  
  • 67. Lisa  Federer     dcxl.cdlib.org   @dcxlCDL   www.facebook.com/DCXLatCDL   www.carlystrasser.net   carlystrasser@gmail.com   @carlystrasser