SlideShare une entreprise Scribd logo
1  sur  55
From  Calisphere  via  California  State  University  Libraries,  	
                        ark:/13030/c818356g	




                                                       Data

                                    A  Scientist’s  




  @carlystrasser	
  Carly  Strasser	
                                    Perspective	
                                                       Management
Roadmap


                           5.  Landscape	
                    4.  Solutions	
                           	
              3.  Barriers	
                    	
        2.  How  Bad  Is  It?	
1.  A  Brief  History	
                                             C.  Strasser
A Brief
From  Calisphere  via  Santa  Clara  University,    




                                                                                 History of
                                                                                 Data
ark:/13030/kt696nc7j2	




                                                                                 Collection

                                                       Or…  how  scientists  came  to  be  so  
                                                          bad  at  data  management
The  lab/field  notebook	
                                               Curie	
                                 Newton	




                                            Darwin	
        Da  Vinci	




classicalschool.blogspot.com
From  Flickr  by    DW0825	
                                                                                   From  Flickr  by  Flickmor	




                                           From  Flickr  by    deltaMike	
                                                                                                                                 The  lab/field  notebook…?	




                                www.woodrow.org	
                                                                  C.  Strasser	




                                                                                                                  Courtesey  of  WHOI	
 From  Flickr  by  US  Army  Environmental  Command
Digital  data	
     +  	
 Complex  
workflows
Data	
                        Models	

               Maximum  
               Likelihood  
               estimation	



                Matrix  
                Models	



    Images	
     Tables	
     Paper
From  Flickr  by  stevecadman	
                                  The Wide World of Data
Data  Types
Dimensions of Data

        Datum	

       Data  file	
       Metadata	

       Dataset	

    Data  collection	

   Data  repository
Data Diversity
Temporal                   File  size	
                Units	
                  File  
structure	
                                     organization	
                      File  type	
Documentatio         Spatial	
  n  extent	
       structure	
      Metadata	
  Collection   Codes	
           Project  intent	
  practices	
                      Analysis
Big Data


OSTP  March  2012	
Big  Data  Effort  Launched
The  
                                    LiDle  
                                    Guys?	




From  Flickr  by  jason  tinder
The Long Tail


 Size  of  
dataset	
   or	
grant  ($)	



                     #  datasets	
                 or  #  researchers	
                   or  #  grants
From  Flickr  by  Old  Shoe  Woman	




                              The
                              Fallout
UGLY TRUTH
                                         Many  (most?)  
                                         researchers…  	
5shortessays.blogspot.com	
                                         	
                                                     	
                        are  not  taught  data  management	
                        don’t  know  what  metadata  are	
                        can’t  name  data  centers  or  repositories	
                        don’t  share  data  publicly  or  store  it  in  an  archive	
                        aren’t  convinced  they  should  share  data
Information  Entropy  	
 Fig.  1  of  Michener  et  al.  1997
A Real Life
 Example
2  tables	
                               Random  notes	

C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab    Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore          Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26         -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26           0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                                       From  Stephanie  Hampton
Wash  Cres  Lake  Dec  15  Dont_Use.xls	
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab    Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore          Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26         -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26           0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                                       From  Stephanie  Hampton
Random  stats  output	


C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                               Peter's lab              Don't use - old data
                         Sample Type: Algal                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                     13                                                   15
                     Reference statistics: SD for delta C = 0.07                              SD for delta N = 0.15


          Position        SampleID        Weight (mg)      %C      delta 13C   delta 13C_ca        %N          delta 15N   delta 15N_ca Spec. No.
         A1                           ref    0.98         38.27     -25.05         -24.59          1.96           4.12          3.47     25354
         A2                           ref    0.98         39.78     -25.00         -24.54          2.03           4.01          3.36     25356
         A3                           ref    0.98         40.37     -24.99         -24.53          2.04           4.09          3.44     25358
         A4                           ref    1.01         42.23     -25.06         -24.60          2.17           4.20          3.55     25360          Shore                    Avg Con
         A5          ALG01                   3.05         1.88      -24.34         -23.88          0.17          -1.65         -2.30     25362      c       -1.26                   -27.22
         A6          Lk Outlet Alg           3.06         31.55     -30.17         -29.71          0.92           0.87          0.22     25364               1.26                     0.32
         A7          ALG03                   2.91         6.85      -21.11         -20.65          0.48          -0.97         -1.62     25366      c
         A8          ALG05                   2.91         35.56     -28.05         -27.59          2.30           0.59         -0.06     25368
         A9          ALG07                   3.04         33.49     -29.56         -29.10          1.68           0.79          0.14     25370
         A10         ALG06                   2.95         41.17     -27.32         -26.86          1.97           2.71          2.06     25372
         B1          ALG04                   3.01         43.74     -27.50         -27.04          1.36           0.99          0.34     25374      c               SUMMARY OUTPUT
         B2          ALG02                     3          4.51      -22.68         -22.22          0.34           4.31          3.66     25376
         B3          ALG01                   2.99         1.59      -24.58         -24.12          0.15          -1.69         -2.34     25378      c                Regression Statistics
         B4          ALG03                   2.92         4.37      -21.06         -20.60          0.34          -1.52         -2.17     25380      c               Multiple R 0.283158
         B5          ALG07                    2.9         33.58     -29.44         -28.98          1.74           0.62         -0.03     25382                      R Square 0.080178
         B6                           ref    1.01         44.94     -25.00         -24.54          2.59           3.96          3.31     25384                      Adjusted R Square
                                                                                                                                                                                -0.022024
         B7                           ref    0.99         42.28     -24.87         -24.41          2.37           4.33          3.68     25386                      Standard Error
                                                                                                                                                                                 1.906378
         B8          Lk Outlet Alg           3.04         31.43     -29.69         -29.23          1.07           0.95          0.30     25388                      Observations         11
         B9          ALG06                   3.09         35.57     -27.26         -26.80          1.96           2.79          2.14     25390
         B10         ALG02                   3.05         5.52      -22.31         -21.85          0.45           4.72          4.07     25392                      ANOVA
         C1          ALG04                   2.98         37.90     -27.42         -26.96          1.36           1.21          0.56     25394      c                                df         SS      MS        F Significance F
         C2          ALG05                   3.04         31.74     -27.93         -27.47          2.40           0.73          0.08     25396                      Regression             1 2.851116 2.851116 0.784507 0.398813
         C3                           ref    0.99         38.46     -25.09         -24.63          2.40           4.37          3.72     25398                      Residual               9 32.7085 3.634278
                                                          23.78                                    1.17                                                             Total                 10 35.55962

                                                                                                                                                                              Coefficients
                                                                                                                                                                                        Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                  Upper 95.0%
                                                                                                                                                                    Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341
                                                                                                                                                                    X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569




                                                                                                                                                                                                        From  Stephanie  Hampton
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                          Peter's lab          Don't use - old data
                         Sample Type: Algal                                                                                                                        Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                      15
                     Reference statistics: SD for delta        C = 0.07                            SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C delta 13C_ca        %N                delta 15N delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05       -24.59         1.96                  4.12        3.47       25354
         A2                            ref    0.98              39.78      -25.00       -24.54         2.03                  4.01        3.36       25356
         A3                            ref    0.98              40.37      -24.99       -24.53         2.04                  4.09        3.44       25358
         A4                            ref    1.01              42.23      -25.06       -24.60         2.17                  4.20        3.55       25360          Shore                Avg Con
         A5          ALG01                    3.05              1.88       -24.34       -23.88         0.17                 -1.65       -2.30       25362 c            -1.26               -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17       -29.71         0.92                  0.87        0.22       25364               1.26                 0.32
         A7          ALG03                    2.91              6.85       -21.11       -20.65         0.48                 -0.97       -1.62       25366 c
         A8          ALG05                    2.91              35.56      -28.05       -27.59         2.30                  0.59       -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56       -29.10         1.68                  0.79        0.14       25370
         A10         ALG06                    2.95              41.17      -27.32       -26.86         1.97                  2.71        2.06       25372
         B1          ALG04                    3.01              43.74      -27.50       -27.04         1.36                  0.99        0.34       25374 c                    SUMMARY OUTPUT
         B2          ALG02                      3               4.51            SampleID
                                                                           -22.68       -22.22        ALG03
                                                                                                       0.34               ALG05
                                                                                                                             4.31        3.66         ALG07
                                                                                                                                                    25376           ALG06            ALG04            ALG02                ALG01                  ALG03           ALG07
         B3          ALG01                    2.99              1.59       -24.58       -24.12         0.15                 -1.69       -2.34       25378 c                 Regression Statistics
         B4          ALG03                    2.92              4.37       -21.06       -20.60         0.34                 -1.52       -2.17       25380 c                Multiple R 0.283158
         B5          ALG07                     2.9              33.58         Weight (mg)
                                                                           -29.44       -28.98          2.91
                                                                                                       1.74                  0.62    2.91
                                                                                                                                        -0.03       25382 3.04          2.95 Square 0.080178
                                                                                                                                                                           R            3.01                     3                  2.99               2.92                  2.9
         B6                            ref    1.01              44.94      -25.00       -24.54         2.59                  3.96        3.31       25384                  Adjusted R Square
                                                                                                                                                                                       -0.022024
         B7                            ref    0.99              42.28      -24.87       -24.41         2.37                  4.33        3.68       25386                  Standard Error
                                                                                                                                                                                        1.906378
         B8          Lk Outlet Alg            3.04              31.43      -29.69 %C-29.23              6.85
                                                                                                       1.07                  0.95   35.560.30       25388 33.49        41.17
                                                                                                                                                                           Observations43.74    11              4.51                1.59              4.37               33.58
         B9          ALG06                    3.09              35.57      -27.26       -26.80         1.96                  2.79        2.14       25390
         B10         ALG02                    3.05              5.52       -22.31
                                                                                 delta 13C
                                                                                        -21.85
                                                                                                       -21.11
                                                                                                       0.45                  4.72
                                                                                                                                   -28.054.07       25392
                                                                                                                                                          -29.56       -27.32
                                                                                                                                                                           ANOVA
                                                                                                                                                                                 -27.50                        -22.68             -24.58             -21.06             -29.44
         C1          ALG04                    2.98              37.90         delta 13C_ca
                                                                           -27.42       -26.96         -20.65
                                                                                                       1.36                  1.21  -27.590.56       25394 -29.10
                                                                                                                                                             c         -26.86    -27.04
                                                                                                                                                                                    df              SS         -22.22
                                                                                                                                                                                                                  MS  F           -24.12
                                                                                                                                                                                                                               Significance F        -20.60             -28.98
         C2          ALG05                    3.04              31.74      -27.93       -27.47         2.40                  0.73        0.08       25396                  Regression          1 2.851116 2.851116 0.784507 0.398813
         C3                            ref    0.99              38.46      -25.09       -24.63         2.40                  4.37        3.72       25398                  Residual            9 32.7085 3.634278
                                                                23.78             %N                    0.48
                                                                                                       1.17                          2.30                 1.68          1.97
                                                                                                                                                                           Total          1.3610 35.55962 0.34                0.15                     0.34                  1.74
                                                                              delta 15N                  -0.97                       0.59                 0.79          2.71              0.99                 4.31                -1.69              -1.52                  0.62
                                                                                                                                                                                         Coefficients
                                                                                                                                                                                                   Standard Error t Stat  P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                              Upper 95.0%
                                                                             delta 15N_ca                -1.62                      -0.06                 0.14          2.06
                                                                                                                                                                           Intercept       -4.297428 4.671099 3.66
                                                                                                                                                                                            0.34                                    -2.34              -2.17
                                                                                                                                                                                                                -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341      -0.03
                                                                                                                                                                               X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569




                                                                                                                                                                                                                                                   4.00



                                                                                                                                                                                                                                                   3.00



                                                                                                                                                                                                                                                   2.00



                                                                                                                                                                                                                                                   1.00

                                                                                                                                                                                                                                                                      Series1

                                                                                                                                                                                                                                                   0.00
                                                                              -35.00                  -30.00                       -25.00                -20.00                 -15.00                  -10.00                  -5.00                  0.00

                                                                                                                                                                                                                                                  -1.00



                                                                                                                                                                                                                                                  -2.00



                                                                                                                                                                                                                                                  -3.00
The  Fallout	

                   Data  
                   Reuse	


                    Data  
                   Sharing	


                    Data  
                 Management
From  Flickr  by  iowa_spirit_walker	


                                         Stewardship
                                         Why? Hurdles to Data
From  Flickr  by  indigoprime	
                                Barriers: Cost




                                  From  Flickr  by  kobiz7	
    C.  Strasser
Barriers: Sociocultural
                          From  Flickr  by  freefotouk	




Not  the  norm
Barriers: Sociocultural

  Not  the  norm	
                  	
   Lack  of  /  too  
many  standards
Barriers: Sociocultural

  Not  the  norm	
                  	
   Lack  of  /  too                                           From  Flickr  by  toucanradio	



many  standards	
                  	
 Disparate  data	
      From  Flickr  by  Chris  Campbell
Barriers: Sociocultural
                            From  Flickr  by  uniinnsbruck	


  Not  the  norm	
                  	
   Lack  of  /  too  
many  standards	
                  	
 Disparate  data	
                  	
Lack  of  training
From  Flickr  by  Christina  Ann  
                      VanMeter	




    Missed  
    opportunities	
                                                                                          Loss  of  rights  or  benefits	




From  Flickr  by  pnh	
                                                                                                              Barriers: Sociocultural

                                                                           Conflict	




                                                           From  Flickr  by  tymesynk	
                          Misuse
Barriers: Sociocultural
                              Lack  of  incentives	

                                                       Time  
                                                       consuming  &  
                                                       expensive	
                                                       	
                                                       No  
From  Flickr  by  bthomso	




                                                       requirements	
                                                       	
                                                       Reward  
                                                       structure
From  Flickr  by    MarqueYe  University	




generation?
But what about the next
Are Undergrads Learning About
Data Management?
  Survey  of  Undergraduate  Ecology  courses	

                 38  Research  focused	
   48  schools	
                 10  Education  focused	

  Schools  selected…	
  •  One  of  top  grad  schools  for  Ecology	
  •  Most  recipients  of  NSF  GRFP
Are Undergrads Learning About
Data Management?
Strasser  and  Hampton,  In  press  at  Ecosphere	
                     Metadata  generation	
                      Software  choice	
                        File  naming	
                           QAQC	
                         Backing  up  	
                         Workflows	
                        Data  sharing	
                         Data  re-­‐‑use	
                       Meta-­‐‑analysis	
                      Reproducibility	
                     Notebook  protocols	
                         Databases  
Are Undergrads Learning About
Data Management?
       Quality control &                                  Yes
             assurance                                    No

  Naming computer files

        Types of files &
        software to use
   Metadata generation

             Workflows

        Protecting data

      Databases & data
             archiving

            Data re-use

         Meta-analysis

           Data sharing

         Reproducibility

    Notebook protocols

                           0   20   40          60   80         100
                                         Percent
Are Undergrads Learning About
Data Management?
                                     4.5

                                     4.0
     Importance for Undergraduates



                                     3.5

                                     3.0

                                     2.5

                                     2.0

                                                                       r = 0.306
                                     1.5
                                                                        p < 0.05

                                     1.0
                                        1.0   1.5   2.0   2.5   3.0   3.5   4.0    4.5
                                                    Value as a Researcher
Are Undergrads Learning About
                         Data Management?
                          70
                                     Time	
                          60
                                                 Not  appropriate  
                                                 at  this  level	
                          50
                                                                    Covered  in  lab	
Percent Citing Barrier




                          40
                                                                                  Students  not                  Class  too  
                                                                                  prepared	
                     large	
 Instructor  
                          30
                                                                                                 Funding/                       not  prepared	
                                                                                                 resources	
                          20
                                                                                                                                                 Covered  in  
                          10
                                                                                                                                                 other  courses	


                            0
                                       e                  el            ab             ed              es                e                ed               es
                                   Tim                lev          in l            par              urc             larg              par              urs
                                                  his           ed              re              eso            too                 re              r co
                                             at t            ver           ot p            g/ r             ss                ot p              the
                                          te              co             sn             in              Cla              or n                 o
                                     pria             be             ent           und                                uct                d in
                                 pro           oul
                                                    d
                                                               Stu
                                                                   d             F
                                                                                                                Ins
                                                                                                                   tr                 re
                            t ap            Sh                                                                                 C ove
                         No
Solutions
Preach the Benefits
Short  term	
     •  Spend  less  time  on  DM  &  more  on  research	
     •  Easier  to  use  data	
     •  Collaborators  can  understand  &  use  data	

Long  term	
     •  Scientists  outside  project  can  find,  understand  &  
        use  data	
     •  Credit  for  data  products  &  their  use	
     •  Funders  protect  their  investment	
  
Educate
DataONE  Best  Practices  Lessons	
What  is  the  Data  Life  Cycle?  	
Data  management  planning  	
Data  collection,  entry  &  manipulation  	
Data  quality  control  &  quality  assurance  	
Data  protection:  Data  backups  	
Data  preservation,  curation  &  archiving  	
Data  documentation:   	
What  is  metadata?	
                                Value  of  metadata  	
    	
    	
    	
    	
    	
How  to  write  good  metadata  	
Data  management:  Data  sharing  	
Data  citation  practices  	
Collaboration/communication  technologies  	
Data  analysis:  Workflows  and  other  tools  
Primer
Use New Technology
E-­‐‑notebooks	
•  NoteBook	
•  ORNL  eNote  	
•  Evernote	
•  Google  Docs	
•  Blogs	
•  wikis	
•  TheLabNotebook.com	
•  NoteBookMaker	
	
Workflows	
         TheLabNotebook.com!
•  Kepler	
•  Taverna
Use New Technology
C.  Strasser	




The Current Landscape
Data are being recognized as
first class products of research




                             From  Flickr  by  Richard  Moross
Publishing Data




From  Flickr  by  Richard  Moross
From  Flickr  by  Richard  Moross	
                  Citing Data



Example:	
Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  
morphological  diversification  in  the  absence  of  a  detailed  phylogeny:  a  
case  study  from  characiform  fishes.  Dryad  Digital  Repository.  
doi:10.5061/dryad.20
From  Flickr  by  Richard  Moross	
Data Management
     Requirements

Journal  publishers
From  Flickr  by  Richard  Moross	
Data Management
     Requirements

 Journal  publishers	
 	
 	
 Funders
Plan	

      Analyze	
                                       Collect	




Integrate	
                                                 Assure	


              From	
  Flickr	
  by	
  darkuncle	
  




     Discover	
                                       Describe	

                            Preserve
What  is  a  
                      Will  I  get                                                    data  
                    credit  for  my  
                       work?	
                                                              Plan	
               management  
                                                                                      plan?	


                              Analyze	
                                         Collect	
What  tools  do  
                                                                                                       Are  there  
  I  use?	
                                                                                                      standards?	



                Integrate	
                                                             Assure	
                                                                                                     What  is  
                                                       How  much                                    metadata?	
  Who  can  help                                       will  it  cost?	
                                        From	
  Flickr	
  by	
  darkuncle	
  
     me?	

                              Discover	
                                        Describe	

                Where  do  I                          Preserve	
                      How  do  I  
                preserve  my                                                         preserve  my  
                   data?	
                                                              data?
B	




A	
         C
NSF  funded  DataNet  Project	
Office  of  Cyberinfrastructure	

                                    Community  
       Cyberinfrastructure	
       Engagement  &  
                                     Outreach	




                                  Courtesy  of  DataONE
My  website	
   carlystrasser.net	
 Email  me	
    carlystrasser@gmail.com	
 Tweet  me	
    @carlystrasser  	
 My  slides	
   slideshare.net/carlystrasser	
 CDL  Blog	
    datapub.cdlib.org

Contenu connexe

Similaire à Data Management: Scientist Perspective - UC3 Data Curation Workshop

UCLA: Data Management for Scientists
UCLA: Data Management for ScientistsUCLA: Data Management for Scientists
UCLA: Data Management for ScientistsCarly Strasser
 
Data Management from a Scientist's Perspective
Data Management from a Scientist's PerspectiveData Management from a Scientist's Perspective
Data Management from a Scientist's PerspectiveCarly Strasser
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsCarly Strasser
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCarly Strasser
 
UCLA: Data Management for Librarians
UCLA: Data Management for LibrariansUCLA: Data Management for Librarians
UCLA: Data Management for LibrariansCarly Strasser
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Carly Strasser
 
Data Management Planning and the DMPTool
Data Management Planning and the DMPToolData Management Planning and the DMPTool
Data Management Planning and the DMPToolCarly Strasser
 
Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current LandscapeCarly Strasser
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsCarly Strasser
 
Data Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFData Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFCarly Strasser
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsCarly Strasser
 
Keeping Up with Data
Keeping Up with Data Keeping Up with Data
Keeping Up with Data AbigailGoben
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730jeffreylancaster
 
Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Carly Strasser
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014Carly Strasser
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishCarly Strasser
 
DataUp: Data Curation for Excel
DataUp: Data Curation for Excel DataUp: Data Curation for Excel
DataUp: Data Curation for Excel Carly Strasser
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekCarly Strasser
 

Similaire à Data Management: Scientist Perspective - UC3 Data Curation Workshop (20)

UCLA: Data Management for Scientists
UCLA: Data Management for ScientistsUCLA: Data Management for Scientists
UCLA: Data Management for Scientists
 
Data Management from a Scientist's Perspective
Data Management from a Scientist's PerspectiveData Management from a Scientist's Perspective
Data Management from a Scientist's Perspective
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for Scientists
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for Researchers
 
UCLA: Data Management for Librarians
UCLA: Data Management for LibrariansUCLA: Data Management for Librarians
UCLA: Data Management for Librarians
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012
 
Data Management Planning and the DMPTool
Data Management Planning and the DMPToolData Management Planning and the DMPTool
Data Management Planning and the DMPTool
 
Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current Landscape
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for Scientists
 
Data Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFData Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UF
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for Scientists
 
Keeping Up with Data
Keeping Up with Data Keeping Up with Data
Keeping Up with Data
 
Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730
 
Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data services
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or Perish
 
DataUp: Data Curation for Excel
DataUp: Data Curation for Excel DataUp: Data Curation for Excel
DataUp: Data Curation for Excel
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research Week
 

Plus de Carly Strasser

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeCarly Strasser
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015Carly Strasser
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceCarly Strasser
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyCarly Strasser
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataCarly Strasser
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopCarly Strasser
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Carly Strasser
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumCarly Strasser
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14Carly Strasser
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Carly Strasser
 
DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14Carly Strasser
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsCarly Strasser
 

Plus de Carly Strasser (20)

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of Change
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career Conference
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14sky
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories Workshop
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch Libraries
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science Workshop
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
 
Dash for IASSIST 2014
Dash for IASSIST 2014Dash for IASSIST 2014
Dash for IASSIST 2014
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience Symposium
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14
 
DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheets
 

Data Management: Scientist Perspective - UC3 Data Curation Workshop

  • 1. From  Calisphere  via  California  State  University  Libraries,    ark:/13030/c818356g Data A  Scientist’s    @carlystrasser Carly  Strasser Perspective Management
  • 2. Roadmap 5.  Landscape 4.  Solutions 3.  Barriers 2.  How  Bad  Is  It? 1.  A  Brief  History C.  Strasser
  • 3. A Brief From  Calisphere  via  Santa  Clara  University,     History of Data ark:/13030/kt696nc7j2 Collection Or…  how  scientists  came  to  be  so   bad  at  data  management
  • 4. The  lab/field  notebook Curie Newton Darwin Da  Vinci classicalschool.blogspot.com
  • 5. From  Flickr  by    DW0825 From  Flickr  by  Flickmor From  Flickr  by    deltaMike The  lab/field  notebook…? www.woodrow.org C.  Strasser Courtesey  of  WHOI From  Flickr  by  US  Army  Environmental  Command
  • 6. Digital  data +   Complex   workflows
  • 7. Data Models Maximum   Likelihood   estimation Matrix   Models Images Tables Paper
  • 8. From  Flickr  by  stevecadman The Wide World of Data
  • 10. Dimensions of Data Datum Data  file Metadata Dataset Data  collection Data  repository
  • 11. Data Diversity Temporal   File  size Units File   structure organization File  type Documentatio Spatial n  extent structure Metadata Collection   Codes Project  intent practices Analysis
  • 12. Big Data OSTP  March  2012 Big  Data  Effort  Launched
  • 13. The   LiDle   Guys? From  Flickr  by  jason  tinder
  • 14. The Long Tail Size  of   dataset or grant  ($) #  datasets or  #  researchers or  #  grants
  • 15. From  Flickr  by  Old  Shoe  Woman The Fallout
  • 16. UGLY TRUTH Many  (most?)   researchers…   5shortessays.blogspot.com are  not  taught  data  management don’t  know  what  metadata  are can’t  name  data  centers  or  repositories don’t  share  data  publicly  or  store  it  in  an  archive aren’t  convinced  they  should  share  data
  • 17. Information  Entropy   Fig.  1  of  Michener  et  al.  1997
  • 18. A Real Life Example
  • 19. 2  tables Random  notes C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton
  • 20. Wash  Cres  Lake  Dec  15  Dont_Use.xls C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton
  • 21. Random  stats  output C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 From  Stephanie  Hampton
  • 22. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 SampleID -22.68 -22.22 ALG03 0.34 ALG05 4.31 3.66 ALG07 25376 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 Weight (mg) -29.44 -28.98 2.91 1.74 0.62 2.91 -0.03 25382 3.04 2.95 Square 0.080178 R 3.01 3 2.99 2.92 2.9 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 %C-29.23 6.85 1.07 0.95 35.560.30 25388 33.49 41.17 Observations43.74 11 4.51 1.59 4.37 33.58 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 delta 13C -21.85 -21.11 0.45 4.72 -28.054.07 25392 -29.56 -27.32 ANOVA -27.50 -22.68 -24.58 -21.06 -29.44 C1 ALG04 2.98 37.90 delta 13C_ca -27.42 -26.96 -20.65 1.36 1.21 -27.590.56 25394 -29.10 c -26.86 -27.04 df SS -22.22 MS F -24.12 Significance F -20.60 -28.98 C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 %N 0.48 1.17 2.30 1.68 1.97 Total 1.3610 35.55962 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% delta 15N_ca -1.62 -0.06 0.14 2.06 Intercept -4.297428 4.671099 3.66 0.34 -2.34 -2.17 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 -0.03 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 4.00 3.00 2.00 1.00 Series1 0.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 -1.00 -2.00 -3.00
  • 23. The  Fallout Data   Reuse Data   Sharing Data   Management
  • 24. From  Flickr  by  iowa_spirit_walker Stewardship Why? Hurdles to Data
  • 25. From  Flickr  by  indigoprime Barriers: Cost From  Flickr  by  kobiz7 C.  Strasser
  • 26. Barriers: Sociocultural From  Flickr  by  freefotouk Not  the  norm
  • 27. Barriers: Sociocultural Not  the  norm Lack  of  /  too   many  standards
  • 28. Barriers: Sociocultural Not  the  norm Lack  of  /  too   From  Flickr  by  toucanradio many  standards Disparate  data From  Flickr  by  Chris  Campbell
  • 29. Barriers: Sociocultural From  Flickr  by  uniinnsbruck Not  the  norm Lack  of  /  too   many  standards Disparate  data Lack  of  training
  • 30. From  Flickr  by  Christina  Ann   VanMeter Missed   opportunities Loss  of  rights  or  benefits From  Flickr  by  pnh Barriers: Sociocultural Conflict From  Flickr  by  tymesynk Misuse
  • 31. Barriers: Sociocultural Lack  of  incentives Time   consuming  &   expensive No   From  Flickr  by  bthomso requirements Reward   structure
  • 32. From  Flickr  by    MarqueYe  University generation? But what about the next
  • 33. Are Undergrads Learning About Data Management? Survey  of  Undergraduate  Ecology  courses 38  Research  focused 48  schools 10  Education  focused Schools  selected… •  One  of  top  grad  schools  for  Ecology •  Most  recipients  of  NSF  GRFP
  • 34. Are Undergrads Learning About Data Management? Strasser  and  Hampton,  In  press  at  Ecosphere Metadata  generation Software  choice File  naming QAQC Backing  up   Workflows Data  sharing Data  re-­‐‑use Meta-­‐‑analysis Reproducibility Notebook  protocols Databases  
  • 35. Are Undergrads Learning About Data Management? Quality control & Yes assurance No Naming computer files Types of files & software to use Metadata generation Workflows Protecting data Databases & data archiving Data re-use Meta-analysis Data sharing Reproducibility Notebook protocols 0 20 40 60 80 100 Percent
  • 36. Are Undergrads Learning About Data Management? 4.5 4.0 Importance for Undergraduates 3.5 3.0 2.5 2.0 r = 0.306 1.5 p < 0.05 1.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Value as a Researcher
  • 37. Are Undergrads Learning About Data Management? 70 Time 60 Not  appropriate   at  this  level 50 Covered  in  lab Percent Citing Barrier 40 Students  not   Class  too   prepared large Instructor   30 Funding/ not  prepared resources 20 Covered  in   10 other  courses 0 e el ab ed es e ed es Tim lev in l par urc larg par urs his ed re eso too re r co at t ver ot p g/ r ss ot p the te co sn in Cla or n o pria be ent und uct d in pro oul d Stu d F Ins tr re t ap Sh C ove No
  • 39. Preach the Benefits Short  term •  Spend  less  time  on  DM  &  more  on  research •  Easier  to  use  data •  Collaborators  can  understand  &  use  data Long  term •  Scientists  outside  project  can  find,  understand  &   use  data •  Credit  for  data  products  &  their  use •  Funders  protect  their  investment  
  • 40. Educate DataONE  Best  Practices  Lessons What  is  the  Data  Life  Cycle?   Data  management  planning   Data  collection,  entry  &  manipulation   Data  quality  control  &  quality  assurance   Data  protection:  Data  backups   Data  preservation,  curation  &  archiving   Data  documentation:   What  is  metadata? Value  of  metadata   How  to  write  good  metadata   Data  management:  Data  sharing   Data  citation  practices   Collaboration/communication  technologies   Data  analysis:  Workflows  and  other  tools  
  • 42. Use New Technology E-­‐‑notebooks •  NoteBook •  ORNL  eNote   •  Evernote •  Google  Docs •  Blogs •  wikis •  TheLabNotebook.com •  NoteBookMaker Workflows TheLabNotebook.com! •  Kepler •  Taverna
  • 45. Data are being recognized as first class products of research From  Flickr  by  Richard  Moross
  • 46. Publishing Data From  Flickr  by  Richard  Moross
  • 47. From  Flickr  by  Richard  Moross Citing Data Example: Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of   morphological  diversification  in  the  absence  of  a  detailed  phylogeny:  a   case  study  from  characiform  fishes.  Dryad  Digital  Repository.   doi:10.5061/dryad.20
  • 48. From  Flickr  by  Richard  Moross Data Management Requirements Journal  publishers
  • 49. From  Flickr  by  Richard  Moross Data Management Requirements Journal  publishers Funders
  • 50. Plan Analyze Collect Integrate Assure From  Flickr  by  darkuncle   Discover Describe Preserve
  • 51. What  is  a   Will  I  get   data   credit  for  my   work? Plan management   plan? Analyze Collect What  tools  do   Are  there   I  use? standards? Integrate Assure What  is   How  much   metadata? Who  can  help   will  it  cost? From  Flickr  by  darkuncle   me? Discover Describe Where  do  I   Preserve How  do  I   preserve  my   preserve  my   data? data?
  • 52. B A C
  • 53. NSF  funded  DataNet  Project Office  of  Cyberinfrastructure Community   Cyberinfrastructure Engagement  &   Outreach Courtesy  of  DataONE
  • 54.
  • 55. My  website carlystrasser.net Email  me carlystrasser@gmail.com Tweet  me @carlystrasser   My  slides slideshare.net/carlystrasser CDL  Blog datapub.cdlib.org