SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Reconstructing Provenance                         Sara Magliacane - VU University Amsterdam
                                                                                          Advisors: Paul Groth and Frank van Harmelen



                                   Problem Statement                                                                                                                        An initial prototype implementation
The provenance of a data item is the metadata describing how,                                                                                                      As a first step we focus on dependencies between files instead of
when and by whom the data item was produced.                                                                                                                       sequences of operations.

Provenance is crucial in many settings, but often it is not tracked,                                                                                               We implemented a prototype of the pipeline using open-source
resulting in collections of files with only basic filesystem                                                                                                       components, like Apache Lucene, Apache Tika and Dropbox API.
metadata, e.g. timestamps.                                                                                                                                         As signal detectors we used well-known similarity measures.

In this case, is it possible to reconstruct provenance post hoc?                                                                                              <2,4%      C*.7*2,.4491;%                             D672)A.4.4%E.1.*+521%                                          D672)A.4.4%C*F191;%                                                 G;;*.;+521%+1/%*+1H91;%
                                                                                                                                                                                                                                                                                                                                                                                                   !#$%


                                                                                                                                                                                                                                                                                         @9:).*%).-72*+:%                                                                                      !          "
                                                                                                                                                                        '()*+,)%-.)+/+)+%%                                   8.()%49-9:+*9)6%                                                                                                                           I.9;A)./%BF-%
                                                                                                                                                                                                                                                                                          91,2A.*.1,.%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                        !
                                                                                      @*A#<7"#A,#8,/#                                                                                                                                                                                         B9-9:+*9)6%
                                                                                                                                                                                                                                                                                                                                                                                               &      $#"%
                                                                                                                                                               &          01/.(%,21).1)%                                     0-+;.%49-9:+*9)6%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
                                                                                      9*5,#.":*597B*"C#                                                                                                                                                                                      )A*.4A2:/4%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                        "
                                                                                                                                                                          013.*%4.-+15,%                                     <2-+91=47.,9>,%                                             <2-+91=47.,9>,%
                                                                                                                                                                               )67.4%                                          49-9:+*9)6%                                                  >:).*91;%

              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                                  ?.)+/+)+%
                                                                                                                                                                                                                                  49-9:+*9)6%
              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#

              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




                                                           4,!5(
                                                          67"8#(
                                                                                   4,!5(
                                                                                 !"$"8$"!+(
                                                                                                           9"$"!+$"-#:
                                                                                                           !"$"8$"!+(
                                                                                                                                                                                        Initial (encouraging) results
                                 )#*+$#!,$)%!&'(
           !"!#$%!&'(
             =+",# #         #      #        #        #       #    #         #        #       #       #        #         #></*?,5#
                                                                                                                                                                   We performed an experiment with a small set of biomedical
                                                                       !,-)#$%!!)(                !,-)#$%!!)(                 !,-)#$%!!)(                          publications, annotated manually by two domain experts.
                                                                       ./01(                      ./31(                       ./21(


                                                                                                                                                                                                 Cluster 1: Blood Cultures                               Cluster 2: Markers                    Cluster 3: General
                                                                                                                                                                                                 EvidenceQ||                                             EvidenceQX                            Guideline




                                                                                                                                                                                !"#$#%&'(
                                                                                                                                                                                                                                                                             22




                                                                                                                                                                                                                        23                                              17




                                                                                                                                                                                                                                                          15                                                                  2                6                                 7




                                    Research Question                                                                                                                                                 13
                                                                                                                                                                                                               14            20




                                                                                                                                                                                                                             16                     21
                                                                                                                                                                                                                                                          18                 19




                                                                                                                                                                                                                                                                                     0




                                                                                                                                                                                                                                                                                         1
                                                                                                                                                                                                                                                                                                                                       4




                                                                                                                                                                                                                                                                                                                                           3       5
                                                                                                                                                                                                                                                                                                                                                                             8




                                                                                                                                                                                                                                                                                                                                                                                     9    10




                                                                                                                                                                                                                                                                                                                                                                                         11




                                                                                                                                                                                                                                      24                                                                                                                   12



     How can one automatically, accurately and efficiently                                                                                                                                                           5




     reconstruct a plausible provenance of files in a shared folder,                                                                                                                                                                                                                                                 23




                                                                                                                                                                                )"*+#,-*+(
                                                                                                                                                                                                                                               20                                                              17




     intended as the sequences of operations connecting the files?
                                                                                                                                                                                                                                                                                    19                                                                          7




                                                                                                                                                                                                               4                                                                                15                                                                  8




                                                                                                                                                                                             3                                                                                                                                    14




                                                                                                                                                                                                  2                                                                                                   18                                               9




                                                                                                                                                                                                           6                                                            22




                                                                                                                                                                                                                                                                                                                         21



                                                                                                                                                                                                                                                                   16




                                                                                                                                                                                                                                           0                                                              13                                                            10




                                                                                                                                                                                                                                                               1                                                                                                        11




                             Approach & Methodology
                                                                                                                                                                                                                                                                                                                                                           12




                                                                                                                                                                                                                                                                                                     24




                                                                                                                                                                                                      Cluster 1: Blood Cultures                                              Cluster 2: Markers                                        Cluster 3: General
                                                                                                                                                                                                      EvidenceQ||                                                            EvidenceQX                                                Guideline




     We propose a multi-signal pipeline approach that reconstructs                                                                                              F1-score of 0.49 for only text similarity
     plausible provenance traces using the contents of the files and                                                                                             F1-score of 0.70 for the aggregation of various similarities
     metadata as evidence of the relationships between files.

     The pipeline consists of four stages, each containing several
     components that can be executed in parallel:
                                                                                                                                                                                                                                           Future work
                                                             #$4:2-4#-';'<=>'

                                                                                                                                                #$%&'              Following the planned methodology, we will explore additional
8$#A'      @1-%1$#-AA)4,'               B&%$0C-A-A'D-4-1+E$4'      B&%$0C-A-A'@1F4)4,'             G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                            !           "
                                                                                                                                                                   components for each of the pipeline phases and consider also
           ./01+#0'*-0+2+0+''             6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'                  G,,1-,+0$1!'
                                                                                                                                                                   computational efficiency.
                                                                                                                                            (        )*+,-'
 !
 (          342-/'#$40-40'                6),4+7'8-0-#0$1('              6),4+7'9)70-1('                  G,,1-,+0$1('                 #$4:2-4#-';'<=?'

 "
                  5'                             5'                              5'                            ==='
                                                                                                                                            !
                                                                                                                                                #$%&'


                                                                                                                                                        "
                                                                                                                                                                                                                                      Bibliography
                                                                                                                                                 (                    (1) Sara Magliacane: Reconstructing Provenance, ISWC Doctoral
                                                                                                                                                                      Consortium 2012

        The research methodology is an iterative process, that will                                                                                                   (2) Paul Groth, Yolanda Gil, Sara Magliacane: Automatic Metadata
        incrementally integrate existing approaches in literature and                                                                                                 Annotation through Reconstructing Provenance, Third International
        evaluate the performance on benchmark corpora.                                                                                                                Workshop on the role of Semantic Web in Provenance Management,
                                                                                                                                                                      ESWC 2012
Advisors: Paul Groth and Frank van Harmelen



                            Problem Statement                                                                                              An initial prototype im
The provenance of a data item is the metadata describing how,                                                                        As a first step we focus on dependen
when and by whom the data item was produced.                                                                                         sequences of operations.

Provenance is crucial in many settings, but often it is not tracked,                                                                 We implemented a prototype of the p
resulting in collections of files with only basic filesystem                                                                         components, like Apache Lucene, Ap
metadata, e.g. timestamps.                                                                                                           As signal detectors we used well-kno

In this case, is it possible to reconstruct provenance post hoc?                                                                <2,4%   C*.7*2,.4491;%                   D672)A.4.4%E.1.*+521%                          D672)A.4.4%C*F


                                                                                                                                                                                                                             @9:).*%).-72*
                                                                                                                                        '()*+,)%-.)+/+)+%%                      8.()%49-9:+*9)6%
                                                                                                                                                                                                                              91,2A.*.1,
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                !
                                                                               @*A#<7"#A,#8,/#                                                                                                                                    B9-9:+*9)6%
                                                                                                                                 &       01/.(%,21).1)%                         0-+;.%49-9:+*9)6%
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
                                                                               9*5,#.":*597B*"C#                                                                                                                                 )A*.4A2:/4
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                "
                                                                                                                                         013.*%4.-+15,%                         <2-+91=47.,9>,%                              <2-+91=47.,
                                                                                                                                              )67.4%                              49-9:+*9)6%                                   >:).*91;%

        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                           ?.)+/+)+%
                                                                                                                                                                                     49-9:+*9)6%
        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#

        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




                                                  4,!5(
                                                 67"8#(
                                                                            4,!5(
                                                                          !"$"8$"!+(
                                                                                                   9"$"!+$"-#:
                                                                                                   !"$"8$"!+(
                                                                                                                                                        Initial (encouragin
                          )#*+$#!,$)%!&'(
      !"!#$%!&'(
        =+",# #       #       #       #      #       #       #        #        #       #       #      #      #></*?,5#
                                                                                                                                     We performed an experiment with a
                                                                 !,-)#$%!!)(               !,-)#$%!!)(            !,-)#$%!!)(        publications, annotated manually by
                                                                 ./01(                     ./31(                  ./21(


                                                                                                                                                             Cluster 1: Blood Cultures             Cluster 2: Markers              Cluster 3: G
                                                                                                                                                             EvidenceQ||                           EvidenceQX                      Guideline




                                                                                                                                                !"#$#%&'(
                                                                                                                                                                                                                   22




                                                                                                                                                                           23                                 17




                                                                                                                                                                                                    15




                              Research Question                                                                                                                13
                                                                                                                                                                    14          20




                                                                                                                                                                                16            21
                                                                                                                                                                                                    18             19




                                                                                                                                                                                                                         0




                                                                                                                                                                                                                             1




                                                                                                                                                                                         24
013.*%4.-+15,%                                      <2-+91=47.,9>,%                                                 <2-+91=47.,9>,%
                                                                                       Advisors: Paul Groth and Frank van )67.4%
                                                                                                                           Harmelen 49-9:+*9)6%                                                                                                                                          >:).*91;%

           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                              ?.)+/+)+%
                                                                                                                                                                                                                           49-9:+*9)6%
           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#




                               Problem Statement                                                                                                                   An initial prototype im
           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




The provenance of a data item is the metadata describing how,
                                                       4,!5(
                                                      67"8#(
                                                                                4,!5(
                                                                              !"$"8$"!+(
                                                                                                        9"$"!+$"-#:
                                                                                                        !"$"8$"!+(
                                                                                                                                                                         Initial (encouraging
                                                                                                                                                              As a first step we focus on dependenc
when !"!#$%!&'( whom the data item was produced.
     and by )#*+$#!,$)%!&'(                                                                                                                                   sequences of operations.
                                                                                                                                                              We performed an experiment with a sm
          =+",# #        #       #        #       #        #    #         #        #       #       #        #         #></*?,5#
                                                                    !,-)#$%!!)(                !,-)#$%!!)(                 !,-)#$%!!)(                        publications, annotated manually by tw
Provenance is crucial in many ./01(
                                  settings, but often it is ./21( tracked,
                                              ./31(          not                                                                                              We implemented a prototype of the pip
resulting in collections of files with only basic filesystem                                                                                                  components, like Apache Lucene, Apa
                                                                                                                                                                                          Cluster 1: Blood Cultures
                                                                                                                                                                                          EvidenceQ||
                                                                                                                                                                                                                                                  Cluster 2: Markers
                                                                                                                                                                                                                                                  EvidenceQX
                                                                                                                                                                                                                                                                                            Cluster 3: General
                                                                                                                                                                                                                                                                                            Guideline

metadata, e.g. timestamps.                                                                                                                                    As signal detectors we used well-know




                                                                                                                                                                        !"#$#%&'(
                                                                                                                                                                                                                                                                      22




                                                                                                                                                                                                                 23                                              17




In this case, is it possible to reconstruct provenance post hoc?                                                                                         <2,4%   C*.7*2,.4491;%                              D672)A.4.4%E.1.*+521%                 15
                                                                                                                                                                                                                                                                            D672)A.4.4%C*F191;%                           2




                                 Research Question                                                                                                               '()*+,)%-.)+/+)+%%
                                                                                                                                                                                               13
                                                                                                                                                                                                        14            20




                                                                                                                                                                                                                      16


                                                                                                                                                                                                                       8.()%49-9:+*9)6%
                                                                                                                                                                                                                                             21
                                                                                                                                                                                                                                                   18                 19




                                                                                                                                                                                                                                                                              0
                                                                                                                                                                                                                                                                                      @9:).*%).-72*+:%
                                                                                                                                                                                                                                                                                       91,2A.*.1,.%
                                                                                                                                                          !
                                                                                                                                                                                                                                                                                      1


          !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#
                                                    @*A#<7"#A,#8,/#                                                                                                                                                            24

                                                                                                                                                                                                                                                                                           B9-9:+*9)6%
    How !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
         can one automatically, accurately and efficiently
                                                    9*5,#.":*597B*"C#
                                                                                                                                                          &       01/.(%,21).1)%                             5
                                                                                                                                                                                                                      0-+;.%49-9:+*9)6%
                                                                                                                                                                                                                                                                                          )A*.4A2:/4%
    reconstruct a plausible provenance of files in a shared folder,
         !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                       "                                                                                                                                                     23




                                                                                                                                                                         )"*+#,-*+(
                                                                                                                                                                  013.*%4.-+15,%                                      <2-+91=47.,9>,%   20

                                                                                                                                                                                                                                                                                      <2-+91=47.,9>,%      17




                                                                                                                                                                                                                                                                                         >:).*91;%
    intended as the sequences of operations connecting the files?                                                                                                       )67.4%                                           49-9:+*9)6%
                                                                                                                                                                                                                                                                             19




                                                                                                                                                                                                        4                                                                                   15




                                                                                                                                                                                      3                                                                                                                                       14



           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                              2                               ?.)+/+)+%                                                              18




                                                                                                                                                                                                    6
                                                                                                                                                                                                                           49-9:+*9)6%                           22




           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#                                                                                                                                                                                                               21



                                                                                                                                                                                                                                                            16




                                                                                                                                                                                                                                    0                                                                 13


           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#                                                                                                                                                         1




                         Approach & Methodology
                                                                                                                                                                                 Initial (encouraging
                                                                                                                                                                                                                                                                                                 24




                                                                                                                                                                                               Cluster 1: Blood Cultures                                              Cluster 2: Markers                                           C
                                                                                                                                                                                               EvidenceQ||                                                            EvidenceQX                                                   G
                                                                                                        9"$"!+$"-#:
                                                                                4,!5(                   !"$"8$"!+(
                                                       4,!5(
                                                      67"8#(                  !"$"8$"!+(


    We !"!#$%!&'(
        propose )#*+$#!,$)%!&'(
                    a multi-signal pipeline approach that reconstructs                                                                                     F1-score of 0.49an experiment with a sm
                                                                                                                                                            We performed for only text similarity
    plausible provenance# traces# !,-)#$%!!)( #the# contents of the files and
         =+",# #  #   #   #     #
                                     using
                                        #               #
                                                    !,-)#$%!!)(
                                                               # #></*?,5#
                                                                      !,-)#$%!!)(
                                                                                                                                                           F1-score of 0.70 for the aggregation of v
                                                                                                                                                            publications, annotated manually by tw
    metadata as evidence of the./01( relationships between./21(
                                                    ./31(
                                                                       files.
                                                                                                                                                                                          Cluster 1: Blood Cultures                               Cluster 2: Markers                        Cluster 3: General




                                                                                                                                                                                                                                    Future work
                                                                                                                                                                                          EvidenceQ||                                             EvidenceQX                                Guideline

    The pipeline consists of four stages, each containing several




                                                                                                                                                                         !"#$#%&'(
                                                                                                                                                                                                                                                                      22




    components that can be executed in parallel:            #$4:2-4#-';'<=>'
                                                                                                                                                                                                                 23                                              17




                                                                                                                                                                                                                                                   15                                                                     2



                                                                                                                                                            Following the planned methodology, we
8$#A'   @1-%1$#-AA)4,'
                                 Research Question
                                     B&%$0C-A-A'D-4-1+E$4'     B&%$0C-A-A'@1F4)4,'              G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                         !
                                                                                                                                             #$%&'


                                                                                                                                                     "
                                                                                                                                                            components for each of the pipeline ph
                                                                                                                                                                                               13
                                                                                                                                                                                                        14            20




                                                                                                                                                                                                                      16                     21
                                                                                                                                                                                                                                                   18                 19




                                                                                                                                                                                                                                                                                  0




        ./01+#0'*-0+2+0+''             6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'                  G,,1-,+0$1!'
                                                                                                                                                            computational efficiency.                                                                                                  1




                                                                                                                                         (                                                                                     24
013.*%4.-+15,%
                                                                                                                                                                   013.*%4.-+15,%                                                   <2-+91=47.,9>,%
                                                                                                                                                                                                                                       <2-+91=47.,9>,%                                                                            <2-+91=47.,9>,%
                                                                                                                                                                                                                                                                                                                                     <2-+91=47.,9>,%
                                                                                                                                                                                                                                                                                                                                     >:).*91;%
                                                                                                                                                                                                                                                                                                                                         >:).*91;%




                                                                                                                                                                     )"*+#,
                                                                                                                                                                    )67.4%
                                                                                                                                                                        )67.4%      2

                                                                                                                                                                                                                                      49-9:+*9)6%
                                                                                                                                                                                                                                          49-9:+*9)6%                                                                                                18




                                                                                                                                                                                                 6                                                                                                  22




                                                                                                                                                                                                                                                                                                                                                                                        21



              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                                        ?.)+/+)+%
                                                                                                                                                                                                                                                ?.)+/+)+%                                      16




                                                                                                                                                                                                                                             49-9:+*9)6%
                                                                                                                                                                                                                                                 49-9:+*9)6%0                                                                                            13




              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#                                                                                                                                                                  1




                            Approach & Methodology
              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
                                                                                                                                                                                        Cluster 1: Blood Cultures
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                                                                         EvidenceQX
                                                                                                                                                                                                                                                                                                                                                    24




                                                                                                                                                                                                                                                                                                                                                                                                      Cluste
                                                                                                                                                                                                                                                                                                                                                                                                      Guide




     We propose a multi-signal pipeline approach that reconstructs
                                                      4,!5(
                                                          4,!5(
                                                     67"8#(
                                                         67"8#(
                                                                               4,!5(
                                                                                   4,!5(
                                                                             !"$"8$"!+(
                                                                                 !"$"8$"!+(
                                                                                                       9"$"!+$"-#:
                                                                                                           9"$"!+$"-#:
                                                                                                       !"$"8$"!+(
                                                                                                           !"$"8$"!+(
                                                                                                                                                                           Initial (encouraging)
                                                                                                                                                                            Initial (encouraging
                                                                                                                                                          F1-score of 0.49 for only text similarity
     plausible provenance traces using the contents of the files and
                         )#*+$#!,$)%!&'(
                             )#*+$#!,$)%!&'(
                                                                                                                                                          F1-score of 0.70 for the aggregation of va
         !"!#$%!&'(
            !"!#$%!&'(
     metadata #as evidence of# the relationships between files.
           =+",# #
               =+",#   # # # # # # # #       # # # # # # # # # # # # # #></*?,5#
                                                                          #></*?,5#
                                                                                                                                                           We performed an experiment with a a sm
                                                                                                                                                             We performed an experiment with sma
                                                                     !,-)#$%!!)(
                                                                         !,-)#$%!!)(          !,-)#$%!!)(
                                                                                                  !,-)#$%!!)(             !,-)#$%!!)(
                                                                                                                              !,-)#$%!!)(                  publications, annotated manually by two
                                                                                                                                                             publications, annotated manually by tw
                                                                     ./01(
                                                                         ./01(                ./31(
                                                                                                  ./31(                   ./21(
                                                                                                                              ./21(

     The pipeline consists of four stages, each containing several
     components that can be executed in parallel:
                                                                                                                                                                                   Cluster 1: Blood Blood Cultures Cluster 2: Markers
                                                                                                                                                                                         Cluster 1: Cultures
                                                                                                                                                                                   EvidenceQ||
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                            Future work   EvidenceQX
                                                                                                                                                                                                                                                                               EvidenceQX
                                                                                                                                                                                                                                                                                                                                           Cluster 3: General
                                                                                                                                                                                                                                                                                                                                                 Cluster 3: General
                                                                                                                                                                                                                                                                                                                                           Guideline
                                                                                                                                                                                                                                                                                                                                                Guideline




                                                                                                                                                                     !"#$#%&'(
                                                                                                                                                                     !"#$#%&'(
                                                             #$4:2-4#-';'<=>'                                                                                                                                                                                                                            22             22




                                                                                                                                            #$%&'         Following the planned methodology, we w                              23            23                                                     17             17




8$#A'      @1-%1$#-AA)4,'            B&%$0C-A-A'D-4-1+E$4'         B&%$0C-A-A'@1F4)4,'          G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                                          components for each of the pipeline phas                                                                         15             15                                                                                                 2             2        6




                                    Research Question
                                     Research Question
                                                                                                                                       !            "                                                14                    14       20            20                       18             18             19             19                                                                             4




           ./01+#0'*-0+2+0+''           6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'               G,,1-,+0$1!'
                                                                                                                                                          computational efficiency.
                                                                                                                                       (
                                                                                                                                                                                        13                    13                    16            16                 21              21                                       0        0                                                                        3


                                                                                                                                                 )*+,-'
 !                                                                                                                                                                                                                                                                                                                                1        1




 (
     How can automatically, accurately and efficiently #$4:2-4#-';'<=?'
   How342-/'#$40-40' one automatically, 6),4+7'9)70-1('
         can one 6),4+7'8-0-#0$1('
                                                                                                                                                                                                                                                       24             24

                                                        G,,1-,+0$1('
                                         accurately and efficiently
 "
                                                                                                                                                                                                                                                       Bibliography
                                                                                                                                                                                                                   5                     5




                                                                     #$%&'
   reconstruct a a plausible provenance of files ===' a shared folder,
     reconstruct plausible provenance of files in in shared folder,
            5'               5'              5'               a                                                                                                                                                                                                                                                                                                    23                   23




                                                                                                                                                                     )"*+#,-*+(
                                                                                                                                                                     )"*+#,-*+(
                                                                                                                                                                                                                                                                20              20                                                                            17                   17




   intended as the sequences ofof operations connecting the!files?
     intended as the sequences operations connecting the files?
                                                                                                                                                                                                                                                                                                                             19       19




                                                                           "                                                                                                                         4                     4                                                                                                                   15                   15




                                                                                                                                                            (1) Sara Magliacane: Reconstructing Prove
                                                                                                                                                                               3             3                                                                                                                                                                                                   14            14




                                                                                                                                             (
                                                                                                                                                                                    2                     2                                                                                                                                          18                   18




                                                                                                                                                            Consortium 2012
                                                                                                                                                                                                 6                     6                                                                            22             22




                                                                                                                                                                                                                                                                                                                                                                                        21            21



                                                                                                                                                                                                                                                                                               16             16




                                                                                                                                                                                                                                                            0              0                                                                             13                   13




        The research methodology is an iterative process, that will                                                                                         (2) Paul Groth, Yolanda Gil, Sara Magliacan
                                                                                                                                                                                                                                                                                1              1




                            Approach &&Methodology
                             Approach Methodology
        incrementally integrate existing approaches in literature and                                                                                       Annotation through Reconstructing Provena
                                                                                                                                                                                        Cluster 1: BloodBlood Cultures
                                                                                                                                                                                             Cluster 1: Cultures

                                                                                                                                                            Workshop on the role of Semantic Web in P
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                             EvidenceQ||
                                                                                                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                                                                              Cluster 2: Markers
                                                                                                                                                                                                                                                                                                         EvidenceQX
                                                                                                                                                                                                                                                                                                              EvidenceQX
                                                                                                                                                                                                                                                                                                                                                    24                   24




                                                                                                                                                                                                                                                                                                                                                                                                      Cluste
                                                                                                                                                                                                                                                                                                                                                                                                      Guide
                                                                                                                                                                                                                                                                                                                                                                                                           C
                                                                                                                                                                                                                                                                                                                                                                                                           G

        evaluate the performance on benchmark corpora.
                                                                                                                                                            ESWC 2012
     We propose a a multi-signal pipeline approach that reconstructs
       We propose multi-signal pipeline approach that reconstructs                                                                                        F1-score ofof 0.49 for only text similarity
                                                                                                                                                           F1-score 0.49 for only text similarity
     plausible provenance traces using the contents ofof the files and
       plausible provenance traces using the contents the files and                                                                                        F1-score ofof 0.70 for the aggregation of v
                                                                                                                                                           F1-score 0.70 for the aggregation of va
     metadata as evidence ofof the relationships between files.
       metadata as evidence the relationships between files.

     The pipeline consists ofof four stages, each containing several
       The pipeline consists four stages, each containing several
     components that can be executed in in parallel:
       components that can be executed parallel:
                                                                                                                                                                                                                                                            Future work
                                                                                                                                                                                                                                                             Future work
                                                              #$4:2-4#-';'<=>'
                                                                 #$4:2-4#-';'<=>'

                                                                                                                                            #$%&'
                                                                                                                                                #$%&'     Following the planned methodology, we w
                                                                                                                                                            Following the planned methodology, we
8$#A'
   8$#A'   @1-%1$#-AA)4,'
               @1-%1$#-AA)4,'           B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,'
                                     B&%$0C-A-A'D-4-1+E$4'       B&%$0C-A-A'@1F4)4,'            G,,1-,+E$4'+42'1+4H)4,'
                                                                                                    G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                       ! ! " "            components for each ofof the pipeline ph
                                                                                                                                                            components for each the pipeline phas
           ./01+#0'*-0+2+0+''
               ./01+#0'*-0+2+0+''       6),4+7'8-0-#0$1!'
                                            6),4+7'8-0-#0$1!'          6),4+7'9)70-1!'
                                                                           6),4+7'9)70-1!'           G,,1-,+0$1!'
                                                                                                         G,,1-,+0$1!'
                                                                                                                                                          computational efficiency.
                                                                                                                                                            computational efficiency.
                                                                                                                                       ( (
isors: Paul Groth and Frank van Harmelen



nt                                                      An initial prototype implementation
adata describing how,                             As a first step we focus on dependencies between files instead of
duced.                                            sequences of operations.

t often it is not tracked,                        We implemented a prototype of the pipeline using open-source
sic filesystem                                    components, like Apache Lucene, Apache Tika and Dropbox API.
                                                  As signal detectors we used well-known similarity measures.

ovenance post hoc?                           <2,4%   C*.7*2,.4491;%                   D672)A.4.4%E.1.*+521%                          D672)A.4.4%C*F191;%                             G;;*.;+521%+1/%*+1H91;%
                                                                                                                                                                                                                   !#$%


                                                                                                                                          @9:).*%).-72*+:%                                                     !          "
                                                     '()*+,)%-.)+/+)+%%                      8.()%49-9:+*9)6%                                                                             I.9;A)./%BF-%
                                                                                                                                           91,2A.*.1,.%
                                              !
<7"#A,#8,/#                                                                                                                                    B9-9:+*9)6%
                                                                                                                                                                                                               &      $#"%
                                              &       01/.(%,21).1)%                         0-+;.%49-9:+*9)6%
#.":*597B*"C#                                                                                                                                 )A*.4A2:/4%
                                              "
                                                      013.*%4.-+15,%                         <2-+91=47.,9>,%                              <2-+91=47.,9>,%
                                                           )67.4%                              49-9:+*9)6%                                   >:).*91;%

563-:6#################                                                                           ?.)+/+)+%
                                                                                                  49-9:+*9)6%
,<05,3*5/63-:6#

3,563-:6#




          9"$"!+$"-#:
          !"$"8$"!+(
                                                                     Initial (encouraging) results
    #          #          #></*?,5#
                                                  We performed an experiment with a small set of biomedical
,-)#$%!!)(                     !,-)#$%!!)(        publications, annotated manually by two domain experts.
 31(                           ./21(


                                                                          Cluster 1: Blood Cultures             Cluster 2: Markers              Cluster 3: General
                                                                          EvidenceQ||                           EvidenceQX                      Guideline
                                                             !"#$#%&'(




                                                                                                                                22




                                                                                        23                                 17




                                                                                                                 15                                           2              6                   7




on                                                                          13
                                                                                 14          20




                                                                                             16            21
                                                                                                                 18             19




                                                                                                                                      0




                                                                                                                                          1
                                                                                                                                                                     4




                                                                                                                                                                         3       5
                                                                                                                                                                                             8




                                                                                                                                                                                                     9    10




                                                                                                                                                                                                         11




                                                                                                      24                                                                             12
ISWC DC poster "Reconstructing Provenance"
ISWC DC poster "Reconstructing Provenance"
ISWC DC poster "Reconstructing Provenance"

Contenu connexe

Similaire à ISWC DC poster "Reconstructing Provenance"

Recomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais EducativasRecomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais EducativasMarcel Caraciolo
 
Blueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service DesignBlueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service DesignAndy Polaine
 
OECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandOECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandIlkka Kakko
 
3 q09 presentation
3 q09 presentation3 q09 presentation
3 q09 presentationSiteriCR2
 
ApresentaçãO 3 Q09 Cr2
ApresentaçãO 3 Q09   Cr2ApresentaçãO 3 Q09   Cr2
ApresentaçãO 3 Q09 Cr2CR2
 
Manifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad LaicaManifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad Laicaguest45bb716a5
 
slam robotic navigatin genetic localization
slam robotic navigatin genetic localizationslam robotic navigatin genetic localization
slam robotic navigatin genetic localizationlzenki
 
E-Enabling the Nation’s Data
E-Enabling the Nation’s Data E-Enabling the Nation’s Data
E-Enabling the Nation’s Data Ed Parsons
 
Banco de dados apostila
Banco de dados apostilaBanco de dados apostila
Banco de dados apostilafabiobelem7
 
Organizational development
Organizational developmentOrganizational development
Organizational developmentSeta Wicaksana
 
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...YONG ZHENG
 
2 q09 presentation
2 q09 presentation2 q09 presentation
2 q09 presentationSiteriCR2
 
Cocina vegana seitan-soja
Cocina vegana seitan-sojaCocina vegana seitan-soja
Cocina vegana seitan-sojaelbisaltico
 
Risk management: Social media usage in enterprises
Risk management: Social media usage in enterprisesRisk management: Social media usage in enterprises
Risk management: Social media usage in enterprisesdaenu
 
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Chiara Ojeda
 

Similaire à ISWC DC poster "Reconstructing Provenance" (20)

Haiku licence experience - fossa2010
Haiku licence experience - fossa2010Haiku licence experience - fossa2010
Haiku licence experience - fossa2010
 
Recomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais EducativasRecomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais Educativas
 
Blueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service DesignBlueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service Design
 
OECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandOECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, Finland
 
Exec ed june '10 ss
Exec ed june '10 ssExec ed june '10 ss
Exec ed june '10 ss
 
3 q09 presentation
3 q09 presentation3 q09 presentation
3 q09 presentation
 
ApresentaçãO 3 Q09 Cr2
ApresentaçãO 3 Q09   Cr2ApresentaçãO 3 Q09   Cr2
ApresentaçãO 3 Q09 Cr2
 
Manifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad LaicaManifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad Laica
 
slam robotic navigatin genetic localization
slam robotic navigatin genetic localizationslam robotic navigatin genetic localization
slam robotic navigatin genetic localization
 
E-Enabling the Nation’s Data
E-Enabling the Nation’s Data E-Enabling the Nation’s Data
E-Enabling the Nation’s Data
 
Ekaw2010 tutorial3 practical
Ekaw2010 tutorial3 practicalEkaw2010 tutorial3 practical
Ekaw2010 tutorial3 practical
 
Banco de dados apostila
Banco de dados apostilaBanco de dados apostila
Banco de dados apostila
 
Organizational development
Organizational developmentOrganizational development
Organizational development
 
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
 
Layouts
LayoutsLayouts
Layouts
 
All about Apache ACE
All about Apache ACEAll about Apache ACE
All about Apache ACE
 
2 q09 presentation
2 q09 presentation2 q09 presentation
2 q09 presentation
 
Cocina vegana seitan-soja
Cocina vegana seitan-sojaCocina vegana seitan-soja
Cocina vegana seitan-soja
 
Risk management: Social media usage in enterprises
Risk management: Social media usage in enterprisesRisk management: Social media usage in enterprises
Risk management: Social media usage in enterprises
 
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
 

Dernier

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Dernier (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

ISWC DC poster "Reconstructing Provenance"

  • 1. Reconstructing Provenance Sara Magliacane - VU University Amsterdam Advisors: Paul Groth and Frank van Harmelen Problem Statement An initial prototype implementation The provenance of a data item is the metadata describing how, As a first step we focus on dependencies between files instead of when and by whom the data item was produced. sequences of operations. Provenance is crucial in many settings, but often it is not tracked, We implemented a prototype of the pipeline using open-source resulting in collections of files with only basic filesystem components, like Apache Lucene, Apache Tika and Dropbox API. metadata, e.g. timestamps. As signal detectors we used well-known similarity measures. In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F191;% G;;*.;+521%+1/%*+1H91;% !#$% @9:).*%).-72*+:% ! " '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% I.9;A)./%BF-% 91,2A.*.1,.% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# ! @*A#<7"#A,#8,/# B9-9:+*9)6% & $#"% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# 9*5,#.":*597B*"C# )A*.4A2:/4% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% )67.4% 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging) results )#*+$#!,$)%!&'( !"!#$%!&'( =+",# # # # # # # # # # # # # #></*?,5# We performed an experiment with a small set of biomedical !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two domain experts. ./01( ./31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 2 6 7 Research Question 13 14 20 16 21 18 19 0 1 4 3 5 8 9 10 11 24 12 How can one automatically, accurately and efficiently 5 reconstruct a plausible provenance of files in a shared folder, 23 )"*+#,-*+( 20 17 intended as the sequences of operations connecting the files? 19 7 4 15 8 3 14 2 18 9 6 22 21 16 0 13 10 1 11 Approach & Methodology 12 24 Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline We propose a multi-signal pipeline approach that reconstructs F1-score of 0.49 for only text similarity plausible provenance traces using the contents of the files and F1-score of 0.70 for the aggregation of various similarities metadata as evidence of the relationships between files. The pipeline consists of four stages, each containing several components that can be executed in parallel: Future work #$4:2-4#-';'<=>' #$%&' Following the planned methodology, we will explore additional 8$#A' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' ! " components for each of the pipeline phases and consider also ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. ( )*+,-' ! ( 342-/'#$40-40' 6),4+7'8-0-#0$1(' 6),4+7'9)70-1(' G,,1-,+0$1(' #$4:2-4#-';'<=?' " 5' 5' 5' ===' ! #$%&' " Bibliography ( (1) Sara Magliacane: Reconstructing Provenance, ISWC Doctoral Consortium 2012 The research methodology is an iterative process, that will (2) Paul Groth, Yolanda Gil, Sara Magliacane: Automatic Metadata incrementally integrate existing approaches in literature and Annotation through Reconstructing Provenance, Third International evaluate the performance on benchmark corpora. Workshop on the role of Semantic Web in Provenance Management, ESWC 2012
  • 2. Advisors: Paul Groth and Frank van Harmelen Problem Statement An initial prototype im The provenance of a data item is the metadata describing how, As a first step we focus on dependen when and by whom the data item was produced. sequences of operations. Provenance is crucial in many settings, but often it is not tracked, We implemented a prototype of the p resulting in collections of files with only basic filesystem components, like Apache Lucene, Ap metadata, e.g. timestamps. As signal detectors we used well-kno In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F @9:).*%).-72* '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% 91,2A.*.1, !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# ! @*A#<7"#A,#8,/# B9-9:+*9)6% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# 9*5,#.":*597B*"C# )A*.4A2:/4 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47., )67.4% 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouragin )#*+$#!,$)%!&'( !"!#$%!&'( =+",# # # # # # # # # # # # # #></*?,5# We performed an experiment with a !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by ./01( ./31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: G EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 Research Question 13 14 20 16 21 18 19 0 1 24
  • 3. 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% Advisors: Paul Groth and Frank van )67.4% Harmelen 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# Problem Statement An initial prototype im !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# The provenance of a data item is the metadata describing how, 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging As a first step we focus on dependenc when !"!#$%!&'( whom the data item was produced. and by )#*+$#!,$)%!&'( sequences of operations. We performed an experiment with a sm =+",# # # # # # # # # # # # # #></*?,5# !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by tw Provenance is crucial in many ./01( settings, but often it is ./21( tracked, ./31( not We implemented a prototype of the pip resulting in collections of files with only basic filesystem components, like Apache Lucene, Apa Cluster 1: Blood Cultures EvidenceQ|| Cluster 2: Markers EvidenceQX Cluster 3: General Guideline metadata, e.g. timestamps. As signal detectors we used well-know !"#$#%&'( 22 23 17 In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% 15 D672)A.4.4%C*F191;% 2 Research Question '()*+,)%-.)+/+)+%% 13 14 20 16 8.()%49-9:+*9)6% 21 18 19 0 @9:).*%).-72*+:% 91,2A.*.1,.% ! 1 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# @*A#<7"#A,#8,/# 24 B9-9:+*9)6% How !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# can one automatically, accurately and efficiently 9*5,#.":*597B*"C# & 01/.(%,21).1)% 5 0-+;.%49-9:+*9)6% )A*.4A2:/4% reconstruct a plausible provenance of files in a shared folder, !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 23 )"*+#,-*+( 013.*%4.-+15,% <2-+91=47.,9>,% 20 <2-+91=47.,9>,% 17 >:).*91;% intended as the sequences of operations connecting the files? )67.4% 49-9:+*9)6% 19 4 15 3 14 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# 2 ?.)+/+)+% 18 6 49-9:+*9)6% 22 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# 21 16 0 13 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 1 Approach & Methodology Initial (encouraging 24 Cluster 1: Blood Cultures Cluster 2: Markers C EvidenceQ|| EvidenceQX G 9"$"!+$"-#: 4,!5( !"$"8$"!+( 4,!5( 67"8#( !"$"8$"!+( We !"!#$%!&'( propose )#*+$#!,$)%!&'( a multi-signal pipeline approach that reconstructs F1-score of 0.49an experiment with a sm We performed for only text similarity plausible provenance# traces# !,-)#$%!!)( #the# contents of the files and =+",# # # # # # using # # !,-)#$%!!)( # #></*?,5# !,-)#$%!!)( F1-score of 0.70 for the aggregation of v publications, annotated manually by tw metadata as evidence of the./01( relationships between./21( ./31( files. Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General Future work EvidenceQ|| EvidenceQX Guideline The pipeline consists of four stages, each containing several !"#$#%&'( 22 components that can be executed in parallel: #$4:2-4#-';'<=>' 23 17 15 2 Following the planned methodology, we 8$#A' @1-%1$#-AA)4,' Research Question B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' ! #$%&' " components for each of the pipeline ph 13 14 20 16 21 18 19 0 ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. 1 ( 24
  • 4. 013.*%4.-+15,% 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% <2-+91=47.,9>,% <2-+91=47.,9>,% >:).*91;% >:).*91;% )"*+#, )67.4% )67.4% 2 49-9:+*9)6% 49-9:+*9)6% 18 6 22 21 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% ?.)+/+)+% 16 49-9:+*9)6% 49-9:+*9)6%0 13 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# 1 Approach & Methodology !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# Cluster 1: Blood Cultures EvidenceQ|| Cluster 2: Markers EvidenceQX 24 Cluste Guide We propose a multi-signal pipeline approach that reconstructs 4,!5( 4,!5( 67"8#( 67"8#( 4,!5( 4,!5( !"$"8$"!+( !"$"8$"!+( 9"$"!+$"-#: 9"$"!+$"-#: !"$"8$"!+( !"$"8$"!+( Initial (encouraging) Initial (encouraging F1-score of 0.49 for only text similarity plausible provenance traces using the contents of the files and )#*+$#!,$)%!&'( )#*+$#!,$)%!&'( F1-score of 0.70 for the aggregation of va !"!#$%!&'( !"!#$%!&'( metadata #as evidence of# the relationships between files. =+",# # =+",# # # # # # # # # # # # # # # # # # # # # # #></*?,5# #></*?,5# We performed an experiment with a a sm We performed an experiment with sma !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two publications, annotated manually by tw ./01( ./01( ./31( ./31( ./21( ./21( The pipeline consists of four stages, each containing several components that can be executed in parallel: Cluster 1: Blood Blood Cultures Cluster 2: Markers Cluster 1: Cultures EvidenceQ|| EvidenceQ|| Cluster 2: Markers Future work EvidenceQX EvidenceQX Cluster 3: General Cluster 3: General Guideline Guideline !"#$#%&'( !"#$#%&'( #$4:2-4#-';'<=>' 22 22 #$%&' Following the planned methodology, we w 23 23 17 17 8$#A' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' components for each of the pipeline phas 15 15 2 2 6 Research Question Research Question ! " 14 14 20 20 18 18 19 19 4 ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. ( 13 13 16 16 21 21 0 0 3 )*+,-' ! 1 1 ( How can automatically, accurately and efficiently #$4:2-4#-';'<=?' How342-/'#$40-40' one automatically, 6),4+7'9)70-1(' can one 6),4+7'8-0-#0$1(' 24 24 G,,1-,+0$1(' accurately and efficiently " Bibliography 5 5 #$%&' reconstruct a a plausible provenance of files ===' a shared folder, reconstruct plausible provenance of files in in shared folder, 5' 5' 5' a 23 23 )"*+#,-*+( )"*+#,-*+( 20 20 17 17 intended as the sequences ofof operations connecting the!files? intended as the sequences operations connecting the files? 19 19 " 4 4 15 15 (1) Sara Magliacane: Reconstructing Prove 3 3 14 14 ( 2 2 18 18 Consortium 2012 6 6 22 22 21 21 16 16 0 0 13 13 The research methodology is an iterative process, that will (2) Paul Groth, Yolanda Gil, Sara Magliacan 1 1 Approach &&Methodology Approach Methodology incrementally integrate existing approaches in literature and Annotation through Reconstructing Provena Cluster 1: BloodBlood Cultures Cluster 1: Cultures Workshop on the role of Semantic Web in P EvidenceQ|| EvidenceQ|| Cluster 2: Markers Cluster 2: Markers EvidenceQX EvidenceQX 24 24 Cluste Guide C G evaluate the performance on benchmark corpora. ESWC 2012 We propose a a multi-signal pipeline approach that reconstructs We propose multi-signal pipeline approach that reconstructs F1-score ofof 0.49 for only text similarity F1-score 0.49 for only text similarity plausible provenance traces using the contents ofof the files and plausible provenance traces using the contents the files and F1-score ofof 0.70 for the aggregation of v F1-score 0.70 for the aggregation of va metadata as evidence ofof the relationships between files. metadata as evidence the relationships between files. The pipeline consists ofof four stages, each containing several The pipeline consists four stages, each containing several components that can be executed in in parallel: components that can be executed parallel: Future work Future work #$4:2-4#-';'<=>' #$4:2-4#-';'<=>' #$%&' #$%&' Following the planned methodology, we w Following the planned methodology, we 8$#A' 8$#A' @1-%1$#-AA)4,' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' G,,1-,+E$4'+42'1+4H)4,' ! ! " " components for each ofof the pipeline ph components for each the pipeline phas ./01+#0'*-0+2+0+'' ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' 6),4+7'9)70-1!' G,,1-,+0$1!' G,,1-,+0$1!' computational efficiency. computational efficiency. ( (
  • 5. isors: Paul Groth and Frank van Harmelen nt An initial prototype implementation adata describing how, As a first step we focus on dependencies between files instead of duced. sequences of operations. t often it is not tracked, We implemented a prototype of the pipeline using open-source sic filesystem components, like Apache Lucene, Apache Tika and Dropbox API. As signal detectors we used well-known similarity measures. ovenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F191;% G;;*.;+521%+1/%*+1H91;% !#$% @9:).*%).-72*+:% ! " '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% I.9;A)./%BF-% 91,2A.*.1,.% ! <7"#A,#8,/# B9-9:+*9)6% & $#"% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% #.":*597B*"C# )A*.4A2:/4% " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% )67.4% 49-9:+*9)6% >:).*91;% 563-:6################# ?.)+/+)+% 49-9:+*9)6% ,<05,3*5/63-:6# 3,563-:6# 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging) results # # #></*?,5# We performed an experiment with a small set of biomedical ,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two domain experts. 31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 2 6 7 on 13 14 20 16 21 18 19 0 1 4 3 5 8 9 10 11 24 12