SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Reconstructing Provenance                         Sara Magliacane - VU University Amsterdam
                                                                                          Advisors: Paul Groth and Frank van Harmelen



                                   Problem Statement                                                                                                                        An initial prototype implementation
The provenance of a data item is the metadata describing how,                                                                                                      As a first step we focus on dependencies between files instead of
when and by whom the data item was produced.                                                                                                                       sequences of operations.

Provenance is crucial in many settings, but often it is not tracked,                                                                                               We implemented a prototype of the pipeline using open-source
resulting in collections of files with only basic filesystem                                                                                                       components, like Apache Lucene, Apache Tika and Dropbox API.
metadata, e.g. timestamps.                                                                                                                                         As signal detectors we used well-known similarity measures.

In this case, is it possible to reconstruct provenance post hoc?                                                                                              <2,4%      C*.7*2,.4491;%                             D672)A.4.4%E.1.*+521%                                          D672)A.4.4%C*F191;%                                                 G;;*.;+521%+1/%*+1H91;%
                                                                                                                                                                                                                                                                                                                                                                                                   !#$%


                                                                                                                                                                                                                                                                                         @9:).*%).-72*+:%                                                                                      !          "
                                                                                                                                                                        '()*+,)%-.)+/+)+%%                                   8.()%49-9:+*9)6%                                                                                                                           I.9;A)./%BF-%
                                                                                                                                                                                                                                                                                          91,2A.*.1,.%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                        !
                                                                                      @*A#<7"#A,#8,/#                                                                                                                                                                                         B9-9:+*9)6%
                                                                                                                                                                                                                                                                                                                                                                                               &      $#"%
                                                                                                                                                               &          01/.(%,21).1)%                                     0-+;.%49-9:+*9)6%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
                                                                                      9*5,#.":*597B*"C#                                                                                                                                                                                      )A*.4A2:/4%
             !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                        "
                                                                                                                                                                          013.*%4.-+15,%                                     <2-+91=47.,9>,%                                             <2-+91=47.,9>,%
                                                                                                                                                                               )67.4%                                          49-9:+*9)6%                                                  >:).*91;%

              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                                  ?.)+/+)+%
                                                                                                                                                                                                                                  49-9:+*9)6%
              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#

              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




                                                           4,!5(
                                                          67"8#(
                                                                                   4,!5(
                                                                                 !"$"8$"!+(
                                                                                                           9"$"!+$"-#:
                                                                                                           !"$"8$"!+(
                                                                                                                                                                                        Initial (encouraging) results
                                 )#*+$#!,$)%!&'(
           !"!#$%!&'(
             =+",# #         #      #        #        #       #    #         #        #       #       #        #         #></*?,5#
                                                                                                                                                                   We performed an experiment with a small set of biomedical
                                                                       !,-)#$%!!)(                !,-)#$%!!)(                 !,-)#$%!!)(                          publications, annotated manually by two domain experts.
                                                                       ./01(                      ./31(                       ./21(


                                                                                                                                                                                                 Cluster 1: Blood Cultures                               Cluster 2: Markers                    Cluster 3: General
                                                                                                                                                                                                 EvidenceQ||                                             EvidenceQX                            Guideline




                                                                                                                                                                                !"#$#%&'(
                                                                                                                                                                                                                                                                             22




                                                                                                                                                                                                                        23                                              17




                                                                                                                                                                                                                                                          15                                                                  2                6                                 7




                                    Research Question                                                                                                                                                 13
                                                                                                                                                                                                               14            20




                                                                                                                                                                                                                             16                     21
                                                                                                                                                                                                                                                          18                 19




                                                                                                                                                                                                                                                                                     0




                                                                                                                                                                                                                                                                                         1
                                                                                                                                                                                                                                                                                                                                       4




                                                                                                                                                                                                                                                                                                                                           3       5
                                                                                                                                                                                                                                                                                                                                                                             8




                                                                                                                                                                                                                                                                                                                                                                                     9    10




                                                                                                                                                                                                                                                                                                                                                                                         11




                                                                                                                                                                                                                                      24                                                                                                                   12



     How can one automatically, accurately and efficiently                                                                                                                                                           5




     reconstruct a plausible provenance of files in a shared folder,                                                                                                                                                                                                                                                 23




                                                                                                                                                                                )"*+#,-*+(
                                                                                                                                                                                                                                               20                                                              17




     intended as the sequences of operations connecting the files?
                                                                                                                                                                                                                                                                                    19                                                                          7




                                                                                                                                                                                                               4                                                                                15                                                                  8




                                                                                                                                                                                             3                                                                                                                                    14




                                                                                                                                                                                                  2                                                                                                   18                                               9




                                                                                                                                                                                                           6                                                            22




                                                                                                                                                                                                                                                                                                                         21



                                                                                                                                                                                                                                                                   16




                                                                                                                                                                                                                                           0                                                              13                                                            10




                                                                                                                                                                                                                                                               1                                                                                                        11




                             Approach & Methodology
                                                                                                                                                                                                                                                                                                                                                           12




                                                                                                                                                                                                                                                                                                     24




                                                                                                                                                                                                      Cluster 1: Blood Cultures                                              Cluster 2: Markers                                        Cluster 3: General
                                                                                                                                                                                                      EvidenceQ||                                                            EvidenceQX                                                Guideline




     We propose a multi-signal pipeline approach that reconstructs                                                                                              F1-score of 0.49 for only text similarity
     plausible provenance traces using the contents of the files and                                                                                             F1-score of 0.70 for the aggregation of various similarities
     metadata as evidence of the relationships between files.

     The pipeline consists of four stages, each containing several
     components that can be executed in parallel:
                                                                                                                                                                                                                                           Future work
                                                             #$4:2-4#-';'<=>'

                                                                                                                                                #$%&'              Following the planned methodology, we will explore additional
8$#A'      @1-%1$#-AA)4,'               B&%$0C-A-A'D-4-1+E$4'      B&%$0C-A-A'@1F4)4,'             G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                            !           "
                                                                                                                                                                   components for each of the pipeline phases and consider also
           ./01+#0'*-0+2+0+''             6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'                  G,,1-,+0$1!'
                                                                                                                                                                   computational efficiency.
                                                                                                                                            (        )*+,-'
 !
 (          342-/'#$40-40'                6),4+7'8-0-#0$1('              6),4+7'9)70-1('                  G,,1-,+0$1('                 #$4:2-4#-';'<=?'

 "
                  5'                             5'                              5'                            ==='
                                                                                                                                            !
                                                                                                                                                #$%&'


                                                                                                                                                        "
                                                                                                                                                                                                                                      Bibliography
                                                                                                                                                 (                    (1) Sara Magliacane: Reconstructing Provenance, ISWC Doctoral
                                                                                                                                                                      Consortium 2012

        The research methodology is an iterative process, that will                                                                                                   (2) Paul Groth, Yolanda Gil, Sara Magliacane: Automatic Metadata
        incrementally integrate existing approaches in literature and                                                                                                 Annotation through Reconstructing Provenance, Third International
        evaluate the performance on benchmark corpora.                                                                                                                Workshop on the role of Semantic Web in Provenance Management,
                                                                                                                                                                      ESWC 2012
Advisors: Paul Groth and Frank van Harmelen



                            Problem Statement                                                                                              An initial prototype im
The provenance of a data item is the metadata describing how,                                                                        As a first step we focus on dependen
when and by whom the data item was produced.                                                                                         sequences of operations.

Provenance is crucial in many settings, but often it is not tracked,                                                                 We implemented a prototype of the p
resulting in collections of files with only basic filesystem                                                                         components, like Apache Lucene, Ap
metadata, e.g. timestamps.                                                                                                           As signal detectors we used well-kno

In this case, is it possible to reconstruct provenance post hoc?                                                                <2,4%   C*.7*2,.4491;%                   D672)A.4.4%E.1.*+521%                          D672)A.4.4%C*F


                                                                                                                                                                                                                             @9:).*%).-72*
                                                                                                                                        '()*+,)%-.)+/+)+%%                      8.()%49-9:+*9)6%
                                                                                                                                                                                                                              91,2A.*.1,
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                !
                                                                               @*A#<7"#A,#8,/#                                                                                                                                    B9-9:+*9)6%
                                                                                                                                 &       01/.(%,21).1)%                         0-+;.%49-9:+*9)6%
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
                                                                               9*5,#.":*597B*"C#                                                                                                                                 )A*.4A2:/4
       !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                "
                                                                                                                                         013.*%4.-+15,%                         <2-+91=47.,9>,%                              <2-+91=47.,
                                                                                                                                              )67.4%                              49-9:+*9)6%                                   >:).*91;%

        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                           ?.)+/+)+%
                                                                                                                                                                                     49-9:+*9)6%
        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#

        !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




                                                  4,!5(
                                                 67"8#(
                                                                            4,!5(
                                                                          !"$"8$"!+(
                                                                                                   9"$"!+$"-#:
                                                                                                   !"$"8$"!+(
                                                                                                                                                        Initial (encouragin
                          )#*+$#!,$)%!&'(
      !"!#$%!&'(
        =+",# #       #       #       #      #       #       #        #        #       #       #      #      #></*?,5#
                                                                                                                                     We performed an experiment with a
                                                                 !,-)#$%!!)(               !,-)#$%!!)(            !,-)#$%!!)(        publications, annotated manually by
                                                                 ./01(                     ./31(                  ./21(


                                                                                                                                                             Cluster 1: Blood Cultures             Cluster 2: Markers              Cluster 3: G
                                                                                                                                                             EvidenceQ||                           EvidenceQX                      Guideline




                                                                                                                                                !"#$#%&'(
                                                                                                                                                                                                                   22




                                                                                                                                                                           23                                 17




                                                                                                                                                                                                    15




                              Research Question                                                                                                                13
                                                                                                                                                                    14          20




                                                                                                                                                                                16            21
                                                                                                                                                                                                    18             19




                                                                                                                                                                                                                         0




                                                                                                                                                                                                                             1




                                                                                                                                                                                         24
013.*%4.-+15,%                                      <2-+91=47.,9>,%                                                 <2-+91=47.,9>,%
                                                                                       Advisors: Paul Groth and Frank van )67.4%
                                                                                                                           Harmelen 49-9:+*9)6%                                                                                                                                          >:).*91;%

           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                              ?.)+/+)+%
                                                                                                                                                                                                                           49-9:+*9)6%
           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#




                               Problem Statement                                                                                                                   An initial prototype im
           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#




The provenance of a data item is the metadata describing how,
                                                       4,!5(
                                                      67"8#(
                                                                                4,!5(
                                                                              !"$"8$"!+(
                                                                                                        9"$"!+$"-#:
                                                                                                        !"$"8$"!+(
                                                                                                                                                                         Initial (encouraging
                                                                                                                                                              As a first step we focus on dependenc
when !"!#$%!&'( whom the data item was produced.
     and by )#*+$#!,$)%!&'(                                                                                                                                   sequences of operations.
                                                                                                                                                              We performed an experiment with a sm
          =+",# #        #       #        #       #        #    #         #        #       #       #        #         #></*?,5#
                                                                    !,-)#$%!!)(                !,-)#$%!!)(                 !,-)#$%!!)(                        publications, annotated manually by tw
Provenance is crucial in many ./01(
                                  settings, but often it is ./21( tracked,
                                              ./31(          not                                                                                              We implemented a prototype of the pip
resulting in collections of files with only basic filesystem                                                                                                  components, like Apache Lucene, Apa
                                                                                                                                                                                          Cluster 1: Blood Cultures
                                                                                                                                                                                          EvidenceQ||
                                                                                                                                                                                                                                                  Cluster 2: Markers
                                                                                                                                                                                                                                                  EvidenceQX
                                                                                                                                                                                                                                                                                            Cluster 3: General
                                                                                                                                                                                                                                                                                            Guideline

metadata, e.g. timestamps.                                                                                                                                    As signal detectors we used well-know




                                                                                                                                                                        !"#$#%&'(
                                                                                                                                                                                                                                                                      22




                                                                                                                                                                                                                 23                                              17




In this case, is it possible to reconstruct provenance post hoc?                                                                                         <2,4%   C*.7*2,.4491;%                              D672)A.4.4%E.1.*+521%                 15
                                                                                                                                                                                                                                                                            D672)A.4.4%C*F191;%                           2




                                 Research Question                                                                                                               '()*+,)%-.)+/+)+%%
                                                                                                                                                                                               13
                                                                                                                                                                                                        14            20




                                                                                                                                                                                                                      16


                                                                                                                                                                                                                       8.()%49-9:+*9)6%
                                                                                                                                                                                                                                             21
                                                                                                                                                                                                                                                   18                 19




                                                                                                                                                                                                                                                                              0
                                                                                                                                                                                                                                                                                      @9:).*%).-72*+:%
                                                                                                                                                                                                                                                                                       91,2A.*.1,.%
                                                                                                                                                          !
                                                                                                                                                                                                                                                                                      1


          !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#
                                                    @*A#<7"#A,#8,/#                                                                                                                                                            24

                                                                                                                                                                                                                                                                                           B9-9:+*9)6%
    How !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6#
         can one automatically, accurately and efficiently
                                                    9*5,#.":*597B*"C#
                                                                                                                                                          &       01/.(%,21).1)%                             5
                                                                                                                                                                                                                      0-+;.%49-9:+*9)6%
                                                                                                                                                                                                                                                                                          )A*.4A2:/4%
    reconstruct a plausible provenance of files in a shared folder,
         !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#                                                                                                       "                                                                                                                                                     23




                                                                                                                                                                         )"*+#,-*+(
                                                                                                                                                                  013.*%4.-+15,%                                      <2-+91=47.,9>,%   20

                                                                                                                                                                                                                                                                                      <2-+91=47.,9>,%      17




                                                                                                                                                                                                                                                                                         >:).*91;%
    intended as the sequences of operations connecting the files?                                                                                                       )67.4%                                           49-9:+*9)6%
                                                                                                                                                                                                                                                                             19




                                                                                                                                                                                                        4                                                                                   15




                                                                                                                                                                                      3                                                                                                                                       14



           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                              2                               ?.)+/+)+%                                                              18




                                                                                                                                                                                                    6
                                                                                                                                                                                                                           49-9:+*9)6%                           22




           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#                                                                                                                                                                                                               21



                                                                                                                                                                                                                                                            16




                                                                                                                                                                                                                                    0                                                                 13


           !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#                                                                                                                                                         1




                         Approach & Methodology
                                                                                                                                                                                 Initial (encouraging
                                                                                                                                                                                                                                                                                                 24




                                                                                                                                                                                               Cluster 1: Blood Cultures                                              Cluster 2: Markers                                           C
                                                                                                                                                                                               EvidenceQ||                                                            EvidenceQX                                                   G
                                                                                                        9"$"!+$"-#:
                                                                                4,!5(                   !"$"8$"!+(
                                                       4,!5(
                                                      67"8#(                  !"$"8$"!+(


    We !"!#$%!&'(
        propose )#*+$#!,$)%!&'(
                    a multi-signal pipeline approach that reconstructs                                                                                     F1-score of 0.49an experiment with a sm
                                                                                                                                                            We performed for only text similarity
    plausible provenance# traces# !,-)#$%!!)( #the# contents of the files and
         =+",# #  #   #   #     #
                                     using
                                        #               #
                                                    !,-)#$%!!)(
                                                               # #></*?,5#
                                                                      !,-)#$%!!)(
                                                                                                                                                           F1-score of 0.70 for the aggregation of v
                                                                                                                                                            publications, annotated manually by tw
    metadata as evidence of the./01( relationships between./21(
                                                    ./31(
                                                                       files.
                                                                                                                                                                                          Cluster 1: Blood Cultures                               Cluster 2: Markers                        Cluster 3: General




                                                                                                                                                                                                                                    Future work
                                                                                                                                                                                          EvidenceQ||                                             EvidenceQX                                Guideline

    The pipeline consists of four stages, each containing several




                                                                                                                                                                         !"#$#%&'(
                                                                                                                                                                                                                                                                      22




    components that can be executed in parallel:            #$4:2-4#-';'<=>'
                                                                                                                                                                                                                 23                                              17




                                                                                                                                                                                                                                                   15                                                                     2



                                                                                                                                                            Following the planned methodology, we
8$#A'   @1-%1$#-AA)4,'
                                 Research Question
                                     B&%$0C-A-A'D-4-1+E$4'     B&%$0C-A-A'@1F4)4,'              G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                         !
                                                                                                                                             #$%&'


                                                                                                                                                     "
                                                                                                                                                            components for each of the pipeline ph
                                                                                                                                                                                               13
                                                                                                                                                                                                        14            20




                                                                                                                                                                                                                      16                     21
                                                                                                                                                                                                                                                   18                 19




                                                                                                                                                                                                                                                                                  0




        ./01+#0'*-0+2+0+''             6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'                  G,,1-,+0$1!'
                                                                                                                                                            computational efficiency.                                                                                                  1




                                                                                                                                         (                                                                                     24
013.*%4.-+15,%
                                                                                                                                                                   013.*%4.-+15,%                                                   <2-+91=47.,9>,%
                                                                                                                                                                                                                                       <2-+91=47.,9>,%                                                                            <2-+91=47.,9>,%
                                                                                                                                                                                                                                                                                                                                     <2-+91=47.,9>,%
                                                                                                                                                                                                                                                                                                                                     >:).*91;%
                                                                                                                                                                                                                                                                                                                                         >:).*91;%




                                                                                                                                                                     )"*+#,
                                                                                                                                                                    )67.4%
                                                                                                                                                                        )67.4%      2

                                                                                                                                                                                                                                      49-9:+*9)6%
                                                                                                                                                                                                                                          49-9:+*9)6%                                                                                                18




                                                                                                                                                                                                 6                                                                                                  22




                                                                                                                                                                                                                                                                                                                                                                                        21



              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6#################                                                                                                                        ?.)+/+)+%
                                                                                                                                                                                                                                                ?.)+/+)+%                                      16




                                                                                                                                                                                                                                             49-9:+*9)6%
                                                                                                                                                                                                                                                 49-9:+*9)6%0                                                                                            13




              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6#                                                                                                                                                                  1




                            Approach & Methodology
              !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
                   !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6#
                                                                                                                                                                                        Cluster 1: Blood Cultures
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                                                                         EvidenceQX
                                                                                                                                                                                                                                                                                                                                                    24




                                                                                                                                                                                                                                                                                                                                                                                                      Cluste
                                                                                                                                                                                                                                                                                                                                                                                                      Guide




     We propose a multi-signal pipeline approach that reconstructs
                                                      4,!5(
                                                          4,!5(
                                                     67"8#(
                                                         67"8#(
                                                                               4,!5(
                                                                                   4,!5(
                                                                             !"$"8$"!+(
                                                                                 !"$"8$"!+(
                                                                                                       9"$"!+$"-#:
                                                                                                           9"$"!+$"-#:
                                                                                                       !"$"8$"!+(
                                                                                                           !"$"8$"!+(
                                                                                                                                                                           Initial (encouraging)
                                                                                                                                                                            Initial (encouraging
                                                                                                                                                          F1-score of 0.49 for only text similarity
     plausible provenance traces using the contents of the files and
                         )#*+$#!,$)%!&'(
                             )#*+$#!,$)%!&'(
                                                                                                                                                          F1-score of 0.70 for the aggregation of va
         !"!#$%!&'(
            !"!#$%!&'(
     metadata #as evidence of# the relationships between files.
           =+",# #
               =+",#   # # # # # # # #       # # # # # # # # # # # # # #></*?,5#
                                                                          #></*?,5#
                                                                                                                                                           We performed an experiment with a a sm
                                                                                                                                                             We performed an experiment with sma
                                                                     !,-)#$%!!)(
                                                                         !,-)#$%!!)(          !,-)#$%!!)(
                                                                                                  !,-)#$%!!)(             !,-)#$%!!)(
                                                                                                                              !,-)#$%!!)(                  publications, annotated manually by two
                                                                                                                                                             publications, annotated manually by tw
                                                                     ./01(
                                                                         ./01(                ./31(
                                                                                                  ./31(                   ./21(
                                                                                                                              ./21(

     The pipeline consists of four stages, each containing several
     components that can be executed in parallel:
                                                                                                                                                                                   Cluster 1: Blood Blood Cultures Cluster 2: Markers
                                                                                                                                                                                         Cluster 1: Cultures
                                                                                                                                                                                   EvidenceQ||
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                            Future work   EvidenceQX
                                                                                                                                                                                                                                                                               EvidenceQX
                                                                                                                                                                                                                                                                                                                                           Cluster 3: General
                                                                                                                                                                                                                                                                                                                                                 Cluster 3: General
                                                                                                                                                                                                                                                                                                                                           Guideline
                                                                                                                                                                                                                                                                                                                                                Guideline




                                                                                                                                                                     !"#$#%&'(
                                                                                                                                                                     !"#$#%&'(
                                                             #$4:2-4#-';'<=>'                                                                                                                                                                                                                            22             22




                                                                                                                                            #$%&'         Following the planned methodology, we w                              23            23                                                     17             17




8$#A'      @1-%1$#-AA)4,'            B&%$0C-A-A'D-4-1+E$4'         B&%$0C-A-A'@1F4)4,'          G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                                          components for each of the pipeline phas                                                                         15             15                                                                                                 2             2        6




                                    Research Question
                                     Research Question
                                                                                                                                       !            "                                                14                    14       20            20                       18             18             19             19                                                                             4




           ./01+#0'*-0+2+0+''           6),4+7'8-0-#0$1!'              6),4+7'9)70-1!'               G,,1-,+0$1!'
                                                                                                                                                          computational efficiency.
                                                                                                                                       (
                                                                                                                                                                                        13                    13                    16            16                 21              21                                       0        0                                                                        3


                                                                                                                                                 )*+,-'
 !                                                                                                                                                                                                                                                                                                                                1        1




 (
     How can automatically, accurately and efficiently #$4:2-4#-';'<=?'
   How342-/'#$40-40' one automatically, 6),4+7'9)70-1('
         can one 6),4+7'8-0-#0$1('
                                                                                                                                                                                                                                                       24             24

                                                        G,,1-,+0$1('
                                         accurately and efficiently
 "
                                                                                                                                                                                                                                                       Bibliography
                                                                                                                                                                                                                   5                     5




                                                                     #$%&'
   reconstruct a a plausible provenance of files ===' a shared folder,
     reconstruct plausible provenance of files in in shared folder,
            5'               5'              5'               a                                                                                                                                                                                                                                                                                                    23                   23




                                                                                                                                                                     )"*+#,-*+(
                                                                                                                                                                     )"*+#,-*+(
                                                                                                                                                                                                                                                                20              20                                                                            17                   17




   intended as the sequences ofof operations connecting the!files?
     intended as the sequences operations connecting the files?
                                                                                                                                                                                                                                                                                                                             19       19




                                                                           "                                                                                                                         4                     4                                                                                                                   15                   15




                                                                                                                                                            (1) Sara Magliacane: Reconstructing Prove
                                                                                                                                                                               3             3                                                                                                                                                                                                   14            14




                                                                                                                                             (
                                                                                                                                                                                    2                     2                                                                                                                                          18                   18




                                                                                                                                                            Consortium 2012
                                                                                                                                                                                                 6                     6                                                                            22             22




                                                                                                                                                                                                                                                                                                                                                                                        21            21



                                                                                                                                                                                                                                                                                               16             16




                                                                                                                                                                                                                                                            0              0                                                                             13                   13




        The research methodology is an iterative process, that will                                                                                         (2) Paul Groth, Yolanda Gil, Sara Magliacan
                                                                                                                                                                                                                                                                                1              1




                            Approach &&Methodology
                             Approach Methodology
        incrementally integrate existing approaches in literature and                                                                                       Annotation through Reconstructing Provena
                                                                                                                                                                                        Cluster 1: BloodBlood Cultures
                                                                                                                                                                                             Cluster 1: Cultures

                                                                                                                                                            Workshop on the role of Semantic Web in P
                                                                                                                                                                                        EvidenceQ||
                                                                                                                                                                                             EvidenceQ||
                                                                                                                                                                                                                                                                                                         Cluster 2: Markers
                                                                                                                                                                                                                                                                                                              Cluster 2: Markers
                                                                                                                                                                                                                                                                                                         EvidenceQX
                                                                                                                                                                                                                                                                                                              EvidenceQX
                                                                                                                                                                                                                                                                                                                                                    24                   24




                                                                                                                                                                                                                                                                                                                                                                                                      Cluste
                                                                                                                                                                                                                                                                                                                                                                                                      Guide
                                                                                                                                                                                                                                                                                                                                                                                                           C
                                                                                                                                                                                                                                                                                                                                                                                                           G

        evaluate the performance on benchmark corpora.
                                                                                                                                                            ESWC 2012
     We propose a a multi-signal pipeline approach that reconstructs
       We propose multi-signal pipeline approach that reconstructs                                                                                        F1-score ofof 0.49 for only text similarity
                                                                                                                                                           F1-score 0.49 for only text similarity
     plausible provenance traces using the contents ofof the files and
       plausible provenance traces using the contents the files and                                                                                        F1-score ofof 0.70 for the aggregation of v
                                                                                                                                                           F1-score 0.70 for the aggregation of va
     metadata as evidence ofof the relationships between files.
       metadata as evidence the relationships between files.

     The pipeline consists ofof four stages, each containing several
       The pipeline consists four stages, each containing several
     components that can be executed in in parallel:
       components that can be executed parallel:
                                                                                                                                                                                                                                                            Future work
                                                                                                                                                                                                                                                             Future work
                                                              #$4:2-4#-';'<=>'
                                                                 #$4:2-4#-';'<=>'

                                                                                                                                            #$%&'
                                                                                                                                                #$%&'     Following the planned methodology, we w
                                                                                                                                                            Following the planned methodology, we
8$#A'
   8$#A'   @1-%1$#-AA)4,'
               @1-%1$#-AA)4,'           B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,'
                                     B&%$0C-A-A'D-4-1+E$4'       B&%$0C-A-A'@1F4)4,'            G,,1-,+E$4'+42'1+4H)4,'
                                                                                                    G,,1-,+E$4'+42'1+4H)4,'
                                                                                                                                       ! ! " "            components for each ofof the pipeline ph
                                                                                                                                                            components for each the pipeline phas
           ./01+#0'*-0+2+0+''
               ./01+#0'*-0+2+0+''       6),4+7'8-0-#0$1!'
                                            6),4+7'8-0-#0$1!'          6),4+7'9)70-1!'
                                                                           6),4+7'9)70-1!'           G,,1-,+0$1!'
                                                                                                         G,,1-,+0$1!'
                                                                                                                                                          computational efficiency.
                                                                                                                                                            computational efficiency.
                                                                                                                                       ( (
isors: Paul Groth and Frank van Harmelen



nt                                                      An initial prototype implementation
adata describing how,                             As a first step we focus on dependencies between files instead of
duced.                                            sequences of operations.

t often it is not tracked,                        We implemented a prototype of the pipeline using open-source
sic filesystem                                    components, like Apache Lucene, Apache Tika and Dropbox API.
                                                  As signal detectors we used well-known similarity measures.

ovenance post hoc?                           <2,4%   C*.7*2,.4491;%                   D672)A.4.4%E.1.*+521%                          D672)A.4.4%C*F191;%                             G;;*.;+521%+1/%*+1H91;%
                                                                                                                                                                                                                   !#$%


                                                                                                                                          @9:).*%).-72*+:%                                                     !          "
                                                     '()*+,)%-.)+/+)+%%                      8.()%49-9:+*9)6%                                                                             I.9;A)./%BF-%
                                                                                                                                           91,2A.*.1,.%
                                              !
<7"#A,#8,/#                                                                                                                                    B9-9:+*9)6%
                                                                                                                                                                                                               &      $#"%
                                              &       01/.(%,21).1)%                         0-+;.%49-9:+*9)6%
#.":*597B*"C#                                                                                                                                 )A*.4A2:/4%
                                              "
                                                      013.*%4.-+15,%                         <2-+91=47.,9>,%                              <2-+91=47.,9>,%
                                                           )67.4%                              49-9:+*9)6%                                   >:).*91;%

563-:6#################                                                                           ?.)+/+)+%
                                                                                                  49-9:+*9)6%
,<05,3*5/63-:6#

3,563-:6#




          9"$"!+$"-#:
          !"$"8$"!+(
                                                                     Initial (encouraging) results
    #          #          #></*?,5#
                                                  We performed an experiment with a small set of biomedical
,-)#$%!!)(                     !,-)#$%!!)(        publications, annotated manually by two domain experts.
 31(                           ./21(


                                                                          Cluster 1: Blood Cultures             Cluster 2: Markers              Cluster 3: General
                                                                          EvidenceQ||                           EvidenceQX                      Guideline
                                                             !"#$#%&'(




                                                                                                                                22




                                                                                        23                                 17




                                                                                                                 15                                           2              6                   7




on                                                                          13
                                                                                 14          20




                                                                                             16            21
                                                                                                                 18             19




                                                                                                                                      0




                                                                                                                                          1
                                                                                                                                                                     4




                                                                                                                                                                         3       5
                                                                                                                                                                                             8




                                                                                                                                                                                                     9    10




                                                                                                                                                                                                         11




                                                                                                      24                                                                             12
ISWC DC poster "Reconstructing Provenance"
ISWC DC poster "Reconstructing Provenance"
ISWC DC poster "Reconstructing Provenance"

Contenu connexe

Similaire à ISWC DC poster "Reconstructing Provenance"

Recomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais EducativasRecomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais EducativasMarcel Caraciolo
 
Blueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service DesignBlueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service DesignAndy Polaine
 
OECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandOECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandIlkka Kakko
 
3 q09 presentation
3 q09 presentation3 q09 presentation
3 q09 presentationSiteriCR2
 
ApresentaçãO 3 Q09 Cr2
ApresentaçãO 3 Q09   Cr2ApresentaçãO 3 Q09   Cr2
ApresentaçãO 3 Q09 Cr2CR2
 
Manifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad LaicaManifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad Laicaguest45bb716a5
 
slam robotic navigatin genetic localization
slam robotic navigatin genetic localizationslam robotic navigatin genetic localization
slam robotic navigatin genetic localizationlzenki
 
E-Enabling the Nation’s Data
E-Enabling the Nation’s Data E-Enabling the Nation’s Data
E-Enabling the Nation’s Data Ed Parsons
 
Banco de dados apostila
Banco de dados apostilaBanco de dados apostila
Banco de dados apostilafabiobelem7
 
Organizational development
Organizational developmentOrganizational development
Organizational developmentSeta Wicaksana
 
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...YONG ZHENG
 
2 q09 presentation
2 q09 presentation2 q09 presentation
2 q09 presentationSiteriCR2
 
Cocina vegana seitan-soja
Cocina vegana seitan-sojaCocina vegana seitan-soja
Cocina vegana seitan-sojaelbisaltico
 
Risk management: Social media usage in enterprises
Risk management: Social media usage in enterprisesRisk management: Social media usage in enterprises
Risk management: Social media usage in enterprisesdaenu
 
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Chiara Ojeda
 

Similaire à ISWC DC poster "Reconstructing Provenance" (20)

Haiku licence experience - fossa2010
Haiku licence experience - fossa2010Haiku licence experience - fossa2010
Haiku licence experience - fossa2010
 
Recomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais EducativasRecomendação de Conteúdo para Redes Sociais Educativas
Recomendação de Conteúdo para Redes Sociais Educativas
 
Blueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service DesignBlueprint+: Developing a Tool for Service Design
Blueprint+: Developing a Tool for Service Design
 
OECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, FinlandOECD, Higher education workshop, Helsinki, 2007, Finland
OECD, Higher education workshop, Helsinki, 2007, Finland
 
Exec ed june '10 ss
Exec ed june '10 ssExec ed june '10 ss
Exec ed june '10 ss
 
3 q09 presentation
3 q09 presentation3 q09 presentation
3 q09 presentation
 
ApresentaçãO 3 Q09 Cr2
ApresentaçãO 3 Q09   Cr2ApresentaçãO 3 Q09   Cr2
ApresentaçãO 3 Q09 Cr2
 
Manifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad LaicaManifiesto En Defensa De Una Sociedad Laica
Manifiesto En Defensa De Una Sociedad Laica
 
slam robotic navigatin genetic localization
slam robotic navigatin genetic localizationslam robotic navigatin genetic localization
slam robotic navigatin genetic localization
 
E-Enabling the Nation’s Data
E-Enabling the Nation’s Data E-Enabling the Nation’s Data
E-Enabling the Nation’s Data
 
Ekaw2010 tutorial3 practical
Ekaw2010 tutorial3 practicalEkaw2010 tutorial3 practical
Ekaw2010 tutorial3 practical
 
Banco de dados apostila
Banco de dados apostilaBanco de dados apostila
Banco de dados apostila
 
Organizational development
Organizational developmentOrganizational development
Organizational development
 
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
 
Layouts
LayoutsLayouts
Layouts
 
All about Apache ACE
All about Apache ACEAll about Apache ACE
All about Apache ACE
 
2 q09 presentation
2 q09 presentation2 q09 presentation
2 q09 presentation
 
Cocina vegana seitan-soja
Cocina vegana seitan-sojaCocina vegana seitan-soja
Cocina vegana seitan-soja
 
Risk management: Social media usage in enterprises
Risk management: Social media usage in enterprisesRisk management: Social media usage in enterprises
Risk management: Social media usage in enterprises
 
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
Tweak Your Slides: Ten Design Principles for Educators (version 3.0)
 

Dernier

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

ISWC DC poster "Reconstructing Provenance"

  • 1. Reconstructing Provenance Sara Magliacane - VU University Amsterdam Advisors: Paul Groth and Frank van Harmelen Problem Statement An initial prototype implementation The provenance of a data item is the metadata describing how, As a first step we focus on dependencies between files instead of when and by whom the data item was produced. sequences of operations. Provenance is crucial in many settings, but often it is not tracked, We implemented a prototype of the pipeline using open-source resulting in collections of files with only basic filesystem components, like Apache Lucene, Apache Tika and Dropbox API. metadata, e.g. timestamps. As signal detectors we used well-known similarity measures. In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F191;% G;;*.;+521%+1/%*+1H91;% !#$% @9:).*%).-72*+:% ! " '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% I.9;A)./%BF-% 91,2A.*.1,.% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# ! @*A#<7"#A,#8,/# B9-9:+*9)6% & $#"% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# 9*5,#.":*597B*"C# )A*.4A2:/4% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% )67.4% 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging) results )#*+$#!,$)%!&'( !"!#$%!&'( =+",# # # # # # # # # # # # # #></*?,5# We performed an experiment with a small set of biomedical !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two domain experts. ./01( ./31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 2 6 7 Research Question 13 14 20 16 21 18 19 0 1 4 3 5 8 9 10 11 24 12 How can one automatically, accurately and efficiently 5 reconstruct a plausible provenance of files in a shared folder, 23 )"*+#,-*+( 20 17 intended as the sequences of operations connecting the files? 19 7 4 15 8 3 14 2 18 9 6 22 21 16 0 13 10 1 11 Approach & Methodology 12 24 Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline We propose a multi-signal pipeline approach that reconstructs F1-score of 0.49 for only text similarity plausible provenance traces using the contents of the files and F1-score of 0.70 for the aggregation of various similarities metadata as evidence of the relationships between files. The pipeline consists of four stages, each containing several components that can be executed in parallel: Future work #$4:2-4#-';'<=>' #$%&' Following the planned methodology, we will explore additional 8$#A' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' ! " components for each of the pipeline phases and consider also ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. ( )*+,-' ! ( 342-/'#$40-40' 6),4+7'8-0-#0$1(' 6),4+7'9)70-1(' G,,1-,+0$1(' #$4:2-4#-';'<=?' " 5' 5' 5' ===' ! #$%&' " Bibliography ( (1) Sara Magliacane: Reconstructing Provenance, ISWC Doctoral Consortium 2012 The research methodology is an iterative process, that will (2) Paul Groth, Yolanda Gil, Sara Magliacane: Automatic Metadata incrementally integrate existing approaches in literature and Annotation through Reconstructing Provenance, Third International evaluate the performance on benchmark corpora. Workshop on the role of Semantic Web in Provenance Management, ESWC 2012
  • 2. Advisors: Paul Groth and Frank van Harmelen Problem Statement An initial prototype im The provenance of a data item is the metadata describing how, As a first step we focus on dependen when and by whom the data item was produced. sequences of operations. Provenance is crucial in many settings, but often it is not tracked, We implemented a prototype of the p resulting in collections of files with only basic filesystem components, like Apache Lucene, Ap metadata, e.g. timestamps. As signal detectors we used well-kno In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F @9:).*%).-72* '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% 91,2A.*.1, !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# ! @*A#<7"#A,#8,/# B9-9:+*9)6% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# 9*5,#.":*597B*"C# )A*.4A2:/4 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47., )67.4% 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouragin )#*+$#!,$)%!&'( !"!#$%!&'( =+",# # # # # # # # # # # # # #></*?,5# We performed an experiment with a !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by ./01( ./31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: G EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 Research Question 13 14 20 16 21 18 19 0 1 24
  • 3. 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% Advisors: Paul Groth and Frank van )67.4% Harmelen 49-9:+*9)6% >:).*91;% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% 49-9:+*9)6% !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# Problem Statement An initial prototype im !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# The provenance of a data item is the metadata describing how, 4,!5( 67"8#( 4,!5( !"$"8$"!+( 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging As a first step we focus on dependenc when !"!#$%!&'( whom the data item was produced. and by )#*+$#!,$)%!&'( sequences of operations. We performed an experiment with a sm =+",# # # # # # # # # # # # # #></*?,5# !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by tw Provenance is crucial in many ./01( settings, but often it is ./21( tracked, ./31( not We implemented a prototype of the pip resulting in collections of files with only basic filesystem components, like Apache Lucene, Apa Cluster 1: Blood Cultures EvidenceQ|| Cluster 2: Markers EvidenceQX Cluster 3: General Guideline metadata, e.g. timestamps. As signal detectors we used well-know !"#$#%&'( 22 23 17 In this case, is it possible to reconstruct provenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% 15 D672)A.4.4%C*F191;% 2 Research Question '()*+,)%-.)+/+)+%% 13 14 20 16 8.()%49-9:+*9)6% 21 18 19 0 @9:).*%).-72*+:% 91,2A.*.1,.% ! 1 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# @*A#<7"#A,#8,/# 24 B9-9:+*9)6% How !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/6# can one automatically, accurately and efficiently 9*5,#.":*597B*"C# & 01/.(%,21).1)% 5 0-+;.%49-9:+*9)6% )A*.4A2:/4% reconstruct a plausible provenance of files in a shared folder, !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/# " 23 )"*+#,-*+( 013.*%4.-+15,% <2-+91=47.,9>,% 20 <2-+91=47.,9>,% 17 >:).*91;% intended as the sequences of operations connecting the files? )67.4% 49-9:+*9)6% 19 4 15 3 14 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# 2 ?.)+/+)+% 18 6 49-9:+*9)6% 22 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# 21 16 0 13 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# 1 Approach & Methodology Initial (encouraging 24 Cluster 1: Blood Cultures Cluster 2: Markers C EvidenceQ|| EvidenceQX G 9"$"!+$"-#: 4,!5( !"$"8$"!+( 4,!5( 67"8#( !"$"8$"!+( We !"!#$%!&'( propose )#*+$#!,$)%!&'( a multi-signal pipeline approach that reconstructs F1-score of 0.49an experiment with a sm We performed for only text similarity plausible provenance# traces# !,-)#$%!!)( #the# contents of the files and =+",# # # # # # using # # !,-)#$%!!)( # #></*?,5# !,-)#$%!!)( F1-score of 0.70 for the aggregation of v publications, annotated manually by tw metadata as evidence of the./01( relationships between./21( ./31( files. Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General Future work EvidenceQ|| EvidenceQX Guideline The pipeline consists of four stages, each containing several !"#$#%&'( 22 components that can be executed in parallel: #$4:2-4#-';'<=>' 23 17 15 2 Following the planned methodology, we 8$#A' @1-%1$#-AA)4,' Research Question B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' ! #$%&' " components for each of the pipeline ph 13 14 20 16 21 18 19 0 ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. 1 ( 24
  • 4. 013.*%4.-+15,% 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% <2-+91=47.,9>,% <2-+91=47.,9>,% >:).*91;% >:).*91;% )"*+#, )67.4% )67.4% 2 49-9:+*9)6% 49-9:+*9)6% 18 6 22 21 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#7"#.978,#:5*9#/0,#12,#373,563-:6################# ?.)+/+)+% ?.)+/+)+% 16 49-9:+*9)6% 49-9:+*9)6%0 13 !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#7--."8#;#3757857304#:5*9#/0,#12,#/,<05,3*5/63-:6# 1 Approach & Methodology !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# !"#!$%&#'&(#)*+#,-./,-#/0,#12,#3*4/,5633/#9*-.1,-#7#375785730#4.9.275#/*#373,563-:6# Cluster 1: Blood Cultures EvidenceQ|| Cluster 2: Markers EvidenceQX 24 Cluste Guide We propose a multi-signal pipeline approach that reconstructs 4,!5( 4,!5( 67"8#( 67"8#( 4,!5( 4,!5( !"$"8$"!+( !"$"8$"!+( 9"$"!+$"-#: 9"$"!+$"-#: !"$"8$"!+( !"$"8$"!+( Initial (encouraging) Initial (encouraging F1-score of 0.49 for only text similarity plausible provenance traces using the contents of the files and )#*+$#!,$)%!&'( )#*+$#!,$)%!&'( F1-score of 0.70 for the aggregation of va !"!#$%!&'( !"!#$%!&'( metadata #as evidence of# the relationships between files. =+",# # =+",# # # # # # # # # # # # # # # # # # # # # # #></*?,5# #></*?,5# We performed an experiment with a a sm We performed an experiment with sma !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two publications, annotated manually by tw ./01( ./01( ./31( ./31( ./21( ./21( The pipeline consists of four stages, each containing several components that can be executed in parallel: Cluster 1: Blood Blood Cultures Cluster 2: Markers Cluster 1: Cultures EvidenceQ|| EvidenceQ|| Cluster 2: Markers Future work EvidenceQX EvidenceQX Cluster 3: General Cluster 3: General Guideline Guideline !"#$#%&'( !"#$#%&'( #$4:2-4#-';'<=>' 22 22 #$%&' Following the planned methodology, we w 23 23 17 17 8$#A' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' components for each of the pipeline phas 15 15 2 2 6 Research Question Research Question ! " 14 14 20 20 18 18 19 19 4 ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' G,,1-,+0$1!' computational efficiency. ( 13 13 16 16 21 21 0 0 3 )*+,-' ! 1 1 ( How can automatically, accurately and efficiently #$4:2-4#-';'<=?' How342-/'#$40-40' one automatically, 6),4+7'9)70-1(' can one 6),4+7'8-0-#0$1(' 24 24 G,,1-,+0$1(' accurately and efficiently " Bibliography 5 5 #$%&' reconstruct a a plausible provenance of files ===' a shared folder, reconstruct plausible provenance of files in in shared folder, 5' 5' 5' a 23 23 )"*+#,-*+( )"*+#,-*+( 20 20 17 17 intended as the sequences ofof operations connecting the!files? intended as the sequences operations connecting the files? 19 19 " 4 4 15 15 (1) Sara Magliacane: Reconstructing Prove 3 3 14 14 ( 2 2 18 18 Consortium 2012 6 6 22 22 21 21 16 16 0 0 13 13 The research methodology is an iterative process, that will (2) Paul Groth, Yolanda Gil, Sara Magliacan 1 1 Approach &&Methodology Approach Methodology incrementally integrate existing approaches in literature and Annotation through Reconstructing Provena Cluster 1: BloodBlood Cultures Cluster 1: Cultures Workshop on the role of Semantic Web in P EvidenceQ|| EvidenceQ|| Cluster 2: Markers Cluster 2: Markers EvidenceQX EvidenceQX 24 24 Cluste Guide C G evaluate the performance on benchmark corpora. ESWC 2012 We propose a a multi-signal pipeline approach that reconstructs We propose multi-signal pipeline approach that reconstructs F1-score ofof 0.49 for only text similarity F1-score 0.49 for only text similarity plausible provenance traces using the contents ofof the files and plausible provenance traces using the contents the files and F1-score ofof 0.70 for the aggregation of v F1-score 0.70 for the aggregation of va metadata as evidence ofof the relationships between files. metadata as evidence the relationships between files. The pipeline consists ofof four stages, each containing several The pipeline consists four stages, each containing several components that can be executed in in parallel: components that can be executed parallel: Future work Future work #$4:2-4#-';'<=>' #$4:2-4#-';'<=>' #$%&' #$%&' Following the planned methodology, we w Following the planned methodology, we 8$#A' 8$#A' @1-%1$#-AA)4,' @1-%1$#-AA)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' B&%$0C-A-A'D-4-1+E$4' B&%$0C-A-A'@1F4)4,' G,,1-,+E$4'+42'1+4H)4,' G,,1-,+E$4'+42'1+4H)4,' ! ! " " components for each ofof the pipeline ph components for each the pipeline phas ./01+#0'*-0+2+0+'' ./01+#0'*-0+2+0+'' 6),4+7'8-0-#0$1!' 6),4+7'8-0-#0$1!' 6),4+7'9)70-1!' 6),4+7'9)70-1!' G,,1-,+0$1!' G,,1-,+0$1!' computational efficiency. computational efficiency. ( (
  • 5. isors: Paul Groth and Frank van Harmelen nt An initial prototype implementation adata describing how, As a first step we focus on dependencies between files instead of duced. sequences of operations. t often it is not tracked, We implemented a prototype of the pipeline using open-source sic filesystem components, like Apache Lucene, Apache Tika and Dropbox API. As signal detectors we used well-known similarity measures. ovenance post hoc? <2,4% C*.7*2,.4491;% D672)A.4.4%E.1.*+521% D672)A.4.4%C*F191;% G;;*.;+521%+1/%*+1H91;% !#$% @9:).*%).-72*+:% ! " '()*+,)%-.)+/+)+%% 8.()%49-9:+*9)6% I.9;A)./%BF-% 91,2A.*.1,.% ! <7"#A,#8,/# B9-9:+*9)6% & $#"% & 01/.(%,21).1)% 0-+;.%49-9:+*9)6% #.":*597B*"C# )A*.4A2:/4% " 013.*%4.-+15,% <2-+91=47.,9>,% <2-+91=47.,9>,% )67.4% 49-9:+*9)6% >:).*91;% 563-:6################# ?.)+/+)+% 49-9:+*9)6% ,<05,3*5/63-:6# 3,563-:6# 9"$"!+$"-#: !"$"8$"!+( Initial (encouraging) results # # #></*?,5# We performed an experiment with a small set of biomedical ,-)#$%!!)( !,-)#$%!!)( publications, annotated manually by two domain experts. 31( ./21( Cluster 1: Blood Cultures Cluster 2: Markers Cluster 3: General EvidenceQ|| EvidenceQX Guideline !"#$#%&'( 22 23 17 15 2 6 7 on 13 14 20 16 21 18 19 0 1 4 3 5 8 9 10 11 24 12