SlideShare une entreprise Scribd logo
1  sur  30
Reengineering PDF-Based Documents
Targeting Complex Software
Specifications
Moutasm tamimi, Ahid yaseen
Software Engineering
Nojoumian, M., & Lethbridge, T. C. (2011). Reengineering PDF-based documents targeting complex
software specifications. International Journal of Knowledge and Web Intelligence, 2(4), 292-319.
Outline
o Review
o Abstract
o Contribution and Motivation
o Related Work
o Document Transformation
o Evaluation
o Logical Structure Extraction
o multilayer hypertext versions elements
o Checking Well-formedness and Validity
◦ Producing Multiple Outputs
◦ Examples
◦ Concept extraction
◦ Cross referencing
◦ Evaluation, Usability, And Architecture
◦ Architecture of the proposed framework
◦ Conclusion
◦ Future Work
Review
1. Extensible Mark-up Language (XML) is a mark-up language that
defines a set of rules for encoding documents in a format that is
both human-readable and machine-readable.
2. XPath function: You can use XML Path Language (Xpath) functions
to refine XPath queries and enhance the programming power and
flexibility of XPath.
Abstract
• This paper investigated the process of reengineering the complex
PDF documents by focusing on the Object Management Group
(OMG) standards and roles to produce the multilayer hypertext
interfaces, which can be more applicable of electronic documents.
Contribution and Motivation
Key contributions:
1. An efficient technique for capturing document structure
2. Various techniques for text extraction
3. A general approach for document engineering
4. Significant values and usability in the final result.
Related Work
1. Document Structure Analysis
2. PDF Document Analysis
3. Leveraging Tables of Contents
Document Transformation
Criteria extract the document’s logical structure and convert it to
XML:
Generality
Low
volume
Easy
processing
Tagging
structure
Containing
clues
Evaluation
The techniques of examining the given transformation criteria
DOC and RTF formats are
generally messy
PDF complexity
Logical Structure Extraction
1. First Refinement Approach (it failed in different chapters)
• In this method start of search and correspond the main tags like
<Part>, <Sect> and <Div>, which indicated at start and end of chapter
or sections in Adobe Acrobat.
• In practice authors applied the methods in sample of large document
and uneven chapters and found that this method unlikely failed, with
reason of forget tagging rightly the method close for<Sect> tag
incorrectly in wrong places
1. Logical Structure Extraction
• 2- Second Implementation Approach (LinkTarget,
LinkTargetQueue)
• In this method start of search and correspond the main tags like
<Part>, <Sect> and <Div>, which indicated at start and end of
chapter or sections in Adobe Acrobat.
• In practice authors applied the methods in sample of large
document and uneven chapters and found that this method
unlikely failed, with reason of forget tagging rightly the method
close for<Sect> tag incorrectly in wrong places
2. Text Extraction
• In 1990, Nielsen demonstrated the Hypertext and
hypermedia which considered the related information in
other data sources, the importance of these issues has
illustrated in the computer applications associated with
structured information like on-line documentation or
computer-aided learning, in order to construct a general
structure for our hypertext interfaces.
Multilayer Hypertext Versions Elements
A page for the table of contents
A separate page for each heading types
Hyperlinks for accessing to the table of
contents
Some pages for extracted concepts
Various cross references throughout the document
i.e. : a single page of a document
i.e. : part, chapter, section, and subsection
i.e. : Associations
i.e. : package and class hierarchy of the
UML
i.e. : content linked with figures
2.1 Checking Well-formedness and Validity
• A well-formed content based on the XML document with
opening and closing tags, and nested logical rules to be able
to check and validate it by Stylus Studio® XML tool. i.e.,
document must have it conducted schema, the uses tags
must be within the schema content.
2.2 Producing Multiple Outputs
• Five motivations to generate small hypertext pages:
1. A better sense of location: Best practice to the cross-references
in the content,
i.e syntax <a name=“xyz”> and <a href=“#xyz”> to navigate and move
between sections.
2. Less chance of getting lost: The end-users can scroll between
pages and have the movements between the parts. The problem
of a jump when the end-users move from part to another.
3. A less-overwhelming sensation: The end-user can operate the
large amounts of data and comprehend the content from the
small document.
4. Faster loading: The end-user ignoring the download of the big
document.
5. Statistical analysis: looking at the importance of information to
deal with the enhancement of the specification itself.
A better sense of location
Less chance of getting lost
A less-overwhelming sensation
Faster loading
Statistical analysis
The produced function based on 3 issues
• Folder named “folder-name”: contains the hypertext files
• @Number = attribute <Part>, <Chapter>, <Section>, <Subsection>
• Outputs: I.html, 7.html, 7.1.html, 7.2.html, 7.3.html, 7.3.1.html,
7.3.2.html.
Examples
2.3 Connecting Hypertext Pages Sequentially
• A Hypertext can be presented based on
XSLT code in a file by Previous and
Next at the above of the pages.
• By extracting elements attribute
sequentially (1, 2, …, 7, 7.1, 7.2, 7.3,
7.3.1, etc) stored in the Num.txt file to
carry out the Procedure Linker ()
algorithm to deal with the process of
building the hypertext pages.
2.4 Forming Major Document Elements
• 2.4.1 Figure
• 2.4.2 Table
• 2.4.3 List
2.4.1 Figures
• This section carried out in
transformation phase by the following
procedures for Figures XPath
expressions and XSLT codes;
• Convert the document to initial XML file
by the Adobe Acrobat Professional,
create a folder called “images” to the
same file. Store overall the figures in
that folder “folder-name_img_1.jpg”,
the XML file contains two elements
“src” means <ImageData>, and figure
<Caption>.
Cells Level string
<TD> When: position () =
1 <TD>
Level 1
<TD> When: position ()
=2 <TD>
Level 2
2.4.2 Tables
• In this section authors generated the relevant caption, and then
selected the TableRow element. Therefore, they constructed all table
cells. After that authors returned the index position of the node that
is currently being processed by XPath function: position(). Finally they
applied many expressions on each column.
2.4.3 Lists
• This section supported the XPath expressions based on a
style sheet design to recover the process of extracting
and transforming the Lists data in a document. According
to the XPath expressions given the table below:
Style sheet design XPath expressions
element <L></L>
lists <LI_Label> ……….. </LI_Label>
<LI_Title> ……….. </ LI_Title>
<xsl:for-each select="LI_Label">……….
<xsl:for-each select="LI_Title">
3. Concept extraction
1. Modeling Class Hierarchy Extraction
2. Modeling Package Hierarchy Extraction
4. Cross referencing
• To facilitate document browsing for end users, we created hyperlinks
for major document keywords (for example, class names as well as
package names) throughout the generated user interfaces. As we
mentioned previously, since these keywords were among document
headings, each of them had an independent hypertext page or anchor
link in the final user interfaces.
Evaluation, Usability, And Architecture
1. Reengineering of Various OMG Specifications
2. Usability of Multilayer Hypertext Interfaces: following benefits
through our usability studies, which did not exist in the original
PDF formats, or Adobe-Generated HTML formats:.
• Navigating
• Scrolling
• Processing
• Learning
• Monitoring
• Downloading
• Referencing
• Coloring
• Keeping track
Architecture of the Proposed Framework
Conclusion
• An approach for taking raw PDF versions of complex documents (e.g.,
specifications) and converting them into multilayer hypertext
interfaces. For each document, we first generated a clean XML
document with meaningful tags, and then constructed from this a
series of hypertext pages constituting the final system.
Future Work
1. Extract the initial XML document from other formats such as DOC,
RTF, HTML, etc. This can extend our framework for other kinds of
formats and documents.
2. Automate the concept extractions or at least create some features
for the detection of the logical relationships among headings
3. Improve the current solution and discover new users’ demands.
Only by such an investigation we can have a deep understanding of
users’ difficulties.
Example
• https://www.iro.umontreal.ca/~pift1025/bigjava/Ch26/ch26.html
Thank you
Speaker Information
 Moutasm tamimi
 Masters of Software Engineering
 Independent Consultant , IT Researcher.
 CEO at ITG7.com , IT-CRG.com
 Email: tamimi@itg7.com,
Click Here
Click HereI T G 7
Click Here
Click HereIT-CRG

Contenu connexe

Tendances

SE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software DesignSE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software DesignAmr E. Mohamed
 
Chapter17 system implementation
Chapter17 system implementationChapter17 system implementation
Chapter17 system implementationDhani Ahmad
 
PS02CINT22 SE Software Maintenance
PS02CINT22 SE Software MaintenancePS02CINT22 SE Software Maintenance
PS02CINT22 SE Software MaintenanceConestoga Collage
 
Ch21-Software Engineering 9
Ch21-Software Engineering 9Ch21-Software Engineering 9
Ch21-Software Engineering 9Ian Sommerville
 
Object Oriented Design
Object Oriented DesignObject Oriented Design
Object Oriented DesignAMITJain879
 
SE18_Lec 02_Software Life Cycle Model
SE18_Lec 02_Software Life Cycle ModelSE18_Lec 02_Software Life Cycle Model
SE18_Lec 02_Software Life Cycle ModelAmr E. Mohamed
 
System Development Life Cycle & Implementation of MIS
System Development Life Cycle & Implementation of MISSystem Development Life Cycle & Implementation of MIS
System Development Life Cycle & Implementation of MISGeorge V James
 
Work of art practices in software development.
Work of art practices in software development. Work of art practices in software development.
Work of art practices in software development. Communication Progress
 
Ch5- Software Engineering 9
Ch5- Software Engineering 9Ch5- Software Engineering 9
Ch5- Software Engineering 9Ian Sommerville
 
Software re engineering
Software re engineeringSoftware re engineering
Software re engineeringdeshpandeamrut
 
Contributors to Reduce Maintainability Cost at the Software Implementation Phase
Contributors to Reduce Maintainability Cost at the Software Implementation PhaseContributors to Reduce Maintainability Cost at the Software Implementation Phase
Contributors to Reduce Maintainability Cost at the Software Implementation PhaseWaqas Tariq
 
System Analysis And Design 2011
System Analysis And Design  2011System Analysis And Design  2011
System Analysis And Design 2011tgushi12
 
Quality Attribute: Testability
Quality Attribute: TestabilityQuality Attribute: Testability
Quality Attribute: TestabilityPranay Singh
 
Configuration Management in Software Engineering - SE29
Configuration Management in Software Engineering - SE29Configuration Management in Software Engineering - SE29
Configuration Management in Software Engineering - SE29koolkampus
 

Tendances (20)

SE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software DesignSE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software Design
 
Chapter17 system implementation
Chapter17 system implementationChapter17 system implementation
Chapter17 system implementation
 
PS02CINT22 SE Software Maintenance
PS02CINT22 SE Software MaintenancePS02CINT22 SE Software Maintenance
PS02CINT22 SE Software Maintenance
 
Software Evolution
Software EvolutionSoftware Evolution
Software Evolution
 
Ch21-Software Engineering 9
Ch21-Software Engineering 9Ch21-Software Engineering 9
Ch21-Software Engineering 9
 
Object Oriented Design
Object Oriented DesignObject Oriented Design
Object Oriented Design
 
SE18_Lec 02_Software Life Cycle Model
SE18_Lec 02_Software Life Cycle ModelSE18_Lec 02_Software Life Cycle Model
SE18_Lec 02_Software Life Cycle Model
 
System Development Life Cycle & Implementation of MIS
System Development Life Cycle & Implementation of MISSystem Development Life Cycle & Implementation of MIS
System Development Life Cycle & Implementation of MIS
 
Work of art practices in software development.
Work of art practices in software development. Work of art practices in software development.
Work of art practices in software development.
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
 
Ch5- Software Engineering 9
Ch5- Software Engineering 9Ch5- Software Engineering 9
Ch5- Software Engineering 9
 
Software re engineering
Software re engineeringSoftware re engineering
Software re engineering
 
Contributors to Reduce Maintainability Cost at the Software Implementation Phase
Contributors to Reduce Maintainability Cost at the Software Implementation PhaseContributors to Reduce Maintainability Cost at the Software Implementation Phase
Contributors to Reduce Maintainability Cost at the Software Implementation Phase
 
System Analysis And Design 2011
System Analysis And Design  2011System Analysis And Design  2011
System Analysis And Design 2011
 
Quality Attribute: Testability
Quality Attribute: TestabilityQuality Attribute: Testability
Quality Attribute: Testability
 
M azhar
M azharM azhar
M azhar
 
Customizing iso 9126 quality model for evaluation of b2 b applications
Customizing iso 9126 quality model for evaluation of b2 b applicationsCustomizing iso 9126 quality model for evaluation of b2 b applications
Customizing iso 9126 quality model for evaluation of b2 b applications
 
Test software use case
Test software use caseTest software use case
Test software use case
 
Configuration Management in Software Engineering - SE29
Configuration Management in Software Engineering - SE29Configuration Management in Software Engineering - SE29
Configuration Management in Software Engineering - SE29
 
5 chap - MAINTENANCE
5 chap - MAINTENANCE5 chap - MAINTENANCE
5 chap - MAINTENANCE
 

Similaire à Reengineering PDF-Based Documents Targeting Complex Software Specifications

IRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET Journal
 
LangChain + Docugami Webinar
LangChain + Docugami WebinarLangChain + Docugami Webinar
LangChain + Docugami WebinarTaqi Jaffri
 
DOC-20210303-WA0017..pptx,coding stuff in c
DOC-20210303-WA0017..pptx,coding stuff in cDOC-20210303-WA0017..pptx,coding stuff in c
DOC-20210303-WA0017..pptx,coding stuff in cfloraaluoch3
 
The Missing Link: Metadata Conversion Workflows for Everyone
The Missing Link: Metadata Conversion Workflows for EveryoneThe Missing Link: Metadata Conversion Workflows for Everyone
The Missing Link: Metadata Conversion Workflows for EveryoneAndrea Payant
 
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)pavishkumarsingh
 
accessible_pdf_webinar.ppt
accessible_pdf_webinar.pptaccessible_pdf_webinar.ppt
accessible_pdf_webinar.pptChelo603470
 
Technical writing tools
Technical writing toolsTechnical writing tools
Technical writing toolsAnil Menon
 
Pem Overview20090130
Pem Overview20090130Pem Overview20090130
Pem Overview20090130brianlbrinker
 
Chapter 1 Getting Started with HTML5
Chapter 1 Getting Started with HTML5Chapter 1 Getting Started with HTML5
Chapter 1 Getting Started with HTML5Dr. Ahmed Al Zaidy
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationijcsit
 
FME World Tour 2015 - FME & Data Migration Simon McCabe
FME World Tour 2015 -  FME & Data Migration Simon McCabeFME World Tour 2015 -  FME & Data Migration Simon McCabe
FME World Tour 2015 - FME & Data Migration Simon McCabeIMGS
 
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachbutest
 
Unit-III(Design).pptx
Unit-III(Design).pptxUnit-III(Design).pptx
Unit-III(Design).pptxFajar Baskoro
 
Adopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAdopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAnswerModules
 

Similaire à Reengineering PDF-Based Documents Targeting Complex Software Specifications (20)

IRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction Framework
 
LangChain + Docugami Webinar
LangChain + Docugami WebinarLangChain + Docugami Webinar
LangChain + Docugami Webinar
 
Remus_3_0
Remus_3_0Remus_3_0
Remus_3_0
 
DOC-20210303-WA0017..pptx,coding stuff in c
DOC-20210303-WA0017..pptx,coding stuff in cDOC-20210303-WA0017..pptx,coding stuff in c
DOC-20210303-WA0017..pptx,coding stuff in c
 
The Missing Link: Metadata Conversion Workflows for Everyone
The Missing Link: Metadata Conversion Workflows for EveryoneThe Missing Link: Metadata Conversion Workflows for Everyone
The Missing Link: Metadata Conversion Workflows for Everyone
 
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)
 
Multimedia system
Multimedia systemMultimedia system
Multimedia system
 
accessible_pdf_webinar.ppt
accessible_pdf_webinar.pptaccessible_pdf_webinar.ppt
accessible_pdf_webinar.ppt
 
Technical writing tools
Technical writing toolsTechnical writing tools
Technical writing tools
 
Pem Overview20090130
Pem Overview20090130Pem Overview20090130
Pem Overview20090130
 
Section 508 Compliance and Remediation Procdure_MMEdits (2)
Section 508 Compliance and Remediation Procdure_MMEdits (2)Section 508 Compliance and Remediation Procdure_MMEdits (2)
Section 508 Compliance and Remediation Procdure_MMEdits (2)
 
Chapter 1 Getting Started with HTML5
Chapter 1 Getting Started with HTML5Chapter 1 Getting Started with HTML5
Chapter 1 Getting Started with HTML5
 
Web Design
Web DesignWeb Design
Web Design
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
 
FME World Tour 2015 - FME & Data Migration Simon McCabe
FME World Tour 2015 -  FME & Data Migration Simon McCabeFME World Tour 2015 -  FME & Data Migration Simon McCabe
FME World Tour 2015 - FME & Data Migration Simon McCabe
 
The Trip to DITA
The Trip to DITAThe Trip to DITA
The Trip to DITA
 
5010
50105010
5010
 
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approach
 
Unit-III(Design).pptx
Unit-III(Design).pptxUnit-III(Design).pptx
Unit-III(Design).pptx
 
Adopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuiteAdopting AnswerModules ModuleSuite
Adopting AnswerModules ModuleSuite
 

Plus de Moutasm Tamimi

Software Quality Assessment Practices
Software Quality Assessment PracticesSoftware Quality Assessment Practices
Software Quality Assessment PracticesMoutasm Tamimi
 
An integrated security testing framework and tool
An integrated security testing framework  and toolAn integrated security testing framework  and tool
An integrated security testing framework and toolMoutasm Tamimi
 
Critical Success Factors along ERP life-cycle in Small medium enterprises
Critical Success Factors along ERP life-cycle in Small medium enterprises Critical Success Factors along ERP life-cycle in Small medium enterprises
Critical Success Factors along ERP life-cycle in Small medium enterprises Moutasm Tamimi
 
Critical Success Factors (CSFs) In International ERP Implementations with que...
Critical Success Factors (CSFs) In International ERP Implementations with que...Critical Success Factors (CSFs) In International ERP Implementations with que...
Critical Success Factors (CSFs) In International ERP Implementations with que...Moutasm Tamimi
 
Best Practices For Business Analyst - Part 3
Best Practices For Business Analyst - Part 3Best Practices For Business Analyst - Part 3
Best Practices For Business Analyst - Part 3Moutasm Tamimi
 
Concepts Of business analyst Practices - Part 1
Concepts Of business analyst Practices - Part 1Concepts Of business analyst Practices - Part 1
Concepts Of business analyst Practices - Part 1Moutasm Tamimi
 
Recovery in Multi database Systems
Recovery in Multi database SystemsRecovery in Multi database Systems
Recovery in Multi database SystemsMoutasm Tamimi
 
Software Quality Models: A Comparative Study paper
Software Quality Models: A Comparative Study  paperSoftware Quality Models: A Comparative Study  paper
Software Quality Models: A Comparative Study paperMoutasm Tamimi
 
ISO 29110 Software Quality Model For Software SMEs
ISO 29110 Software Quality Model For Software SMEsISO 29110 Software Quality Model For Software SMEs
ISO 29110 Software Quality Model For Software SMEsMoutasm Tamimi
 
Windows form application - C# Training
Windows form application - C# Training Windows form application - C# Training
Windows form application - C# Training Moutasm Tamimi
 
Asp.net Programming Training (Web design, Web development)
Asp.net Programming Training (Web design, Web  development)Asp.net Programming Training (Web design, Web  development)
Asp.net Programming Training (Web design, Web development)Moutasm Tamimi
 
Database Management System - SQL Advanced Training
Database Management System - SQL Advanced TrainingDatabase Management System - SQL Advanced Training
Database Management System - SQL Advanced TrainingMoutasm Tamimi
 
Database Management System - SQL beginner Training
Database Management System - SQL beginner Training Database Management System - SQL beginner Training
Database Management System - SQL beginner Training Moutasm Tamimi
 
Measurement and Quality in Object-Oriented Design
Measurement and Quality in Object-Oriented DesignMeasurement and Quality in Object-Oriented Design
Measurement and Quality in Object-Oriented DesignMoutasm Tamimi
 
SQL Injection and Clickjacking Attack in Web security
SQL Injection and Clickjacking Attack in Web securitySQL Injection and Clickjacking Attack in Web security
SQL Injection and Clickjacking Attack in Web securityMoutasm Tamimi
 

Plus de Moutasm Tamimi (15)

Software Quality Assessment Practices
Software Quality Assessment PracticesSoftware Quality Assessment Practices
Software Quality Assessment Practices
 
An integrated security testing framework and tool
An integrated security testing framework  and toolAn integrated security testing framework  and tool
An integrated security testing framework and tool
 
Critical Success Factors along ERP life-cycle in Small medium enterprises
Critical Success Factors along ERP life-cycle in Small medium enterprises Critical Success Factors along ERP life-cycle in Small medium enterprises
Critical Success Factors along ERP life-cycle in Small medium enterprises
 
Critical Success Factors (CSFs) In International ERP Implementations with que...
Critical Success Factors (CSFs) In International ERP Implementations with que...Critical Success Factors (CSFs) In International ERP Implementations with que...
Critical Success Factors (CSFs) In International ERP Implementations with que...
 
Best Practices For Business Analyst - Part 3
Best Practices For Business Analyst - Part 3Best Practices For Business Analyst - Part 3
Best Practices For Business Analyst - Part 3
 
Concepts Of business analyst Practices - Part 1
Concepts Of business analyst Practices - Part 1Concepts Of business analyst Practices - Part 1
Concepts Of business analyst Practices - Part 1
 
Recovery in Multi database Systems
Recovery in Multi database SystemsRecovery in Multi database Systems
Recovery in Multi database Systems
 
Software Quality Models: A Comparative Study paper
Software Quality Models: A Comparative Study  paperSoftware Quality Models: A Comparative Study  paper
Software Quality Models: A Comparative Study paper
 
ISO 29110 Software Quality Model For Software SMEs
ISO 29110 Software Quality Model For Software SMEsISO 29110 Software Quality Model For Software SMEs
ISO 29110 Software Quality Model For Software SMEs
 
Windows form application - C# Training
Windows form application - C# Training Windows form application - C# Training
Windows form application - C# Training
 
Asp.net Programming Training (Web design, Web development)
Asp.net Programming Training (Web design, Web  development)Asp.net Programming Training (Web design, Web  development)
Asp.net Programming Training (Web design, Web development)
 
Database Management System - SQL Advanced Training
Database Management System - SQL Advanced TrainingDatabase Management System - SQL Advanced Training
Database Management System - SQL Advanced Training
 
Database Management System - SQL beginner Training
Database Management System - SQL beginner Training Database Management System - SQL beginner Training
Database Management System - SQL beginner Training
 
Measurement and Quality in Object-Oriented Design
Measurement and Quality in Object-Oriented DesignMeasurement and Quality in Object-Oriented Design
Measurement and Quality in Object-Oriented Design
 
SQL Injection and Clickjacking Attack in Web security
SQL Injection and Clickjacking Attack in Web securitySQL Injection and Clickjacking Attack in Web security
SQL Injection and Clickjacking Attack in Web security
 

Dernier

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 

Dernier (20)

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 

Reengineering PDF-Based Documents Targeting Complex Software Specifications

  • 1. Reengineering PDF-Based Documents Targeting Complex Software Specifications Moutasm tamimi, Ahid yaseen Software Engineering Nojoumian, M., & Lethbridge, T. C. (2011). Reengineering PDF-based documents targeting complex software specifications. International Journal of Knowledge and Web Intelligence, 2(4), 292-319.
  • 2. Outline o Review o Abstract o Contribution and Motivation o Related Work o Document Transformation o Evaluation o Logical Structure Extraction o multilayer hypertext versions elements o Checking Well-formedness and Validity ◦ Producing Multiple Outputs ◦ Examples ◦ Concept extraction ◦ Cross referencing ◦ Evaluation, Usability, And Architecture ◦ Architecture of the proposed framework ◦ Conclusion ◦ Future Work
  • 3. Review 1. Extensible Mark-up Language (XML) is a mark-up language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. 2. XPath function: You can use XML Path Language (Xpath) functions to refine XPath queries and enhance the programming power and flexibility of XPath.
  • 4. Abstract • This paper investigated the process of reengineering the complex PDF documents by focusing on the Object Management Group (OMG) standards and roles to produce the multilayer hypertext interfaces, which can be more applicable of electronic documents.
  • 5. Contribution and Motivation Key contributions: 1. An efficient technique for capturing document structure 2. Various techniques for text extraction 3. A general approach for document engineering 4. Significant values and usability in the final result.
  • 6. Related Work 1. Document Structure Analysis 2. PDF Document Analysis 3. Leveraging Tables of Contents
  • 7. Document Transformation Criteria extract the document’s logical structure and convert it to XML: Generality Low volume Easy processing Tagging structure Containing clues
  • 8. Evaluation The techniques of examining the given transformation criteria DOC and RTF formats are generally messy PDF complexity
  • 9. Logical Structure Extraction 1. First Refinement Approach (it failed in different chapters) • In this method start of search and correspond the main tags like <Part>, <Sect> and <Div>, which indicated at start and end of chapter or sections in Adobe Acrobat. • In practice authors applied the methods in sample of large document and uneven chapters and found that this method unlikely failed, with reason of forget tagging rightly the method close for<Sect> tag incorrectly in wrong places
  • 10. 1. Logical Structure Extraction • 2- Second Implementation Approach (LinkTarget, LinkTargetQueue) • In this method start of search and correspond the main tags like <Part>, <Sect> and <Div>, which indicated at start and end of chapter or sections in Adobe Acrobat. • In practice authors applied the methods in sample of large document and uneven chapters and found that this method unlikely failed, with reason of forget tagging rightly the method close for<Sect> tag incorrectly in wrong places
  • 11. 2. Text Extraction • In 1990, Nielsen demonstrated the Hypertext and hypermedia which considered the related information in other data sources, the importance of these issues has illustrated in the computer applications associated with structured information like on-line documentation or computer-aided learning, in order to construct a general structure for our hypertext interfaces.
  • 12. Multilayer Hypertext Versions Elements A page for the table of contents A separate page for each heading types Hyperlinks for accessing to the table of contents Some pages for extracted concepts Various cross references throughout the document i.e. : a single page of a document i.e. : part, chapter, section, and subsection i.e. : Associations i.e. : package and class hierarchy of the UML i.e. : content linked with figures
  • 13. 2.1 Checking Well-formedness and Validity • A well-formed content based on the XML document with opening and closing tags, and nested logical rules to be able to check and validate it by Stylus Studio® XML tool. i.e., document must have it conducted schema, the uses tags must be within the schema content.
  • 14. 2.2 Producing Multiple Outputs • Five motivations to generate small hypertext pages: 1. A better sense of location: Best practice to the cross-references in the content, i.e syntax <a name=“xyz”> and <a href=“#xyz”> to navigate and move between sections. 2. Less chance of getting lost: The end-users can scroll between pages and have the movements between the parts. The problem of a jump when the end-users move from part to another. 3. A less-overwhelming sensation: The end-user can operate the large amounts of data and comprehend the content from the small document. 4. Faster loading: The end-user ignoring the download of the big document. 5. Statistical analysis: looking at the importance of information to deal with the enhancement of the specification itself. A better sense of location Less chance of getting lost A less-overwhelming sensation Faster loading Statistical analysis
  • 15. The produced function based on 3 issues • Folder named “folder-name”: contains the hypertext files • @Number = attribute <Part>, <Chapter>, <Section>, <Subsection> • Outputs: I.html, 7.html, 7.1.html, 7.2.html, 7.3.html, 7.3.1.html, 7.3.2.html.
  • 17. 2.3 Connecting Hypertext Pages Sequentially • A Hypertext can be presented based on XSLT code in a file by Previous and Next at the above of the pages. • By extracting elements attribute sequentially (1, 2, …, 7, 7.1, 7.2, 7.3, 7.3.1, etc) stored in the Num.txt file to carry out the Procedure Linker () algorithm to deal with the process of building the hypertext pages.
  • 18. 2.4 Forming Major Document Elements • 2.4.1 Figure • 2.4.2 Table • 2.4.3 List
  • 19. 2.4.1 Figures • This section carried out in transformation phase by the following procedures for Figures XPath expressions and XSLT codes; • Convert the document to initial XML file by the Adobe Acrobat Professional, create a folder called “images” to the same file. Store overall the figures in that folder “folder-name_img_1.jpg”, the XML file contains two elements “src” means <ImageData>, and figure <Caption>. Cells Level string <TD> When: position () = 1 <TD> Level 1 <TD> When: position () =2 <TD> Level 2
  • 20. 2.4.2 Tables • In this section authors generated the relevant caption, and then selected the TableRow element. Therefore, they constructed all table cells. After that authors returned the index position of the node that is currently being processed by XPath function: position(). Finally they applied many expressions on each column.
  • 21. 2.4.3 Lists • This section supported the XPath expressions based on a style sheet design to recover the process of extracting and transforming the Lists data in a document. According to the XPath expressions given the table below: Style sheet design XPath expressions element <L></L> lists <LI_Label> ……….. </LI_Label> <LI_Title> ……….. </ LI_Title> <xsl:for-each select="LI_Label">………. <xsl:for-each select="LI_Title">
  • 22. 3. Concept extraction 1. Modeling Class Hierarchy Extraction 2. Modeling Package Hierarchy Extraction
  • 23. 4. Cross referencing • To facilitate document browsing for end users, we created hyperlinks for major document keywords (for example, class names as well as package names) throughout the generated user interfaces. As we mentioned previously, since these keywords were among document headings, each of them had an independent hypertext page or anchor link in the final user interfaces.
  • 24. Evaluation, Usability, And Architecture 1. Reengineering of Various OMG Specifications 2. Usability of Multilayer Hypertext Interfaces: following benefits through our usability studies, which did not exist in the original PDF formats, or Adobe-Generated HTML formats:. • Navigating • Scrolling • Processing • Learning • Monitoring • Downloading • Referencing • Coloring • Keeping track
  • 25. Architecture of the Proposed Framework
  • 26. Conclusion • An approach for taking raw PDF versions of complex documents (e.g., specifications) and converting them into multilayer hypertext interfaces. For each document, we first generated a clean XML document with meaningful tags, and then constructed from this a series of hypertext pages constituting the final system.
  • 27. Future Work 1. Extract the initial XML document from other formats such as DOC, RTF, HTML, etc. This can extend our framework for other kinds of formats and documents. 2. Automate the concept extractions or at least create some features for the detection of the logical relationships among headings 3. Improve the current solution and discover new users’ demands. Only by such an investigation we can have a deep understanding of users’ difficulties.
  • 30. Speaker Information  Moutasm tamimi  Masters of Software Engineering  Independent Consultant , IT Researcher.  CEO at ITG7.com , IT-CRG.com  Email: tamimi@itg7.com, Click Here Click HereI T G 7 Click Here Click HereIT-CRG