SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
PDF AssociationTechnical Conference June 18-19 2013

PDF and Microsoft Sharepoint
Hurdles to Overcome

Neil Pitman
Aquaforest Limited

Version 1.120613
Objective

PDF as a Sharepoint “First Class Citizen”
 Objectives
 Sharepoint Overview
 PDF Capture
 PDF Search

Agenda

 iFilters
 Handling Image and Mixed Mode PDFs

 PDF Metadata
 Dictionary, XMP and Entity Extraction

 Configuration
 Sharepoint 2010 , 2013

 Summary
Microsoft Sharepoint Server - 125 million licenses sold
Sharepoint to be a natural target for PDF storage

 What is Sharepoint?
 On-Premise and Cloud-based Collaboration &
Document Management Platform

Sharepoint
Overview

 Origin - 2001
 Usage
 Focus on MS Office Documents
 Typically distributed capture
 Sharepoint Editions (2010, 2013)

Sharepoint
Overview

 Foundation
 Standard
 Enterprise

 Office 365 / Sharepoint Online
 Ecosystem
 Partner Products
 Office / Sharepoint Marketplace
Sharepoint
Architecture
Overview



MS Web-based (IIS)



MS Office Integration



SQL Server Storage



List or library data in a site collection is stored in a SQL Server database table, which uses queries, indexes and locks to maintain overall performance, sharing, and accuracy.



Filtered views with column indexes (and other operations) create database queries that identify a subset of columns and rows and return this subset to your computer.



Thresholds and limits help throttle operations and balance resources for many simultaneous users.



Privileged developers can use object model overrides to temporarily increase thresholds and limits for custom applications.



Administrators can specify dedicated time windows for all users to do unlimited operations during off-peak hours.



Information workers can use appropriate views, styles, and page limits to speed up the display of data on the page.

Microsoft Technology Stack







Windows Server 2008/12
Internet Information Server (IIS)
.Net Framework
SQL Server
MS Office
 Options

PDF Capture
for Sharepoint







Sharepoint UI
Acrobat XI
Load Tools
Custom Code
Workflow & Event Receivers

WebRequest request = WebRequest.Create(destUrl);
request.Credentials = CredentialCache.DefaultCredentials;
request.Method = "PUT";
byte[] buffer = new byte[1024];
using (Stream stream = request.GetRequestStream())
using (MemoryStream ms = new MemoryStream(fileBytes))
{
for (int i = ms.Read(buffer, 0, buffer.Length); i > 0;
i = ms.Read(buffer, 0, buffer.Length))
{
stream.Write(buffer, 0, i);
}
}
WebResponse response = request.GetResponse();
response.Close();
Logging.Log("Upload successful");
Acrobat XI
Sharepoint
Integration

http://www.adobe.com/uk/products/acrobat/pdf-version-control-sharepoint-integration.html
PDF Search in
Sharepoint Overview

 Item 1
 Item 2
iFilters scan documents for text and attributes – primarily in support
of Microsoft Search technologies.

iFilter
Architecture
iFilter
Configuration

 Architecture
 Code Sample
 Suppliers
 Issues
iFilter Explorer

PDF Search in
Sharepoint :
iFilters

 iFilter Explorer
https://gist.github.com/jimschubert/1473904

Using iFilters
directly in
Code

StringBuilder Buffer=new StringBuilder();
string PDFFile = @"C:devPDF
Conferences.pdf";
FilterCode f=new FilterCode();
f.GetTextFromDocument(PDFFile, ref Buffer);
Console.WriteLine(Buffer);

[DllImport("query.dll", SetLastError = true,
CharSet = CharSet.Unicode)]
static extern int LoadIFilter(string
pwcsPath,
[MarshalAs(UnmanagedType.IUnknown)]
object pUnkOuter,
ref IFilter ppIUnk);

public void GetTextFromDocument(string Path, ref StringBuilder
Buffer)
{
IFilter filter = null;
int hresult;
IFilterReturnCodes rtn;
// Initialize the return buffer to 64K.
Buffer = new StringBuilder(64 * 1024);
// Try to load the filter for the path given.
hresult = LoadIFilter(Path, new IntPtr(0), ref filter);
if (hresult == 0)
{
IFILTER_FLAGS uflags;
// Init the filter provider.
rtn = filter.Init(
IFILTER_INIT.IFILTER_INIT_CANON_PARAGRAPHS |
IFILTER_INIT.IFILTER_INIT_CANON_HYPHENS |
IFILTER_INIT.IFILTER_INIT_CANON_SPACES |
IFILTER_INIT.IFILTER_INIT_APPLY_INDEX_ATTRIBUTES |
IFILTER_INIT.IFILTER_INIT_INDEXING_ONLY,
0, new IntPtr(0), out uflags);
if (rtn == IFilterReturnCodes.S_OK)
{
STAT_CHUNK statChunk;
iFilter Test
Bookmark

PDF
Attachment

XMP
Metadata
Text

Image/OCR Text
Dictionary
Metadata

Annotation
Adobe
iFilter

FoxIt
iFilter

Microsoft
Format Handler

Body Text

iFilter Test
Results

PDFLib
iFilter




Bookmarks



Dictionary
Metadata










Annotations






XMP Metadata







PDF Attachment



*









Classify :





Dealing with
Image and
Mixed-Mode
PDFs

Image-Only
Born-Digital
Part Image-Only, Part Born-Digital
Previously OCRed
 Objectives:
 Ensure Full Searchability
 Avoid Text to Image Processing

 Process :

Dealing with
Image and
Mixed-Mode
PDFs

 Capture Time?
 Scheduled In-Place?
 Text Search vs Metadata Search
 Crawled vs Managed Properies
 Review Requirements

 Dictionary Metadata
 XMP Metadata
 Entity Extraction

PDF Metadata
In Sharepoint

 Consider Automation
Crawled vs Managed Properies

PDF Metadata
In Sharepoint
PDF Metadata
In Sharepoint :
Using Event
Receivers

 Event Receivers can enable Metadata assignment
Entity Extraction

PDF Metadata
In Sharepoint
Configuration

 Sharepoint 2010
 Sharepoint 2013
 Missing icon and iFilter

Sharepoint
2010 PDF
Configuration

http://www.adobe.com/devnet-docs/acrobatetk/tools/AdminGuide/Acrobat_Reader_IFilter_configuration.pdf
Sharepoint
2010 PDF
Configuration
 Default for PDF : X-Download-Options: noopen' added to HTTP
Response Header

Sharepoint
PDF
Configuration
 PDF Format Handler Support
 Currently no iFilter Support for PDF !?!?!!

Sharepoint
2013 and PDF
Configuration
Inline Viewing PDF in Sharepoint 2013

Sharepoint
2013 and PDF
Configuration

http://stevemannspath.blogspot.co.uk/2012/10/sharepoint-2013-pdf-preview-in-search.html
http://stevemannspath.blogspot.co.uk/2013/04/sharepoint-2013-pdf-support-and.html
 Microsoft Sharepoint Server - 125 million licenses sold
 Sharepoint to be a natural target for PDF storage
 PDF as a Sharepoint “First Class Citizen”

Summary

Contact : neil.pitman@aquaforest.com

Contenu connexe

Dernier

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Dernier (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

En vedette

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 

En vedette (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

Pdf and microsoft share point hurdles to overcome

  • 1. PDF AssociationTechnical Conference June 18-19 2013 PDF and Microsoft Sharepoint Hurdles to Overcome Neil Pitman Aquaforest Limited Version 1.120613
  • 2. Objective PDF as a Sharepoint “First Class Citizen”
  • 3.  Objectives  Sharepoint Overview  PDF Capture  PDF Search Agenda  iFilters  Handling Image and Mixed Mode PDFs  PDF Metadata  Dictionary, XMP and Entity Extraction  Configuration  Sharepoint 2010 , 2013  Summary
  • 4. Microsoft Sharepoint Server - 125 million licenses sold Sharepoint to be a natural target for PDF storage  What is Sharepoint?  On-Premise and Cloud-based Collaboration & Document Management Platform Sharepoint Overview  Origin - 2001  Usage  Focus on MS Office Documents  Typically distributed capture
  • 5.  Sharepoint Editions (2010, 2013) Sharepoint Overview  Foundation  Standard  Enterprise  Office 365 / Sharepoint Online  Ecosystem  Partner Products  Office / Sharepoint Marketplace
  • 6. Sharepoint Architecture Overview  MS Web-based (IIS)  MS Office Integration  SQL Server Storage  List or library data in a site collection is stored in a SQL Server database table, which uses queries, indexes and locks to maintain overall performance, sharing, and accuracy.  Filtered views with column indexes (and other operations) create database queries that identify a subset of columns and rows and return this subset to your computer.  Thresholds and limits help throttle operations and balance resources for many simultaneous users.  Privileged developers can use object model overrides to temporarily increase thresholds and limits for custom applications.  Administrators can specify dedicated time windows for all users to do unlimited operations during off-peak hours.  Information workers can use appropriate views, styles, and page limits to speed up the display of data on the page. Microsoft Technology Stack      Windows Server 2008/12 Internet Information Server (IIS) .Net Framework SQL Server MS Office
  • 7.  Options PDF Capture for Sharepoint      Sharepoint UI Acrobat XI Load Tools Custom Code Workflow & Event Receivers WebRequest request = WebRequest.Create(destUrl); request.Credentials = CredentialCache.DefaultCredentials; request.Method = "PUT"; byte[] buffer = new byte[1024]; using (Stream stream = request.GetRequestStream()) using (MemoryStream ms = new MemoryStream(fileBytes)) { for (int i = ms.Read(buffer, 0, buffer.Length); i > 0; i = ms.Read(buffer, 0, buffer.Length)) { stream.Write(buffer, 0, i); } } WebResponse response = request.GetResponse(); response.Close(); Logging.Log("Upload successful");
  • 9. PDF Search in Sharepoint Overview  Item 1  Item 2
  • 10. iFilters scan documents for text and attributes – primarily in support of Microsoft Search technologies. iFilter Architecture
  • 11. iFilter Configuration  Architecture  Code Sample  Suppliers  Issues
  • 12. iFilter Explorer PDF Search in Sharepoint : iFilters  iFilter Explorer
  • 13. https://gist.github.com/jimschubert/1473904 Using iFilters directly in Code StringBuilder Buffer=new StringBuilder(); string PDFFile = @"C:devPDF Conferences.pdf"; FilterCode f=new FilterCode(); f.GetTextFromDocument(PDFFile, ref Buffer); Console.WriteLine(Buffer); [DllImport("query.dll", SetLastError = true, CharSet = CharSet.Unicode)] static extern int LoadIFilter(string pwcsPath, [MarshalAs(UnmanagedType.IUnknown)] object pUnkOuter, ref IFilter ppIUnk); public void GetTextFromDocument(string Path, ref StringBuilder Buffer) { IFilter filter = null; int hresult; IFilterReturnCodes rtn; // Initialize the return buffer to 64K. Buffer = new StringBuilder(64 * 1024); // Try to load the filter for the path given. hresult = LoadIFilter(Path, new IntPtr(0), ref filter); if (hresult == 0) { IFILTER_FLAGS uflags; // Init the filter provider. rtn = filter.Init( IFILTER_INIT.IFILTER_INIT_CANON_PARAGRAPHS | IFILTER_INIT.IFILTER_INIT_CANON_HYPHENS | IFILTER_INIT.IFILTER_INIT_CANON_SPACES | IFILTER_INIT.IFILTER_INIT_APPLY_INDEX_ATTRIBUTES | IFILTER_INIT.IFILTER_INIT_INDEXING_ONLY, 0, new IntPtr(0), out uflags); if (rtn == IFilterReturnCodes.S_OK) { STAT_CHUNK statChunk;
  • 15. Adobe iFilter FoxIt iFilter Microsoft Format Handler Body Text iFilter Test Results PDFLib iFilter   Bookmarks  Dictionary Metadata       Annotations     XMP Metadata    PDF Attachment  *      
  • 16. Classify :     Dealing with Image and Mixed-Mode PDFs Image-Only Born-Digital Part Image-Only, Part Born-Digital Previously OCRed
  • 17.  Objectives:  Ensure Full Searchability  Avoid Text to Image Processing  Process : Dealing with Image and Mixed-Mode PDFs  Capture Time?  Scheduled In-Place?
  • 18.  Text Search vs Metadata Search  Crawled vs Managed Properies  Review Requirements  Dictionary Metadata  XMP Metadata  Entity Extraction PDF Metadata In Sharepoint  Consider Automation
  • 19. Crawled vs Managed Properies PDF Metadata In Sharepoint
  • 20. PDF Metadata In Sharepoint : Using Event Receivers  Event Receivers can enable Metadata assignment
  • 23.  Missing icon and iFilter Sharepoint 2010 PDF Configuration http://www.adobe.com/devnet-docs/acrobatetk/tools/AdminGuide/Acrobat_Reader_IFilter_configuration.pdf
  • 25.  Default for PDF : X-Download-Options: noopen' added to HTTP Response Header Sharepoint PDF Configuration
  • 26.  PDF Format Handler Support  Currently no iFilter Support for PDF !?!?!! Sharepoint 2013 and PDF Configuration
  • 27. Inline Viewing PDF in Sharepoint 2013 Sharepoint 2013 and PDF Configuration http://stevemannspath.blogspot.co.uk/2012/10/sharepoint-2013-pdf-preview-in-search.html http://stevemannspath.blogspot.co.uk/2013/04/sharepoint-2013-pdf-support-and.html
  • 28.  Microsoft Sharepoint Server - 125 million licenses sold  Sharepoint to be a natural target for PDF storage  PDF as a Sharepoint “First Class Citizen” Summary Contact : neil.pitman@aquaforest.com