Lexbe eDiscovery Webinar- Best Practices: NearDup

Best Practices: NearDup
Gene Albert
Principal, Lexbe LC
Using Near Duplicate ID to Detect Key Docs, Protect Privilege
& Speed Reviews
July 17, 2014

eDiscovery Webinar Series
○ Takes Place Monthly
○ Cover a Variety of Relevant eDiscovery Topics
○ Presentations Available for Download by Registrants.
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
Info

Lexbe is an Austin, TX based eDiscovery software and services provider.
○ Lexbe eDiscovery Platform
Lexbe eDiscovery Platform is a hosted eDiscovery processing and review
tool. Users can load a variety of file types, process for review, OCR for
search, and conduct document reviews, productions, prepare for depos
& analyze transcripts, conduct case analytics, prepare for dispositive
motions, and provide litigation support during trial.
○ Lexbe eDiscovery Services
Lexbe performs large volume document culling, processing from native
to PDF or TIFF, load file creation, high-volume OCR of image files, Rule
26 and project management consulting, and related eDiscovery Services.
About Lexbe
Lexbe Sales
sales@lexbe.com
(800) 401-7809 x22

If you have any questions or technical issues, please e-mail them to:
webinars@lexbe.com
Questions will be forwarded to Gene and answered during the webinar or via
e-mail if we run out of time.
Questions & Technical Issues

○ Principal of Lexbe LC, a provider of cloud-based litigation review and document
management software & eDiscovery services.
○ Prior business experience in software, medical services and internet-based
businesses. Prior legal experience as in-house counsel and in private practice.
○ Frequent speaker and author on eDiscovery and legal technology issues.
○ Education
MBA, University of Texas (2005)
JD, Southern Methodist University (1983)
BA, University of Texas (1979)
○ Contact
Gene Albert
512-686-3460
gene@lexbe.com
Gene Albert Bio

Near Duplicate Detection
○ What is Near Duplicate Identification?
○ When is ‘NearDup’ Needed?
○ Inadvertent Privilege Release Example
○ Using ‘NearDup’ to:
■ Group Similar Documents
■ Find More Key Documents
■ Enable Email Threading
■ Prevent the Inadvertent Release of Privileged Information
○ NearDup Groupings+ service options from Lexbe
Agenda

What Is It?
○ NearDup technology automatically recognizes similar
documents within an e-discovery document collection
○ Algorithm analyzes, evaluates and compares the
actual text content of the documents to each other
Unstructured Documents NearDup Groupings

What Does It Do?
NearDup technology will group similar documents, even though not
exactly the same. Examples include:
○ Separately scanned documents.
○ Multiple versions of a Word document that are slightly different
due to minor edits, reformatting, etc.
○ An original document and one with handwritten notes on it.
○ Emails and responses that continue a conversational ‘chain’ or
‘thread’.

Data Types and Volume Keep Growing
Digital Information Created, Captured, Replicated Worldwide4
3
2
1
2005 2010 2015
Source: IDC Digital Universe Study (2012)
* 1 Zettabyte = 1 Trillion Gigabytes
Zettabytes*
2.8 zettabytes of information were created
and replicated during 2012, a 56% increase
from 2011 (IDC)
Voip
Email
iPhones
Peer-to-Peer
Online Storage
Digital Cameras
Facebook | LinkedIn
DropBox | Backup Devices
Elastic Storage | SaaS | Google Streets
Personal Blogs | Skype | World Satellite Images
Personal Scanners | Customer Service Recordings
Public Webcams | Google Goggles | Netbooks | Cloud Instance Servers | PaaS
Need for Near Duplicate Detection

Main Applications of NearDup
There are 4 main applications of NearDup analysis:
1) Grouping similar documents:
○ Bunch highly similar documents together for more efficient coding
and review
2) Finding hidden ‘key’ or ‘hot’ docs:
○ Retrieve and mark unseen documents that have content highly
related to existing ‘hot’ or ‘key’ documents
3) Preventing the inadvertent release of privileged information
○ Be automatically alerted to files containing similar content to
documents that have already been coded as privileged
4) Enable email threading:
○ Maintain relationships between email conversations
Do I Need Near Duplicate Detection?

Applying Near Duplicate Detection
Large Groupings Accelerate Review
Feature Description
Report identifies Near Dup Groups in
a case based on extracted or OCRed
text
Benefits
⃝ Accelerate document review by
batch coding (using multidoc edit)
larger groups
⃝ Increase coding consistency of
batched documents
⃝ Reduce privilege errors

Find Similar Versions of Key Documents
Example
Similar versions of a Key
Document are shown in
the Document Viewer
Benefits
⃝ Follow the trail from one key document to others.
⃝ Find key documents that would otherwise be missed

Prevent Inadvertent Privilege Release
Setup &
Planning
Collection
Culling &
Analysis
Processing
Depos &
Motions
Review &
Production
Beware of Inadvertent Privilege Release
○ Larger cases have put a strain on accurate privilege review.
○ Finding 9 versions of a privileged document doesn’t help if you
release version 10.
○ Nothing is more costly than compromising or losing a case
because of privilege disclosure.
○ Claw-back agreements a good idea, but no panacea.
“You can’t unring a bell.”
Applying Near Duplicate Analysis

Example Case:
Thorncreek Apartments III, LLC v. Village of Park Forest (N.D. Ill.
2011)
○ At issue were six documents produced by Defendants to
Plaintiffs, but attorney-client privilege was claimed
○ Court determined that the Defendants were negligent by failing
to check the production database created by a third-party e-
discovery vendor before it became available to opposing counsel
○ Court found waiver, relying in part on long period of time after
production before attempting to clawback documents and failure
to timely prepare a privilege log.
○ Even if the court allowed clawback, the sensitive information
would have already been disseminated.

Setup &
Planning
Collection
Culling &
Analysis
Processing
Depos &
Motions
Review &
Production
Minimizing Risk of Privilege Release
○ Understand the Privilege Review process undertaken in detail.
○ Build dictionary of privileged sources and issues early in doc review.
○ Check for: untrained or sloppy review; unsearchable documents;
incomplete search indices; poor redaction procedures; search not done
in metadata and full-text; privilege text retained in natives, text files,
load files, text-based PDFs.
○ Use specialized computerized privilege checks for container
(email family) consistency, exact-dup and near-dup
identification.

Example
⃝ Privileged documents
found 9 out 10 times,
but one missed
Benefit
⃝ Find privileged documents with text similarity
that can be easily missed otherwise

Catch Privilege Inconsistencies
Feature Description
Report identifies inconsistently coded
privilege and work product codings
Benefits
⃝ Reduce privilege errors
⃝ Avoid sole reliance on human
coding consistency
⃝ Establish safeguards to help
maintain privilege

Email Threading
Feature Description
Group email messages that have
similar text representing a
conversation thread
Benefits
⃝ View email chains with similar text
in date & time order
⃝ Avoid confusion of emails only
tangentially related (<50% text
overlap)
⃝ Consistently code email chains for
responsiveness, privilege, attorney-
eyes only, etc.

Included with Lexbe eDiscovery Platform
○ Near Duplicate Identification is included at no
additional cost in Lexbe eDiscovery Platform.
○ You can automatically apply ‘NearDup’ to documents you self-
upload into the platform to group similar documents and
review for privilege coding consistency.

Applying ‘NearDup’ in The Cloud
Lexbe eDiscovery Platform
● Self-administration
● Native (Office, etc.) processing
● Automatic OCR
● Early case analysis
● Dual-index search
● Exact & near-dup ID
● Doc Review & issue tagging
● Blended productions
● Transcript management
● Timelining, depo prep
● Dispositive motions
● Trial document management
Cloud-based litigation document
management software
FEATURES

Included in Processing Services
We apply NearDup Groupings+ to the following processing services
at no additional charge:
○ Native Processing+ (TIFF)
Convert Outlook, Microsoft Office, and other native file types for
review in in-house TIFF-based systems
○ Native Processing+ (PDF)
Convert Outlook, Microsoft Office, and other native file types
into searchable PDFs for review
○ Native Extraction+
Prepare case data for native or near native review

Security & Data Ownership
What to look for in litigation cloud service offerings:
○ Encryption
Data encrypted (256-bit or above) in-place and in-transit.
○ Data Center Certifications
Data centers should be certified, follow industry best standards, etc.
○ Clear Ownership Rights
Service agreements should clearly acknowledge client data ownership.
○ Redundant Back-Ups; Recovery
Service provider should have robust and redundant backup & recovery protocols.
Applying ‘NearDup’ in The Cloud

Summary
Use ‘NearDup’ to Improve Doc Reviews
○ Faster Review
Group Incoming Documents by Similarity for faster, more
efficient coding.
○ Find Hot Docs
Find hidden ‘hot’ documents with similar content to files you’
ve already marked as being particularly important to a case.
○ Prevent Privilege Release
Identify documents containing privileged information that
haven’t been consistently tagged before producing them to
opposing counsel
○ Better Email Review
Easily and coherently review through email conversations
threads with different custodian sources.

Thank You
Contact Info
Gene Albert:
Lexbe Principal
gene@lexbe.com
(512) 686-3382
Stu Van Dusen:
Marketing Manager
svandusen@lexbe.com
(512) 843-7672
Lexbe Sales: sales@lexbe.com
(800) 401-7809 x22
Webinar Questions: webinars@lexbe.com

Lexbe eDiscovery Webinar- Best Practices: NearDup

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

Lexbe eDiscovery Webinar- Best Practices: NearDup