This document discusses the importance of addressing key technical specifications upfront when managing electronic discovery (ESI) to avoid costly mistakes later in the process. It outlines several areas that should be agreed upon during an initial project meeting, including time zone handling, de-duplication approach, file formatting, and metadata extraction. Addressing these details early allows attorneys to properly scope the ESI collection and review.
1. Improve
Electronic
Discovery
Results
by Focusing on
the Beginning
By Larr y Lieb
Larry Lieb is a National Director at Esquire Litigation
Solutions. He has extensive experience related to
electronic discovery and computer forensics. He is the
former Executive Director of the Litigation Support
Vendors Association, an organization dedicated to
establishing and maintaining quality standards and
professional certifications. Mr. Lieb is a graduate of the
University of Illinois at Urbana-Champagne.
2. Prior to the advent of electronic discovery, par- to the situation. The FRCP in effect has made at-
ties to major litigations had to make a decision early torneys personally responsible for the disposition
in the discovery process to either scan and code the of their client’s electronic evidence (see Qualcomm,
documents for review, or to manage the discovery Inc. v. Broadcom Corp., 2008 WL 66932 (Jan. 7,
collection using paper documents. If the creation of
2008), vacated in part, 2008 WL 638108 (S.D. Cal.
an image-enabled database was approved by a cli-
March 5, 2008); Phoenix Four, Inc. v. Strategic Re-
ent, the vendor decision and all key technical speci-
fications were pushed down from the billing partner sources Corp., 2006 WL 1409413 (S.D.N.Y. May
to the partner in charge of discovery, and from there 23, 2006). On top of personal responsibility, costs
to an associate and finally to a paralegal, if no offi- related to the identification, harvesting, processing,
cial litigation support staff existed. This was accept- hosting, review and production of electronic discov-
able, if not ideal, in the day of paper and scanned ery have increased exponentially, putting attorneys
images. at risk if vendor and review costs are dramatically
The shift in discovery to electronically stored in- underestimated.
formation (ESI) has exacerbated the problems inher-
So how should attorneys and their litigation sup-
ent in a hands-off approach by attorneys. There are
port staff address critical technical details of the
serious implications if the technology involved in the
discovery process is not understood. Certain details discovery process as it relates to the production and
must be understood and agreed upon in order for a exchange of ESI? This article will cover the major
database to provide the reviewers what they need to technical areas that must be decided upon up front
do their job. in order to avoid time-consuming and expensive re-
The recent Federal Rules of Civil Procedure work, an possible mistakes, once ESI has been pro-
(FRCP) changes relating to ESI have added urgency duced to and received from opposing counsel.
Project Meeting
The specifications described below should be by counsel or their trusted surrogates following
covered in a project meeting prior to the com- the project meeting. Most good vendors have a
mencement of processing ESI, whether it is done robust project setup document wherein all of the
in house or by outside vendors. These details below items can be recorded for approval by at-
should be documented and approved in writing torneys.
General Processing Specifications
Time Zone. Determine the time zone in which the pensate for time zone differences without resetting
ESI originates and relay that information to the pro- the internal clock of the computer running the pro-
duction team prior to processing to assure that dates cessing software. However, many applications sim-
and time stamps come out correctly. For instance, if ply process dates and times based upon the setting of
ESI originates from the Pacific Time Zone, but the the production computer.
clock on the computer or software processing the De-duplication. ESI collections often contain
ESI is set to Eastern Time Zone, the four-hour dif- many copies of the same correspondence sent to dif-
ference could make dates appear to be one day later ferent people. As a result, if these duplicate e-mails
than they really are. If one is exchanging metadata, are not removed during processing, attorneys will
including dates, as part of a document production, be tasked with reviewing the same document over
wrong dates as the result of incorrect time zones may and over again, leading to increased review time and
result in producing documents with incorrect dates. higher client bills. Therefore, “de-duplication” has
Some electronic discovery software is able to com- become a commonly requested service. Unfortunate-
4 £ EDRM
3. … the actual technical process behind de-duplication
ly, this detail should be discussed up front during the
is usually poorly understood. project meeting.
There are two main methods used to de-duplicate Exception Files; Files that Will Not Process. The
e-mails and loose files in a collection: through a com- bane of any electronic discovery collection is semi-
parison of their metadata fields such as, “To, From, corrupt, corrupt, or password-protected files. In
CC, Subject/Re:” or a hash value. E-mails are typi- general, approximately 95 percent of most ESI col-
cally de-duplicated via their metadata fields whereas lections will process without issue. By definition, a
loose files such as PDF, Excel, Word and PowerPoint file that is able to be processed can be successfully De-duplication
presentations are compared using their hash value. converted to a TIFF image and have its full-text and has become
A hash value is a unique value generated by running metadata extracted. However, the remaining five an accepted
the zeros and ones that comprise a file through a percent of files will not cooperate with processing
mathematical algorithm. Changing just one charac-
practice to
software and get moved to an “exception list.”
ter of a Word file, for example, will alter the hash reduce attorney
Many hosted ASP solutions also offer “TIFF
value that is calculated for that file. on the fly” capabilities that can cause problems. review time and
The attorneys typically choose either to de-dupli- TIFF’ing ESI is part science and part art and should ultimately client’s
cate the entire, global collection or just within indi- be approached with caution. The default TIFF’ing bills. However, it
vidual custodians. Global de-duplication is usually practice for most production software is to use a
more effective in reducing the total collection, but is recommended
file’s native application, such as Microsoft® Excel,
adds to the amount of processing time as each new to open and TIFF Excel files. Sometimes, a generic that opposing
e-mail needs to be compared to every e-mail that pre- file viewer is used to print ESI, such as Stellant’s (now parties agree
ceded it in the processing queue. owned by Oracle) Quick View Plus, when the native up front that
If a loose Excel file is processed first, then all fol- application is having trouble printing a given file.
lowing examples of that Excel file could be removed de-duplication
Semi-corrupt files are files that will open, but will
from the collection under de-duplication rules. How- not print. One workaround method to address semi- will take place
ever, most attorneys would prefer that attachments corrupt files is to generate an image by performing and address
to e-mails not be removed from the collection, even a “print screen” of the opened file. This is a manual the question
if they are duplicate files. If duplicate attachments process that takes time and expense, but it may be
to e-mails are removed during processing, this could regarding
important in the event that opposing counsel has a
become a problem for reviewing attorneys later on. copy of the semi-corrupt file and goes the extra mile global versus
De-duplication has become an accepted prac- towards processing it. custodian level
tice to reduce attorney review time and ultimately de-duplication.
Corrupt files are files that will not open at all. It is
client’s bills. However, it is recommended that op-
common for attachments to e-mails to become cor-
posing parties agree up front that de-duplication will
rupt during transit and thus require a resend. Most
take place and address the question regarding global
ESI processing software will place a TIFF image
versus custodian level de-duplication.
placeholder in the final deliverable that reads “cor-
Creating Image Files. When e-mails and their at- rupt file,” thus alerting counsel and possibly oppos-
tachments are processed, typically an image file or ing counsel that a file existed in the collection but
“TIFF” file is created. This TIFF image can be elec- could not be opened and processed. In general, most
tronically redacted and Bates labeled, thus creating a batches of ESI that are processed for review will
production set. Most times, TIFF images are created generate a list of exception files. It is important to
in black and white if color is not gerund to the un- acknowledge and address these files if they become
derlying matter. However, if a case revolves around important later in the case.
intellectual property such as the color of a product’s
packaging, then black and white TIFF images may Most vendors will have access to password crack-
not be sufficient. It is important to discuss the na- ing software that can get through most common file
ture of the case during the project meeting from this types. Generally speaking, vendors will provide a list
standpoint. of password-protected files to their clients first to
In most cases, single page TIFF images are accept- avoid the charge of cracking passwords. Lotus Notes
able to both sides. However, some production agree- files, which typically come in the form of “.NSF” are
ments call for multipage image file formats, such as so secure that cracking their passwords is typically
Adobe Acrobat PDFs. If opposing counsel is expect- impossible. If a discovery collection contains Lotus
ing PDF files, with embedded, searchable OCR, then Notes e-mails, it is important to make sure they have
FALL 2008 £ 5
4. been created without a password during the collec- Assigning Image Key/DocID. In either example
tion process. All password-protected files should above, it is important to decide upon an “image key”
be included on the exception file list that is sent to or “DocID” for either the extracted native files or
counsel for further direction. TIFF images. This image key concept is similar to a
Review Platform. Relay to the outside vendor Bates number in that a unique value must be assigned
or opposing counsel what type and version of your to files during their processing. A sample DocID
particular review tool will be used in order to avoid would be: JSMITH0000001. Keep in mind that as
costly rework once data arrives for loading. Unlike one gigabyte of ESI can process out to 300,000 im-
the old days when the mere fact that a review tool ages, it is probably prudent to include a seven or
was being employed was kept a secret, it is assumed eight place numeric component to the DocID. After
a production subset of images is created, an actual
today that some form of automated database tool
Bates number can be burned on to the TIFF images
will be employed to manage large discovery collec-
using a consecutive numbering scheme.
tions. It is possible to create a detailed specification
sheet that can be provided to vendors and oppos- Database Fields. Whenever ESI is processed for
ing counsel time after time as there are specifications use in a database, metadata fields are extracted. Al-
that will not change from case to case. though Microsoft files contain more than 300 differ-
ent types of metadata fields, only a select portion are
If an online hosted solution is being used, it is
relevant to attorneys. Examples of relevant metadata
still possible to create a delivery specification sheet.
would include To, From, CC, SentDate, and Subject/
Many leading hosted solutions will accept a Concor-
RE. One area of confusion revolves around meta-
dance “DAT” file as a load file format. A “DAT” file
data extracted from e-mails versus their attachments.
is typically used to store the extracted metadata and
Oftentimes, the only useful pieces of metadata that
perhaps full-text, or body of e-mails and files. Just
can be extracted from attachments are the file name
be aware of issues involved in text delimiters and and type of file. The extracted metadata field “Au-
field headers. thor” for Microsoft Word files may be irrelevant and
Native Review versus Image Review. For small misleading as this piece of metadata refers to the
collections, typically three gigabytes or less, many original creator of a file, not the actual author of the
attorneys choose to have all ESI processed to TIFF contents of a file.
images prior to review. One gigabyte alone of e-mails Create a standard list of metadata fields that at-
and their attachments can process from 50,000 up to torneys would like to see extracted during processing
300,000 TIFF images. Generally speaking, 20,000 or provided by opposing counsel as part of a produc-
TIFF images can be one gigabyte in size and there- tion. This will make the task of loading and creat-
fore one gigabyte of native ESI could result in stor- ing a review database much simpler. These standard
age requirements of fifteen gigabytes or more. TIFF fields, which could comprise the above described
images are large files that occupy a lot of space on Concordance DAT file, should be part of a firm’s
the server, so it is important to alert the appropriate technical standard.
IT professional that a lot of storage space will be Filtering Options. There are several common
needed. Also allow for a long time to copy and load techniques that can be applied to an ESI collection
more than fifteen gigabytes of data and database to reduce its size prior to review. These filtering tech-
files to a network. niques should be agreed upon up front with oppos-
One alternative is to process ESI to extracted ing counsel so that no questions arise later as to why
metadata, full-text and links to native attachments. only certain documents have been produced. The
After e-mails and their attachments have been re- most basic technique, de-duplication, has been ad-
viewed and a subset of documents have been flagged dressed earlier in this article. The three final filtering
as responsive, a list of those responsive documents techniques include: key word, date range, and file
can be provided back to the vendor or internal pro- type filtering.
cessing group for TIFF’ing. As TIFF’ing documents A list of key words can be agreed upon by oppos-
takes a long time and creates a huge collection of ing counsel prior to collection and processing of ESI.
files that need to be loaded for review, native review Occasionally litigants themselves will use key words
prior to TIFF’ing is an attractive alternative. to search e-mail servers for responsive files. How-
6 £ EDRM
5. ever, most IT tools are not capable of running key image fidelity can be achieved by using the native
word searches against archived or compressed files application to print a file. However, some semi-
such as ZIP files. If a Word file contains a key word, corrupt files can only be printed by a generic file
but is inside of a ZIP file attached to an e-mail, the viewer. The drawback of using a generic file viewer
key word search will typically not return that file is that some formatting can be lost from the origi-
during a search. nal settings. For example, Excel spreadsheets can
Filtering by date range is a common technique be forced to always print out with gridlines and all
that can reduce a harvested collection to just those hidden columns and rows revealed. Excel spread-
files that fall within the period specified by a discov- sheets are the most challenging file type to process
ery request. If e-mails are harvested at the custodian as they are rarely set up with printing in mind.
level, by mailbox for example, it is likely that all of One common discovery mistake is not discussing
the custodian’s e-mails will be collected, including the format in which ESI will be exchanged and to
those that extend beyond the scope of the discovery move forward with paper printouts. Opposing coun-
request. Date range filtering can be employed during sel may make a good argument that printed Excel
processing to remove e-mails that are outside of the spreadsheets are not acceptable as they need to have
desired time period from the final deliverable. access to formulas. Therefore, it is important to agree
Filtering by file type is a technique that can be up front on the format that ESI will be exchanged in
employed to remove system files from an ESI col- prior to processing, or counsel risks an expensive pro-
lection, or to target only those file types that are cess to go backwards
necessary for review. Text files with the extension, to native files.
“.TXT” are often included in processing, but turn
out to be system files containing useless informa-
tion. It is dangerous to exclude all text files, though,
as some may be actual correspondences that have
been saved as simple text. One area of concern
involves non-standard file types that may
derive from home grown or custom soft-
ware packages. If the underlying mat-
ter involves intellectual property, or Conclusion
specifically programming files, then
those special file extensions need
to be identified prior to process- The prevalence of electronic discovery, in conjunction with the new FRCP rules dictating
ing so that they are not sum- that issues related to ESI be addressed up front, indicate that that the time has come for
marily excluded. attorneys to address technical issues, or be versed in the issues to a greater extent.
Printing Options. ESI
Given that most law firms use the same review platforms over and over again, it is possible
processing software typi-
cally uses a given file’s na- to create a firm-wide technical standard governing the processing of ESI. All of the sections
tive application to open addressed above can be memorialized into one document that can be used from case to case,
and print that file during as well as given to opposing counsel prior to the exchange of discovery. Each of the areas
processing. The printing that have been addressed deal with important technical decisions that, if left to the vendor
process involves the cre-
or internal processing team, inevitably will not be the correct or desired path. To avoid this
ation of a TIFF image,
versus sending that file to situation, most experienced vendors have resorted to refusing to move forward with a project
a printer. Most ESI pro- unless attorneys sign off on a project setup specification sheet.
cessing software is config- Many discovery agreements do not go into enough technical depth to cover such detail as
ured so it can override the
text delimiters, filtering options, field naming conventions and more. As a result, litigation
print settings of a given file
that have been put in place support professionals within law firms oftentimes have to perform a rework of produced data
by the original author of that or data received from a vendor. A large majority of this rework can be avoided with a robust,
file. Generally speaking, the best up-front project meeting that results in a document memorializing all possible technical details.
FALL 2008 £ 7
6. Because Every Piece of
Esquire Electronic Evidence
Data Discovery Counts:
Services Count on Esquire
Esquire Litigation Solutions’
experienced staff ensures that all
documents are forensically sound.
They are carefully harvested, preserved,
organized, processed and exported
to protect against spoliation.
Count on Esquire to:
•
Preserve the integrity of the documents
•
Process files to your specifications
•
Provide a searchable online database
Other Esquire Litigation Support Services:
•
Full discovery services
•
Video services
•
Discovery repository and production
•
Trial preparation services
•
Trial presentation services
Contact us today for a free consultation
or download a white paper at
www.esquirelitigation.com.