|
|
The Bell Curve & Document Indexing/Imaging
This white paper applies the “bell curve” from statistics class to the
document collections produced during discovery. Out of every thousand
cases, how many involve 1,000,000+ documents? 100,000+? 100+? And what
does this distribution suggest regarding document indexing and the
possibility of using Adobe Acrobat PDF files for document images.
Remember the "bell curve" from statistics class? The bell curve, so
named because of its shape, illustrates the frequency distribution of
many phenomena, for example, height. Measure a thousand people. For
every person over 7', you'll have a mob between 5'6" and 5'10".
Let's apply the bell curve to the document collections produced during
discovery. Out of every thousand cases, how many involve 1,000,000+
documents? 100,000+? 100+? And what does this distribution suggest
regarding document indexing and image handling strategies?
Giant Cases – Special Tools
Required
We’re all familiar with cases in which millions of documents are
produced during discovery. But we've also seen individuals over 7' tall.
Both are outliers occurring infrequently. Out of every thousand cases,
maybe two or three have 1,000,000 or more documents.
Cases with document collections of over 100,000 are also relatively
rare. Do even a hundred cases out of every thousand involve this many
docs? Widespread use of email has definitely increased the volume of
documents produced in many cases, but it hasn't turned every case into a
document monster.
Dealing with 1,000,000+ documents or even 100,000+ justifies a
substantial investment in scanning and coding. This type of case also
demands sophisticated software products such as Concordance, iCONECT,
IPRO, Litigator's Notebook, or Summation to assist with document
indexing, image handling, and more.
Our CaseMap case analysis tool includes document indexing functionality
analogous to that found in the applications mentioned above. However,
while CaseMap is the place to organize facts and issues for every size
case, it isn’t intended to be the document indexing solution for mammoth
cases. CaseMap isn't philosophically or physically designed to house an
index of 100,000+ or 1,000,000+ documents. (Please note that most
products used in mass document cases, e.g., all of the products listed
above, feature tight integration with CaseMap. This integration makes it
easy to marry your CaseMap fact analysis to the source documents stored
in another application.)
So that's the story for the giant cases lurking out in one tail of the
bell curve. But what about the cases that fall under the rest of the
curve? How many documents do these cases involve? What's an appropriate
image handling and document indexing solution for them?
Normal Cases – Perfect for
Adobe PDF Files
Cases with very small document collections fall at the other end of the
spectrum. For every 1,000,000 document case, there's a case that
involves, literally, a handful of documents. For every case with 100,000
documents, there's a case with 100. But these matters with only a single
redwell of docs are as atypical as those with massive quantities of
documents.
Which brings us to the approximately 70% of all cases that fall under
the center area of the bell curve. Our experience suggests these cases
have between 1,000 and 25,000 documents — a small number relative to a
gargantuan million doc case, but still a heap of paper. These matters
have far more documents than any trial team can memorize the details
about or can organize effectively without indexing and imaging.
The good news is that the ubiquitous Adobe PDF format provides an
excellent method for handling document images for this size case. It’s
likely you’re already adept at working with PDF files, perhaps in
connection with court-filing requirements. It’s also a good bet that
expert witnesses and even clients are familiar with using PDF files and
have either a full Acrobat license or the free Adobe Reader, making it a
cinch to share case documents.
The PDF format has become the de facto standard for electronic versions
of paper documents. Why? Because a single PDF file can contain the
images of all pages of the paper document as well as the associated
document text, which can be captured by optical character recognition
(OCR) software.
Prior to the introduction of the PDF format, the standard technique for
creating electronic versions of paper documents was to generate a series
of single page TIFF images and a separate OCR text file. Thus, scanning
a 15-page document would yield a total of 16 separate electronic files —
15 TIFFs and a text file. When scanning first became available, the Many
Electronic Files = 1 Paper Document approach was as good as it got and
undoubtedly beat having nothing at all. However, with the advent of the
PDF, which meant that 1 Electronic File = 1 Paper Document, it wasn't
long before this new format ruled the roost.
The argument in favor of using PDFs to handle document imaging on small
to mid-sized cases gained strength following Adobe's release of Acrobat
6. It has become stronger still with the introduction of Acrobat 7.
These new versions offer numerous enhancements that are particularly
useful to trial teams. For example, Acrobat 6 offers improved document
mark-up functionality. It also provides cross-PDF text searching. This
feature permits you to search a folder containing any number of PDFs for
those documents that include a particular word or phrase.
Normal Cases – Perfect for
Document Indexing in CaseMap
On any case where you use Acrobat PDF files to handle document images,
please consider employing our CaseMap case analysis software to manage
document indexing and to organize the PDF collection.
Every CaseMap file contains a fact spreadsheet, a cast of characters
spreadsheet, and an issue spreadsheet that are designed for use in even
the largest of cases. In addition, each CaseMap file includes a document
spreadsheet that provides an easy-to-implement, yet powerful document
indexing solution appropriate for matters with small to midsized
document collections.
The document spreadsheet in each CaseMap file contains a series of
predefined columns for building a detailed doc index. These columns
include Date, Bates-Begin, Bates-End, Description, Type, Authors,
Recipients, Copied To, Mentioned In, and Linked Issues. There’s also a
Linked File column, which can be used to connect the document index
spreadsheet to PDF files. Something missing? Select from other
predefined columns or easily create any number of new columns from
scratch.
The display of the document spreadsheet can be customized as you desire.
Choose which columns are included in the spreadsheet and the order in
which they appear. Determine whether spreadsheet rows are listed by
document date, by Bates number, or by some other criterion.
The spreadsheet can also be easily filtered based on values in any
columns. For example, it takes two mouse clicks to filter the
spreadsheet down from all documents to just those written by a
particular witness, or related to a particular case issue, or flagged as
being critical. Compound filters can be run as well. For example, filter
the document spreadsheet down to just those documents drafted by a
specific author and sent to a specific recipient.
Dozens of reports can be generated from the document spreadsheet.
CaseMap’s printing is What You See is What You Get. When you add a
column to the spreadsheet, hide a column, change the sort order, run a
filter, etc., a report based on this revised spreadsheet view is just
one mouse click away. Each CaseMap report automatically includes a
polished and customizable cover page.
In addition to printing reports to plain old paper, they can be output
as PDF files using CaseMap's built-in PDF creation capability that
doesn't require a full Acrobat license. In two mouse clicks, you’ll have
a PDF of any document index report. CaseMap also offers Send to Word and
Send to WordPerfect functionality that creates an editable table of
spreadsheet information directly in a word processing document. It takes
just three mouse clicks to accomplish this task. Finally, CaseMap's
unique ReportBook feature makes it easy to include one or more document
based reports in a compilation of reports from any of CaseMap’s five
primary case analysis spreadsheets.
Another benefit of using CaseMap to create document indexes for cases
with small to mid-sized document populations is the availability of many
ease-of-use features that aren't present in the industrial-strength
software products required for the 1,000,000+ document case.
For example, CaseMap offers live spell checking and autocorrect just
like Word and WordPerfect. We’re all hooked on live spell checking’s red
squiggly line and right-click list of spelling suggestions. This and
many other ease-of-use features in CaseMap save energy and result in a
higher quality document-index work product.
The Tight Integration Between
Acrobat & CaseMap
Acrobat can certainly be used to manage document images without
employing CaseMap to organize knowledge about these documents. And vice
versa. However, the tight integration between these applications
provides a strong incentive for using them in tandem.
The document descriptions in a CaseMap spreadsheet are easily linked to
PDF files containing document images. Once PDFs are linked to a CaseMap
file, they can be opened in one click anytime you review the document
index.
While links between CaseMap and PDF files can be made manually, CaseMap
offers a feature that automates the process — the Adobe PDF Bulk
Importer. Point this utility at any folder containing PDFs of case
documents. CaseMap analyzes the contents of the folder, creates a new
row in the document spreadsheet describing each PDF, and links each row
back to the PDF file so it can later be displayed with one click. Say
you have a folder containing 500 PDFs. It would take less than two
minutes for CaseMap to process these files and add 500 rows to the
document index, each linked to the appropriate PDF.
The PDF Bulk Importer is designed to handle the growth of your
collection of document images over time. When additional discovery
documents are scanned as PDFs, the Bulk Importer can be run again. It’s
smart enough to skip PDFs that have already been processed and to only
add spreadsheet rows and links for new documents.
Another important example of Acrobat/CaseMap integration is the “Send to
CaseMap” Plug-in that’s available for Acrobat. The “Send to CaseMap”
Plug-in makes it possible to cull critical knowledge from PDFs and
instantly organize it in CaseMap.
Here’s how the “Send to CaseMap” Plug-in works following installation:
Open a PDF of a case document and start reading. When you spot an
important passage, select it and click the Send to CaseMap button the
Plug-in adds to Acrobat’s toolbar. A new fact containing the text
selection is created in CaseMap and linked back to the specific section
of the PDF.
As you later review your fact chronology spreadsheet in CaseMap, a click
on any fact sent from a PDF file reopens the PDF document and reselects
the original excerpt in context. The “Send to CaseMap” Plug-in works
with the full version of Acrobat and also with the free Adobe Reader.
When you use the Acrobat/CaseMap combination to handle document imaging
and indexing, also consider our TextMap transcript summary utility for
dealing with electronic transcripts. TextMap indexes all words in case
transcripts so they are easily searched for any word or phrase. While
reviewing a transcript, key passages can be selected and sent to CaseMap
using the same process you’ve mastered by employing the “Send to
CaseMap” Plug-in for Acrobat.
Here's a final tip for any reader who's yet to experiment with document
imaging: Using CaseMap and Acrobat together is a great way to get
comfortable using electronic documents without jumping into the deep end
of the pool. Don't scan every case document until you're sure it's worth
the effort. Instead, identify the 100 or so most critical documents,
have these scanned as PDFs and linked to your CaseMap file. You'll be
able to evaluate the benefits of using electronic versions of case
documents with a minimal investment of time and expense.
Not All PDFs are Created
Equal
A word of warning before you fire up the office scanner or ship boxes of
documents to a scanning service. Take care or you could end up with PDFs
that contain images, but not text.
If there’s no text in your PDFs, you won’t be able to use Acrobat to
search the collection of PDFs for those containing specific words or
phrases. If there’s no text in your PDFs, there’s nothing to select when
using the “Send to CaseMap” Plug-in for Acrobat.
To produce PDFs of discovery documents that contain text in addition to
images, the documents must be processed using Optical Character
Recognition (OCR) software when scanned. Don’t assume this work will be
done automatically. Make sure the individuals performing document
scanning have a clear picture of your expectations.
Conclusion
CaseMap's traditional long suit is fact and issue analysis. For the vast
majority of cases, it can also be an ace document indexing solution.
Similarly, while Acrobat is a poor image-handling/text searching
solution for the matter with a gazillion docs, it's a great one for the
average matter. And, when used together on appropriate cases, the unique
integration between Acrobat and CaseMap enhances the value of both
tools.
If you have yet to try the Acrobat/CaseMap document management strategy,
please review your current cases and find one or two where you can put
it to the test. We would be glad to show you the Acrobat/CaseMap
integration at work. Just write us for a quick phone tour at phonetour@casesoft.com.
Don’t have Acrobat? Download the free Adobe Reader at www.adobe.com. Not
a current CaseMap client? Obtain a full-featured trial version of
CaseMap and the “Send to CaseMap” Plug-in for Acrobat at
www.casesoft.com
About the Author
Greg Krehel is a co-founder of CaseSoft. If you have questions or
suggestions about this article, please contact him at gkrehel@casesoft.com
or 904.273.5000 x233.
Greg has written a series of other white papers you may find of
interest, e.g., “Chronology Best Practices” and “Creating and Using
Issue Analysis Memos.” You can obtain PDF versions of these articles at
no charge by visiting www.casesoft.com/articles.asp.
About CaseSoft
At CaseSoft, we develop five software tools:
• CaseMap – our case analysis tool
• TimeMap – our timeline graphing tool
• TextMap – our transcript summary tool
• NoteMap – our outlining tool
• DepPrep – our witness preparation tool
Full-featured trial versions of all five products are available for free
download at www.casesoft.com.
Our tools are in use at 10s of 1000s of small and large law firms,
government investigative and
prosecutorial agencies, and private investigation and forensic
accounting firms. Two examples of our many
clients: The United States Attorney's Office has 15,000
CaseMap/TimeMap/TextMap license sets and the
Securities and Exchange Commission has a CaseSoft Suite Enterprise
License for 1,100 users.
We’re totally devoted to providing every client with excellent support
and training.
|
|