FTF-Identify-Unsearchable-PDFs-Within-Folders

Fast Tip Friday | PDF Files

Fast Tip Friday – Identify Unsearchable PDFs Within Folders

ByAmy Bowser-Rollins 05/13/201608/02/2021

This fast tip demonstrates how to use a free tool called Count Anything to identify which PDF files are unsearchable.

Sean O'Shea shared this tip in his article entitled Where Are My Unsearchable PDFs.

Source: Sean O'Shea

Fast Tip Friday | MS Excel

Fast Tip Friday – Create Excel Pivot Table to Count Values in a Column

ByAmy Bowser-Rollins 05/05/201608/02/2021

This fast tip demonstrates how to create a simple pivot table that will count the unique values in a column. Download Sample Files

Fast Tip Friday | Text Editors

Fast Tip Friday – Combine Multiple Text Files Using TextPad

ByAmy Bowser-Rollins 01/22/201608/17/2021

This fast tip demonstrates how to easily combine multiple text files using a feature in the TextPad editor.

Directory Listings | Fast Tip Friday

Fast Tip Friday – Batch Rename Adding a Prefix and Padding

ByAmy Bowser-Rollins 05/29/201508/17/2021

This fast tip will demonstrate how to rename generically numbered TIFF files to filenames that include a prefix and zero padding. The tool is a free software program called Bulk Rename Utility. Download Sample Files

Fast Tip Friday

Fast Tip Friday – Using an iPhone to Subscribe to a Podcast

ByAmy Bowser-Rollins 12/17/201508/06/2018

This fast tip demonstrates how to subscribe to an iTunes podcast using the free Podcasts app on the iPhone. There are more and more new podcasts starting up in the legal industry. Use iTunes to search for the ones you might be interested in and then subscribe to them from your mobile phone using one…

Fast Tip Friday | Website Snapshot

Fast Tip Friday – Save a Web Page Offline Using Chrome and Firefox

ByAmy Bowser-Rollins 09/15/201608/17/2021

This fast tip demonstrates how to save a web page to your hard drive.

Fast Tip Friday | Website Snapshot

Fast Tip Friday – Monitor Web Pages For Changes

ByAmy Bowser-Rollins 06/09/201608/17/2021

This fast tip demonstrates how to use a free service that will monitor a web page for changes (additions/deletions) and send an email notification. It might be helpful to monitor docket listings or a web page for a specific client or case.

7 Comments

James Bell says:

05/13/2016 at 9:10 am

That is a pretty neat tool! Thank you for sharing.

Reply
1. Amy Bowser-Rollins says:
  
  05/13/2016 at 1:41 pm
  
  I agree. Sean finds good stuff.
  
  Reply
mgolab says:

05/15/2016 at 5:43 pm

Good one. It has a commandline facility as well, which means that you could have a kind of automated process (for identifying and then OCRing) if you wanted to.

Reply
1. Amy Bowser-Rollins says:
  
  05/15/2016 at 6:42 pm
  
  Hey Matthew – I noticed that too. I wouldn’t be surprised if Sean tried something like that. He’s a smart cookie.
  
  Reply
2. Eliot says:
  
  05/20/2016 at 10:08 am
  
  This command line idea has instant utility! I would be interested in seeing that work. Would it depend on a command prompt script? First, a script to call the Count Anything app to identify documents in a selection, then save a text delimited version in a specified location. Next we set up the excel spreadsheet and Identify the ones we want to OCR.
  
  Another command prompt to identify the selection of filenames, then OCR the documents in the background. I theoretically could use a VBA to do this, also, once the filenames are in excel. This solves a major issue I’m having with OCRing PDFs– the problem of one-by-one OCRing each one.
  
  I’d like to discover a way to OCR all the PDFs in the background. Right now, my database identifies the documents that need OCR, but it occupies my machine’s memory/processes to do so.
  
  Thanks for sharing Sean’s find.
  
  Reply
  1. mgolab says:
    
    05/21/2016 at 3:18 pm
    
    What I was thinking was:
    1) you have a tranche of PDFs, and you make a file/folder listing
    2) manipulate the file/folder listing so that you call up Count Anything for each file – say for example you have 2GB or 1,000 files, then what I would do is to sort this into say 4x 500MB chunks (your file listing would need to have the file size and you’d want to have an Excel formulae (or something) to work out a cumulative size)
    3) initiate each instance of the 4 commandline scripts (ie batch files) instantanesously – where each output to a unique name
    4) parse the output – I’m not a VBA wiz so this would be a manual step, however I think there is a way to parse a text file for a specific string of text where you want to know which files don’t contain text – ie count is zero or whatever the output is
    5) copy those [naughty] files that require OCRing to a ‘hot folder’ or somewhere
    6) if you have something like ABBYY or an equivalent which monitors a hot folder then it would automatically OCR the contents of the hot folder
    7) alternatively, then use your OCR weapon of choice – I’d also be looking at load balancing by cumulative file size as otherwise you get one machine completing a task quicker
    8) go get a coffee and reflect on the bad old days of how miserable you were with OCRing manually
    
    We have a fleet of virtual machines that are our workers to do things like this, not yet optimised for load balancing and farming out jobs, but still pretty good as we get to save significant time by doing lots of things in parallel.
    Good luck.
    
    Reply
Pingback: Finding the Right Resources: Terminology, Tips, and Tricks of the Trade - The Chronicle of eDiscovery

Similar Posts

7 Comments

Leave a Reply Cancel reply