It looks like you're using an Ad Blocker.

Please white-list or disable AboveTopSecret.com in your ad-blocking tool.

Thank you.

 

Some features of ATS will be disabled while you continue to use an ad-blocker.

 

Convert unsearchable PDF's into searchable files via Online OCR website

page: 1
3

log in

join
share:

posted on Jul, 13 2019 @ 07:05 AM
link   
I tried about 8 different applications on my computer before I looked for an online app. The first 3-4 failed miserably and did the same as the 8 software "solutions" I downloaded - it basically copied the PDF but there was no way to select or search text. So I did one more search for "online OCR pdf to docx file searchable" and the first result worked perfectly! granted it was the 4th result for search "online OCR pdf to text searchable" I tried a differet seach before that.

So the site is:

ocr.space...


It is more complicated than the other, but it has more options. I just selected the file (upload from my hard drive) then clicked the "Start OCR!" button, then click the "Download" button and it saves it as a .txt file.

This is really great if you need to get a list of things that has been saved on a PDF and you don't want to re-type everything. YOu will have to proof read b/c I just saw a P mistaken for an F.

Well, it's not a huge discovery but if people are doing research here and you have a PDF that isn't searchable (text wise), then this might be your best or only answer.




posted on Jul, 13 2019 @ 07:52 AM
link   
a reply to: DigginFoTroof

If you run your own linux server you can install nextcloud and tesseract for ocr and document management. No OCR that i am aware of is fast on-the-fly. It uses redis, so a lot of memory is a good idea. Also it is your scanners responsibility to fix the page orientation -- will not ocr unless text is LTR. I usually handle too sensitive of information to pass it through an online host, which is why i throw this out there. If you dont need online ocr you can run tesseract from cli to output a text document based on the pdf but im not sure how you would index this.

Try your online ocr engine with a rotated text as jpg or embedded tif in pdf. :-p
edit on 13-7-2019 by drewlander because: (no reason given)



posted on Jul, 13 2019 @ 11:05 PM
link   
a reply to: DigginFoTroof

Just checked with my tech officer (we have all Adobe suites) and was told that Acrobat Pro will convert scanned pdf to searchable. Money well spent, IMHO.



 
3

log in

join