It looks like you're using an Ad Blocker.

Please white-list or disable AboveTopSecret.com in your ad-blocking tool.

Thank you.

 

Some features of ATS will be disabled while you continue to use an ad-blocker.

 

Converting Open Minds Forum html (an archived UFO website/forum) to PDF - help needed

page: 1
6

log in

join
share:

posted on Jan, 8 2018 @ 09:42 AM
link   
Some of you may be familiar with the Open Minds Forum that shut down a few years ago.

I've had the blessing of its founder, Brendan Burton, to convert an archive of that website/forum to a searchable PDF.

(I find it helpful to have lots of different types of UFO material converted to searchable PDFs...).

I've obtained a html archives of that forum from another ATS member, UFOradio, and had offers of further files from various old members of the Open Minds Forum.

I've done this with some other websites/blogs without any bother, using Adobe Acrobat's "Create from Webpage" function and then setting the options to convert the entire website.

However, when I've tried the same function with this particular archive, it crashes after converting about 30 or 40 html pages to PDF (which is a rather small fraction of the whole). I've tried 2 or 3 times and Adobe Acrobat crashes at (I think) about the same point each time.

This is the only method I've previously used for converting html archives of websites (or live websites) to PDF. I'm not sure how else to do it - particularly given my limited technical skills.

In case it helps, the html archive can be downloaded from this temporary link (with thanks to UFOradio):
mega.nz...#!5JhwARjY!r1eYqFvI-DS7NYTrpoANxz1hG15A8wc9RokXtU0X8HM

Since the above link is being a bit wrangled when it is displayed here, I've created a shortened Google link below to the same rar file containing the html archive of the Open Minds Forum:
goo.gl...



edit on 8-1-2018 by IsaacKoi because: (no reason given)




posted on Jan, 8 2018 @ 10:20 AM
link   
a reply to: IsaacKoi

Interesting, can you find / enable logging during the conversion ? and post here ?
hopefully that will narrow the task



posted on Jan, 8 2018 @ 10:23 AM
link   

originally posted by: Quadlink
Interesting, can you find / enable logging during the conversion ? and post here ?
hopefully that will narrow the task


Mmm. Good idea - but I've no idea of how to generate such a log from inside - or outside - Adobe Acrobat.

I can see a list of webpages as they are being converted by Adobe Acrobat (in a side panel of Adobe Acrobat) so will try the conversion again and see if I can tell visually from that list what page the problem occurs at - unless some method of outputting a log exists (and can be implemented by someone with my limited technical skills...).



posted on Jan, 8 2018 @ 10:34 AM
link   
a reply to: IsaacKoi

Why don't you just host it somewhere

Or actually if you don't mind. I can host it.

And about the pdfing. You're going to need a program other than adobe to pdf something that big.

From what I can see there's close to 50K pages in that zip file.

winrar says it will take 2 hours to unzip it all.
edit on 8-1-2018 by grey580 because: (no reason given)



posted on Jan, 8 2018 @ 10:37 AM
link   
a reply to: IsaacKoi

Sorry, I'm not savvy enough. But I'm bumping the thread so hopefully one of our tech ninjas will see it.



posted on Jan, 8 2018 @ 10:40 AM
link   
On second thought.

putting that much data into 1 pdf is not practical.



posted on Jan, 8 2018 @ 11:22 AM
link   

originally posted by: grey580
Why don't you just host it somewhere

Or actually if you don't mind. I can host it.



Cheers. I might get back to you about the hosting of it (which would need the permission of the person/people behind OMF).

I'm still keen to convert the material to a PDF (or set of PDFs) since I've found this format very useful for building up an archive I can search offline all at the same time (which currently includes PDF scans of a few thousand UFO books, PDF scans of a large number of official UFO documents, PDFs of dissertations about UFOs, PDFs of the archives of various UFO email discussion Lists etc etc).

Having to search different resources in different formats by different means would be more time consuming, which would mean getting less done.



From what I can see there's close to 50K pages in that zip file.


Yep, it's pretty big - although I think there is a lot of duplication due to most pages having a html and a plain text copy (so the number of pages may be halved).



posted on Jan, 8 2018 @ 11:33 AM
link   
a reply to: IsaacKoi

At that point someone will need to write a program that will go ahead and queue the creation of all those pdf files.

Then later on combine them.

Its a big job.



posted on Jan, 8 2018 @ 12:20 PM
link   
You say OMF, I say OMG...because that's definitely a rabbit hole worth digging through imo. That's one data base I'd personally like to search.



posted on Jan, 8 2018 @ 12:47 PM
link   

originally posted by: grey580
At that point someone will need to write a program that will go ahead and queue the creation of all those pdf files.

Then later on combine them.

Its a big job.


Eek, when technically qualified people start talking about a "big job", I start getting worried...

But I wonder if there isn't some quick and easy way of getting most of the material converted to PDF, e.g. by doing a sub-forum at a time (i.e. about 20 or 30 PDFs).

So if the problem is the total number of pages, or one page or set of pages within the mass of pages, we may be able to avoid the problem...

I've done a few quick experiments in the last hour.

By using the file name for one of the forums and setting the "Create from Webpage" options within Adobe Acrobat to convert 2 levels of links and ticking the "same path" option, I've quickly created a PDF of one of the smallest sub-forums (just in relation to OMF's posting rules). I've uploaded that small test PDF here:
we.tl...



Unfortunately, when I've tried in the last hour doing the same with a bigger sub-forum (the New Members sub-forum), it seems that 2 levels isn't enough to capture everything in that bigger sub-forum but setting the create from webpage options within Adobe Acrobat to 3 or more levels (even with the "same path" option ticked) seems to pick up random pages from over sub-forums - i.e. basically everything - eventually leading to the same crashing problem that made me start this thread (albeit, by coincidence or otherwise, this method resulted in it crashing after creating a larger PDF, over 600 pages long).



posted on Jan, 8 2018 @ 02:20 PM
link   
a reply to: IsaacKoi

600 pages. wow.

yeah you need alot of processing power and memory to make all those pdf pages.

However you know what you could do.

You could create a link inside the pdf to another pdf. Reducing the pdf size.



posted on Jan, 8 2018 @ 02:49 PM
link   
a reply to: IsaacKoi
HTML was designed to handle the cross-linking of data very efficiently.

A hyperlinked document contains many cross-links and the rise in the requirements to resolve such elements rises exponentially with the number of the links. This means that moderate HTML documents can require vast memory space to parse all the links.

Try installing a 32bit version of Acrobat and increasing your Windows page-file size. This will take much longer to process but hopefully won't drop the ball during conversion.

The memory allocation of 32 bit programs is limited and therefore they generally accommodate that by using memory more moderately.

The larger swap-file means Windows can page out physical memory before it runs out. You know it is working if your hard drive is 'thrashing' to accommodate the extra paging.

Of course, once you have completed the conversion, you can revert your settings to those that won't slow down general useage.

Once you have created .pdf version of the website, the issue with memory allocation of links remains and the resultant file may crash some systems for the same reasons it crashed during compilation.

I wonder if zipping the HTML files and their folders down to an archive might be a better way of producing a 'single file' version of the content.

You could also look at a 'Windows HTMLhelp Compiler' software for producing a single efficient cross-linked file readable by most systems.



posted on Jan, 8 2018 @ 05:59 PM
link   
a reply to: IsaacKoi

A very worthy cause!

Adobe Acrobat is for small, single pages (it is like a browser plug-in). I think you would need Acrobat (full version) or maybe even Acrobat Pro to convert an entire website.

There is probably some local geeky/nerdy group like a LUG (Linux User Group) that could probably help for free! First, just to see if Linux can do it (and if they have the knowhow to do so!), and add on the UFO stuff... nerd heaven!

Good luck!




posted on Jan, 23 2018 @ 07:13 PM
link   
a reply to: IsaacKoi

Sorry late reply.. been a little rough, hope you sorted the problem



posted on May, 14 2018 @ 11:21 AM
link   
Bumping for updates and to add my suggestion.

I would suggest trying to make it kind of a open-source project, more so than just with us here on ATS UFO Radio. Maybe make a request for help on reddit? I would love to help but I am not too computer science inclined. I barely know a little bit of java and html.

Anyways, good luck IK.




top topics



 
6

log in

join