It looks like you're using an Ad Blocker.

Please white-list or disable AboveTopSecret.com in your ad-blocking tool.

Thank you.

 

Some features of ATS will be disabled while you continue to use an ad-blocker.

 

The 154 GB NARA Blue Book Archive

page: 4
86
<< 1  2  3    5  6 >>

log in

join
share:

posted on Sep, 17 2014 @ 02:00 AM
link   
Nice one!

I'm doing the BitTorrent Sync now, will take a while but I have an uncapped connection so I can leave it downloading and then seeding.

I'm happy to add me and my machine to any OCR / manual transcribing effort, let me know what me to do and to what portion of the files.

Top job man.



posted on Sep, 18 2014 @ 12:45 AM
link   
Quick update:

After running Dupscout there are now: 129,545 pages (694 less from the original 130,239). Fold3 claims there are 129,658 pages total. So there are still potentially at least 113 missing pages.

Here is a new directory list and an update of the SHA1 of all of the files (2014.09.17_NARA-Fold3 Archive Listing.7z).
edit on 2014-9-18 by Xtraeme because: (no reason given)



posted on Sep, 18 2014 @ 02:32 AM
link   
Time for phase 2—

Right now it would be a huge help if people start looking for errors and missing files.

Once you find something that needs to be fixed (it could be as simple as a misnamed file), leave a comment in the thread with what the problem is and send Isaac or me a private message with your BT Sync device name and click the following link:

link.getsync.com...

Once we approve your device, this will gives you direct access to the read-write folder called "Community Updates and Fixes." In this folder create a copy of the directory where you found the error.

For example:

1957.05 - 6789913 - Berlin, Germany

Then inside the folder copy-and-paste a description of what the problem is and if you have found a solution drag and drop the fix in to the directory.

So to give an example of what you might do. There is a weird quirk with case 1957.05 - 6789913 - Berlin, Germany where Page 1 won't load. To make it easy to check in the future, in the event NARA ever adds the file, I included a link to the Fold3 website showing where the file should be.

Page 1 seems to be missing.url

Nothing too complicated.


If people start getting crazy adding things that aren't project-related I may decide to remove the "Community Updates and Fixes" folder altogether. For the time being though we'll just monitor things to make sure everything is on the up and up.

For those of you who want to provide transcriptions. That would be hugely useful as well. So by all means - jump in! Please, just say as much in your post so people don't duplicate each others work.
edit on 2014-9-18 by Xtraeme because: (no reason given)



posted on Sep, 18 2014 @ 02:52 AM
link   
a reply to: Xtraeme

Have you thought of asking them for a list of the documents?



posted on Sep, 18 2014 @ 02:55 AM
link   
a reply to: ArMaP

Hey Armap,

I tried a long time ago, but I doubt they realize there are files that are actually missing on their end.

Example: www.fold3.com...

Also contacting them won't help to find what is missing in the current collection that was skipped due to footnotereap's error. Even if we did get cooperation, the diff process would be pretty ugly. =)
edit on 2014-9-18 by Xtraeme because: (no reason given)



posted on Sep, 18 2014 @ 07:55 AM
link   
a reply to: Xtraeme

Hi Xtraeme,

I'm fairly sure few people realise how much can be done with the material you've shared (particularly if the work is distributed a bit)

Perhaps a little demonstration and a few homework exercises are in order (whether here or in a new thread, so that the homework exercises are at the start of a thread and hence will be clearer to more people) ???

Mmm. I can think of quite a list of useful projects...

...

In fact, using the material you've shared, I've just completed the first few tasks that occurred to me which may serve as a demonstration of the possibilities.



Examples of possibilities include the quick index I've made at the link below indicating the number of pages in each Project Blue Book file (something I don't think anyone has been able to list before today...).

docs.google.com...



edit on 18-9-2014 by IsaacKoi because: (no reason given)



posted on Sep, 18 2014 @ 09:49 AM
link   

originally posted by: Xtraeme
After running Dupscout there are now: 129,545 pages (694 less from the original 130,239). Fold3 claims there are 129,658 pages total. So there are still potentially at least 113 missing pages.


Thanks Xtraeme.


Is it possible for you (or, ideally, someone else - since I'd like to spread the work around a bit
) to:

(1) generate a list from the Fold3 website of the relevant cases and the number of document images for each of those cases?;

(2) automatically compare that number of document images with the number in the column in the spreadsheet below listing the number of page already downloaded for each case, so that we can identify the cases with missing pages?
Google Docs spreadsheet generated using Xtraeme's work


If not, is there some way to extract from the .txt file at your link below
(1) the highest page number per case (since the case has a list of files ending "page1.jpg", "page2.jpg" etc) and
(2) automatically compare THAT number with the number in the column in the spreadsheet above listing the number of page already downloaded for each case, so that we can identify the cases with missing pages?
(2014.09.17_NARA-Fold3 Archive Listing.7z)



Or is there some other way to work out what pages are missing?


We should then be able to:
(1) Fill in the gaps in the collection;
(2) Improve the above index;
(3) Start some statistical work, e.g. the number of cases per year, the number of cases per month, the total number of pages of PBB records per month/year, average number of pages of PBB per month/year etc etc.


edit on 18-9-2014 by IsaacKoi because: (no reason given)



posted on Sep, 18 2014 @ 11:28 AM
link   

originally posted by: IsaacKoi

originally posted by: Xtraeme
After running Dupscout there are now: 129,545 pages (694 less from the original 130,239). Fold3 claims there are 129,658 pages total. So there are still potentially at least 113 missing pages.


Thanks Xtraeme.


Is it possible for you (or, ideally, someone else - since I'd like to spread the work around a bit
) to:

(1) generate a list from the Fold3 website of the relevant cases and the number of document images for each of those cases?;


It is already done.
All you have to do is reconnect the BT Sync group, wait for everything to synchronize, go into "_Reports and Changes" \ "2014.09.17 - Removing all duplicate files using Dupscout", and open duplicates.csv.



posted on Sep, 18 2014 @ 11:56 AM
link   
Personally I don't think it gonna be that conclusive about Ufos in general. Sure it might help prove the existence of the Ufo phenomenon to the general public, but I think it would just lead many more question being left in the air.

Like who, what, where, when, and a big Y with a capital W. And let not forget how ether.

Honestly I think it great that there is a whole bunch files being accessed, I mean 154 gb of paper is a lot information.



posted on Sep, 18 2014 @ 03:04 PM
link   

originally posted by: Specimen
Personally I don't think it gonna be that conclusive about Ufos in general.

No, it will never merely be a matter of collecting massive amounts of data, and then suddenly TAH-DAH we have the "answers." We know from previous data analyses that there are patterns to UFO encounters. But they've never been able to be used for accurately predicting when or where or under what condition a UFO is bound to show up. But, hey, I guess there's always a chance. Maybe somebody will figure out that a flying saucer appears every 16.42 years over Billings, Montana, and be waiting there when it comes back to shoot it down and find out where it's from. Good luck figuring anything like that out, though.



posted on Sep, 19 2014 @ 03:27 AM
link   

originally posted by: Xtraeme

originally posted by: IsaacKoi
Is it possible for you (or, ideally, someone else - since I'd like to spread the work around a bit
) to:

(1) generate a list from the Fold3 website of the relevant cases and the number of document images for each of those cases?;


It is already done.
All you have to do is reconnect the BT Sync group, wait for everything to synchronize, go into "_Reports and Changes" "2014.09.17 - Removing all duplicate files using Dupscout", and open duplicates.csv.


That would be great.

I'm still Syncing, so could you possibly upload just the "duplicates.csv" file via Wetransfer (or email it to me) so that I can do some work on these issues while the Sync continues?

If that file contains a list the Fold3 website of the relevant cases and the number of document images for each of those cases then I might be able to automatically compare that to the Excel spreadsheet I posted above listing the downloaded cases/pages so that the missing pages can be identified and added to the download.






edit on 19-9-2014 by IsaacKoi because: (no reason given)



posted on Sep, 19 2014 @ 05:36 PM
link   
2014/04/18 Update:

Wrote a quick script to cleanup any misnamed files and locate missing content.

code.google.com...

At this point the only files that should be missing are files that were either (a) overwritten by mistake that weren't duplicated elsewhere, (b) files added to fold3 since the initial capture, or (c) files that were skipped at the end of the case list (e.g. a directory with 3 pages might have a 4th missing page).

Approximately 190 files were added or modified and 230 were removed or renamed with some overlap between the two. This brings our grand total to:

129,491 pages total (54 less - before there were 129,545. Fold3 claims there are 129,658 total. So there are potentially 167 missing pages. -- www.fold3.com...)

The case count is still the same.

Here is a new directory list and an update of the SHA1 of all of the files (2014.09.19_NARA-Fold3 Archive Listing.7z | Proof of Timestamp | TX Hash).

This will be the final update for a while.

Signing off,
-Xt
edit on 2014-9-19 by Xtraeme because: (no reason given)



posted on Sep, 20 2014 @ 12:05 AM
link   

originally posted by: Xtraeme

Several years ago in 2011 Isaac issued a challenge to the ATS community to see if anyone could come up with a way to download the entire National Archive Blue Book library.

I realized at the time there was a simple solution. So I threw in my hat and decided to give it a try.

The bulk of the coding was done in my spare time in a month after Isaac posted the initial thread.

code.google.com...

Thanks to Isaac's willingness to volunteer to test the program. I was able to track down most of the bugs pretty quickly.


Some fantastic work there guys - hats off to you both.


(Liking the fact that it's free as well)

Hopefully with more access to the facts surrounding specific cases Dr Jim Mcdonald's comments below can be shown to be correct.


"Much more disturbing are the indications from my limited review of BB cases that there may be as many as possibly 4,000 Unexplained UFO cases miscategorized as IFO's in the BB files. McDonald similarly stated in 1968 at his CASI lecture that from his review of BB cases he estimated that 30-40% of 12,000 cases were Unexplained, or about 3,600 to 4,800. These are mostly military cases and many involve radar".

Link



Also thought this audio report which presents official Project Blue Book interviews and UFO eyewitnesses testimony taken from the original USAF source tapes (1951-1969) deserved a place on your thread.





Part 2 / 3


Cheers!



posted on Sep, 20 2014 @ 12:20 AM
link   
a reply to: IsaacKoi

Isaac, in the UFO Update Archive Brad Sparks once mentioned there was 'approximately 225,000 pages of USAF UFO documents languishing in the National Archives that had never been yet fully studied by UFO researchers since their public release in 1976' - have you ever heard any more info about these docos mate?

Have to admit I've been very lazy and not contacted the National Archives or Brad Sparks and just thought I'd ask.



After all the FOIA (Freedom of Information Act) ordeals and even lawsuits to pry loose government UFO documents, and uncover the facts about USAF, CIA and British MoD investigations of UFOs, how can one seriously contend there is no "securitization" of the UFO subject by governments?

Many documents are now being disclosed for the first time only after a half century of unwarranted secrecy. There are approximately 225,000 pages of USAF UFO documents languishing in the National Archives and never yet fully studied by researchers since their public release in 1976.

I have reviewed 100,000's of pages of AF and CIA documents on UFOs and other subjects, and interviewed about 100 CIA Directors, Deputy Directors, and officials in AFIN, NSA, DIA, and other agencies, down to intelligence analyst level. Both the AF and the CIA have "actually looked for UFOs", contrary to Wendt and Duvall, and they "found" them.


link


Cheers.



posted on Sep, 20 2014 @ 01:23 PM
link   
a reply to: Blue Shift

True, but like I said, it will only help serve to prove the phenomenon existence to actual or would be researchers.

And if it did prove it existence, your gonna see a lot more arguments about the very nature and a long list of guesses of origins of a ufo's. Let alone, as to how they work, defy feats that would astound modern science, and who or what made them.

From my experience, spring time and when you least expect it. Also, helps if you look up alot.

edit on 20-9-2014 by Specimen because: (no reason given)



posted on Sep, 20 2014 @ 02:02 PM
link   

originally posted by: karl 12
Isaac, in the UFO Update Archive Brad Sparks once mentioned there was 'approximately 225,000 pages of USAF UFO documents languishing in the National Archives that had never been yet fully studied by UFO researchers since their public release in 1976' - have you ever heard any more info about these docos mate?

Have to admit I've been very lazy and not contacted the National Archives or Brad Sparks and just thought I'd ask.


I have some contact with Brad Sparks so I'll ask him for a breakdown of his estimate when I next email him.



posted on Sep, 20 2014 @ 08:50 PM
link   
Okay I lied. More updates.


Here is a handy little tool I just wrote to make reading the case files a little more pleasant:

code.google.com...
(note: it is also in the root of the sync folder)

The problem, basically, is that all of the Fold3 documents adhere to the following naming convention:

Page 1, Page 2, ... Page 10, ..., Page 20, etc.

This royally blows because when you navigate through the files using a normal image viewing tool like Irfanview — since the filenames don't have padded zeros — Irfanview shows Page 1 first (as it should), Page 10 second (which it shouldn't), Page 11 third, -after seven more pages- Page 19, Page 2 (about bloody time), and then Page 20.

Ideally we would like to read the pages in their intended proper order (ie. Page 1 and then as we would expect Page 2)

A naive solution is to just rename the files (ie. Page 01 and Page 02). However this breaks footnotereap and ultimately would end up requiring everyone to redownload and resync all of the files in BitTorrent Sync.

Not exactly a great solution.

After a little tinkering the best hack that I could conjure up was to create a symbolic link structure with the padded filenames all placed in an alternate, but identical directory tree parallel to the actual documents (by default the program uses the current directory name plus "- browse").

So, in practice, if the base directory is "footnote.com" the sym-linked folder will be named "footnote.com - browse."

This worked smashingly well and consumes practically zero disk space. So I consider it a win. =)

To initialize the program, provide the path to your Fold3 working directory or to the specific case that you want see symlinked and padded. For example,

create_sym_links_with_padded_zeros.py [fold3 base directory]

The script only has to be run once to generate the symbolic link structure. After that it is smooth sailing.

Platform Notes:
CSLPZ should work in Windows, Mac, and on Linux. However I have only tested the script in Windows. So if something isn't working, check line 160 and debug the "ln -s" section.

TL;DR: If you want to be able to read the pages in the correct order. Open the footnote.com folder and double click on create_sym_links_with_padded_zeros.py .
edit on 2014-9-20 by Xtraeme because: (no reason given)



posted on Sep, 21 2014 @ 04:31 AM
link   

originally posted by: Xtraeme
At this point the only files that should be missing are files that were either (a) overwritten by mistake that weren't duplicated elsewhere, (b) files added to fold3 since the initial capture, or (c) files that were skipped at the end of the case list (e.g. a directory with 3 pages might have a 4th missing page).


Before turning to Phase 2 regarding the next steps (conversion to PDFs, OCR, image enhancement, file size reduction, cross-referencing with various other sources/collections/databases, statistical analysis and other fun and games...), I'm keen to do any further practicable check for missing pages.

I presume Point (a) above isn't likely to be much of a problem and frankly I doubt (b) is likely to be an issue either (since I doubt Fold3 viewed uploading these files as on-going project - it was probably completed as far as they were concerned ages ago). So, I'd guess (and frankly accept it's a guess) that (c) is the issue to focus on - ie files that were skipped at the end of the case list (e.g. a directory with 3 pages might have a 4th missing page).

So, does anyone have any bright ideas for generating a list of the number of pages per case now held on the Fold3 website to compare against the list of downloaded page numbers per file already in Xtraeme's Sync file (for which I've already generated a list of relevant page numbers)?

(I'm a little concerned that Page 4 for this thread isn't really the place to be raising tasks like this, and quite a few others that occur to me, if the workload is to be spread around rather than dumped on the rather busy Xtraeme but - hey - I'll give it a try before resorting to a "Disclosure challenge for ATS : The next phase" thread...
).

Anyway, the Fold3 website presumably has, or can be used to generate, a list of the number of pages per case held on that website. The number of pages per case appears in at least 2 places when browsing each file (highlighted in the screenshot below) so my question is really whether there is (ideally a quick an easy...) way of obtaining from that website that number for each case other than VISUALLY browsing through over 10,000 folder.

www.fold3.com...|haouYr9guvaVpcnEa4vDUieED1z4SleB4



I'm hoping one of the more technical chaps (or ladies) here can write a script or some other technical magic to generate a relevant list (ideally in a format that can be imported into Excel to contrast against the spreadsheet I posted above listing the case name, starting Fold3 image number and the number of pages per page included in Xtraeme's Sync file)...




edit on 21-9-2014 by IsaacKoi because: (no reason given)



posted on Sep, 21 2014 @ 05:12 AM
link   

originally posted by: Xtraeme
Okay I lied. More updates.



I think most of us will forgive you.


Seriously, I realise that you are busy and probably now want to get on with other projects but if you could remain active with this project at the moment a little while longer that would be fantastic. You now probably know more about the mechanics of the Fold3 website than any non-employee!

I've been working on image enhancement and PDF creation in relation to the Fold3 collection so once the missing pages are addressed (so far as practicable...) then I can focus on those issues - and bother other people in relation to the Phase 2 issues.


Personally, I think Phase 1 was a pretty damn significant contribution by you to ufology particularly since the Fold3 website could fold - no pun intended - at any point and the entire collection of high resolution images could have been lost until your work was done) but Phase 2 may result in more bite-sized output which can be appreciated by a wider audience, such as searchable PDFs of the case files on various classic cases plus key reports (not to mention statistical analysis and data mining).

edit on 21-9-2014 by IsaacKoi because: (no reason given)



posted on Sep, 21 2014 @ 06:00 AM
link   
Howdy Isaac,


originally posted by: IsaacKoi
Before turning to Phase 2 regarding the next steps (conversion to PDFs, OCR, image enhancement, file size reduction, cross-referencing with various other sources/collections/databases, statistical analysis and other fun and games...), I'm keen to do any further practicable check for missing pages.

I know people are going to copy the files all over the place. The last thing that I want to see is incorrect information scattered all over the internet. So we are definitely on the same page here.

The last three updates I released should address the great bulk of the errors. Hopefully people understand that they need to stay attached to BitTorrent Sync for the time being because the data isn't 100% perfect just yet.

We are about 99.9% (actually a little better 99.98% if you only count the missing files).

That said, here is a list of some of the problem cases:

1949.06 - 6313679 - Los Alamos, New Mexico (#382) / info.txt
1952.04 - 6314058 - Topeka, Kansas / this folder had largely been overwritten by case 6312827 and 6313455.txt
1952.08 - 6994754 - LAKE CHARLES, LA / Page 15 is missing.url
1952.08 - 6994754 - LAKE CHARLES, LA: PAGE 15 IS MISSING!
1954.11 - 6958118 - BURLISON, TEXAS / It looks like 6960345 wrote into 6958118 -- overwriting Page 4 and adding Page 6 to Page 15 (which werent necessary).txt
1954.11 - 6960345 - CANTERBURY, NEW ZEALAND / It looks like 6960345 wrote into 6958118 -- overwriting Page 4 and adding Page 6 to Page 15 (which werent necessary).txt
1957.05 - 6789913 - Berlin, Germany / Page 1 seems to be missing.url
1957.05 - 6789913 - Berlin, Germany: PAGE 1 IS MISSING!
1957.11 - 6779635 - Great Neck, New York: PAGE 2 IS MISSING!
1957.11 - 7230586 - Great Neck, New York / _Extra case material in 6779635.txt
1960.01 - 6968016 - Pacific (30-30N 139-05W) / Extras - Not Duplicates (odd): PAGE 1 IS MISSING!
1960.01 - 6968016 - Pacific (30-30N 139-05W) / Extras - Not Duplicates (odd): PAGE 9 IS MISSING!
1960.02 - 6969663 - Rockledge, Florida / errors - all three overwritten by data from 6970327.txt
1966.03 - 8675242 - Sheboygan, Wisconsin / Page 16 is missing.url
1966.03 - 8675242 - Sheboygan, Wisconsin: PAGE 16 IS MISSING!



I presume Point (a) above isn't likely to be much of a problem and frankly I doubt (b) is likely to be an issue either (since I doubt Fold3 viewed uploading these files as on-going project - it was probably completed as far as they were concerned ages ago). So, I'd guess (and frankly accept it's a guess) that (c) is the issue to focus on - ie files that were skipped at the end of the case list (e.g. a directory with 3 pages might have a 4th missing page).

  1. "(a) overwritten by mistake that weren't duplicated elsewhere,"

    The only files we should have to worry about are the ones from the first month or two when I first started to run the program back in 2011.

    Footnotereap became much more reliable after that. At most there are probably just one or two files that are affected — if that (could be nothing). I am probably being overly cautious. Still it is something I would like to check more thoroughly (I did a partial check on the second pass back in 2012/2013).

  2. "(b) files added to fold3 since the initial capture

    There were a couple added (or removed) I believe. Take a peek at:

    1960.01 - 6968016 - Pacific (30-30N 139-05W) / Extras - Not Duplicates (odd)

    Also there are at least three missing pages that NARA-Fold3 never properly uploaded:

    1952.08 - 6994754 - LAKE CHARLES, LA / Page 15 is missing.url
    1957.05 - 6789913 - Berlin, Germany / Page 1 seems to be missing.url
    1966.03 - 8675242 - Sheboygan, Wisconsin / Page 16 is missing.url

  3. (c) files that were skipped at the end of the case list

    This is the one I am working on right now. I can use scrapy to get the information, but that'll probably happen some time over the next week or two (real busy with another more pressing project at the moment).


(I'm a little concerned that Page 4 for this thread isn't really the place to be raising tasks like this, and quite a few others that occur to me, if the workload is to be spread around rather than dumped on the rather busy Xtraeme but - hey - I'll give it a try before resorting to a "Disclosure challenge for ATS : The next phase" thread...
).


I think you are probably right. =) It might be worthwhile to outline some of the challenges we have going forward in a new discussion.
edit on 2014-9-21 by Xtraeme because: (no reason given)




top topics



 
86
<< 1  2  3    5  6 >>

log in

join