Excel help: Removing duplicate rows??, page 1

posted on Aug, 8 2011 @ 04:13 PM

link

Some of the more frequent users of the Aliens & UFOs forum may have seen my recent thread, "Canadian disclosure: “UFO Found” and other documents/photos". An issue (hopefully small) has arisen in relation to Excel spreadsheet.

One of our members, idealord, is making good progress in generating an index to the documents using an automated process. One problem is that this method results in quite a few duplicate rows in the spreadsheet.

The current version of the spreadsheet is at:
www.abovetopsecret.com...

Originally posted by idealord
Well here's what I got:

parnasse.com...
...
Jeff

Obviously, it would be good to eliminate duplicate rows (i.e. rows which have the name content in Column G).

If we sort by the content of Column G, you can see all the duplicates.

Does anyone have an easy way of eliminating any row after the first one which has the same value in Column G?

The current spreadsheet covers just the first part of the Canadian files, but the larger spreadsheet will be generated in the same way and thus have the same duplication problem.

Any suggestions would be most welcome.

All the best,

Isaac

H1ght3chHippie

posted on Aug, 8 2011 @ 04:22 PM

link

In MySQL a simple "SELECT DISTINCT(VALUE) FROM TABLE" would suffice.

But since Excel is a pain in the butt, you might want to have a look HERE

Edit: After further reading about your endeavour, some sort of custom web-interface pops to my mind .. I wonder whether the documents themselves are machine readable .. full text index on all documents, browsable by time and tags .. and of course with a MySQL DB backend ^^

edit on 8-8-2011 by H1ght3chHippie because: (no reason given)

davidchin

posted on Aug, 8 2011 @ 04:25 PM

link

I usually add a new column next to the one that I'm using as the "index" column.

In the new column, I create an equation in which I subtract the value in the index column of the same row from the value in the index column in the row just below. So, suppose in cell A1, I write in the equation: +B2-B1 and click on the checkmark to create the equation. I then put the mouse over the lower right corner and, holding down the left button, pull the arrow downwards, effectively copying the equation into all the cells below the first one.

This allows me to scroll down a little faster and look for zeros. To make it even faster, I create yet another column that multiplies the value in the newly created column with the cell just below it, so everything below the zero will also become zero in the new column, making it easier to spot. When the row is deleted, you just have to fix the equation in the row above so the equations continue to help spot the duplicates.

Of course this works best when the identifying index values are numerical.

IsaacKoi

posted on Aug, 8 2011 @ 04:43 PM

link

Thanks for your comments guys.

MySQL is outside my comfort zone (I'm a lawyer, used to Microsoft Word but not much beyond it).

I think I can adapt davidchin's suggestion quite quickly and easily.

Originally posted by davidchin
In the new column, I create an equation in which I subtract the value in the index column of the same row from the value in the index column in the row just below. So, suppose in cell A1, I write in the equation: +B2-B1 and click on the checkmark to create the equation. I then put the mouse over the lower right corner and, holding down the left button, pull the arrow downwards, effectively copying the equation into all the cells below the first one.

This allows me to scroll down a little faster and look for zeros.

I see.

I can easily remove the non-numerical part of the index column (Column G), sort that column by the index number, add the column you are suggesting and add a relevant the formula.

Once I've got a lot of rows with zeros in the new column, I can sort by number and delete all the rows with zeros in it.

Possibly not the most technically elegant solution, but I think this process would only take a couple of minutes so I'm very happy with this option.

Thanks to you both for your suggestions.

All the best,

Isaac

harlot7

posted on Aug, 9 2011 @ 12:40 AM

link

Issac - I am an Excel Expert and I am also half asleep I will touch base with you tomorrow but for now try to run an Advanced Filter which is located in the Data tab up top. You will need to highlight all of the date and then you should see an option to remove duplicates. Pretty simple. These methods, of course, depend on which version of Office you are running 7, 10? I sooo wanted to help on the Alien Disclosure files too! I have an IT friend at work who does coding and lets just say other stuff. I already asked him to help me out with it when he had time.

Gotta rest - let me know if that helps and I'll PM you tomorrow.

IsaacKoi

posted on Aug, 10 2011 @ 10:12 AM

link

Just to report back, although I understood and could apply davidchin's helpful suggestion above in a couple of minutes, I thought I'd have a quick look at the "Data" tab mentioned above (which I don't think I'd ever actually used before in Excel...) and was pleased to find a "Remove Duplicates" option just sitting there in Excel 2007.

After that, it took just a few seconds to select the column being used as the index column (Column G) and produce the desired shortened list.

Thanks once again for your help. Your comments were very helpful.

All the best,

Isaac