It looks like you're using an Ad Blocker.
Please white-list or disable AboveTopSecret.com in your ad-blocking tool.
Thank you.
Some features of ATS will be disabled while you continue to use an ad-blocker.
Prediction is intuitively related to understanding. If you understand a sequence, then you can predict it. If you understand a language, then you could predict what word might appear next in a paragraph in that language. If you understand an image, then you could predict what might lie under portions of the image that are covered up. Alternatively, random data has no meaning and is not predictable. This intuition suggests that prediction or compression could be used to test or measure understanding.
Data Compression Explained by Matt Mahoney
Being able to compress well is closely related to intelligence as explained below. While intelligence is a slippery concept, file sizes are hard numbers. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 100MB of Wikipedia better than your predecessors, your (de)compressor likely has to be smart(er). The intention of this prize is to encourage development of intelligent compressors/programs as a path to AGI.
This compression contest is motivated by the fact that being able to compress well is closely related to acting intelligently, thus reducing the slippery concept of intelligence to hard file size numbers. In order to compress data, one has to find regularities in them, which is intrinsically difficult (many researchers live from analyzing data and finding compact models). So compressors beating the current "dumb" compressors need to be smart(er). Since the prize wants to stimulate developing "universally" smart compressors, we need a "universal" corpus of data. Arguably the online encyclopedia Wikipedia is a good snapshot of the Human World Knowledge. So the ultimate compressor of it should "understand" all human knowledge, i.e. be really smart. enwik8 is a hopefully representative 100MB extract from Wikipedia.
50'000€ Prize for Compressing Human Knowledge
The Large Text Compression Benchmark and the Hutter Prize are designed to encourage research in natural language processing (NLP). I argue that compressing, or equivalently, modeling natural language text is "AI-hard". Solving the compression problem is equivalent to solving hard NLP problems such as speech recognition, optical character recognition (OCR), and language translation. I argue that ideal text compression, if it were possible, would be equivalent to passing the Turing test for artificial intelligence (AI), proposed in 1950 [1]. Currently, no machine can pass this test [2]. Also in 1950, Claude Shannon estimated the entropy (compression limit) of written English to be about 1 bit per character [3]. To date, no compression program has achieved this level.
Rationale for a Large Text Compression Benchmark
There are many different kinds of "intelligence."
If they are talking about zero-error compression, then they are even more distant from intelligence than otherwise. Humans make errors all the timee.
Why Not Lossy Compression?
Although humans cannot compress losslessly, they are very good at lossy compression: remembering that which is most important and discarding the rest. Lossy compression algorithms like JPEG and MP3 mimic the lossy behavior of the human perceptual system by discarding the same information that we do. For example, JPEG codes the color signal of an image at a lower resolution than brightness because the eye is less sensitive to high spatial freqencies in color. But we clearly have a long way to go. We can now compress speech to about 8000 bits per second with reasonably good quality. In theory, we should be able to compress speech to about 20 bits per second by transcribing it to text and using standard text compression programs like zip.
Humans do poorly at reading text and recalling it verbatim, but do very well at recalling the important ideas and conveying them in different words. It would be a powerful demonstration of AI if a lossy text compressor could do the same thing. But there are two problems with this approach. First, just like JPEG and MP3, it would require human judges to subjectively evaluate the quality of the restored data. Second, there is much less noise in text than in images and sound, so the savings would be much smaller. If there are 1000 different ways to write a sentence expressing the same idea, then lossy compression would only save log2 1000 = about 10 bits. Even if the effect was large, requiring compressors to code the explicit representation of ideas would still be fair to all competitors.
originally posted by: ChaoticOrder
Yes that is a good point. In some sense Google is quite intelligent because it can quickly find what I'm looking for even if it has a vague search term. It can also predict what I'm going to search for before I fully type it. It takes some level of intelligence to perform those actions, but that doesn't mean the Google engine is self aware (yet).
Yes that is a good point. In some sense Google is quite intelligent because it can quickly find what I'm looking for even if it has a vague search term. It can also predict what I'm going to search for before I fully type it. It takes some level of intelligence to perform those actions,
That's one of the biggest problem with intelligence, there are many things that may look like intelligence but are not, statistics can be used that way and, although they appear to be intelligent, they are just doing something like "97% of the people that searched for word 'X wrote as their second word 'Y'". The same happens with machine translation, but, as it's close to real life situations, when we look at Google (or anyone else's) translation services we see that they fail terribly most of the time.
A simple test - which compression algorithm do you consider to be the most intelligent, if at all?
cmix is a lossless data compression program aimed at optimizing compression ratio at the cost of high CPU/memory usage. cmix is free software distributed under the GNU General Public License.
cmix is currently ranked first place on the Large Text Compression Benchmark and the Silesia Open Source Compression Benchmark. It also has state of the art results on the Calgary Corpus and Canterbury Corpus. cmix has surpassed the winning entry of the Hutter Prize (but exceeds the memory limits of the contest).
CMIX
FreeArc is a modern general-purpose archiver. Main advantage of FreeArc is fast but efficient compression and rich set of features.
Typically, FreeArc works 2-5 times faster than best programs in each compression class (ccm, 7-zip, rar, uharc -mz, pkzip) while retaining the same compression ratio; from technical grounds, it’s superior to any existing practical compressor.
FreeARC
... and remember, compression occurs naturally in nature. Holography, for instance reduces 3d data to a 2d representation and is therefore a compression.
The intelligence of google is written by human coders inside a program which instructs computers to look up indexes, return matches etc.
originally posted by: ChaoticOrder
a reply to: chr0naut
A simple test - which compression algorithm do you consider to be the most intelligent, if at all?
Well just going by raw numbers, the most intelligent algorithm would be the one which can compress data better than any other algorithm. If we ignore the time and memory required, than the best compression algorithm is cmix:
cmix is a lossless data compression program aimed at optimizing compression ratio at the cost of high CPU/memory usage. cmix is free software distributed under the GNU General Public License.
cmix is currently ranked first place on the Large Text Compression Benchmark and the Silesia Open Source Compression Benchmark. It also has state of the art results on the Calgary Corpus and Canterbury Corpus. cmix has surpassed the winning entry of the Hutter Prize (but exceeds the memory limits of the contest).
CMIX
However, if we are concerned about the time and memory required, there is an equation we can use to calculate the practical efficiency of any given compression algorithm. It turns out the most efficient compression software is called FreeARC:
FreeArc is a modern general-purpose archiver. Main advantage of FreeArc is fast but efficient compression and rich set of features.
Typically, FreeArc works 2-5 times faster than best programs in each compression class (ccm, 7-zip, rar, uharc -mz, pkzip) while retaining the same compression ratio; from technical grounds, it’s superior to any existing practical compressor.
FreeARC
... and remember, compression occurs naturally in nature. Holography, for instance reduces 3d data to a 2d representation and is therefore a compression.
Converting 3D information into a 2D format does not compress it, the same amount of information is required, it's just formatted differently.
The original represented object would normally be considered to be mediated upon 3D space (a volume). The holographic process renders the data (in a lossy, but distributed manner) onto 2D space (a plane). The medium of space is the same in both instances but the holographic representation occupies much less of that media, hence, the compression.
We are programmed to be self-learning biological machines.
originally posted by: glend
Yes thats the difference. The processors in computers need instructions to operate whereas humans don't because the neural networks in our brain are able to self balance for a desired outcomes.
You...That's the trick, though isn't it?
ATS ELIZA... So why do you think that's the trick?
Although they are trying to emulate neurons in software its not the real thing.