Information is knowledge or data. It may form part of a message, it may be constituted by knowledge held by an individual or it may even exist as data held in storage. More technically, information can be described as a message transmitted by a message sender and received and understood by a message receiver.
Information can be stored and conveyed by various means, including orally (speech), printed matter (books, journals, newspapers, magazines, etc.), sign language, physical shapes (such as Braille), arrangement of magnetic particles (such as cassette tapes and hard disks), and electromagnetic radiation (radio signals).
Information has no weight (i.e. it has no mass). It is not a property of the matter or energy used to store or convey it, but is distinct from both. Rather, information is stored and conveyed in the arrangement of matter or energy. It therefore has been described as the third fundamental quantity of the universe, alongside matter and energy.
Language and symbols
Information is conveyed by means of a language, a system of symbols (e.g. alphabetic letters) for which both the sender and receiver of the message have the same understanding. Symbols can be composed of other symbols. Words can themselves be considered symbols, composed of letters.
For example, the symbols (letters) g, i, f, and t, when combined into a word, are understood by English speakers to refer to a present. However, the same group of symbols are understood by German speakers to refer to a poison. The difference is neither in the symbols, nor in the matter (such as ink) making up the symbols, but in the meaning attached to the symbols.
As well as the same symbols having different meanings, the same meanings can be carried by different symbols. The text of the Bible, for example, can be carried by the arrangement of ink on paper, the arrangement of magnetic particles on a computer disk, or even the arrangement of knots in a piece of rope where a single knot represents a dot and a pair of knots represents a dash in Morse code.
Symbols and meaning
As mentioned above, the symbols have meaning to both the sender and receiver of the information. In the case of machines, the "understanding" of the message depends on how the machine was programmed.
For example, a machine might be fed information in the form of a punched tape, and three holes might mean "attach piece A to B". But there is nothing inherent in that particular arrangement of holes causing that particular action. The machine attaches piece A to B simply because it has been designed or programmed to respond with that action when encountering three holes. The machine could be redesigned or reprogrammed to respond to three holes with a different action, or to attach piece A to B for a different arrangement of holes. It is solely because of a convention that three holes means "attach piece A to B". Further, both the "sender" (the person who created the punched tape) and the "receiver" (the machine) need to have the same convention. If the sender believes that three holes means "attach piece A to B", but the machine has been programmed so that three holes means "attach piece C to D", the message will be misunderstood.
Information vs nonsense
People have an intuitive concept of information and the difference between information and non-information. For example, Richard Dawkins mentioned the difference without feeling the need to explain it in further detail. In talking about the information content of the genome and how to measure it in a statistical sense ('Shannon' information; see below), he drew a distinction between "nonsense" and "meaningful information":
Remember, too, that even the total capacity of genome that is actually used is still not the same thing as the true information content in Shannon’s sense. The true information content is what’s left when the redundancy has been compressed out of the message, by the theoretical equivalent of Stuffit. There are even some viruses which seem to use a kind of Stuffit-like compression. They make use of the fact that the RNA (not DNA in these viruses, as it happens, but the principle is the same) code is read in triplets. There is a “frame” which moves along the RNA sequence, reading off three letters at a time. Obviously, under normal conditions, if the frame starts reading in the wrong place (as in a so-called frame-shift mutation), it makes total nonsense: the “triplets” that it reads are out of step with the meaningful ones.
Determining the existence of information
Even without understanding the information, i.e. knowing its meaning, it is possible to determine the probable existence of information. A signal that repeats in small blocks like "abcabcabcabcabcabcabc" contains little or no information. A signal that does not repeat at all may contain a great deal of information, or it may be generated by a random process and contain no information at all. This is the basis of the SETI (Search for Extraterrestrial Intelligence) program. The SETI program scans radio signals from space looking for signs of intelligent life. In order to understand the meaning of the radio signal, it would have to decode and translate the message, a process that is time-consuming and difficult. But SETI does not aim to do that. Instead, it looks for a non-repetitive and non-random signal.
In 1799, people had looked at the Rosetta Stone and many previous texts written in hieroglyphic script and Demotic script. They quickly determined that they probably contained information; even though no one on earth at the time could understand what any of them meant.
There are various ways of measuring information content, including statistical (counting letters, for example), although most such ways ignore the meaning of the information.
Claude Shannon published a landmark paper in 1948 in which he set out the basics of what has become known as information theory. It is concerned with the loss of data during the transmission of data. This loss he referred to as entropy.
Kolmogorov was concerned with measuring information for the purposes of compressing and transmitting data. For example, the string of characters "123123123123123123123123123123123123123123" can be represented as "14 x '123'", whereas the compression of the same-length string "The quick brown fox jumps over a lazy dog." is not compressible if the algorithm is to reproduce letters one at a time. At one end extreme, very repetitive strings allow for great compression, and at the other extreme, random strings allow for little or no compression.
However, although Kolmogorov's research has proved to be very useful in many areas of data storage and communication, the meaning of the information depends on the algorithm and individual processes to compress and decompress it. A completely random sequence of letters cannot be compressed and takes more space to store if the algorithm is to reproduce the letters individually. A proper sentence that means something may be more compressible if and only if the linguistic algorithm used to compress the string takes into account the meaning of words.
The relevance to Kolmogorov complexity to genetics depends entirely on the genetic algorithms used.
An important point to note about Kolmogorov complexity is that it is uncomputable — in general, we can't know the Kolomogorov complexity of an arbitrary string, and we can prove mathematically that we can't know that.
Differentiating between Kolmogorov complexity and Shannon information
Kolmogorov complexity and Shannon information are not the same concept. In a process you can increase the information entropy ("decrease the Shannon information" colloquially, by increasing the possibilities of different digital information at a position) but increase the Kolmogorov complexity (make a string less compressible, dependent on the algorithm).
There is no universal way of measuring the meaning of information, but it is possible to compare two bits of information that are similar and see which has the most information. For example, in comparing the following two sentences, we can determine that the second has more information than the first:
- She has a yellow vehicle.
- She has a yellow car.
The second not only tells you that she has a yellow vehicle, but supplies the extra information that the vehicle is a car. Note also that the second sentence is shorter, so would normally be considered to have less information if measured purely in a statistical manner.
The same principle can be applied to genetics. If one sequence of genetic code has instructions for making brown hair (i.e. containing melanin), and another has the instructions for making hair without specifying that it has melanin (and is therefore fair in colour), then the second sequence has less information than the first. Nevertheless, there is no widely accepted quantitative measure of information in genetics, nor even a consensus on the relative amount of information in various systems.
Jack W. Szostak wrote in 2003:
The information content of biopolymers is usually thought of in terms of the amount of information required to specify a unique sequence or structure. This viewpoint derives from classical information theory, which does not consider the meaning of a message, defining the information content of a string of symbols as simply that required to specify, store or transmit the string. ... A new measure of information — functional information — is required…
Szostak and other scientists proposed a way of measuring meaningful information in another paper in 2007.
For more information, see Genetic information.
The genetic code (DNA) found in all living things meets all of the criteria of information. It is, like text in a book, knowledge carried by means of chemical letters (the symbols). The information can be carried with different symbols (we use the letters "A", "C", "G", and "T" to represent the actual chemicals, adenine, cytosine, guanine, and thymine), and the same symbols can mean different things under different conventions.
The DNA symbols have meaning, such as the group of three "letters" "ATG" meaning methionine, one of the amino acids. This particular symbol refers to methionine because the biochemical machinery of the living cell is programmed to produce glutamine when it reads that symbol.
But under a different convention, ATG means "start", an instruction to the chemical machinery indicating the beginning of a gene.
Generation of information
Humans have generated huge amounts of information, much of which they record in books, computers, recordings, and other media.
But nature itself also contains massive amounts of information, particularly in the genetic code of living things. Unlike information generated by humans, this information in nature cannot have been generated by human beings. This leaves two possibilities: either it was generated by one or more non-human intelligent beings, or it occurred naturally.
If those other intelligent beings themselves contain genetic information (which presumably they must if they are physical beings), then this merely raises the question of where their genetic information came from, solving nothing. Alternatively, the genetic information in living things could have been generated by an eternal being, e.g. God. This leaves the alternatives of genetic information being generated by God, or by natural processes.
Physicist Paul Davies, however, has said that natural processes appear unable to account for its origin.
How did stupid atoms spontaneously write their own software … ? Nobody knows … there is no known law of physics able to create information from nothing.
Even the biologist Richard Dawkins, when asked to "give an example of a genetic mutation or an evolutionary process which can be seen to increase the information in the genome" (in the documentary From a Frog to a Prince), was unable to provide any such example.
- Gitt, Werner, In the Beginning was Information, 2nd edition, CLV, 2000.
- Lamb, Andrew, How do we define information in biology?, 17 February 2007.
- Sarfati, Jonathan, Is the design explanation legitimate?, chapter 9 of Refuting Evolution, Master Books, 1999.
- ↑ Merriam-Webster dictionary definition 1: "the communication or reception of knowledge or intelligence".
- ↑ The Penguin Macquarie Dictionary definition 1: "knowledge conmunicated or received concerning some fact or circumstance; news."
- ↑ Gitt, 2000, p.70: "Any piece of information has been transmitted by somebody and is meant for somebody. A sender and a recipient are always involved whenever and wherever information is concerned."
- ↑ Gitt, 2000, chapter 3.
- ↑ Dawkins, Richard, The Information Challenge, A Devil's Chaplain 91-103, 2003, Houghton Mifflin Co., Boston ISBN 9780618485390. (Content warning: This link contains pro-evolutionary material.)
- ↑ Sarfati, 1999, p.118
- ↑ Mark Chu-Carroll, How to Measure Information. TalkOrigins Post of the Month: February 2001.
- ↑ Example taken from Lamb, 2007
- ↑ Jack W. Szostak, Molecular messages, Nature, Vol 423, 12 June 2003.
- ↑ Robert M. Hazen, Patrick L. Grifﬁn, James M. Carothers, and Jack W. Szostak, Functional information and the emergence of biocomplexity, Proceedings of the National Academy of Sciences, vol. 104, 8574–8581, 15 May 2007.
- ↑ Davies, P., Life force, New Scientist 163 (2204):26–30, 1999, quoted by Batten, Don, Would Darwin be a Darwinist Today?, Creation 31(4):48–51, September 2009.