Related documents
Information theory is based on certain key assumptions, or postulates,
that are inherently plausible and reasonable. However, the ultimate
justification is that logical conclusions drawn from these postulates
have led to useful and effective solutions to real-life problems.
One assumption of information theory is that a message is not
significant by itself; it is significant in the context of all
the other possible messages that could have been sent. When a
message tells you something that you already know, it's reasonable
to say that the message conveys no information; there was no other
possible message. For example, if you have a 10-year-old son,
and someone tells you that you have a son, no information has
been conveyed. On the other hand, under different circumstances
(when more than one message is possible), the same message could
convey some information. For example, if you are in the hospital
delivery room, and someone tells you that you have a son, some
information has been conveyed.
"The significant aspect is that the actual message is one
selected from a set of possible messages" (Shannon and Weaver).
The greater the number of possible messages, the greater the amount
of information conveyed. In other words, how much information
a message contains depends on the extent to which it resolves
uncertainty.
You could also say that the more probable a message is, the less
information it conveys. For instance, a message selected from
a set of only one possible message has a probability of 100per
cent, or 1, and conveys no information. A message selected from
a set of two equally probably messages, each with a probability
of 1/2, conveys some information, while a message from
a set of three (probability of 1/3) conveys even more,
and so on.
The amount of information increases as the probability of the
message decreases; they are inversely related, but in exactly
what proportions? You could say that the information content of
a message with a probability of p1 is 1/p1, but this
doesn't give zero information content for a message with a probability
of 1.
Shannon suggested a more definite form for relating information content and message probability. He argued that you can measure information so that the total amount conveyed by two messages is equal to the sum of the information conveyed by each of them; in other words, the information conveyed by a series of messages is additive.
If you have two messages, one with a probability of p1 and the other with a probability of p2, you could say that the quantity of information these messages convey is related to 1/p1 and 1/p2, respectively. However, if you think of the two as a compound message, the probability becomes p1 x p2. For example, if p1 is 1/3 and p2 is 1/3, there is a one-in-three chance of the first message being selected. If it is chosen, there is only a one-in-five chance that the second message will also be chosen. Thus,
the chances of the compound message being sent are 1/3
x 1/5, or 1/15. Thus, the information content
of this compound message should be related to 1(p1 x p2).
The concept of additivity requires that the information content
associated with a 1/(p1 x p2) probability be the sum of the information
content associated with 1/p1 and that associated with 1/p2. Therefore,
I(1/(p1 x p2)) = I(1/p1) + I(1/p2)
where I denotes quantity of information. According to Shannon,
the only mathematical relationship that satisfies this requirement
is: The quantity of information associated with a probability
of p1 is
I(1/p1) = log(1/p1)
This, then, is Shannon's fundamental equation for measuring quantity
of information.
Briefly, the logarithm of any number to a particular base is defined
as the power to which you must raise the base to get that number.
For example, the log of 1000 to the base 10 is 3, since 10 x 10
x 10, or 103, is 1000. So what base should Shannon's
equation use? Base 2 seems a natural choice because, in the simplest
case where one or two equally probable messages is selected, each
with a probability of 1/2, the quantity of information
is log(1/1/2), or log{base2}. The log of 2 to the base
2 is 1. Thus, the information contained in each of these two messages
equals one unit. The average amount of information also equals
one unit.
Shannon chose the name bit for this unit for measuring the amount of information. Let's call it an infobit, since it isn't quite the same as a bit in computer storage, which represents information (let's call that a repbit). Thus, if a message with a probability of 1/4 is chosen out of four equally likely messages, he amount of information would be log{base2}(1/1/4), or log{base2}4, or 2infobits.
Figure 1a
The process that occurs at the transmission end of communicating a message.
| Transmitter Signal Information source ---_ (codes message)---_ (channel) _ ¦ ¦ Noise |
Figure 1b
The corresponding process that occurs at the receiving end.
| Signal | Receiver (decodes) |
To see the difference, as well as the connection between repbits
and infobits, suppose you are expecting one or two messages, yes
or no, in regard to some decision, and the two are equally probable.
The message could be sent as yes or no, using 8 repbits for each
character, 24 for yes or 16 for no, with an average of 20repits.
However, in terms of information theory, for two equally probable
messages, each with a probability of 1/2, each has
an information content of log{base2}(1/1/2), or log{base2}2,
or 1 infobit; and the average is also 1 infobit.
Thus, the number of repbits is not necessarily equal to the number of infobits, but there is a connection. You could say that the number of infobits is the smallest number of repbits required. If there are only two possible messages and you use a code of 0 for no and 1 for yes, then a message of 1 repbit is enough. Similarly, if there are four possible messages, each with a probability of 1/4, the number of infobits needed is 2; the minimum number of repbits required is also 2.
What if you have three messages, each with a probability of 1/3?
According to Shannon's equation, the number of infobits is log{base2}3=1.58
infobits. But repbits can only be whole numbers, so how does this
work?
You need at least 2 repbits to distinguish between the three alternatives.
But with 2 repbits, you could actually handle four alternatives,
so you're wasting some of the capacity of the 2repbits for sending
messages. You could reduce this waste if you code block of such
messages, rather than sending each one individually.
If you code blocks of 10 such messages, the whole block could
contain 310, or 59,049, alternative forms. If you use
a string of 16 binary signals, you can have 216, or
65,536, alternative forms. Since a string of 16 binary signals
is more than enough to handle 10 of these three-alternative messages,
on the average you need only 16/10, or 1.6, repbits to represent
the average three-alternative message.
An alternative name that Shannon gave to the average amount of
information is entropy, a term from thermodynamics. One interpretation
of the amount of entropy in a physical system concerns the degree
of uncertainty about which of many possible states of the system
is actually realised at different stages. Shannon chose this name
because of the analogy between realising one of many possible
states and choosing one of many possible messages, and also because
the mathematical equations for calculating thermodynamic entropy
and average quantity of information were similar.
Thus, the fractional entropy represents the average amount of
repbits required if you code the messages in sufficiently long
blocks instead of one at a time. The longer the block of messages,
the closer the calculation of 1.6 repbits moves to the 1.58 average
that the entropy calculation gives.
In reality, messages aren't usually a series of signals indicating
which of different messages is being sent; they are a series of
characters selected from a character set or alphabet. If you consider
each choice of a character as a "minimessage", a selection
from the set of all possible characters, the method still applies.
You can think of an overall message as a long series of such minimessages.