Zusammenfassung der Ressource
Ch26 Coding Systems
- Character Set
- A table which maps each character
with a unique binary number.
- ASCII
- American Standard Code for
Information Interchange
- 7 bits, so holds
128 characters.
- More recently extended ASCII was
released, 8 bits, 256 characters.
- Only contains basic English characters
and punctuation.
- Not enough: advent of World
Wide Web created need for
universal system.
- Unicode
- Was brought in to replace Unicode and
include most characters from across
the world.
- Note: this includes Emojis!
- Includes the characters for
over 20 countries.
- 16 bit so can code for around 6.5 thousand characters.
- Constantly updated and maintained
by the Unicode Consortium.
- Error Checking
- Parity Bit
- Where an additional bit is added to the end of a binary number to
show whether there are an even or an odd number of 1 digits.
- Unreliable as if an even number of bits flip or
the parity bit flips, it doesn't detect the error.
- Majority Voting
- Data is sent multiple (typically 3) times and if there is a
discrepancy, the computer goes with the majority.
- Very reliable and can correct data
without retransmitting, however,
uses a lot of bandwidth.
- Check Digit
- An additional digit is added to
the end of a number which is
derived from the number.
- A simple method is by simply adding up all
the digits, and adding up the digits on the
sum, repeat this until you are left with a
single number.
- This is unreliable as if the order
changes, the check digit remains the
same.
- A more advanced method would
multiply each number by a weight, so
as to ensure that order is needed. An
example of this is the modulo-11
method.
- Graphics
- Bit Maps
- A bit map graphic is a 2D array of pixels.
- Each pixel holds a colour value, stored as a
binary number, many thousands of these
pixels typically make up an image.
- A problem with bit maps is that
quality degrades when you
zoom in, additionally they take
up a large amount of space.
- Each pixel will have a colour depth, which is the
number of bits allocated to represent the colour of a
pixel, the higher the colour depth, the more colours
that can be expressed.
- Vector Graphics
- A vector graphic is an image generated from a set of instructions
and mathematical formulae. This generates the image consisting
of geometric shapes, relative to a point of origin.
- These are not suitable for photos as
they simplify things significantly,
however, they are good for diagrams
and CAD/CAM images.
- A vector graphic typically takes up little
space and can be easily scaled up
without losing quality.
- Audio
- Audio is converted to digital data by
sampling the sound wave many times per
second.
- The higher the number of samples per second (measured in
Hz), the higher the quality of the sound and the truer it is to
the original.
- The computer then synthesises a wave by extrapolating the data sampled, this is usually
indistinguishable from the original for a human ear.
- Nyquist's Theorem states that to faithfully
recreate sound, you must record at least twice
the highest frequency.
- Humans hear 20Hz to 20kHz.
- The resolution of sound is the number
of bits allocated to each sample, hence,
the more bits allocated, the more
pressure levels each sample can
represent.
- Compression
- Lossy compression
- Some data is discard, to reduce file size,
however, the new file is usually indistinguishable
from the original.
- An example would be replacing the hundreds of different
shades of blue in an image of the sky, with just a few shades.
A human would probably not notice the difference.
- JPEG files use this.
- It can be useful when transmitting images across
the internet, as it reduces the bandwidth
significantly. This is especially important for
people on slow connections.
- Lossless
compression
- Compression of data in such a way that the original
data can be reacquired in it's totality, after
decompressing. i.e. no data is discarded.
- Examples would be run length encoding, where for example, if in an image, there are many pixels of the same colour, the computer just stores this as x*"blue" pixels. Rather
than storing "blue pixel", "blue pixel"... This doesn't lose accuracy, it just encodes repeated data more simply. Another example is dictionary-based encoding, where
commonly occurring strings in a text file are coded for in a simpler way, perhaps by assigning some token. A simple way to understand this could be replacing every "and"
with an "&" this takes up 67% less space than writing "and". Again, no loss of accuracy as the original data can be easily reacquired.
- Dictionary based encoding can be used on binary data also, if it is considered as a string of 1s and 0s.