Mobile CSP: It's All Bits
Nov 27, 2018
Okay. In this talk, we're going to try to illustrate with some examples. What it means to say that all digital data are bits. Let's start with the example log from the video that you just saw of the spaceship represented using a compression technique known as run length encoding or RLE for short. The way this works is, in each row of this 16 by 16 matrix, we're going to first say how many white bits there are, and then whenever we change to black bits, we'll list the number of black bits, so in row one there's 16 light bits and row two, there's 12 white bits, then we change three black bits, and then back to one white bit, and row three, we start with ten white bits, 5 black bits, one white bit. And so forth. Run length encoding RLE is an example of lossless compression. It's good for images with lots of white space, like this one. It's good for, for example, facts images. It's used in bitmaps. It's an example of lossless compression, which means that if you use it, you can get the original image back perfectly. No data are lost. It's good, therefore, for medical and archival images. Important images. It's contrasted with lossy data compression. Where some data may be lost during the compression process. Well, that's okay for lots of images that we use, such as camera images, where the lost data can't even be perceived by the human eye. Will jpeg is an example of lossy image compression. So, okay, we're doing compression, let's ask how much data is required to do this. So on the left we have our compressed data, and on the right we have our original image. The original image is 256 pixels, because it's a 16 by 16 array. And on the left we have 6 two numbers. How much compression is going on here? Well, it depends. Among other things, it depends on how many bits are we using for each pixel in the image. The one way we could do this image, it's black and white, after all, is we can use one bit per pixel. This would be what's called a monochrome image. And this was an old technique used in 1960s when our computer screens could really only handle two colors. If we do things this way, then 256 pixels would require 256 bits. And this is what it would look like when we transfer the whites and blacks to ones and zeros, respectively. So we let the bit one be white and the bit zero be black. So how much data does that technique require? All right, well our original image has 256 pixels, each one takes one bit. That's 256 bits. Our compressed image has 62 numbers. But to represent numbers like 16, we need at least 5 bits and really we want to be able to have bigger images than this. We may have runs of a hundred or so white or black pixels. Let's just say we need 8 bits per number. That would require 496 bits for this image, which is more than the original image itself. So that's not a very good compression result. Our file actually got bigger instead of smaller. But most images these days are not black and white. So let's look at another old technique in 1980 technique where we would use 8 bits per color. And colors are represented by mixing red, blue, and green hues to make the final color. If we have 8 bits per pixel, then one way to do this would be, let the first three bits be used for the red hue, the next three for the green, and the next two, for the blue. That should be two. If you actually put one's in the red and nothing, zeros in the green and ones in the blue, you'd end up with a sort of magenta color. But in this case, zero is still going to represent black and one. It's all ones in each bit are going to represent white. And in decimal, that's two 55. So to represent a colored image, however, we can't use the simple scheme we had above, and the reason is there aren't just two colors. So we need to actually say what color were changing to when we make a transition from white to black or white to red or red to green, and so forth. So we need two numbers per color, one for the color, and one for the run, how many. For example, in the first row we have 16 white pixels. So we encode it as two 55 16. The second row we have 12 whites followed by three blacks, followed by one white. So it's two 55, 12, zero, which is black, three blacks, and then one white. And so forth. So basically, we need twice as many numbers. Two numbers for each run now, rather than one number for each run as we had in the original scheme. Since the numbers are still 8 bits, that's 992 bits now. So okay, how much data does our new scheme require? Well, the original image now has 256 pixels still, but each pixel now is represented by 8 bits because it could be a color. Let's 2048 bits. I've compressed the image, has a 124 numbers, 8 bits per number, which is only 992 bits. So, yay, we've got some good compression here, better than 50%. The file went from over 2000 bits to less than a thousand bits. So that's a pretty good result. If we move the to one step further up to a sort modern color representation, which are 24 bit, we can get an even better result. So we still use the red green and blue mixture to create our colors. But now we use 24 bits per pixel. 8 reds, 8 greens, and a blue bits. That gives us a total of 16 million colors, two to the 24th, which is more than the eye can see, which is why some compression techniques are able to reduce the size of the file without really it being noticeable that some data are lost. In any case, now when we use run length encoding, we're going to need three numbers for each pixel because we need to represent the RGB and then the number of pixels in the run. Zero zero zero will be black, two 55 55 to 55 will be white. And this color here might be represented by two 55 zero 200. So let's see how this works. So we have four times as many numbers now. So we have 248 numbers, and they look like this, and the first row, we have white, so that's 55 55 to 5, and there's 16 of them. The second row, we have 12 whites, so 55 55 to 5 and 12, followed by three blacks fell by one white, and so forth. How do we do this time when we compress? Well, our original image is still 256 pixels, but now it's 24 bits per pixel, which comes to over 6000 bits. The compressed image is 248 numbers, 8 bits per number still, which gives us 1984 bits. Much less than 6000. So now, in fact, we're getting better than 70% reduction, which is very good. Okay, but to sum up, what does the amount of RLE compression depend on? Well, it depends as we've seen on the number of bits per pixel. But even more importantly, this is something that the previous examples didn't show. It depends on the number of different colors in the image. Look at these two images, regardless of how many bits we use, neither one of these images is going to lead to much compression. In this case, we would have one black followed by one rifle by one black four by one white and so on. This one, the color changes on every pixel. So we're going to have no compression. In fact, the file is going to get a lot bigger as we saw in the first example. So we've had a couple of lessons now on binary numbers. We've seen binary used to represent numbers. Represent colors. And to represent machine language, let's finish up with one more example. What about character data, the kind of data that goes into your text messages? Well, here too, we use binary numbers. And what I'm showing you here is the ASCII code, which goes back to the 1960s, and has been superseded now by better codes. This is a 7 bit code, so you can only represent a 128 characters in 7 bits you can't represent Chinese and Japanese and Hindi. So now we use a 16 bit code. For our example, this will suffice. So look at some of the common letters. This is the space character. It's the number 20 in hex. The symbol zero is number 30 in hex, the letter a is 41 in hex, the small layer 61 in hex. If we put this all into a table for the common characters, the space, the zero, the capital a, the lower case a, then in hex they represent these numbers in decimal they represent these numbers, and here are their binary codes. So here again, characters are represented by binary. So what does this bit sequence represent? Is it the number 65? Could be, is it the letter a? Could be, is that the color magenta? Could be. It depends. It actually depends on how that number is used in the computer. Is it in an image file? Is it in the text message, a spreadsheet? Is it part of a machine language code? As these examples, I hope show you it's all bits. And how you interpret a sequence of bits depends on how that sequence is used.