How does the CPU interact with the monitor? - binary

So the question is: How does a computer go from binary code representing the letter "g" to the correct combination of pixel illuminations?
Here is what I have managed to figure out so far. I understand how the CPU takes the input generated by the keyboard and stores it in the RAM, and then retrieves it to do operations on using an instruction set. I also understand how it does these operations in detail. Then the CPU transmits the output of an operation which for this example is an instruction set that retrieves the "g" from the memory address and sends it to the monitor output.
Now my question is does the CPU convert the letter "g" to a bitmap directly or does it use a GPU that is either built-in or separate, OR does the monitor itself handle the conversion?
Also, is it possible to write your own code that interprets the binary and formats it for display?

In most systems the CPU doesn't speak with the monitor directly; it sends commands to a graphics card which in turn generates an electric signal that the monitor translates into a picture on the screen. There are many steps in this process and the processing model is system dependent.
From the software perspective, communication with the graphics card is made through a graphics card driver that translates your program's and the operating system's requests into something that the hardware on the card can understand.
There are different kinds of drivers; the simplest to explain is a text mode driver. In text mode the screen is composed of a number of cells, each of which can hold exactly one of predefined characters. The driver includes a predefined bit map font that describes how a character looks like by specifying which pixels are on and which are off . When a program requests a character to be printed on the screen, the driver looks it up in the font and tells the card to change the electric signal it's sending to the monitor so that the pixels on the screen reflect what's in the font.
The text mode has limited use though. You get only one choice of font, a limited choice of colors, and you can't draw graphics like lines or circles: you're limited to characters. For high quality graphics output a different driver is used. Graphics cards typically include a memory buffer that contains the contents of the screen in a well defined format, like "n bits per pixel, m pixels per row, .." To draw something on the screen you just have to write to this memory buffer. In order to do that the driver maps the buffer into the computer memory so that the operating system and programs can use the buffer as if it was a part of RAM. Programs then can directly put the pixels they want to show, and to put the letter g on the screen it's up to the application programmer to output pixels in a way that resembles that letter. Of course there are many libraries to help programmers do this, otherwise the current state of the graphical user interface would be even sorrier than it is.
Of course, this is a simplification of what actually goes on in a computer, and there are systems that don't work exactly like this, for example some CPUs do have an integrated graphics card, and some output devices are not based on drawing pixels but plotting lines, but I hope this clears the confusion a little.

See here http://en.m.wikipedia.org/wiki/Code_page_437
It describes the character based mechanism used to display characters on a VGA monitor in character mode.

Related

Is there an actual minimum input image size for popular computer vision models? (E.g., vgg, resnet, etc.)

According to the documentation on pre-trained computer vision models for transfer learning (e.g., here), input images should come in "mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224".
However, when running transfer learning experiments on 3-channel images with height and width smaller than expected (e.g., smaller than 224), the networks generally run smoothly and often get decent performances.
Hence, it seems to me that the "minimum height and width" is somehow a convention and not a critical parameter. Am I missing something here?
There is a limitation on your input size which corresponds to the receptive field of the last convolution layer of your network. Intuitively, you can observe the spatial dimensionality decreasing as you progress through the network. At least this is the case for feature extractor CNNs which aim at extracting feature embeddings from the input image. That is most pre-trained models such as vanilla VGG, and ResNets networks do not retain spatial dimensionality. If the input of a convolutional layer is smaller than the kernel size (even if/when padded), then you simply won't be able to perform the operation.
TLDR: adaptive pooling layer
For example, the standard resnet50 model accepts input only in ranges 193-225, and this is due to the architecture and downscaling layers (see below).
The only reason why the default pytorch model works is that it is using adaptive pooling layer which allows to not restrict input size. So it's gonna work but you should be ready for performance decay and other fun things :)
Hope you will find it useful:
https://discuss.pytorch.org/t/how-can-torchvison-models-deal-with-image-whose-size-is-not-224-224/51077/3
What is Adaptive average pooling and How does it work?
https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html
https://github.com/pytorch/vision/blob/c187c2b12d86c3909e59a40dbe49555d85b98703/torchvision/models/resnet.py#L118
https://github.com/pytorch/vision/blob/c187c2b12d86c3909e59a40dbe49555d85b98703/torchvision/models/resnet.py#L151
https://developpaper.com/pytorch-implementation-examples-of-resnet50-resnet101-and-resnet152/

Arrangement of logic gates on an IC?

I'm just curious how data is physically transferred through logic gates. For example, does the pixel on my monitor that is 684 pixels down and 327 pixels to the right have a specific set or path of transistors in the GPU that only care about populating that pixel with the correct color? Or is it more random?
Here is a cell library en.wikipedia.org/wiki/Standard_cell that is used when building a chip for a particular foundry, kind of like an instruction set used when compiling. the machine code for arm is different from x86 but the same code can be compiled for either (if you have a compiler for that language for each of course). So there is a list of standard functions (and, or, etc plus more complicated ones) that you compile your verilog/vhdl for. A particular cell is hardwired. There is an intimate relationship between the cell library and the foundry and the process used (28nm, 22nm, 14nm, etc). Basically you need to construct the chip one thin layer at a time using a photographic like process, the specific semiconductors and other factors for a specific piece of equipment may vary from some other, so the 28nm technology may be different than 14nm so you may need to construct an AND gate differently thus a different cell library. And that doesnt necessarily mean there is only one AND gate cell for a particular process at a particular foundry, it is possible that more than one has been developed.
as far has how pixels and video works, there is a memory somewhere, generally on the video card itself. Depending on the screen size, number of colors, etc that memory can be organized differently. Also there may be multiple frames used to avoid flicker and provide a higher frame rate. so you might have one screens image at address 0x000000 in this memory the video card will extract the pixel data starting at this address, while software is generating the next frame at say 0x100000.
then when it is time to switch frames based on the frame rate the logic may switch to displaying the image using 0x100000 while software modifies 0x000000. So for a particular video mode the first three bytes in the memory at some known offset could be the pixel data for the 0,0 coordinate pixel then the next three for 1,0, and so on. For a number like 684 they could start the second line at offset 684*3, but they might start the second at 0x400.
Whatever, for a particular mode, the OFFSET in a frame of video memory will be the same for a particular pixel so long as the mode settings dont change. The video card due to the rules of the interface used (vga, hdmi, or interfaces specific to a phone lcd for example) has logic that reads that memory and generates the right pulses or analog level signal for each pixel.

Multiple matrix inversions parallel on GPU [duplicate]

I'm attempting to optimise an application in realtime 3D modelling. The compute part of the application runs almost entirely on the GPU in CUDA. The application requires the solution of a small (6x6) double precision symmetric positive definite linear system Ax = b 500+ times per second. Currently this is being done with an efficient CPU based Linear Algebra library using Cholesky but necessitates the copying of data from the CPU - GPU and back to GPU hundreds of times per second and the overhead of kernel launches each time etc.
How can I calculate the solution to the linear system on the GPU solely without having to take the data onto the CPU at all? I've read a little about the MAGMA library but it seems to use hybrid algorithms rather than GPU only algorithms.
I'm prepared for the fact that the solution of an individual linear system on the GPU is going to be a lot slower than with the existing CPU based library but I want to see if that can be made up for by removing the data communication between the host and device and the overhead of kernel launches etc hundreds of times per second. If there is no GPU only LAPACK-like alternative out there how would I go about implementing something to solve this particular 6x6 case on the GPU only? Could it be done without a huge time investment with GPU BLAS libraries for example?
NVIDIA posted code for a batched Ax=b solver to the registered developer website last fall. This code works for generic matrices, and should work well enough for your needs provided you can expand the symmetric matrices to full matrices (that should not be an issue for a 6x6?). As the code performs pivoting, which is unnecessary for positive definite matrices, it is not optimal for your case, but you may be able to modify it for your purposes as the code is under a BSD license.
NVIDIA's standard developer website is experiencing some issues at the moment. Here is how you can download the batched solver code at this time:
(1) Go to http://www.nvidia.com/content/cuda/cuda-toolkit.html
(2) If you have an existing NVdeveloper account (e.g. via partners.nvidia.com) click on the green "Login to nvdeveloper" link on the right half of the screen. Otherwise click on "Join nvdeveloper" to apply for a new account; requests for new accounts are typically approved within one business day.
(3) Log in at the prompt with your email address and password
(4) There is a section on the right hand side titled "Newest Downloads". The fifth item from the top is "Batched Solver". Click on that and it will bring you to the download page for the code.
(5) Click on the "download" link, then click "Accept" to accept the license terms. Your download should start.

AS3: How to access pixel data efficiently?

I'm working a game.
The game requires entities to analyse an image and head towards pixels with specific properties (high red channel, etc.)
I've looked into Pixel Bender, but this only seems useful for writing new colors to the image. At the moment, even at a low resolution (200x200) just one entity scanning the image slows to 1-2 Frames/second.
I'm embedding the image and instance it as a Bitmap as a child of the stage. The 1-2 FPS situation is using BitmapData.getPixel() (on each pixel) with a distance calculation beforehand.
I'm wondering if there's any way I can do this more efficiently... My first thought was some sort of spatial partioning coupled with splitting the image up into many smaller pieces.
I also feel like Pixel Bender should be able to help somehow, however I've had little experience with it.
Cheers for any help.
Jonathan
Let us call the pixels which entities head towards "attractors" because they attract the entities.
You describe a low frame rate due to scanning for attractors. This indicates that you may possibly be scanning an image at every frame. You don't specify whether the image scanned is static or changes as frequently as, e.g., a video input. If the image is changing with every frame, so that you must re-calculate attractors somehow, then what you are attempting is real-time computer vision with the ABC Virtual Machine, please see below.
If you have an unchanging image, then the most important optimization you can make is to scan the image one time only, then save a summary (or "memoization") of the locations of the attractors. At each rendering frame, rather than scan the entire image, you can search the list or array of known attractors. When the user causes the image to change, you can recalculate from scratch, or update your calculations incrementally -- as you see fit.
If you are attempting to do real-time computer vision with ActionScript 3, I suggest you look at the new vector types of Flash 10.1 and also that you look into using either abcsx to write ABC assembly code, or use Adobe's Alchemy to compile C onto the Flash runtime. ABC is the byte code of Flash. In other words, reconsider the use of AS3 for real-time computer vision.
BitmapData has a getPixels method (notice it's plural). It returns a byte array of all the pixels which can be iterated much faster than a for loop with a call to getPixel inside, nested inside another for loop . Unfortunately, bytearrays are, as their name implies, 1 dimensional arrays of bytes, so iterating each pixel(4 bytes) requires using a for loop, not a foreach loop. You can access each pixel's color channel individually by default, but this sounds like what you want (find pixels with a "high red channel"), so you won't have to bitwise-and each pixel value to isolate a particular channel.
I read somewhere that getPixel is very slow, so that's where I figured you'd save the most. I could be wrong, so it'd be worth timing it.
I would say Heath Hunnicutt's anwser is a good one. If the image doesnt change just store all the color values in a vector. or byteArray of whatever and use it as a lookup table so you don't need to call getPixel() every frame.

How does a stored image or video appear in binary on the hard drive?

In attempting to understand the concept of binary, my question is "How does a stored image or video look in binary on the hard drive?"
As for how it is physically stored, it depends on the technology of your storage device. For a hard disk drive you can read about it on Wikipedia.
The next layer is how the controller on the storage device sends the data to the motherboard.
Then how the motherboard sends the data to the operating system.
Then how the operating system stores the data on the disk (what file system it uses; NTFS is common in modern Windows installations.)
Finally, what you'll see when reading the data is groups of 8 bits (bytes) which are basically 8 on/off flags, which together form 256 possible combinations. Which is why most image formats are stored with colors varying from 0-255 for each channel (red, green, blue.) Most raw formats are stored linearly, so you can actually try reading them yourself. A raw image where the first pixel is red (assuming it stores the pixels left-to-right, top-to-bottom) would look like this in bits:
11111111 00000000 00000000
red green blue
For more information, you'll have to be more specific.
Every file on disk is basically a number of bits in a row.
The difference between "binary" and "something else" (often called ASCII, or text, or...) is that non-binary is basically human readable when opened in a text editor. In other words: the bytes in the file map to human readable letter (and other) characters in some way a generic text editor knows how to handle.
So called binary files can only be interpreted back to that data that they actually contain when you know the format which was used to map the content (image, sound, movie, whatever) to a stream of zeros and ones. This mapping is called the file format and is usually part of the file name in the form of an extension. You need a piece of software that knows the mapping and can interpret the row of bits back into the original content.
Mind you: this is usually only a hint. Renaming a JPEG image file to have a .mp3 extension doesn't change it into an audio file; it is still just an image file, containing the image (=dimensions of the image in pixels + the color values for each pixel, basically) encoded into a stream of zeros and ones in the way described in the JPEG file format encoding description.
Check out the link: Binary File Format
The images are sequential flow of colored dots... But it's not hardware dependent i.e. your hard-disk will store any thing in any format which your OS provide it to... However the OS maintain standards of saving file formats other wise a JPG image will not be valid one across different platforms...
Simillarly the videos are flows of images and voice data multiplexed into a sequential flow.
All data on commercial computer systems are stored in binary format (we'll ignore scientific studies into quantum and optical computing).
At the lowest level all files and processing by a computer are performed in binary. This is because our computing systems are powered by the flow of electrons. They either flow or don't. Electric current is on or off. 1 and 0.
The data stored on a hard disk is there due to pulsing of the hard disk write head coil which magnetises spots of hard disk material. These magnetised spots cause a current pulse in the read coil (in actual fact the read and write coils are the same) as the hard disk head passes over them. Hence the data is read as a stream of current pulses, 1s and 0s.
Now processors are built to accept process a finite number of binary "pulses" or data bits simultaneously (it can be anything from 4 bits upwards). Hence a modern 64bit PC can process 64 binary data bits i.e. 64 1s and 0s, at any one time.
Now at a higher level, although all files are stored as binary and can be read in binary format we help the processing of them by telling the processor what format to read them in. This is so that it process the file data as small chunks e.g. 8 bits or 1 byte for ASCII text.
The operating system provides the processor with a template for any given file. This is set up in an extension relation table. And according to what the file extension is the operating system will expect that data to be in a particular format and link it to code that can be used by the processor to interpret it. Hence changing a file name extension will confuse the processor as it won't interpret the data correctly. That's why changing the filename from *.jpg to *.exe won't show the image, as the processor has been told to expect executable code, which the data within the file clearly isn't.
So back to your original question the image within the jpeg file has been encoded as series of 1s and 0s in a specific order.
I'm not sure how exactly they are arranged, but as an example:
A picture was captured and stored as a bitmap at a resoultion of 800 x 600 in 24bit colour. The first pixel is stored as 3 bytes (8 bit binary) representing a red, green and blue value. The value of each byte dictates the intensity of that colour. 0 - 255, with 0 being none at all to 255 being the highest value. Unsigned 255 in binary is 11111111, I won't confuse you with 2's complement for signed values. So the full picture will require a file of minimum 1,440,000 bytes or about 1,406 kilobytes (a kilobyte being 1024 bytes).
The binary as follows: 000010101011010101101010101 would be stored on a hard drive actual microscopic bumps and troughs by changing the polarity of the metalic grains on the disk in specific regions.Binary is actually read from right to left, obviously the opposite way of how most people read text.
If your question is really "how does it look": See Figure 4 on this page; it shows high resolution measurements of a hard drive.
Although googletorp's answer does not look very helpful, it's not totally untrue. To store binary data, the only thing you need is the possibility to have two different states for each storage unit (be it an on/off switch, hole or no hole in a punchcard, or, as in the case of hard drives, the direction of ferromagnetic particles).
The Wikipedia page for the BMP File Format contains an example(Including all hex values) of a 2x2 pixel bitmap image, it should be very good at explaining the basis of the binary representation of an image.
In general if you're really curious how the binary looks for a file you could always use a Hex Viewer and take a look yourself :) I normally use od on Linux to dump the binary information of a file. I'm sure you can google a good Hex Editor for Windows (or maybe someone can suggest one.)
Headers ? Every file created contains header information, that are also stored as binary bits along with the data. The header bits of a files holds the information of header length, file type, file location and length. Now each application is designed to read certain file types. If the application tries to open a file on hard disk which has a header with a different file format, that is not supported by the application, it fails to read the file. Thus a text file cannot be opened using a media player. Because a media player expects a file that contains a header with audio file format binary pattern. Similarly, same in case of picture files.