Cache Memory
Cache memory is a type of high-speed memory designed to speed up processing. Cache (pronounced "cash") is derived from the French word cacher, meaning to hide. A cache attempts to predict which information is about to be used, using an algorithm (logical formula) based on probabilities and proximity. Proximity means how close something is to something else; in this case, instructions or data bytes.
Typically, a memory cache is a separate SRAM chip, running much faster than DRAM. Whichever instruction or data is most likely to be used next, is stored in the cache. When the CPU looks for the next instruction, the chances are good that it will find it faster in small cache memory than in large main memory.
The Memory Hierarchy and Caches
A cache is like an expectation. If you expect to see a piece of information and it's right beside you, you can access that information much faster than if you had to go look for it. When you open a book and look at page 22, logic dictates that you'll look at the top of the page, then at the middle of the page, then at the bottom of the page, and then at the top of page 23. That's how computer caching operates.
Think of the example at the beginning of this chapterthe one where you're asked to repeat the words on the last page of the previous chapter. If you were expecting to be asked this question, you could cache the page by putting your finger in the book at that location. This would create a pointer to the page you were going to be asked to read, and you would be waiting for the probable next requestthe instruction to read the page.
Because memory size is always increasing, more time is needed to decode increasingly wider addresses and to find stored information. Larger numbers mean we can have more addresses, but a memory register can store only one digit of an address. The larger the numbers, the wider the registers must be, and the wider the corresponding data bus used to move a complete address.
One solution is a memory hierarchy. "Hierarchy" is a fancy way of saying "the order of things; from top to bottom, fast to slow, or most important to least important." Memory hierarchy works because of the way that memory is stored in addresses. Going from fastest to slowest, the memory hierarchy is made up of registers, caches, main memory, and disks. When a memory reference is made, the processor looks in the memory at the top of the hierarchy (the fastest). If the data is there, it wins. Otherwise, a so-called miss occurs, at which time the requested information must be brought up from a lower level of hierarchy.
A miss in the cache (that is, the desired data isn't in the cache memory) is called a cache miss. A miss in the main memory is called a page fault. When a miss occurs, the whole block of memory containing the requested missing information is brought in from a lower, slower hierarchical level. Eventually, the information is looked for on the hard diskthe slowest storage media. If the current memory hierarchy level is full when a miss occurs, some existing blocks or pages must be removed for a new one to be brought in.
A hierarchical memory structure contains many levels of memory, usually defined by access speed. A small amount of very fast SRAM is usually installed right next to the CPU, matching up with the speed and memory bus of the CPU. As the distance from the CPU increases, the performance and size requirements for the memory are reduced.
CAUTION
SMARTDRV.SYS and SMARTDRV.EXE are DOS program utilities that provide disk caching. The efficiency of a cache is reported as its hit ratio. To send an efficiency report to the screen, issue the command SMARTDRV /S from a DOS command prompt.
L-1 and L-2 Cache Memory
The Intel 486 and early Pentium chips had a small, built-in, 16KB cache on the CPU called a Level 1 (L-1), or primary cache. Another cache is the Level 2 (L-2), or secondary cache. The L-2 cache was generally (not all the time, nowadays) a separate memory chip, one step slower than the L-1 cache in the memory hierarchy. L-2 cache almost always uses a dedicated memory bus, also known as a backside bus.
CAUTION
For the purposes of the exam, you should remember that the primary (L-1) cache is internal to the processor chip itself, and the secondary (L-2) cache is almost always external. Modern systems may have the L-1 and L-2 cache combined in an integrated package, but the exam differentiates an L-2 cache as being external. Up until the 486 family of chips, the CPU had no internal cache, so any external cache was designated as the "primary" memory cache. The 80486 introduced a 16KB internal L-1 cache. The Pentium family added a 256KB or 512KB external, secondary L-2 cache.
Technology tends to move toward consolidating components, for speed and cost efficiencies. The Super I/O chips combined many of the original XT adapters (for example, keyboard, COM and LPT ports) into a single package, and central processors soon moved in the same direction. Although caches were originally placed outside the chip die, new developments paved the way to move them inside the chip. A die, sometimes called the chip package, is essentially the foundation for the multitude of circuit traces making up a microprocessor. Today, we have internal caches (inside the CPU housing) and external caches (outside the die).
Internal and External Memory
When we speak of a chip's internal bus, we mean that the bus is cast right on the manufacturing die, along with the chip. These chip packages are sort of like an extremely small motherboard, in that they're the foundation for the many transistors, diodes, buses, caches, and a host of other electrical components we call a central processor. Don't confuse a chip package with a chipsetthe entire set of chips used on a motherboard to support a CPU. The CPU is a chip package.
NOTE
A manufacturing mask is the photographic blueprint for the given chip. It is used to etch the complex circuitry into a piece (chip) of silicon.
An external bus is the place where information moves out of the chip die to another destination (for example, an L-2 cache). Because bus width is typically measured by the number of bits that the bus can process at one time, we can have an 8-, 16-, 32-, and 64-bit bus. The number of data lines, each carrying a stream of bits, indicates the width of the bus. The bus width generally depends upon where the processor is directing information.