- Conceptual Overview
- Read-Only Memory (ROM)
- Random Access Memory (RAM)
- Cycles and Frequencies
- Summary—Basic Memory
- Cache Memory
- Memory Pages
- Rambus Memory (RDRAM)
- Double Data Rate SDRAM (DDR SDRAM)
- Video RAM (VRAM)
- Supplemental Information
- Packaging Modules
- Memory Diagnostics—Parity
- Exam Prep Questions
- Need to Know More?
Cycles and Frequencies
Any business can make more money by choosing different growth paths. One path is to move the product along faster. Speeding things up means that in a given time period, we can ship out more stuff (technical term). More stuff means more money, and the business grows. System performance is no different in a computer, and some improvements have come about by simply making things go faster.
Taking half as long to move a byte means moving twice as many bytes in a given time. If it takes 10 ticks to move one byte, then using 5 ticks to move the same byte means faster throughput. In other words, we can keep the byte the same size and move it in less time. This is essentially the underlying principle of multipliers and half ticks, and gave rise to double data rate (DDR) memory.
The power supply converts alternating current (AC) to direct current (DC), but that doesn't mean we never see alternating current again. Consider the oscillator, vibrating back and forth very quickly. How would that be possible, unless the associated electrical charges were moving back and forth? In fact, some components in a computer re-convert the incoming direct current to very low amperage alternating current. This isn't ordinary AC power, but means that small amounts of electricity reverse direction (polarity) for the purposes of timing and signaling.
Timing cycles are represented as waves moving up and down through a midpoint. The height between the top (peak) and bottom (trough) of a single wave cycle is called the amplitude. The number of waves being generated in a single second is called the frequency. We mentioned frequency in Chapter 2, in our discussion of bandwidth and broadband, but let's take a closer look at the specific concept.
Signal information moves at some frequency number. In Chapter 11, "Cables and Connectors," we reference various types of wire specifications, but as an example, Category 4 low-grade cable specifies a transmission rate of 20MHz. This means signals are passing through the wire at a cycle rate of twenty million waves per second. To produce both the timing of the cycles, and the characteristic up-reverse-down pattern of a wave, the electrical current must be moving in an alternating, cyclical flow pattern. (Think of your hand moving back and forth at one end of a tub of water. Although your arm is moving horizontally, the pulses of water are measured vertically.) The reversing directions of alternating current produce pulses of electricity that we see as waves on an oscilloscope.
Clock Speed and Megahertz
Clock speed is a frequency measurement, referring to cycles per second. It's usually written in megahertz (MHz), where "mega" refers to 1 million cycles per second and "giga" refers to one billion cycles per second. One cycle per second is 1Hz. The motherboard oscillatora sort of electronic clockis configured through jumpers to produce a specific frequency. Once again, the number of waves passing a given point in one second, from the start to finish of each wave, is the frequency of the cycle.
A single clock tick (wave cycle) is measured from the point where a wave begins to move upwards, all the way down and through a midline, to the point where the wave moves back up and touches the midline again. Figure 3.2 is a sine wave, with smooth up and down movements very similar to waves you see in water. Waves come in various shapes, but the two we'll be concerned with are the sine wave and the square or pulse wave. When you look at any signal wave on an oscilloscope, you'll see that the name refers to its actual shape.Figure 3.2 A sine wave.
When you hear a sine wave generated on a synthesizer oscillator (not so different from a computer oscillator), it sounds very smooth, like a flute. The many steps taking place as the wave moves up and down make it an analog signal. We'll discuss the difference between analog and digital in Chapter 6, "Basic Electronics." A pulse wave, on the other hand, sounds very harsh, like a motorcycle engine. Pulse waves have three components we're interested in: the midline, the peak, and the trough. Figure 3.3 shows a pulse or square wave.Figure 3.3 A square wave.
Note that in Figure 3.3, we've highlighted the top and bottom of the wave with a heavier, thicker line. The actual wave is the same signal strength, but we want you to see how a pulse wave is much like the on/off concept of any binary system. When we speak of the leading edge of a wave, we can also speak of the immediate-on, top of a pulse. Likewise, the trailing edge can be the immediate-on, bottom of the wave. The top is one polarity and can take on a +1 setting, whereas the bottom is the reversed polarity and can take on a -1 setting. When the wave is at the immediate-off centerline, it has a 0 setting.
A computer timing oscillator is a piece of crystal. When it's connected to an electrical current, the crystal begins to vibrate, sending out very fast pulses of current. Pulses from the oscillator enter a frequency synthesizer, where the main frequency can be changed and directed to different components. The various fractional speeds are set with jumpers. Generally, the motherboard uses one fraction of the crystal's vibration, which constitutes the motherboard speed. The CPU uses a different fraction, usually faster than the motherboard.
This is highly simplified for the purpose of creating an example only.
Suppose the crystal vibrates at 660MHz, and the motherboard speed is one twentieth of that: 33MHz (660/20). If the CPU uses one fifth of the crystal's frequency, it runs at 133MHz (660/5). That means the CPU is also running four times faster than the motherboard (33x4), making it a 4X processor.
The original XT machines used the same timing frequency for all the components on the motherboard. The 80486 introduced the concept of multipliers and frequency synthesizers. Nowadays, we see various frequencies being assigned to such things as the processor, the front-side bus, the memory bus, memory caches, the expansion bus, and so forth. The frequency assigned to the CPU's internal processing can also be sent to a high-speed L-1 cache.
When you hear that a memory controller is synchronized to a processor bus, it means a certain timing frequency is being derived from the main oscillator and "sent" to both devices.
Have you ever watched a group of children playing with a jump rope? Part of the game is to move the arc of the rope around a cylinder of space at some speed. At the high end of the arc, the rope passes over the jumper's head. At the low end of the arc, the jumper has to jump up and create a gap for the rope to pass between his feet and the ground. Each jump is like a 1-bit data transfer. The speed of the rope is the timing frequency.
Suppose we have two groups of children, where the pair on the left is twirling their rope in one direction. Their friends on the right are twirling a second rope, twice as fast, in the opposite direction. Let's not worry about the jumping kids, but instead, watch each rope in slow motion. Figure 3.4 shows the centers of each rope as they come close together. (Note that the following physics and math are incorrect, but we're using an example.)
The rope to the left, in Figure 3.4, is producing one cycle for every two cycles on the right. The CPU typically attaches a bit of information (represented by the cylinder on the rope) to each of its own cycles (the high end of the arc). Notice that a transfer to the memory controller takes place in one cycle, but the "rope" in the CPU passes by twice. For every two ticks taking place inside the CPU, the components working with the motherboard clock "hear" only a single tick. When the CPU attaches a bit to each wave (each turn of its rope), it has to wait until the memory cycle is ready for that second bit.Figure 3.4 Relative cycle speeds and one missed transfer.
We can improve performance in the CPU by adding a small buffer, or cache, to the motherboard, close to the CPU. When the processor and memory controller's timing cycles are synchronized, the processor can offload a bit directly to memory. When their cycles are out of sync, the CPU can still move its second bit into the buffer and get on with something else. Figure 3.5 shows how a small buffer (the little guy in the middle), synchronized to the processor, can temporarily store bits until the memory controller is ready for a transfer.Figure 3.5 CPU transfers buffered to a "holding tank."
The small buffer we're talking about is the L-1 cache. In CPU-memory transfers, a buffer is the same as a cache. A critical difference is that memory caches do not work with probabilities. Each bit going into the cache is absolutely going to be sent to memory. When the L-1 cache fills up, the L-2 cache takes the overflow. If both the L-1 and L-2 buffers become filled, a Level 3 cache might be helpful. The goal is to ensure that bits are transferred for every single processor clock tick. Understand that the CPU can also recall bits from memory and use the caches. However, at twice the speed of memory, the CPU more often is ready, willing, and able to take bits while the memory controller is still searching.
To bring this together: Imagine installing a Pentium processor on a 66MHz motherboard, using a 4X clock multiplier. Internally, the Pentium moves data at 264MHz (call it 266Mhz). The memory controller runs at 66MHz (the speed of the motherboard). When the Pentium "hangs" a byte onto a clock tick, it may have to wait for up to four of its own cycles before the memory controller is ready to handle the transfer. This assumes we're using SDRAM and the controller "hears" the same ticks as the processor. Remember that DRAM had no timing link between the processor and CPU, and each component had to wait until the other wasn't busy before it could accomplish a transfer.
The PC100 Standard
Motherboard speeds eventually increased to 100MHz, and CPU speeds went beyond 500MHz. The industry decided that SDRAM modules should be synchronized at 100MHz. Someone had to set the standards for the way memory modules were clocked, so Intel developed the PC100 standard as part of the overall PCI standard. The initial standard made sure that a 100MHz module was really capable of, and really did run at 100MHz. Naturally, this created headaches for memory manufacturing companies, but the standard helped in determining system performance.
At 100MHz and higher, timing is absolutely critical, and everything from the length of the signal traces to the construction of the memory chips themselves is a factor. The shorter the distance the signal needs to travel, the faster it runs. Non-compliant modulesthose that didn't meet the PC100 specificationcould significantly reduce the performance and reliability of the system. The standard caught on, although unscrupulous vendors would sometimes label 100MHz memory chips as PC100 compliant. (This didn't necessarily do any harm, but it did leave people who built their own systems wondering why their computer didn't run as they expected.)
We evaluate memory speed partly on the basis of the actual memory chips in a module, and partly on the underlying printed circuit board and buses. Because of the physics of electricity, a module designed with individual parts running at 100MHz rarely reaches that overall speed. It takes time for the signals to move through the wire, and the wire itself can slow things down. This led to ratings problems similar to those involving processors, which are covered in Chapter 5.
PC66 Versus PC100
PC100 SDRAM modules required 8 ns DRAM chips, capable of operating at 125MHz. The extra twenty-five megahertz provides a margin of error, to make sure that the overall module will be able to run at 100MHz. The standard also called for a correctly programmed EEPROM, on a properly designed circuit board.
SDRAM modules prior to the PC100 standard used either 83MHz chips (12 ns) or 100MHz chips at 10 ns. They ran on systems using only a 66MHz bus. It happens that these slightly slower 100MHz chips could produce a module that would operate reliably at about 83MHz. These slower SDRAM modules are now called PC66, to differentiate them from the PC100 specification (with 8 ns chips).
As memory speeds increased, the PC100 standard was upgraded to keep pace with new modules. Intel released a PC133 specification, synchronized to a 133MHz chipset, and so it went. PC800 RDRAM was released to coincide with Intel's 800 series chipset, running at 800MHz. These days, we see a PC1066 specification, designed for even higher-speed memory. As bus speeds and module designs change, so too does the specification.
MHz to Nanosecond
SDRAM modules are rated in megahertz, so as to link the chip speed to the bus speed. To find the speed in nanoseconds, divide 1 second (1 billion nanoseconds) by the output frequency of the chip. For example, a 67MHz chip runs at 67-million cycles per second. If you divide one billion by 67 million, the result is 14.9, which rounds off to 15 ns.
You can use this same formula to make a loose comparison between processor speeds and memory modules. For example, we can take a 900MHz Pentium and divide one billion by 900 million. The result shows a CPU running at 0.9 nanoseconds. Compare a 12 ns SDRAM chip with this CPU and you can see how much faster the processor is running. Even when we take ultra-fast SRAM running at 2 ns, we can see a significant difference in speed. Understand that nanosecond timing numbers don't tell the whole story when it comes to performance.
NRZI and DDR
Instructions can be designed to begin from exact points in a wave cycle. This is another way of improving processor performance. When the cycle is going up, we refer to an "up tick." When the cycle is going down, we refer to a "down tick." Using an analog sine wave, we can use the midpoint for a 0 setting, and some amount of signal (other than zero) as a 1. This is the concept of Non-Return-to-Zero-Inverted (NRZI) encoding. NRZI encoding means that any variation in the voltage level produces a change in state. A steady voltage represents a 1, and any change at all in voltage represents a 0.
If we use a pulse wave, we can clearly differentiate between a zero, and two additional numbers: the +1 and the -1. Pipelining and double-data rate (DDR) memory both take advantage of the square design of a pulse wave to send two signals per clock tick.
Suppose your friend asks you to go buy a soda. Right as you turn to the door, he then asks you to hand him a pencil. Both instructions are mixed together, and you'll have to make a processing decision as to which takes priority. That decision moment slows down your overall actions. Essentially, when an asynchronous DRAM chip receives an instruction to store something, it runs into the same problem. It takes the first instruction, then processes it until it finishes. At that point, it "looks up," so to speak, to get another instruction.
On the other hand, when we're aware of our surroundings, someone can ask us to do something and also ask us to do something else when we're done. Although we're busy with the first task, we store the second task in a cache, knowing that as soon as we're finished with the first, we can begin the second. In other words, we don't have to be told a second time. This type of buffering (a very small memory cache) saves time for the person issuing the instructions.
Instead of being constantly interrupted, the clock and a pipeline cache in a memory module allows instructions to be organized one after the other. The process is called a pipeline system. Pipelining is a little like the way a CPU uses IRQ lines to make some sense out of the data stream chaos flying around in electronic space.
Using regular pipelining, a memory controller can write a bit to memory at the same time as it's "hearing" the next call from the CPU. Likewise, when it reads a bit from memory, it can have the next bit ready to go before the CPU asks for it. We'll encounter the concept of pipelining again when we take a look at Pentium processors in the next chapter. Dual-pipeline architecture means a chip can listen to a new instruction while it's completing the first instruction. This is yet another way to speed up any system or sub-system.
Pseudo Static RAM (PSRAM)
Synchronous DRAM takes into account interrupt timing and the motherboard clock, and works just like SRAM. Capacitors allow for higher density (and lower cost) of a DRAM chip, as opposed to the more expensive transistors on SRAM chips. Most SDRAM controllers are built into the North Bridge of the motherboard chipset.
Another type of memory is called Pseudo Static RAM (PSRAM). This is DRAM with built-in refresh and address-control circuitry to make it behave similarly to SRAM. It combines the high density of DRAM capacitors with the speed of SRAM, but instead of having to rely on the CPU for an accuracy check of the original "send," the built-in circuitry "remembers" the data correctly.
Larger Bytes and Wider Buses
The other way to improve performance is by increasing the size of the information packet (combining bytes) and moving everything at the original time. This is similar to increasing bus widths. For example, a piece of paper four inches wide and eight inches long can hold some number of words. Handing you a piece of paper still takes only one movement. But if the paper changes to eight inches wide, it can store a lot more words. In the same single movement, you receive much more information.