If you haven't read my other article on computer audio here's a brief, but super awesome, bullet point recap:
Audio in the computer is real-time from the point it is copied from the hard drive into the first piece of software for processing. It therefore does not file the rules of buffered (error corrected) file transfer.
Every piece of software, process, conversion, filter, or volume change creates a new version of the file. This new version is created by voltage from the power supply. This means that hundreds of new versions of the file are created between the time the file leaves the hard drive to the point where it leaves the computer.
The element that determines the quality of this 'new version' of the file is how quickly the CPU can process the mathematics and how clean the power supply is that's creating it.
The CPU does not talk directly to the RAM. It talks to the memory controller, which is embedded in modern CPUs. Having the memory controller on the CPU is faster because it eliminates the memory controller hub as a middle man.
RAM is connected to the CPU or memory controller through a memory bus, which contains data, access, and control lanes. The data is the actual data being read (transferred from memory to controller) or written (transferred from controller to memory from CPU). The address bus tells the memory where to store this data. Control tells the memory whether to read or write. The control bus also contains the memory clock signal.
The memory controller determines the maximum capacity per memory module. If you insert 2133mhz modules in a system that can only handle 1333mhz then the memory will run at 1333 MHz. The clock rate determines the bandwidth, more on that later.
There are several types and layers of memory.
The first type of memory is cache memory and there are, currently, three types called Level one, two and three. The lower the level the faster the speed and lower the quantity.
Cache memory is extremely fast, in theory clocking as fast as the CPU itself (which is a big deal and I'll explain that later). What Cache memory does is it pre-fetches and buffers data that is requested by the CPU for processing.
The most critical element is that all data that is to be processed by the CPU must first be loaded into memory. Level one memory loads instantly, when it fills up in capacity level two pre-fetches data further and so on. The faster this can occur and the larger the quantity the more effectively the CPU can process algorithms in the computer.
DRAM And Timing
The first number, xxx, indicates the clock speed that the memory chipset supports (its maximum speed). DDR3-1333 can work up to 1333 MHz. However, this number does not represent the real clock speed of the memory. The real clock speed is actually half that number. So the functional speed of DDR3-1333 is actually 666 MHz.
The second number indicates the maximum transfer rate that the memory reaches. So if you have PC3-10600 for DDR3-1333 memory you have transfer rates of 10,664 MB/s per memory stick. That means quad channel memory will, in theory, be four times this fast. Bandwidth is the maximum theoretical transfer rate and these numbers assume the CPU is transferring data at each clock cycle. For instance, on DDR3-1333 memory 1.3 billion transfers per second will occur. This never happens of course. Using dual, triple, or quad channel memory increases the number of data wires available in the memory bus.
It gets trickier. If you purchase DDR3-2133 memory they won't automatically run at 2133 MHz. Generally these memories will be limited to the maximum standard speed on the motherboard, in most cases 1333 MHz.
So why the extra clock speed? Generally the extra clock speed means the memory can be overclocked to those higher rates assuming the motherboard can handle it.
It is important to notice that these increases in performance are only on the memory subsystems. Doubling memory bandwidth performance does not translate to a computer that's twice as fast. However, you are much better off having 4x 4 GB memory sticks than one 16 GB memory sticks because of the speed at which four "workers" can transfer data compared to a single larger worker.
Let's look into the temporization of memory (timing and latency).
Why is timing important if I have a fast clock rate? Two memory modules with the same maximum transfer rate may have different delays internally. Memory timings (4-4-4-8 or 9-9-9-24) indicate the amount of clock cycles that it takes the memory to perform a certain operation.
The numbers represent certain operations: CL-tRCD-tRP-tRAS-CMD. Memory is internally organized as a matrix where data is stored at the intersections of various lines and columns.
CL: CAS Latency. The time it takes between the processor requesting data and the memory returning it.
tRCD: RAS to CAS delay. The time it takes between the activation of the line (RAS) and the column (CAS) where the data are stored in the matrix
tRP: RAS Precharge. The time it takes to stop accessing one piece of data and begin accessing another.
tRAS: Active to Precharge Delay. How long the memory has to wait until the next access to the memory can begin.
CMD: Command Rate. The time it takes between the memory chip having been activated and when the first command may be sent to the memory.
Lower timings may run less stably. When overclocking the memory to achieve greater transfer rates often these timings must be increased to maintain stability.
As clock rate increases the period decreases. That means more clock cycles can happen more quickly. There then becomes a very significant balance between clock frequency and CAS latency. How many cycles the memory must wait and how fast those cycles are. In some cases a slower clock with lower latency is faster than a faster clock with longer latency. Sometimes it's the opposite.
For DDR3-1333 memory running at 1333 MHz the period of each clock cycle would be 1.5ns. However, this 1333 MHz clock is actually running at 666.66 MHz and so it would have a delay of 10.5ns to start delivering data if it has a CAS Latency of 7. Or 7-7-7-24. If the CAS Latency was 9 this would take 13.5ns. This is calculated by T=1/f.
Here's the cool part: Modern DDR3 memory implement what's called burst mode. This is where data stored in the next address can exit the memory at one clock cycle (not 7 or 9). So the first data would delay CL clock cycles, but the next data would be delivered right after the first came out from memory without waiting. This is why memory is labeled as having twice its actual clock rate because it can accomplish two actions in the same amount of time.
Each time the CPU must wait for data to be loaded latency occurs. Latency is a delay in processing. When there is a delay the real-time data is just sitting in limbo waiting to be loaded, picking up noise and introducing distortion.
For every process, service, application, etc that occurs the data must be re-loaded into memory for CPU Processing.
A single CPU acting at its clock frequency must access cache memory, which pre-fetches data and communicates with the memory controller. It can only process data as quickly as the bandwidth and latency of the RAM allow.
Using quad channel RAM increases the bandwidth, bandwidth is the easy problem to solve. It is then limited by the wait time that occurs as it waits for memory to send the data.
Even the best CPUs and RAM are not fast enough to process audio in real time. So how can we improve upon this performance?
What if 12 cores aren't enough to process complex audio data? There are now General Processing Units that are quite spectacular in their ability to process floating point mathematical data. Many of these GPU cards can process of up 4+ Taraflops of data per second (floating point operations per second). Compared to a standard CPU, which is only 10Gigaflops per second this is a tremendous increase in threads available for computing. What does this do? It's like having a 1000 core CPU for processing.
Limitations And Misconception
What To Expect
If upgrading from a 3.2GHz Core i5 to a 3.2GHz i7 merits a 20% improvement in performance, upgrading to dual 3.4GHz LGA2011 i7 CPUs merits a 100% improvement in performance. Adding the GPU cards is 200%.
But what does that mean sonically?
Many people have been afraid to go to computer audio because it, until now, wasn't as high performance as a reference level CD Transport or analog system. But I'm here to tell you now, that with the right configuration of power supplies and computing power you can not only achieve better sonic performance than any transport on the market, but far surpass it in terms of resolution and versatility.
See our article "Building the ultimate music server" for more information on how we here at Core Audio Technology configured our $65,000 super server to create a new reference for audio reproduction.