|
Balance of the force.The latest attempts to crack the nut of quantum chromo dynamics.
Some problems are just too hard to solve without plugging numbers in and seeing what happens. One such problem is quantum chromodynamics (QCD), the theory of the strong force of nature. If QCD were a nut, you would need an enormous sledgehammer to crack it.
Which explains why, for the last five years, an international team of physicists, computer scientists and IBM engineers has been creating three examples of just such a computing sledgehammer: a 12,000-processor machine known as QCD On Chip, or QCDOc. The UK's QCDOC is hosted by the Edinburgh Parallel Computing Centre and has been operational since January 2005. When every processor is dedicated to one task, it can plough through 10Tflops, or ten million million floating point operations every second.
The physicists will use those flops to try to solve the equations that arise in QCD, and to find quantities, such as the masses of certain elementary particles, that will allow them to test the limits of the Standard Model. the current theory of these particles. One of the many questions the new computers could help to answer is why there appears to be more matter than antimatter in the universe, and if existing theories can explain it.
That would be a big step along the road to a single theory that can explain all of the four fundamental forces, the so-called Theory of Everything. So the stakes are high.
The strong force binds together quarks inside protons and neutrons. It's mediated by a particle called the gluon, which is analogous to the photon that mediates the electromagnetic force. The important difference between quantum chromodynamics and electromagnetism is that, unlike photons, gluon particles interact strongly with each other. The equations that pop out are fiendishly non-linear. Solving them for a chunk of continuous space-time is impossible if you take an analytical approach. The problem needs to be made amenable to numerical analysis, or'discretized', which is what US physicist Ken Wilson did in 1974 by developing a theory called lattice QCD.
During the project, when a 128-node QCDOC had just completed testing, I spoke to Professor Robert Mawhinney, a physicist at Columbia University in New York.
"What these lattice QCD calculations are limited by is that for a problem of fixed size, we don't have enough computing horsepower to get to the end," says Mawhinney. "So we need to put even more processing power on a problem of fixed size. As we do that, the amount of the problem that ends up on any particular processor goes down.
"This issue is true for all parallel computing. The problem becomes how much power do you have on your local chip to do floating point calculations, versus how quickly can you send information to your neighbour to get the information you need from them?"
Accelerated interface. To address that issue in QCDOC, great emphasis was put on creating an accelerated interface between the PowerPC processor and embedded DRAM on each chip. As a result, there are three bidirectional ports to the memory: one for the processor local bus, one to a direct memory access unit (which is used to move data between the embedded DRAM and external memory), and one to the PowerPC.
Each port has two read buffers and two write buffers, each of 1,024 bits. Each pair of read buffers will pre-fetch a maximum of all those bits before the actual read request. These features mean two separate data streams can be efficiently read simultaneously, which is a common requirement in lattice QCD.
Another aspect of the design that speeds things up is a direct connection between the embedded DRAM controller and the processor data bus. The controller operates at the full processor speed, which is three times faster than usual. Because the crucial variables can often be contained entirely in the on-chip DRAM, this connection, along with the other features, means the whole calculation is accelerated. With 12,288 of these specialist chips available (see 'Logical progress' ), the next problem is how to arrange them in a network that makes them effective for QCD calculations. "Lattice QCD is naturally a four-dimensional problem, because space and time are treated on an equal footing - that's just relativity," says Professor Richard Kenway. director of EPCC. "As it turns out, for a technical reason, it's good to formulate the problem in five dimensions. That's not necessary, it's just a rather modern way of approaching the simulations."
In fact, the topology chosen is a six-dimensional mesh. The spare dimension means the machine can be reconfigured in software for different tasks, without engineers having to run about pulling out and plugging in wires. The 6D mesh arrangement means that a special serial communications unit (SCU) is used to control data input and output to each of a given node's 12 nearest neighbors (3D equals six nearest neighbors, add two for each extra dimension).
The conundrum continues. Now, a key part of a lattice QCD calculation is global summation. Sending the result of these sums to each of the 12,288 nodes in a complex 6D mesh is potentially slow, so the SCU includes a small amount of logic dedicated to the task. It connects the 12 send and 12 receive units, and allows a send-to-all message to be sent through the network in one operation.
"The problem is very intense computing with very little memory requirements," Professor Kenway continues. "The whole issue is how to feed the data through the CPU fast enough, really. It's a local problem, so you just need fast communications to your local memory and to the local memories of your neighbors. "It's just the balance of the amount of arithmetic you have to do, per unit of data that you have to ship from memory and ship back to memory. So it's purely a feature of QCD."
It's those features of QCD that have led to the features of QCDOC. As it were, the nature of the nut has dictated the shape of the sledgehammer.
Easy as QCD. Lattice QCD imagines a regular grid structure through the four dimensions of space-time, and assumes the effects of the quantum fields of the various particles only to have a value at its vertices (see the above image). With that model in place, Monte Carlo techniques are used to simulate a series of values at these lattice points.
Averaging the results over several hundred random configurations reduces error to a low percentage rate. Apart from some considerations such as the size of the lattice and its fineness, that's the essence of the problem to be solved. So how do you choose a computer specification and architecture that can efficiently tackle it?
Logical progress. Field programmable gate arrays are the future.
One of the most recent developments in high performance computing today is the use of chips called field programmable gate arrays, or more commonly known as FPGAs. These reprogrammable logic devices were until recently too expensive for use in mass market products, and were used primarily for prototyping chip designs that were then migrated onto an application specific IC (Asic). However, with the parts now on FPGAs are arguably the largest. Most production chips in the world, and are wherever their massively parallel nature distributed SRAM can help crack complex computational problems.
"Over the last 18 months, FPGAs have gotten big enough to do very complicated things;' says Mark Parsons, commercial director of the Edinburgh Parallel Computing Centre (EPC). For example, you can now have, say, 20 floating point units on them, along with a PowerPc. So they can handle complex numerical algorithms:' Among an FPGA's attributes are huge 1/0 bandwidth, which equates to feeding data in along a motorwily, rather than a trunk road; and distributed memory, so calls to off-chip caches aren't necessary, which can dramatically speed up calculations.
In May, a £3.6million initiative to help supercomputer users make use of FPGAs was launched in Scotland. The FPGA High Performance Computing Alliance involves EPCC, three Scottish companies that specialize in reconfigurable and FPGA firm Xilinx. It will try to standardize approach taken to using the devices in "We do have the aspirations to go future things for QCD," says Richard Kenway, director. "It may be that the FPGA into that at some point" source PC Plus Mag.
|