Argonne’s new supercomputer won’t be in full production until 2013, but it represents such a leap forward that just the first two prototype racks already rank among the top 100 fastest computers in the world.
The computer, named “Mira,” is an IBM Blue Gene/Q, the third generation in a line of supercomputers that has topped the performance charts. Argonne and its sister national lab Lawrence Livermore helped design the computer.
Mira will be a 10-petaflop machine, capable of carrying out 10 quadrillion calculations per second. If you recruited every single person on earth to solve one calculation per second, around the clock, it would take them more than two weeks to do the work that Mira will do in one second.
Any researcher with a question can apply for time on the supercomputer, typically in chunks of millions of processor-hours, to run programs for their experiments. This adds up to billions of hours of computing time, awarded through several U.S. Department of Energy programs; see sidebar for details.
Billions of Hours? Time on the supercomputer is measured in processor-hours, or the work done by one processor in one hour. Since Mira has almost 800,000 processors, that adds up quickly.
Scientists will use Mira to study exploding stars, nuclear energy, climate change, and jet engines, to name just a few projects.
Beyond providing hours of computing time, Mira itself is a stepping stone toward the next great goal of supercomputing: exascale speed, where computers will calculate quintillions of floating point operations per second. That’s a thousand times faster than today’s top machines.
What does a supercomputer need to be the best?
√ It’s Fast
Mira will be 20 times faster than Intrepid, its predecessor at Argonne. It will provide billions more processor-hours per year to the scientists, engineers, and researchers who use it to run complex simulations of everything from nuclear reactors to blood vessels.
The first supercomputers consisted of a handful of processors and memory units that were much faster than mainstream computers. As demand for ever-faster computers grew, and microprocessors got cheaper, the industry began instead combining hundreds—and then thousands—of processors, or “cores”, into one system. Argonne’s current supercomputer, Intrepid, has 163,840 cores; Mira will have 786,432. (A typical laptop has two cores.)
“Each individual core in the Blue Gene architecture is actually slightly less powerful than the ones found in a typical home desktop computer,” said Michael Papka, who heads the Argonne Leadership Computing Facility. “But because faster processors generate more heat, at the petaflops level it’s more efficient to run lots of lower-power processors.”
Mira’s sister machine, a Blue Gene/Q destined for Lawrence Livermore National Laboratory, will run at 20 petaflops. It could be the fastest supercomputer in the world once it’s built—if China and Japan don’t leapfrog us first.
√ It’s Green
Mira is expected to be the greenest supercomputer in the world, topping supercomputing’s Green 500 list even before it’s installed. It’s five times as energy-efficient as its predecessor.
It has to be:“If you took the current Blue Gene supercomputer and simply added processors to get it to exascale speed, which is 2,000 times faster than today’s machine, you’d need a couple of new power plants just to supply electricity to it,” said Pete Beckman, who heads Argonne’s exascale initiative. Each new computer has to break new ground for efficiency just to keep up.
Mira’s predecessor Intrepid, itself a revolutionary energy-saver when it was built, uses chilled water to cool air that is circulated around the processors. In Mira, copper tubes will pipe cold water directly alongside the chips, which saves power by eliminating the extra cooling step.
Mira also fits more cores onto a single chip. This arrangement reduces the distance that data has to travel between the chips, which speeds communication between cores and saves the energy lost when transporting data across long distances.
√ It’s Fault-Tolerant
“Even supercomputers can crash, which forces scientists to recompute results. But Mira is designed to run reliably for long periods.” — Argonne computer scientist Pete Beckman
“Any complex system designed by humans must anticipate breaking down,” Beckman said. “And since these supercomputers have many, many cores, there are many, many spots for something to go wrong.”
One or two misbehaving cores can halt the entire task, Beckman said, because most codes are designed to be interdependent: each core gets a task and they feed the results to one another. If one goes down, the rest of the operation gets hung up while waiting for the missing answer. Because a supercomputer’s system is so complex, designing them to crash less often—and to wreak less havoc when they do—is a delicate art.
There are several ways to make a computer more resilient; for example, reducing the number of parts. “Every time you’ve got a connector, you’ve got a potential for failure,” Papka explained. Mira’s memory chips and CPUs are soldered directly onto cards.
Another technique saves wear on data storage. As the computer works, it’s constantly saving, erasing, and rewriting data. This data is stored in separate memory stacks. Over time, this wears out the chips. One way to fight this is “wear leveling:” writing software that moves the storage location around the card to distribute the wear equally. It’s a bit like rotating tires on a car.
Argonne’s old Blue Gene/P system was designed go five to six days without error. In fact, the computer has consistently outperformed that hope—it often runs for weeks without crashing. Mira is predicted to be stable enough to run without crashing for up to 10 days, even though it has 623,000 more cores than Intrepid. Experts say that fault tolerance will become even more crucial as computers get closer and closer to exascale speeds.
√ It’s Easy to Use
Though some of Mira’s architecture is different, it is also grown from the same DNA as its predecessor at Argonne. For programming purposes, it’s very similar to its predecessors—only much faster.
“Users will be able to run the same codes on Mira as they did on its predecessor,” Papka said. “They will be able to jump straight into running their scientific programs without having to rewrite their codes from scratch.”
For the problems that do pop up, the Argonne Leadership Computing Facility, which runs the supercomputer, has a crack team to help users adapt their codes and fix unexpected bugs.
Solving Problems with Supercomputers
Mira’s numbers are impressive, but why do we need supercomputers in the first place?
“Supercomputers help the United States economy stay ahead of our competitors,” Beckman said. These days, more and more companies rely on modeling and simulation.
Take planes. “Say that an aerospace company wants to build a new airplane,” Beckman said. “In the old days, they’d have to physically build models of all the wing types they were considering and test them all in wind tunnels. Today, they can run a very detailed simulation and virtually test the air flow in hundreds of models before building just a few physical versions, which saves a lot of money and time.”
The oil and gas industries are also investing in computation. Plugging geologic models into computers creates maps that lead companies to the right places to drill, reducing the environmental and financial costs of drilling unnecessary wells.
That virtual design space is useful for more than consumer products and drilling. It’s also key to studying climate change.
The world’s climate is an extraordinarily complicated affair. It sews together temperature, cloud cover, vegetation, rainfall, wind, geography, ocean currents, even volcanic eruptions, all over the nearly 200 million square miles of Earth’s surface area.
Even today’s most accurate climate models use just a single data point to represent thousands of square miles. An area the size of Lake Michigan is represented by perhaps two data points, which can’t possibly convey the rich interplay of lake, dunes, forests, swamps and farmland—not to mention the sprawling urban heat island of Chicago—in enough detail to make accurate predictions. More powerful computers allow scientists to incorporate more and more data to create higher-resolution models.
“If you have an iPhone, you’re carrying a computer in your pocket that is far more sophisticated and powerful than the first supercomputer we built at Argonne back in 1953.” – Argonne computer scientist Charlie Catlett
With Mira, scientists will also be able to run their models faster and more often. More runs allow the scientists to test how much the model changes depending on what data goes into it, which gauges uncertainty. When meteorologists predict the week’s weather for Chicago, they run the current weather data through different models dozens of times. Do all of the models agree on two inches of snow by Friday, or do some say two inches and some say two feet? The more times they run a model, the better they can estimate just how uncertain they are about the forecast. Similarly, climate scientists can both improve their models and better understand how accurate their models are.
Finally, improvements in supercomputers tend to trickle down into the consumer’s hands. For example, all supercomputers depend on parallel processing: breaking a task into many smaller ones that can be performed simultaneously. Today, almost all laptops, PCs, and smartphones run on this principle, dividing tasks between two cores.
The research and planning that go into the next generation of supercomputers like Mira help advance the field of computer science, making computers faster and more energy-efficient.
But for the hundreds of scientists who will use it to tackle some of the biggest scientific questions in the world, Mira’s contribution is just beginning.
This story was originally published in volume 6, issue 1 of Argonne Now, the laboratory’s biannual science magazine.