ARN

Evolution of Internet powers massive particle physics grid

Inside the network that will help scientists discover the origins of the Universe

If you're a fan of particle physics (and really, aren't we all?), by now you know scientists are on the verge of opening the Large Hadron Collider, which will use ultra-powerful magnets to race proton beams around a 17-mile circular underground tunnel and smash them into each other 40 million times a second.

Besides being awesome, these collisions will produce tiny particles not seen since just after the Big Bang and perhaps will enable scientists to find the elusive Higgs boson, which -- if theories are correct -- endows all objects with mass. The Large Hadron Collider may also help scientists figure out why all the matter in the universe wasn't destroyed by anti-matter, which would have been inconvenient for those of you who enjoy residing in a universe that isn't a great vacuum devoid of life.

Perhaps just as complicated as answering these questions of origin, however, is setting up a worldwide network capable of distributing the mountains of data produced by the seemingly infinite number of particle collisions. The Worldwide LHC Computing Grid was set up to perform this task. Data will be gathered from the European Organization for Nuclear Research (CERN), which hosts the collider in France and Switzerland, and distributed to thousands of scientists throughout the world.

One writer described the grid as a "parallel Internet." Ruth Pordes, executive director of the Open Science Grid, which oversees the US infrastructure for the LHC network, describes it as an "evolution of the Internet." New fiber-optic cables with special protocols will be used to move data from CERN to 11 Tier-1 sites around the globe, which in turn use standard Internet technologies to transfer the data to more than 150 Tier-2 centers.

"It's using some advanced features and new technologies within the Internet to distribute the data," Pordes says. "It's advancing the technologies, it's advancing the [data transfer] rates, and it's advancing the usability and reliability of the infrastructure."

The data is first produced in the collisions which occur in caverns 100 meters underground. If all goes according to plan, the first proton beams will be injected into the LHC around mid-June, and will start smashing into each other about two months later.

When proton beams collide and produce new particles, data will be read from 150 million sensors and sent to a counting room where signals are filtered. The interesting data, or "raw data," is what remains, according to CERN.

(Read CERN's description of how the grid network operates here.)

Raw data is sent over dedicated 10Gbips optical fiber connections to the CERN Computer Centre, which is known as "Tier-0" in the LHC Computing Grid. Here, raw data is sent to tape storage and also to a CPU farm which processes information and generates "event summary data." Subsets of both the raw data and summaries are sent to the 11 Tier-1 sites, including Brookhaven National Laboratory and Fermilab.

Page Break

Each of the 11 Tier-1 centers are connected to CERN via a dedicated 10 gigabit per second link, and the Tier-1 centers are connected to each other by a general purpose research network. Each Tier-1 center receives only certain subsets of information. Brookhaven, for example, is dedicated to ATLAS, one of several large detectors housed at the LHC, while Fermilab handles data from the CMS (Compact Muon Solenoid) detector.

The Tier-1 centers are responsible for reprocessing raw data, which is then kept on local disk and tape storage and distributed to Tier-2 centers, which are located in most parts of the world.

Tier-2 centers are connected to Tier-1 sites and each other by general purpose research networks, such as the US Department of Energy's Energy Sciences Network. Tier-2s are located mainly in universities, where physicists will analyze LHC data. Ultimately, about 7,000 physicists will scrutinize Large Hadron Collider data for information about the origins and makeup of our Universe, according to CERN.

The LHC collisions will produce 10 to 15 petabytes of data a year, says Michael Ernst of Brookhaven National Laboratory, where he directs of the program that will distribute data from the ATLAS detector. Brookhaven, as a Tier-1 site, will be responsible for filtering data so it can be easily readable by scientists located at the more numerous Tier-2 facilities, Ernst says.

Brookhaven has about 1,200 multicore x86 servers dedicated to the LHC, along with disk and tape storage that holds seven petabytes of data. Ernst says Brookhaven will have to scale that storage up significantly by 2012, when he expects to be storing 13 petabytes of Large Hadron Collider data.

Worldwide, the LHC computing grid will be comprised of about 20,000 servers, primarily running the Linux operating system. Scientists at Tier-2 sites can access these servers remotely when running complex experiments based on LHC data, Pordes says. If scientists need a million CPU hours to run an experiment overnight, the distributed nature of the grid allows them to access that computing power from any part of the worldwide network, she says. With the help of Tier-1 sites such as Brookhaven, the goal is to make using the grid just as easy for universities as using their own internal networks, according to Pordes.

Asked if the LHC project is the most complicated thing he's ever worked on, Ernst gave a quick laugh and said, "Yeah, I would say so."