Subscribe

Data at centre of Gaia mission to map the Milky Way

Paula Gilbert
By Paula Gilbert, ITWeb telecoms editor.
Johannesburg, 25 Apr 2018
The Gaia mission plans to create a 3D map of a billion of the 100 billion stars in our galaxy.
The Gaia mission plans to create a 3D map of a billion of the 100 billion stars in our galaxy.

Creating a 3D map of a billion stars in our galaxy is a massive project, and right at the centre of the mission is data, a lot of data.

The project, called Gaia, is being run by scientists at the European Space Agency (ESA) and has the ambitious mission to chart a three-dimensional map of the Milky Way. It was launched in December 2013, and by taking high-precision measurements of the positions and motions of stars in the Milky Way, it will answer questions about the origin and evolution of our home galaxy.

The scientists, however, depend on reliable access to massive quantities of data to explore the universe and share their discoveries with the world. Ultimately, Gaia will catalogue one billion of the 100 billion stars in our galaxy. In the process, it will produce 10 000 times more data than previous missions, processing an average of 70 billion observations daily. The full Gaia catalogue is expected to hold more than two petabytes (PB) of data.

The biggest challenge the mission faces is the processing of the huge quantities of data it is generating, and it has enlisted the help of NetApp and its storage and data management solutions to assist with this.

The storage and processing of data would have been much more difficult if it weren't for the major advancements made in recent years in big data analysis, cloud computing, machine learning and artificial intelligence (AI).

Every day, ESA receives massive volumes of raw telemetry data from its spacecraft and observatories. That data must be stored and processed before it can be archived or shared. Scientists across Europe depend on ESA's daily observations, so the reliability of that data is critical.

"We have a commitment to deliver data to different institutes in Europe on a daily basis. NetApp has given us the confidence that we will meet those requirements," explains Ruben Alvarez, IT manager at ESA.

"It's all about the data. We call our site the library of the universe because we keep the science archive of all of our scientific missions. This is how we allow people to really investigate the universe," adds Alvarez.

ESA expects to publish the full Gaia catalogue in 2020, making it available online to professional astronomers and the general public. Interactive, graphical interfaces will make it easy for anyone, anywhere to access the full catalogue and explore our galaxy in 3D.

AI advantage

In order to analyse this 'library of the universe', AI and specifically machine learning needs to play a pivotal role, according to Morne Bekker, country manager at NetApp South Africa.

"[Machine learning] is critically important for space exploration for a few reasons. Firstly, it is impossible for scientists around the world to articulate their knowledge under one umbrella as well as automate tasks, and secondly, machines are excellent learners. They can turn data into assets, allowing scientists to accelerate innovation and achieve superhuman performance, drive efficiencies, create insights and even aid new research developments," he says.

Bekker says the capabilities for AI and machine learning in the processing of mass amounts of data are far-reaching.

"Not only does it equate to extreme performance, but also to massive non-disruptive scalability where scientists can scale to 20PB and beyond in a single namespace, to support the largest of learning data sets. Importantly, it also allows scientists to expand their data where needed."

NetApp enables this by building a data fabric for hybrid cloud that connects resources and allows data management, movement and protection across internal and external cloud services.

"From a space exploration perspective, this has assisted with the challenge where data from every mission needs to be indefinitely accessible so that future scientists may continue their exploration of the universe using historical data."

NetApp used artificial intelligence and machine learning to constantly analyse and provide consistent insight across the data centre, so scientists could monitor and manage hybrid IT multivendor storage, compute and networking infrastructure.

Star shine

The mission's first data release in 2016 contained positions of more than a billion stars plus distances and motions of a subset of two million, and has already generated hundreds of scientific publications.

Today is D-day for the second data release which will take the census of our galaxy to an entirely new level, including three-dimensional positions and two-dimensional motions of more than 1.3 billion stars, as well as their brightness and colours. The data will be revealed at a media briefing at the ILA Berlin Air and Space Show in Germany today.

Based on 22 months of observations, the second release of Gaia's data contains the position in the sky and brightness of almost 1.7 billion stars, as well as measurements of the parallax and proper motion of 1.3 billion stars. It also includes a wide range of additional information, including the colours of 1.38 billion stars; the radial velocities of over seven million stars; and an estimate of the surface temperature for 161 million stars.

Closer to home, the new data set also contains the position of over 14 000 solar system objects, mostly asteroids, based on more than 1.5 million observations.

Expectations for second data release.
Expectations for second data release.

Share