Supercomputer refers to a computer that has a high level of computing performance compared to a general purpose computer. Its performance is measured in the operations of floating-point per second instead of million instructions per second. There exist supercomputers that are able to perform up to quadrillions of floating-point operations. Current supercomputers run on the operating systems that are based on Linux. They do play an important role in the computer science field and are useful in various computationally tasks in fields like weather forecasting, oil and gas exploration, quantum mechanics, climate research, molecular modelling, and physical simulations.

Supercomputers were initiated in 1960s, and for many decades the fastest supercomputers had been made by Cray Research, Seymour Cray at Control Data Corporation and subsequent companies bearing monogram or his name. The first one to be initiated were greatly tuned conventional designs which ran faster than the more general purpose computers. Through 1960s, the developers started to add increasing amounts of parallelism with typical being one to four processors.

Through 1970s, the vector concept of computing with specialized math units operating on the large arrays of data dominated. An example of such a supercomputer is Cray-1 which was built in 1976.  The vector computers remained the dominant design into the 1990s.  From that time until today, massively parallel supercomputers with tens of thousands off-the-shelf processors has become the norm. They are also essential in the field of cryptanalysis. The ranking of the supercomputers is based on the how fast supercomputers execute the high performance linpack benchmark. In this paper, we are going to discuss about the Barcelona Supercomputing Center- MareNostrum 4, as the 13th top supercomputer as follows;


Thirteenth Ranked Supercomputer

The 13th ranked top supercomputer is the Barcelona Supercomputing Center named MareNostrum 4. This center is known as the leading Centre of supercomputing in Spain. It specializes in high performance computing and its mission is to offer services and facilities of supercomputing to Spanish and European scientists, and to create technology and knowledge to be transferred to the society.

The MareNostrum 4 has been named as the most interesting supercomputer in the world. Its total speed is to be 13.7 Petaflops. It has two separate parts which includes, a block of general purpose and a block featuring the emerging technologies. It has a storage racks with the capacity to store 14 million Gigabytes of data. And a high speed network of the Omnipath connects all components of the supercomputer to each other.

The general purpose block of this supercomputer contains 48 racks with about 3,456 nodes. Every node contains two Intel Xeon Platinum chips, every chip has 24 processors, this amounts to a sum of 165,888 processors together with a main memory of 390 Terabytes. It has a peak performance of 11.5 Petaflops.

On the other hand, the block of the emerging technologies is formed of clusters of three distinct technologies which is to be updated and incorporated as they are presented to the market. The technologies are being developed in the Japan and United States currently so as to speed up the arrival of the pre-exascale supercomputers which is the new generation system. They include the following:

  • Cluster comprising of NVIDIA Volta GPUs and IBM POWER9, with a computational capacity of more than 1.5 Petaflops. The IBM and NIVIDIA is going to use these processors for the Sierra and Summit supercomputers that the United States department of energy has ordered for its Oak Ridge and Lawrence Livermore national laboratories.
  • Cluster formed of Intel Knights Hill processors, with a computational capacity of 0.5 Petaflops. They are the same processors as the ones ordered by the department of energy in United States which are used in the Aurora and Theta.
  • Cluster is made up of 64-bit ARMv8 processors in a prototype system with a computational capacity of over 0.5 Petaflops. These are similar to those used in Aurora and Theta. It is a cluster that is going to use the cutting edge technology of the Japanese supercomputer Post-K.

The main purpose of incorporating the emerging technologies into MareNostrum 4 is to enable Barcelona Supercomputing Center to carryout experiments with what are expected to be the most advanced technological developments over the coming years and evaluate their suitability for future iterations of MareNostrum 4 supercomputer.

MareNostrum 4 contains a disk storage capacity of 14 Petabytes and is connected to the big data facilities of BSC, which has a sum of 24.6 Petabytes. It is used for research projects on gravitational waves, change in climate, simulations relating to the production of fusion energy and new radiation treatments to control cancer.

The MareNostrum’s computing nodes tend to communicate primarily through a low latency InfiniBand FDR10 network and a high bandwidth. The Mellanox 648-port FDR10 Infiniband Core Switches and the fiber optic cables interconnects the different nodes. Furthermore, there is a more traditional local area network consisting of adapters of the Gigabit Ethernet.

The 3rd cluster of MareNostrum 4 supercomputer is going to have 64-bit ARMv8 processors that will be provided by Fujitsu in a prototype system, which is designed to use similar processors the one Fujitsu is developing for a modern Japanese system to supplant the current K supercomputer. It also have to offer the peak performance of more than 0.5 Petaflops.

Overview of MareNostrum 4 Supercomputer

The limitation that existed before the initiation of the supercomputer is that, it was difficult for one to carry out a significant amount of research and substantial number of projects that required large capacity for handling and calculation of the data. Research process was very expensive and companies and organizations took a lot of time to complete a single project. Hence, this motivated Barcelona Supercomputing center in collaboration with other scientists from other nations to come up with a more powerful supercomputer that will be used to create simulations and models, and to organize vast quantities of information that is generated by studies in all scientific areas.

Before this system, the existed supercomputers that had a peak performance of about 1 Petaflop and used more power than the one used by the current system. People also took more time to complete projects. Furthermore, other supercomputers had a peak performance of 42 teraflops and used 640 kilowatts of power.

The MareNostrum 4 supercomputer is more than 10 times faster and uses only 30% more power. It provides the capacity to perform 11.1*105 operations per second to scientific production. This supercomputer is used for applied and basic research, and have the ability to perform large calculations, simulate, analyze and execute large amounts of data. Most disciplines of science like astrophysics, biomedicine, material physics employ the use of this supercomputer, even used in the industries and engineering.

A part from increase in power, the Maresnostrum 4 supercomputer is going to have distinct technologies and emerging architectures that will enable people to work in one way or another depending on the requirements of every project. And this way, it is going to be able to analyze those elements that are yet to be represented to the market and determine which ones are best suited to ones requirements.

The supercomputer is to be used to power diverse science from research of the human genome, biomechanics and bioinformatics, composition of the atmosphere and weather forecasting. It is used for applied and basic research as it is able to carry out long calculations, execute simulations of large-scale and analyzes massive amounts of data.

The new achievement for this system is that, the developers were able to acquire a general purpose supercomputer that is able to execute all types of engineering and scientific tasks, and able to provide itself with clusters built with emerging technologies. The clusters are being used to serve the requirements of the user and to allow the center to test and analyze the performance of the recent developments in the field of supercomputing.

With incorporation of the emerging technologies into this supercomputer, people are able to operate with what are expected to be some of the most state-of-the-art developments of the coming years and to test if they are suitable for the future MareNostrum versions. Emergence of MareNostrum have allowed new phase of the Partnership for Advanced Computing in Europe, European supercomputing in which Spain is going to be able to maintain its status of core member alongside other nations.

With this supercomputer, the center was able to create models and simulations and manage vast quantities of information provided by studies in the scientific areas. They were also able to easily analyze the performance of the most recent developments in the supercomputing fields to achieve more advanced supercomputers. Hence, this system provided them with a room for the improvement of the existing systems.
It heralds a new era in the accessibility of massive computing power to provide answers to the world’s most vexing questions. This accessibility of supercomputing power marks a watershed event.


Kernel CG in NAS PB Application

In this application, the time of computation is mostly spent for matrix-vector multiplication with large sparse matrix. Furthermore, the two-dimensional block-block mapping are to be selected so as to be used for dividing the matrix being targeted. Because elements that are non-zero are about to be distributed uniformly at random in the matrix, the mapping can keep load balancing among processing units. The overhead of communication can be optimized to select appropriate configuration for the number of processing units.

Though, the length of vector also varies according to the configuration, and the performance of the inner-processing unit is degraded when the average length of vector is too small. Because the code generated by the compiler have been modified for the best CVL, the outcome is not regular based on the policy of NAS Parallel Benchmarks. However, it indicates that the potential performance of MareNostrum supercomputer system, and its believed that this performance can be achieved by some enhancement of the PVP compiler.

Processing Power’s Petaflops

Supercomputer system of MareNostrum provides processing power of 11.1 Petaflops; this represents the capacity to perform operations per second to the scientific production. It is the capacity of the general-purpose cluster, the most powerful and the largest part of the supercomputer that is going to be raised. The capacity of 11.1 Petaflops is ten times greater than the one of MareNostrum 3, which was installed between 2012 and 2013.

The block of emerging technologies in the supercomputer has a purpose of evaluating the technologies currently under development to find out the best options for future updates of the supercomputer and therefore to be able to achieve the highest possible performance when the time comes.

MareNostrum as an SIMD

MareNostrum it is a SIMD, the word SIMD refers to the Single instruction, multiple data, it is a class of parallel computers. It describes computers with multiple elements of processing that perform the same operation simultaneously on the multiple data points. Thus, this system exploit data level parallelism, but not concurrency; there are parallel computations, but only a single instruction at a given moment.

The SIMD processor enables the two blocks in the supercomputer to be able to understand information to be in the blocks and also in this supercomputer system, a number of values can be loaded all at once. This enables the system to be able to perform its operations at the highest rate. All the processors of MareNostrum, execute the same instructions at the same time, no synchronization between processors will be required.


In this paper, we discussed the architecture and the performance of the Barcelona Supercomputing center; the MareNostrum 4 system as the 13th top supercomputer. It is made up of a main system for general-purpose use based on the traditional Xeons, together with three new emerging technology clusters that are based on the NVidia and IBM power, ARM-based computing and Xeon Phi. This system was developed in Barcelona Supercomputing center, this center specializes in high performance computing and its mission is to offer services and facilities of supercomputing to Spanish and European scientists, and to create technology and knowledge to be transferred to the society.

The block of the emerging technologies is formed of clusters of three distinct technologies which is to be updated and incorporated as they are presented to the market. This block of emerging technologies it is to be used to evaluate the technologies currently under development to find out the best options for future updates of the supercomputer and therefore to be able to achieve the highest possible performance when the time comes.



MareNostrum – iDataPlex DX360M4, Xeon E5-2670 8C 2.600GHz, Infiniband FDR | TOP500 Supercomputer Sites”. Retrieved 2016-10-08

Michael J. Moller, (2017). Barcelona Plans the World’s most diverse Supercomputer.

Ibaroudene, Djaffer, (2008). “Parallel Processing, EG6370G: Chapter 1, Motivation and History.” St Mary’s University, San Antonio, TX.