{"id":9530,"date":"2024-03-13T18:35:41","date_gmt":"2024-03-13T16:35:41","guid":{"rendered":"https:\/\/www.retarus.com\/blog\/en\/everything-you-always-wanted-to-know-about-data-center-upgrades-but-were-afraid-to-ask"},"modified":"2024-05-07T19:20:43","modified_gmt":"2024-05-07T17:20:43","slug":"everything-you-always-wanted-to-know-about-data-center-upgrades","status":"publish","type":"post","link":"https:\/\/www.retarus.com\/blog\/en\/everything-you-always-wanted-to-know-about-data-center-upgrades\/","title":{"rendered":"Everything you always wanted to know about data center upgrades (but were afraid to ask)"},"content":{"rendered":"\n

In the week of March 4th to 8th, 2024, a task force was dispatched from Munich to refurbish Retarus\u2019 SEC1 data center in Secaucus, New Jersey, upgrading it to state-of-the-art EVPN\/VXLAN network technology based on the Arista 7050X3 and 7280R3 series as well as installing fiber-optic cabling. All of this during ongoing operations. And best of all: Our customers didn\u2019t even notice it happening.<\/p>\n\n\n\n

This kind of \u201copen-heart surgery\u201d of course calls for meticulous planning and precision. Uwe Geuss, our Director Operations and a key member of the task force, has written up the following report:<\/p>\n\n\n\n


\n\n\n\n

EVPN\/VXLAN \u2013 Exchanging switches during ongoing data center operations <\/h2>\n\n\n\n

Switches are the beating heart of every data center, as they direct data traffic and ensure that information flows smoothly between the various components. Upgrading this infrastructure is of critical importance when it comes to keeping pace with continually advancing technologies and maximizing data center performance.<\/p>\n\n\n\n

Why are we replacing our network infrastructure?<\/h2>\n\n\n\n

The rapid evolution of technologies, escalating network performance requirements and a demand for higher capacity are just a few of the reasons we decided to upgrade our switch infrastructure. The benefits of employing cutting-edge, higher performance Arista switches<\/a> and EVPN\/VXLAN technology include achieving a higher band-width, lower latency and much greater flexibility.<\/p>\n\n\n\n

 7050X3 Series<\/strong><\/td><\/tr><\/thead>
Description<\/strong><\/td>Arista 7050X3 Series fixed configuration leaf and spine switches<\/td><\/tr>
Switching Throughput<\/strong><\/td>6.4 Terabits\/sec<\/td><\/tr>
Maximum Forwarding Rate<\/strong><\/td>2 Bpps<\/td><\/tr>
40\/100G Interfaces<\/strong><\/td>Up to 32<\/td><\/tr>
10\/25G Interfaces<\/strong><\/td>Up to 128<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n

Step 1: Planning is everything<\/h2>\n\n\n\n

Before we commenced physically exchanging the switches, it was essential to plan the procedure thoroughly. This involved analyzing the existing infrastructure in the data center, assessing the customer traffic and scheduling the steps required for exchanging the components. Setting up a detailed plan minimizes downtime and ensures a smooth transition, which is of crucial importance for the satisfaction of our customers. <\/p>\n\n\n\n

The configuration of the network devices had already been prepared in advance using Ansible, ensuring that the transition itself could be achieved in a way that was uniform, quality-checked multiple times and automated as far as possible.<\/p>\n\n\n\n

The actual work steps and tasks were preceded by a PoC phase lasting 1.5 years, in which several manufacturers had to meet Retarus\u2019 demanding requirements for the new network infrastructure.<\/p>\n\n\n\n

Step 2: Exchanging the switches<\/h2>\n\n\n\n

Replacing the switch infrastructure on site is a complex process requiring meticulous coordination. It comprises migrating the servers, renewing network cards in existing systems, installing the new switch hardware, configuring the system and physically removing the old switches. The transition to the totally different EVPN\/VXLAN network technology<\/a> involved wide-ranging adjustments and reorganization on the logical layer of the network. At this stage, a highly experienced team of experts from the fields of networks, infrastructure services, application management and data centers played a key role in ensuring everything ran smoothly.<\/p>\n\n\n\n

Step 3: Testing, testing and more testing<\/h2>\n\n\n\n

Once the infrastructure has been replaced completely, comprehensive testing is of crucial importance. By simulating various scenarios, it is possible to ensure that the new switches fulfill all the requirements and function reliably in productive operations. This step minimizes the risk of errors and outages in regular operations.<\/p>\n\n\n\n

Step 4: Documentation and training<\/h2>\n\n\n\n

Detailed documentation of the new setup is essential for simplifying future maintenance tasks. Furthermore, the IT Operations team members need to be trained accordingly, so they become familiar with the new infrastructure and are able to react quickly should the need arise.<\/p>\n\n\n\n

<\/div>\n\n\n\n

Regarding the first two steps, I\u2019d like to go into more detail and describe the activities a bit more closely.<\/p>\n\n\n\n

Planning<\/h3>\n\n\n\n

The planning was divided up into two distinct areas \u2013 server hardware and network.<\/p>\n\n\n\n

With regard to server hardware, it was necessary to determine which systems had to be relocated physically in the data center and which network cards would need to be replaced due to the new requirements. <\/p>\n\n\n\n

In addition, we specified via which switch and to which switch port each of the servers would be connected in future. This crucially enabled us to prepare the switch configuration in advance.<\/p>\n\n\n\n

At the same time, the order in which the systems would be updated was defined, because in regular operations we are always only able to disconnect a small portion of the devices from the network.<\/p>\n\n\n\n

Last, but not least, we were able to use this information to plan the schedule for the expert task force and supporting staff. For each system, an application manager was first required to suspend the services, after which an infrastructure engineer had to carry out changes on the operating system level. In the data center, the server could then physically be rebuilt. Subsequently, the system was booted up again, reconfigured, brought back into operative service and checked.<\/p>\n\n\n\n

The network planning, on the other hand, focused on all activities which physically and logically needed to be undertaken on the network infrastructure before, during and after the physical rebuild.<\/p>\n\n\n\n

We proceeded as follows:<\/p>\n\n\n\n