Bank job: How Nedbank took 3 800 servers out of Sandton
I’ve always imagined that when I was finally invited to step through the heavily armoured door of a bank vault, I would step out onto a sea of money, wade my way over to jewel island place my bum on gold bullion chairs, from which I would sip champagne from a silver tankard. Instead, the first time I walked though a comedically large vault door it was into a more-than-half empty computer room where hundreds of servers plough through some 6 000 transactions a second coming in from all over the country and, indeed, the world.
I console myself with the idea that since more than nine-tenths of the world’s wealth only exists as 1s and 0s on a hard drive somewhere, this room may well have more value than my imagined money cave. It’s certainly a lot cleverer, as I’m about to find out.
This big, spacey room is approximately one third of Nedbank’s computing power. It contains the machines that record every single withdrawal and purchase made using an Nedbank account or cash machine, working with live data flowing to and from bank tellers, ATMs, investment consoles, debit card readers and so on. Directly below us is an almost identical room which is used for load balancing and back-up in case the servers on this floor fall over. The extra capacity is especially useful around tax return time and paydays, I’m told.
Somewhere north of here there’s yet a second mirror for off-site back-up and storage of the daily tape drives that contain a record of all transactions.
I’m told by Kobus Rheeder, a consulting engineer who is overseeing the complete overhaul of Nedbank’s datacentres, that when a card is swiped the transaction is actually recorded on the off-site back-up milliseconds before multiple copies are written here. It seems counter-intuitive, but is bad news for anyone who’s ever dreamed of pulling off a stunt like the end of Fight Club where the computers get destroyed and all debt is wiped off. By saving to the emergency cache first, if anything did happen to the operational machines the failsafe servers are always the most up-to-date, so if they are called into action absolutely nothing is lost.
Why am I here, though? It’s because a couple of months ago I picked up a press release from Nedbank about its ambitious internal power consumption targets and I expressed slight cynicism – I know, out of character or what? – about a claim that it could reduce CO2 output by the equivalent of taking 1 800 cars off of road simply by installing a power management app on its employee’s PCs.
Needless to say, the folk at Nedbank wanted to prove me wrong. And they did. Not only did they run me through the basics of that energy saving desktop program, I was invited to tour its Sandton datacentres and control room to see an environment-saving work in progress. The only restriction is that I’m not allowed to take photographs while inside.
Not that there’s much the bank can fear by showing it off: the data centre in the bank’s Wierda Valley compound was built in the early 80s, in full knowledge that it could be a target for anti-apartheid protests and designed to stay standing in the event of pretty much anything that could be thrown at it. The office complex has grown up around the original datacentre – one floor of which is above ground and one below – and new buildings literally cocoon the old. But the server rooms themselves remain structurally independent of the later extensions and locked away behind double vault door security: a bomb and rocket proof concrete heart through which all the bank’s business flows in a truly sanguine manner.
Everything has a fallover and failsafe too: there’s a half a million rands-worth of fire-extinguishing FM2000 gas waiting to flood the room in the event of a burning circuit, backed up by water sprinklers should it appear that the building is in danger. Every motherboard in here has two processors on-board – one for using and second for redundancy. There are two separate power rings which come in from opposite sides of the building, dozens of lead-acid batteries in case they should both go down and three enormous 1 850Kva diesel generators that are kept hot and ready to run in sound-proofed rooms next door.
If it gets to the stage that all three generators have run out of fuel and the power isn’t back on, jokes Rheeder, the chances are the operational status of a single bank will be the last of humanity’s worries. Something, he says, will have gone very wrong outside.
Unlike most datacentres I’ve been into before, the two floors of Nedbank’s operational centre are almost as warm as the air outside. They’re quiet too, and dark – the lights are triggered by motion sensors as you enter the room. The reason is fairly simple – they don’t have heavy duty aircon units blowing a hurricane all the time. A traditional datacentre is cooled to almost freezing temperatures by noisy air-conditioning units, in the understanding that when computer processors overheat they stop working, and the warmth generated by processors under load has to be got rid of fast. Rheeder explains that while that’s still true, it doesn’t mean that all the air in the room has to be cooled to quite the extremes we previously thought.
“Currently we’re keeping the atmospheric temperature somewhere between 16 and 22 degrees,” he says, “We can go a bit higher, but the margin for error then becomes very small.”
That may not be strictly true, however. In Belgium, Google reckons it can keep its datacentre running at temperatures of up to 35 degrees, suggesting modern servers are a lot more resilient than we imagine. It’s still an area where huge savings can be made, says Rheeder – if the air outside the server room is three degrees cooler than the target for inside, there’s no need to cool it further, a concept known as ‘free air cooling’ and in use in more than half the datacentres in the US already.
Up until 18 months ago, Rheeder explains, there were considerably more machines filling these rooms. That was before a massive virtualisation project in which several physical servers are replaced by a single one running multiple virtual machines. Thanks to this, the bank was able to remove somewhere in the region of 3 800 servers, leaving about a fifth still in place.
Those that remain are mostly arranged in blocks of ‘hot aisle containment’ units. These are large walk-in cabinets with sealed doors on the front to trap hot air inside. Server towers are arranged so that the exhausts all point inwards, and large vents on the top of the cabinets draw the hot air outside without giving it the chance to warm up the rest of the room. Cold air is blown up from the floor, making use of natural convection to improve cooling efficiency.
The biggest problem, explains Rheeder, is the amount of excess space outside these units. With so many racks removed, controlling physical airflow around the room becomes tricky. Still, he says, it won’t be that way forever.
“We have to keep the space to expand into,” he says, “Typically in a datacentre like this you expand as you need more processing power, then reduce the numbers of servers as processors get more powerful. We’re back to the start of that curve now.”
The typical measure of a datacentre’s use of energy is the Power Usage Efficiency (PUE) co-efficient, which measures the amount of power used for computing as a ratio against the amount lost to cooling, lighting and so on. Before the internal redesign, Rheeder says, Nedbank’s PUE was around 1.9, which meant for every 1W of energy used for computing, almost an entire watt was being used for other purposes. For Nedbank, that figure is down to around 1.5 now, he says and will fall closer to a PUE of 1 once a new cooling system is fitted next year.
At the moment, cooling is provided by a couple of large HVAC units in towers outside the main server room, while the ventilation system inside is brought down to a reasonable ambient temperature via water cooling. The new system for chilling external air as its bought into the building will replace traditional air-con with a cold water filter system of a type similar to the one introduced by Facebook at its Oregon server farm.
Make no mistake – while the organisation does have a policy governing environmental targets for cutting down on energy use, this is still a bank and there’s a strong financial incentive too. Eskom tariffs are unlikely to fall in the future, and all of these plans – including one to begin installing solar panels on its Sandton HQ next year – are designed to save money as well as comply with the CSR program.
Kevin Kassel, Nedbank’s general manager for IT and services, says that many of the investments the bank is making will be repaid within 12 months. That includes the time spent developing that desktop app, and the accompanying company-wide control panel to monitor the power state of employee’s computers. The desktop app turns out to be intriguing – not only is it more efficient at powering down PCs than Windows’ built-in controls, it’s essentially a way of dropping a machine into a hibernate state without losing control of it.
The problem Kassel’s team faced is that reducing power consumption from PCs in the office isn’t as simple as turning machines off at night, explains Kassel, because banking regulations require access to certain services at all times. Also, since maintenance cannot be carried out during the day, PCs have to be left on at night so that security updates and software patches can be applied remotely. Employees don’t have to use the tool on their PC, but live monitoring shows them how much energy their PC is using and thus encourages them to be more aware.
And the claim about taking 1 800 cars off the road? Apparently it’s all true. The live dashboard compares current power usage across the employee cubicles – not including the datacentres – and compares them to a reading from three years ago. So what do I know?