http://www.irishtimes.com/newspaper/ireland/2012/0628/1224318887148.html
A complex computer system has spectacularly crashed with spectacular consequences. I’m sure that there are IT failures of this type very regularly. This one has received public attention for obvious reasons.
Reaction to it, however, has at times seemed crazy, especially the demands and deadlines for it to be fixed. There seems little appreciation of engineering reality. There is moreover the likelihood that senior managers share this blinkered perspective and have taken decisions based on “best practice” in financial and administrative terms.
It is completely daft to demand that it be fixed, to try to impose deadlines for its being fixed or to enquire into in isolation. This is a breakdown, a failure, no one knows exactly what caused it and no one knows exactly when it will be fixed.
Here’s a reasonable news article setting out the difficulties: http://www.guardian.co.uk/technology/2012/jun/25/how-natwest-it-meltdown?fb=optOut
In detail it is complex. And, we find ourselves a long way down the road in terms of dependence on these systems. The fundamental issues are however simple. Over the years basic mistakes have been made by senior managers who did not realise what they were doing and were under pressure from competitors. The public was uninvolved because the issue was thought to be too complex for public discussion. That’s how elites take control and that’s often ok when those taking control really do know what they are doing.
Take this from the Guardian article linked above: “This was not inevitable – you can always avoid problems like this if you test sufficiently,” said David Silverstone, delivery and solutions manager for NMQA, which provides automated testing software to a number of banks, though not RBS/NatWest. “But unless you keep an army of people who know exactly how the system works, there may be problems maintaining it.”
Here’s something worth bearing in mind. Despite stunning improvements in testing, anything beyond the most basic software CANNOT be fully tested before it is put into service because the number of variables is too great. That’s why David Silverstone said “sufficiently” and not “fully”. The user runs with it and hopes for the best. “Software maintenance” has always been a risible concept. What it means is that the customer runs the tests as day to day usage and pays the developer to patch whatever is discovered. It’s not a scam; there’s no other way.
The problem now is compounded in that complex programmes are being run in parallel with and on top of older applications. The last couple of decades saw a problematic coincidence. At a time when the overall systems became more complex and more ambitious there was a management fashion to offload not IT operators but real software developers and to buy in “turn-key” applications which may have been modified to give the appearance of bespoke systems. It’s a recipe for profound crashes, and everyone in engineering generally and anyone who has given serious thought to complex systems has been watching it develop over the years.
In most large organisations there are problems such as this waiting to happen. There’s no easy fix at this stage; we are too far gone. Fundamentally wrong management decisions have been made and cannot be undone quickly or perhaps at all.