Five actions every organization should take

 

Two weeks ago, when a cybersecurity software update shut down millions of Microsoft Windows devices globally, the world woke up to a scenario that science fiction writers and policy experts have warned of for decades. In the hours and days that followed, large and small companies and institutions worked to resolve the issues in their IT estates; some didn’t know where to start. As we look forward, many are asking what can we learn? What do we change?

At Kyndryl, our teams immediately mobilized to help our customers recover their IT systems and lessen disruption. Thanks to our AI-powered IT infrastructure insights platform — Kyndryl Bridge — our experts immediately identified the impacted systems that support our customers' business operations. They turned this visibility into insights to prioritize and apply fixes for our customers, efficiently restoring our customers’ operations.

Cyber events can take many forms. Some are caused by bad actors maliciously breaking things, but some can be caused by a bad patch, human error, siloed ways of working, ineffective vendor management and more. This outage highlights the critical importance of being cyber resilient, which means implementing systems and processes that help proactively protect against any cyber event, from a sophisticated cyberattack to issues encountered during software updates.

Based on interviews with top Kyndryl experts, here are five actions businesses can take now to reduce the risk of being a victim of a cyber event, reduce the scale of impact and speed time to recovery if they are impacted despite preparedness.

 

1. The Chief Information Security Officer (CISO) must evolve into the Cyber Resilience Officer (CRO)

Ideally, the CISO-turned-CRO will leverage their expertise in risk management and crisis response to manage and mitigate risks across the organization. The CRO will put in place a strategy that embeds resiliency principles into secure software development and third-party risk management; designs recovery capabilities to support essential business functions; establishes a robust operating environment with fundamental practices like asset management, automated vulnerability and patch management; and integrates functions across silos — unifying security, business continuity and disaster recovery.

 



2. Assess your defense, resiliency and recovery

Cyber resiliency transcends traditional cybersecurity by assuming advanced adversaries can surpass conventional defenses. It also encompasses threats such as supply chain disruptions and defective software updates. Businesses should assess the entire lifecycle, including software development, security, third-party risk management recovery, asset management and automated vulnerability and patch management. Make sure the enterprise platforms are backed up so IT estates can go back to prior environments, if necessary. Does the enterprise need to update and modernize its approach to backup and recovery? Enterprises do the right thing when they have a combination of processes in place to protect operations from a security and cyber resiliency perspective. Having firewalls, patches and antivirus software is important. It’s also essential to know what needs to be backed up and recovered first — the most vital systems of the organization that power critical business processes. Having identified the critical assets to be backed up, the final step is to test the ability to perform a recovery — to validate that the controls put in place work as required. Testing should be automated where possible to provide continued validation of recoverability.

 

3.  Improve real-time visibility, analytics and insights integrated across the enterprise

Being able to observe the overall system status and see servers or individual devices go offline in real-time is one of the most powerful tools companies can have. In addition, observability platforms should be able to visualize multiple layers of service, including devices, total inventory, patch data, backup data, antivirus software and applications. Having knowledge about individual servers is another core component of any company’s security and resiliency preparation. Understanding how applications are deployed across the infrastructure estate enables early awareness of what systems and devices are or may be impacted and how the business is affected. With this knowledge, companies can recover the right systems in the right order — and quickly.



4. Form partnerships with developers to establish greater transparency in testing processes

It’s important for businesses to better understand and trust the automated testing processes that software updates go through prior to releases. Ask: How does the vendor test each device type before updates go out? Trusting those updates that come through live and dynamically multiple times a day is important. This is especially true for “rapid updates,” which historically might have gone through a less rigorous testing methodology, often to speed up deployment. With transparency into these testing processes, this information can ease recovery should there be an unexpected operational impact. The approach to testing as a whole needs to evolve as systems become more complex; the adoption of DevOps/site reliability engineer (SRE) roles is key to this transformation.

 

5. Control what you can controlincluding software protections used on a network

Businesses should consider taking a phased approach to installing updates and configuration changes. If fewer devices receive a faulty update, disruptions can be minimized. Consider whether you can stage different versions between production and non-production environments. Production devices may stay one to two levels behind on software updates, while non-production services may get updates right away. Tagging of devices may help in further policy-related evaluation and processing. This may help mitigate potential risks. Special care should be taken in relation to security technologies that deploy dynamic updates to detection rules used to detect cybersecurity threats. Delaying these updates to perform additional testing may improve operational stability but at the risk of delaying enhanced protection from emerging threats.

 


With a cyber resilience strategy, companies can anticipate, protect against, detect and recover from cyber incidents of all kinds. Enterprises that focus on cyber resiliency and take a few proactive steps will have an advantage.

For more information about taking an orchestrated resilience approach, visit https://www.kyndryl.com/us/en/services/cyber-resilience/incident-recovery/cyber-recovery.