Version française prochainement disponible
By Kohji Ootsu
IT operations leaders increasingly look at no-operations IT environments, or NoOps, as a way to maintain system stability while responding to the increasing complexity of management, labor shortages, and demands for cost reduction.
A NoOps approach positions machines to do what machines can do as much as possible, while shifting peoples’ time and energy to higher-value work.
Operations leaders who advocate for NoOps say it enables a system environment that’s stable—even during heavy load times—24 hours a day, 365 days a year. NoOps minimizes the repeated manual work that tends to occur in release procedures, patching, release monitoring, standby system maintenance, and other tasks. This approach can also drive reductions in operating and labor costs.
With an effective NoOps model in place, even if the scale or complexity of the system increases, higher operational loads and related expenses do not necessarily follow.
So, how do you move from a more traditional IT operating model to a NoOps approach? Start by taking inventory of your team’s day-to-day work.
1. Shift human resources to high-value-added work
Start by classifying and visualizing the operational and technology challenges for your department or company.
One classification method I have seen companies use is to map work across six categories.
- The organization as a whole
- Human resources
- Governance
- Processes
- Technology
- On the axis of the nature of work involved: management or creation
By confirming status across the categories, you get a clear visual of what may need attention—such as talent shortages, budget, or other constraints—in order to move to a NoOps model.
The visualization also helps align your team on the tasks that most likely can be shifted to machines—for example, through robotic process automation (RPA) or infrastructure as code (IaC). These tasks should be shifted as much as possible, allowing human resources to use time on higher-value tasks.
Proof of concept: We used this approach with a life insurance company and found an opportunity to automate some of the work requests submitted to the system operation department. The automation drove a 58% reduction of manually-generated work requests.
2. Leverage data to enhance IT operations
Whether or not your company already has already automated aspects of IT operation, I suspect there's opportunity to expand how data and AI are used—perhaps for troubleshooting procedures or failure prediction.
For example, data such as incident tickets, change requests, and data obtained by monitoring systems can be stored in a data lake. The data can be analyzed for trends or insights, as well as used to train AI systems.
Consider a scenario where operational data is used to train AI about past trends in the system. This way, system alerts can be set up to trigger when the AI observes unusual trends—thereby preventing failures from occurring and warning of risks associated with changes to the system.
System alerts can be set up to trigger when the AI observes unusual trends—thereby preventing failures from occurring and warning of risks associated with changes to the system.
Or, in cases when a first-of-its-kind failure occurs, the AI can still recommend response procedures if it has learned the trend of similar failures. Simple failures can even be recovered with AI and automation, without relying on the skills and experience of the operation personnel.
By making advanced use of data and AI, an automotive manufacturer we work with reduced 73% of simple system failures or threshold alerts. The manufacturer also was able to automatically respond to 65% of fault alerts that previously were manually addressed by people.
3. Focus on employee engagement
Day or night, we must respond and resolve the problem when an IT operations failure occurs. But with a NoOps approach, the need for continuity alone does not dictate staffing requirements.
With NoOps, IT operations are increasingly managed on dashboards. Work flows on a ticket basis instead of through person-to-person exchanges. Rather than performing modifications manually, changes can be incorporated by IaC code editing and release—with automated tests at release times.
The machines do what machines can do, while IT operators take on tasks that require higher-level skills.
Code development skills become essential in operations based on IaC. AI and agile methodology also present new ways of working. From what I have seen with customers, efforts to promote and support reskilling not only benefit the operations but can also lead to improved employee engagement.
Next-generation IT operations
Most large enterprises today have IT operations composed of multiple clouds and applications from many vendors. I believe a NoOps approach built on extensive system automation presents a reliable, efficient way to manage across these disparate systems.
Kohji Ootsu is the Director of Consult Partner in Japan