Managing the Modern NOC: 10 Tips to Improve Performance and Efficiency This Year

People setting up the components of a NOC
Prasad Rao

By Prasad Rao

President, INOCAn organizational efficiency and quality control expert, Prasad has 25 years of experience in operations, finance and business development in the IT and manufacturing industries. Prior to INOC, he was director of engineering at RC Machine, where he oversaw the engineering of automotive production machines. He was also co-founder and chief executive of Vidhata Plastics, a pioneering manufacturer of polyurethane shoe soles. A graduate of the Indian Institute of Technology-Madras, Prasad holds an M.S. in industrial engineering from the University of Illinois.

Table of contents

Companies experience many of the same issues when managing a network operations center (NOC). Fortunately, these common problems are very solvable. This guide offers 10 tips for managing the modern NOC that we’ve gathered over more than 20 years of doing just that for enterprises, service providers, and OEMs.

Want to talk NOC? Schedule a free NOC consultation with our Solutions Engineers for a focused conversation on aligning NOC support with your specific business needs.

1. Design an Operational Structure for the NOC

Operational design is the key to unlocking a NOC’s full capability and value. It provides a central framework with informed, documented guidance for each operational decision and action. A look under the hood of any top-performing NOC will reveal such a framework. Most often, it’s the factor that separates truly outstanding NOCs—those that have a measurable business impact—from those that only deliver basic, reactive support.

The first consideration for developing a NOC operational framework is determining how such a framework will be developed. Rather than putting a team together to hash out a framework from scratch, the resulting design will typically be far more robust and effective when an experienced NOC design team starts the development process with a proven framework that can be shaped to fit the requirements of your business and its IT infrastructure.

Whether you’re taking on NOC operational design yourself or putting it in the hands of capable NOC experts, each component should take into account the three Ps:

  • People (team with the right operational and technical skills appropriate for your environment)
  • Process (consistency through a standardized framework such as ITIL)
  • Platform (a single consolidated view for the NOC team and other stakeholders)

Since these three Ps encompass everything that could be impacted by the NOC, this approach ensures nothing is left out of each operation you develop.

tiered ops support model delegates tasks based on skill levels or tiers, helping ensure that lower-cost, less experienced engineers handle simple break/fix type issues, while more complex problems are escalated to more experienced engineers instead of having the same individuals handle everything. It also takes into account SLAs, technology, and urgency, helping to organize and prioritize alarms in order to prevent an overwhelming “wall of red” and help the NOC perform at peak efficiency.

This model is beneficial for a number of reasons.

  • First, it makes more efficient use of senior engineers’ time, keeping their plates clean of routine issues.
  • Second, it helps prevent the burning of more experienced employees by forcing them to work beneath their abilities.
  • Third, it helps the service team resolve issues more quickly, effectively resolving 65 to 75% of incidents at the Tier 1 level.

When developing specific elements of the NOC operation, your framework should offer well-defined process flows and incorporate tools to support each type of input into the NOC, such as phone calls, emails, and events.

Phone and email tools should focus on helping the NOC achieve desired service levels for response time. Today, with an operational framework that clearly identifies issues and offers processes to work through them quickly and easily, most issues that arrive through phone and email should be handled and initially routed or resolved by Tier 1 NOC engineers—freeing high-tier engineers to focus their attention elsewhere.

Here are a few things to keep in mind when designing processes and tools for handling NOC inputs:

  • Phone calls: The framework needs the capability to identify specific details of a caller’s IT service, such as the services they are signed up for, past service records, and other details that empower the NOC engineer to take action quickly.
  • Events: Multiple alarm screens from different management platforms will hinder the NOC team’s efficiency in diagnosing events. Bringing alarms into a single view, whether in a consolidated Network Management System (NMS) platform or a Manager of Managers, will be of immediate value.
  • Event correlation: Event correlation—the process of analyzing relationships between multiple events to make sense of them—can be handled by a human alarm analyst, a rule-based system, machine learning, or a combination of these. Here, higher-tier engineers can play a short but meaningful role by performing advanced correlations that can then be passed to lower-tier engineers to handle with clear information in front of them. This approach frees higher-tier engineers to work on projects worth their time and expense. Once an event is ticketed, a quick and accurate diagnosis and action plan are needed so swift action can be taken.

Download our white paper for more on setting up an operational framework in the NOC ⤵️

2. Track Meaningful Metrics to Measure Performance and Utilization

Many NOCs track metrics to meet their SLAs with clients, but not all choose KPIs that provide visibility into operations, reflect its size and scale, and clearly demonstrate performance in relation to a set of organization-wide objectives, such as first-call resolution, percentage of abandoned calls, mean time to restore, and the number of tickets and calls handled

Instead, many service providers' clients may be unsatisfied with a NOC’s performance despite meeting SLAs due to a lack of satisfying results.

Even for enterprise NOCs outside the OEM and support provider spaces, a lack of concrete KPIs can negatively affect staff morale. They have difficulty benchmarking their performance and that of their peers, leading to feelings of relentless busyness and falling behind without the reassurance of metrics that quantify achievements.

To remedy this, a NOC must choose the most relevant and meaningful metrics for its environment and evaluate these daily, weekly, and monthly.

Consider tracking and reporting on the following metrics to measure the NOC's utilization and efficiency:

  • Labor content for each edit of a ticket
  • Number of edits processed/performed per hour
  • A heatmap of edits by the time of day and day of week

And consider tracking the following commonly-ignored KPIs to measure outward NOC performance:

  • Time to impact assessment
  • Update frequency

3. Create a Staffing Strategy Customized to Your Situation

Once you’ve established a tiered operational structure and meaningful performance metrics, you can create a data-driven staffing plan.

Assigning each engineer to a tier helps you keep track of the number of employees you have at each skill level, while metrics help you identify key areas and times when you are short on staff.

Together, this data can help you identify:

  • times of the day and week where you need to schedule more staff or can meet SLAs with fewer individuals;
  • what times and levels you need to have more staff, and
  • your historical rate of attrition to calculate how much additional staff you will need to hire per year.

It’s also important to consider days off, such as PTO and holidays, for scheduling purposes to ensure your NOC is always appropriately staffed. Likewise, it’s wise to create a training regime that ensures your engineers are kept up to date.

📄 Read our other guide to learn more about staffing a 24x7 NOC team: Staffing a 24x7 NOC: Costs, Challenges, and Key Considerations

4. Incorporate Industry Standards Like ITIL

Different types of organizations require different outcomes from their NOC. Until recently, this meant incorporating different standards into the NOC framework depending on whether it was designed for an enterprise or service provider type of organization.

But consistency is key to peak performance, and the best way to get it is to implement a standardized process framework like ITIL, MOF, or FCAPS that provides a best practices “playbook” for operationalization and documenting your NOC’s processes, functions, and roles.

Today, the ITIL service framework has gained momentum. ITIL provides significant guidance for developing, maintaining, and improving IT services, which makes it particularly useful for designing any type of NOC operation. ITIL has proven effective in a variety of applications and industries, thereby making the need for separate standards for enterprises and service providers largely obsolete.

ITIL is a widely used framework useful in achieving ISO 20000 certification. It provides best practices for delivering technology support services and allows you to include your organization’s custom procedures under its umbrella of life cycle stages.

To use the framework, get everyone in your organization trained and involved in the process. You might try prioritizing implementing the framework in areas of your operation that challenge you the most before moving on to others to ease into it.

📄 Read our other guide to see some best practices for applying ITIL to your incident management process: 5 ITIL Incident Management Best Practices [+ Checklist] (2022)

5. Put Together a Business Continuity Plan

A business continuity plan (BCP) is a formal plan for the management team to continue operations in the event of an emergency that interrupts service. 

This could be anything from a short-term emergency, such as a regional power outage, to a fire that permanently destroys the NOC facility or a natural disaster that prevents access to the facility for a prolonged period of time.

A BCP should include the following: 

  • An analysis of all organizational threats
  • A list of action items required to maintain
  • Easily accessible contact information for key stakeholders
  • An explanation of where/how personnel should relocate if there is an interruption in operations
  • The steps required to make the backup site(s) operational
  • How all the areas within the organization need to collaborate in executing the plan

BCPs should be rehearsed at least quarterly and regularly, audited for possible improvements, and include failover of all critical assets.

6. Take Measures to Ensure Quality of Service

Maintaining a high-standard quality of service is critical for NOC service providers, particularly in maintaining a positive reputation and retaining customers. To do so, we recommend implementing a quality assurance program.

The key ingredients of a quality assurance program are up-to-date runbooks outlining procedures for handling customer complaints and other consistently carried-out processes, as well as accurate and effective reporting (as we discussed earlier) and monitoring.

Metrics drawn from monitoring activities can be used to identify chronic issues and provide quantitative evidence when the customer complains. Proactive measures, such as staff mentoring, regular audits, and quarterly stakeholder reviews, help identify problems before they worsen.

ITIL Continual Service Improvement (CSI) provides IT organizations with best practices and structures for improving their service and service management processes.

With it, teams can constantly re-examine what’s working and what’s not and make ongoing, incremental improvements to their processes while keeping service aligned with the business’s changing needs. An effective CSI program constantly looks for ways to improve process efficiency and cost-effectiveness throughout the entire ITIL Lifecycle.

📄 Read our other guide to learn about bringing a CSI program to life in the NOC: ITIL CSI: A Guide and Checklist for IT Support and the NOC

7. Integrate Tools and Platforms

Wrangling disparate tools and platforms can quickly create a stressful mess that is not only challenging to use but also difficult to track and report on.

Engineers often find themselves tracking and managing multiple screens for event information, manually collecting information from multiple sources for documentation, notification, and escalation, and then attempting to manage workflow toward service restoration. 

The more convenient, efficient, and less stressful alternative is consolidating all of these tools and platforms into one view: “a single pane of glass.” This includes bringing voice, email, text, customer portals, knowledge bases, documentation, and workflow management tools (and potentially their respective platforms) all into one convenient dashboard.

This can help NOCs not only perform their duties more efficiently but also ensure more accurate reporting and prevent missed SLAs.

Here are a few of our own capabilities as an outsourced NOC support partner that have proven to be massive value-adds for organizations struggling to make their tools work for them, rather than the other way around: 

  • Alarming interface integrations: When monitoring tools are already in place, we integrate downstream of an NMS, EMS, and/or devices through an alarming interface—the mechanism by which your systems tell ours that an event has occurred.
  • Event correlation and ticketing integrations: Once we’ve received an alarm, we employ both human and automated ticket correlation processes to create appropriate incident tickets, problem tickets, and other records, which can be synchronized to the ticketing system for troubleshooting and resolution.
  • CMDB integrations: A seamless CMDB integration ensures our configurations are a perfect match. For each alarm we receive and each subsequent ticket we create, CMDB integration associates the appropriate meta information, arming the NOC engineer with the actionable information they need to make informed decisions. When necessary, we also draw on years of experience to enhance existing CMDB structures and capabilities, further enhancing efficiency and effectiveness.

📄 Read our other guide for a look at some of the common tools used in the NOC and the operational considerations key for each: NOC Tools and Software in 2022: An Operational Perspective

8. Ensure Runbooks Are Up to Date

Poor documentation is the source of many problems throughout ITOps. Without formal processes and procedures, even highly skilled professionals can struggle to achieve consistent desired results when outages occur.

While a common issue, out-of-date runbooks can negatively impact quality assurance, service improvement, and issue resolution while generally impeding a NOC’s performance. To address issues strategically, management must develop comprehensive runbooks and keep them updated as changes impact the NOC and the supported environment.

NOC teams should start by documenting the tools and procedures necessary to deliver quality NOC services for each service in their catalog with the aid of a competent technical writer. Runbooks should be the single source of truth for everyone inside and outside the NOC. 

Need help developing or improving runbooks? We deliver expert-driven runbook development as a professional service and as a core component for our NOC support clients. We work closely with you to understand and document your processes, creating a single source of truth for everyone inside and outside your NOC.

  • Our runbooks lay out the critical inputs that drive NOC service and provide step-by-step procedures for handling them, whether it’s a phone call, email, or event-based notification.
  • Our runbooks also describe the outcomes of these actions, both successful and unsuccessful, with clear escalation paths to other levels of support. These paths can direct action internally or to external third parties. In short, we ensure everything is fully documented and presented for clear, consistent action.

9. Ensure Your Operation Can Scale

Your NOC should be able to scale with your business. Scalability, or planning for an increase in work without compromising on quality of service, is something your NOC wants to consider before business growth affects performance (and results in unhappy customers).

Certain aspects of scalability will likely have been accounted for in your organization’s business plan, such as initial funding, sales and marketing, system build-out, operations support and the business guidance needed to meet the projected growth. However, predictable growth and process planning are often overlooked. 

When planning for growth, consider these factors: 

  • Keep staff utilization below 80% to leave room for growth and give yourself time to hire.
  • Ensure you have a distributed, redundant architecture so you can deploy additional server resources on demand to meet spikes in growth and monitor systems and network capacity closely to ensure they can bear the weight of growth.
  • Build additional capacity into tools to leave room for growth.
  • Adopt a flexible process framework that suits your organization’s needs.

Shared NOC Support and the Economy of Scale

Here at INOC, our shared support model allows for service to scale across a large team of shared resources to meet periods of expected or unexpected demand—a capability that simply wouldn’t be possible in a dedicated support arrangement. This group of shared resources is sized to ensure roughly 65% utilization in order to provide a safe buffer of capacity to handle unexpected spikes in activity. Using company-wide metrics, changes in utilization are reflected in staffing decisions to ensure this balance is maintained at all times.

In short, our shared NOC support model enables organizations to benefit from economies of scale. Rather than being based on the number of resources, the shared support model is based on the number of assets (such as devices) and workload (the expected volume of NOC activity in a given period of time). The shared NOC is a timely and reliable resource pool that is constantly triaging and working through queues containing tickets from many clients. This model is tailored to offer standardized and templatized support. While service pricing will naturally fluctuate with significant changes in workloads, the increments are typically far more subtle compared to adding even one additional dedicated resource.

📄 Read our other guide to learn more about shared vs. dedicated NOC support models: Shared vs. Dedicated NOC Support: A Quick-Guide

10. Manage Operational Costs

Here are some general tips when setting budgets for your NOC:

  • Remember that, in addition to front-end engineers, a robust enterprise-capable NOC will need to hire back-end support staff.
  • Set aside resources for training new employees whenever something changes, including when they are first hired, when onboarding new clients, and when new technologies are implemented.
  • Budget for a dedicated quality assurance program to satisfy and retain customers.
  • When putting together a new NOC, remember to consider the cost of resources for ongoing support and cloud storage or a physical environment for housing systems, network connectivity, and security controls.
  • Be prepared for the costs of software licensing, such as NMS and EMSs, trouble ticketing systems, knowledge bases, portals, and a CMDB.
  • Consider the necessary operating expenses of compliance components.

Staffing and platform costs are two of the biggest financial factors when considering building and maintaining an in-house NOC vs. outsourcing it.

Staffing

Given that most NOCs require, at minimum, a team of ten to provide reliable 24/7/365 support, comparing the total in-house human resource expenditures to a much smaller team of outsourced FTEs operating in a fully mature NOC environment can lead to a stark realization.

For most companies, staffing a NOC is often a needlessly high expenditure compared to outsourcing that support. A plan that doesn’t consider this opportunity might, for example, call for a staff of 12 full-time employees, when in fact, the same or likely better support could be provided through an outsourced service solution that takes full advantage of an economy of scale to provide far better service at a far lower cost.

Platform

Apart from staffing, the cost of acquiring, implementing, and integrating a full suite of NOC tools only further tips the scale in favor of outsourcing much of the time.

Monitoring, ticketing, knowledge centralization, and reporting are just a few essential NOC functions requiring tools. Together, these can constitute a massive expenditure even though, in most homegrown NOCs, their low utilization doesn’t justify their high price tags. More recent technologies like machine learning and automation (AIOps) only add to the balance sheet, not to mention the difficulty of implementation.

It’s not uncommon for companies to learn that given the payroll and overhead costs of building a NOC in-house, electing for outsourced support can cut their total cost of ownership in half

Final Thoughts and Next Steps

There’s no getting around it—optimizing your NOC for peak performance is a lot of work, but implementing these best practices can pay dividends over time, or even in the short term, in the form of greater efficiency, higher employee and client satisfaction, and even cost reductions or reallocations toward more valuable investments.

There’s no getting around it—optimizing your NOC for peak performance is a lot of work, but implementing these best practices can pay dividends over time, or even in the short term, in the form of greater efficiency, higher employee and client satisfaction, and even cost reductions or reallocations toward more valuable investments.

Here at INOC, we help organizations with these critical needs through award-winning outsourced NOC support (sometimes referred to as NOC as a Service) and NOC operations consulting services.

  • NOC Support Services: Our NOCs monitor tens of thousands of infrastructure elements around the clock. High-level NOC management expertise and custom-built systems ensure you and your customers achieve the infrastructure performance and availability needed to grow and thrive no matter how your IT environment evolves or what new challenges arise. By following an operational methodology that utilizes a tiered support structure in full alignment with the ITIL framework, our NOC can rapidly respond to incidents and events and continue to implement changes as needed, all under a more cost-effective service model.
  • NOC Operations Consulting: We also deliver comprehensive best practices consulting for designing and building new NOCs and helping existing NOCs significantly improve the support provided to you and your customers. Our approach to high-quality support aligns and integrates each function of NOC support operations to enable more informed, consistent decision-making in line with the ITIL framework.

Want to learn how to put these NOC management practices to use in your NOC? Contact us or schedule a free NOC consultation with our Solutions Engineers to see how we can help you improve your IT service strategy and NOC support, and download our free white paper below.

Top 11 Challenges Cover

Free white paper Top 11 Challenges to Running a Successful NOC — and How to Solve Them

Download our free white paper and learn how to overcome the top challenges in running a successful NOC.

Prasad Rao

Author Bio

Prasad Rao

President, INOCAn organizational efficiency and quality control expert, Prasad has 25 years of experience in operations, finance and business development in the IT and manufacturing industries. Prior to INOC, he was director of engineering at RC Machine, where he oversaw the engineering of automotive production machines. He was also co-founder and chief executive of Vidhata Plastics, a pioneering manufacturer of polyurethane shoe soles. A graduate of the Indian Institute of Technology-Madras, Prasad holds an M.S. in industrial engineering from the University of Illinois.

Let’s Talk NOC

Use the form below to drop us a line. We'll follow up within one business day.

men shaking hands after making a deal