Staffing a 24x7 NOC: Costs, Challenges, and Key Considerations

Staffing a Network Operations Center
Brandon Atkins

By Brandon Atkins

Senior Director of NOC Operations, INOCBrandon is INOC's Senior Director of NOC Operations. He specializes in driving Network Operations Center (NOC) growth by initiating cross-functional solutions to support customer-driven project and process efforts, and offers 15+ years of progressive experience.

Table of contents

Staffing a NOC to operate continuously—three shifts a day, 365 days a year—is difficult to plan and manage, not to mention prohibitively expensive for most companies who need this support.

From a planning perspective, there’s no handy playbook IT leaders can follow to tackle everything that goes into such an operation, such as:

  • developing an operational and organizational structure;
  • defining roles and responsibilities;
  • writing specialized job descriptions;
  • creating work schedules;
  • training staff;
  • securing tools and licenses;
  • developing processes;
  • managing quality,
  •  …and all the other critical tasks that need to be carried out thoughtfully from the start.

Teams are left to feel around in the dark to put all these pieces together—all while putting their business and customers on the line to bear the consequences when problems result in expensive outages and downtime. Most businesses simply can’t afford to take on the costs and risks.

There’s also no playbook for what comes after the NOC is stood up and staffed up: managing it continually—forever.

Looking after an always-on NOC is a delicate balancing act that requires specialized domain expertise, unique metrics, and thoughtful, continuous evaluation. 

  • Under-investment here can saddle too few staff with too much work, leading to burnout.
  • Over-investment (throwing people at problems) can be an enormous financial drain for the business, and still fail to address issues that are more fundamental to the way the NOC is operationalized.

It’s for these reasons (and many others) that companies partially or completely outsource their NOC to an expert third-party service provider whose core business is, well, NOC support.

Here are the main benefits of outsourcing with respect to staffing, at a glance:

  • The heavy and constant burden of recruiting, hiring, training, and retaining talent is lifted completely, so your time and attention can be spent elsewhere.
  • Through an economy of scale, the level of support you receive can flex with your business—up or down—predictably.
  • You can tap into a breadth of experience and expertise that simply wouldn’t be possible much less financially feasible to buy or generate internally.

In short, the person or people responsible for hiring and managing 24x7 NOC staff should clearly understand what they’re signing up for at the outset, and consider the value-adds of outsourcing closely.

What we cover below will help you do just that. We explore some of the top challenges of staffing a NOC 24x7, so you can contextualize them for your business and make the most informed decision about how you get the support you need.

If you want more insight into building a NOC team, read our companion post: How to Build and Manage an Effective NOC Team

Top 10 Challenges to Running a Successful NOC

FREE WHITE PAPER

Top 11 Challenges to Running a Successful NOC — and How to Solve Them

Download our free white paper and learn how to overcome the top challenges in running a successful NOC.

Download

 

 

Developing a NOC Organizational Structure

Effectively staffing a 24x7 NOC starts with a well-organized structure. Before you can put a traditional team-based org structure together, you have to create an operational structure that dictates how that team will work. This should directly inform who you need in the NOC.

At the risk of getting too far into the weeds, the tiered NOC support structure and workflow queues we present below are good starting points for determining the skill levels required of your NOC staff.

Figure 1 illustrates a well-organized tiered NOC support structure, central to which is the Tier 1 team that uses monitoring tools and interacts with end-user help desks, Tier 2 and 3 engineers, and third parties. 

The arrows represent information flows between the various entities within this well-defined process framework.

INOC.COM  NOC Best Practices_ 10 Ways to Improve Your Operation in 2020 (1) rev2-1

Employing this structure, we’ve seen NOCs (including our own!) effectively resolve 65 to 75% of incidents at Tier 1.

This structure enables the support group to handle the events, service requests, and incidents at the appropriate tier, more quickly achieving resolution.

Workflows come next in the plan, which again, helps you create an informed org structure. Well-defined workflows also help teams avoid the “wall of red” by planning what will happen and when. Most times, issues should be prioritized and organized into a set of queues, each of which can be handled by the appropriate group. Variables like SLAs, technology, and technician skill levels play an important role in determining workflows.

Figure 2 shows how a set of issues can be broken up into queues and assigned to groups based on skillset.

Sample Workflow Queues

Now we can finally get to putting that team org structure together.

Figure 3 below shows an example of a skills-based NOC structure that can be adapted not only to support your 24x7 NOC requirements but also to provide a growth plan for employees, which is essential for retention. 

Keep in mind this is just one approach to structuring a NOC team. Yours may look different depending on the factors at hand. 

Screen Shot 2022-03-24 at 3.16.07 PM

 

“If I was to build my own NOC from scratch, I'd have to put together job descriptions, figure out an org structure, and career path to develop the employees. I’d also need to lay out the workflows, processes, and tools for them to be successful. All of that I’d need to continue to improve over time, which requires setting up and managing certain metrics and reporting on them so I know which changes I need to make as time goes on.”

 

— Brandon Atkins, Senior Director of NOC Operations, INOC

Since we’re focusing on staffing specifically here, we won’t dive into all the processes a company would then need to develop for their NOC to actually operate well consistently. But here’s a short list of what you should expect to need to plan for if you’re planning to stand up an operation yourself:

  • Escalation
  • QA/QC
  • Disaster recovery/continuity
  • Onboarding
  • Training
  • Continuous development
  • Workflow management
  • Incident management
  • Problem management
  • Knowledge management
  • Access/credential management

Hiring

Once there’s an operational and staffing blueprint for the NOC, and the position descriptions are created, your HR team should be armed with clear directives for hiring and retention. This can be a major challenge that itself motivates companies to outsource to a team that lives and breathes NOC support and can take this burden off their shoulders. It’s no small task, especially in today’s labor market.

Keep in mind that the absolute minimum personnel requirement of a 24x7 NOC is 10 to 12 people.

Most NOCs need more.

Carefully consider the benefits your company provides for employees. For example, if your company provides 10 holidays and four weeks of PTO per employee, these hours need to be accounted for in the total headcount to ensure that your NOC runs smoothly.

The NOC team’s key responsibilities will likely encompass a series of tasks presented in the ITIL* framework, such as:

  • Event Monitoring and Management
  • Incident Management
  • Problem Management
  • Capacity Management
  • Change Management

Ensuring the Right Skills in the NOC

It’s important to linger on the importance of skillsets for a moment. The modern NOC engineer needs a variety of skills to keep your network, infrastructure, and applications up and running.

Diverse technical knowledge, including knowledge of various network technologies, cloud environments, server operating systems, virtualization, storage systems, and applications, is becoming more and more essential—and harder to find—each day.

This demand for skilled human resources in a 24x7 environment can pose a considerable and often insurmountable challenge for many organizations.

In addition to these core skills, tech innovations demand new ones. Machine learning and artificial intelligence in particular pose new challenges that don’t always lend themselves to time-tested best practices. Even many seasoned NOC engineers haven’t dealt with networks becoming more “aware” of the traffic that runs through them. Developing skills that complement machine learning is just one example of the evolving challenges that those who work in the NOC will need to overcome.

Prioritizing NOC design (along with the right training programs and tools) to maximize your team’s capabilities from the start saves an incredible amount of expensive, labor-intensive work as the NOC comes to life.

Understanding the required skillset early on can help you identify the correct staff to hire and drive the selection of tools to manage the infrastructure over time. In addition, a rigorous, ongoing knowledge management and training program is important to ensure the entire NOC team is up to date on all changes made to the supported infrastructure.

(By the way, this is an excerpt from our free white paper: A Practical Guide to Running an Effective NOC. If you’re reading this, you should probably grab this guide, too.)

Again, given the nature of a 24x7 support operation, your team should be prepared to spend time recruiting, interviewing, training, and developing new people as a regular and likely ongoing part of their job.

Also keep in mind that the high-stress environment of a NOC naturally lends itself to turnover, especially at entry-level positions. The amount of work involved here is almost always higher than what you'd expect for the average engineering role. Whether frontline support staff move outside your company or move up within it, hiring is a near-constant and particularly high-stakes management challenge.

Is your HR team up for it?

Training

A NOC training program should include initial onboarding as well as ongoing training. If you’re considering standing up or scaling an existing support operation yourself, do you have the time, energy, and expertise to develop and administer training?

If so, be prepared to incur some lead time here. It may take up to six months of various classes and on-the-job training before an engineer is ready to take on NOC support responsibilities depending on the tasks and technologies that await them. 

After work has begun, monthly or quarterly training sessions should be scheduled to keep engineers’ skills fresh and to update the support team on new types of services, new customer requirements, and new equipment.

“Training is probably one of the more unspoken challenges but is probably the highest cost you'll take on. It takes time to train individuals depending on what level you're hiring them at. For a NOC, most  people typically think Tier 1-style support, potentially Tier 2. Are you hiring more experienced individuals? Are you hiring people that are brand new to the industry? Because that's going to change your cost model as well. More experience costs more money. But if you pay less, and you go with somebody that's newer, they have to learn all of the skills and the processes and things that are unique to your company and to your network; how to interact with the various departments, the ticketing systems; understanding the ins and outs of your company as a whole. The more experienced person may only have to learn the company's specific tools since the technical piece is already there.”

— Austin Kelly, Director of Dedicated NOC and ATS, INOC

Retention

A certain rate of attrition within the NOC should be taken into account based on your historical data and on industry standards. (Do you have historical data to work with here?)

Factors that affect retention rates include company culture as well as NOC organization (i.e., whether there’s a clear path for employee growth from one level to the next or to other departments within the organization).

By making these calculations, you can better plan for staffing and training needs. For example, assuming that a typical engineer works five years in your NOC (a retention rate of 80%), you’d need to hire an additional 20% of staff each year. This is one of many realizations that drive companies to outsource their NOC support to a dedicated support provider, who again, can shoulder this burden for them.

Scheduling

There’s also the never-ending task of juggling schedules. Paid time off, sick days, and other personnel interruptions all need to be planned and covered to ensure support with required skill sets is never missing when it can’t be.

Some companies may try to mitigate the problem of weekend and third shift staff shortages by outsourcing only those shifts—a hybrid solution. However, this requires management of both the internal and external NOC staff and ensuring quality and consistency are maintained across the two different teams, rather than a single in-house or outsourced team handling everything. We make sure this is addressed thoughtfully in our service engagements.

Quality

One of the biggest staffing considerations that fly under the radars of many teams—and isn’t taken into consideration—is quality control and quality assurance.

The quality of updates, communication, soft skills, etc., is something we see teams realize far too late despite being critical for the NOC to catch its errors and turn them into opportunities to improve.

Here are some of the quality-related tasks top-performing teams make part of their workflow:

  • Reviewing tickets, incidents, and escalations
  • Reviewing chronic alarms and incidents
  • Coordinating maintenance calendars
  • Reporting on service and operational metrics
  • Contractor assessments of performance
  • Forecast planned activities
  • Reviewing quantity and causes for NOC support activities
  • Identifying root causes and preventive actions
  • Reviewing escalations and open items
  • Reviewing staffing and proficiency levels
  • Validating assumptions for contract
  • Identifying and mitigating risks within infrastructure and operations
  • Reviewing previous action items and statuses

Tools and Licenses

The bigger your team, the more licenses and similar types of expenses you can expect to incur. If you’re planning on building a distributed team supporting you remotely, be sure to identify any additional costs that result, such as additional tools and sending individuals equipment. Also, make sure you have a strategy for access and credential management.

If a separate facility will be needed, be sure to factor these costs in as well as they can be considerable.

Utilization Metrics and Reporting

Metrics are far too often conflated with KPIs and SLAs only—the measures and contractual obligations for how well the NOC is doing its job.

While both can certainly be a sore spot for many support teams, in the context of staffing a 24x7 NOC, there’s an even bigger metrics-related problem—one that is often a complete blind spot for teams: utilization metrics

These are the metrics that reveal why the NOC is or isn’t busy at any point in time, and how staffing levels should be set accordingly. It’s, in many ways, the key to efficiency.

  • What's your throughput?
  • What's your workload? 
  • How many tickets are you dealing with at certain points in time? 
  • How long do you spend working tickets?

All of these questions are critical for setting and fine-tuning staffing levels based on how and when the NOC is typically utilized.

Some essential utilization metrics for a NOC include: 

  • Labor content for each edit of a ticket — reveals how much work was devoted to the lifecycle of a ticket, which helps you determine optimal staffing levels, resulting in control over OPEX and better morale by avoiding overworking employees.
  • Number of edits processed/performed per hour — shows how many edits in the ticket lifecycle a person can perform per hour, which, in combination with the previous metric, can help you gauge how many tickets can be resolved per hour.
  • A heatmap of edits by time of day and day of week — tells you how many tickets you can expect at a given time on a given day of the week, which helps you determine optimal staffing levels.

In our other guide, we dive into these utilization metrics further: NOC Performance Metrics: How to Measure and Optimize Your Operation

Staffing a 24x7 NOC vs. Outsourcing

In most organizations, staffing a 24x7 NOC is often a needlessly high expenditure compared to strategically outsourcing support as a predictable operating expense. Given the payroll and overhead costs of building a NOC in-house, many companies that outsource with us cut their total cost of ownership in half.

A plan that doesn’t consider this opportunity might, for example, call for a staff of 12 full-time employees, when in fact, the same or likely better support could be provided through an outsourced service solution that takes full advantage of an economy of scale to provide far better service at a far lower cost

In addition to staffing costs, the cost of acquiring, implementing, and integrating a full suite of NOC tools only further tips the scale in favor of outsourcing much of the time.

Perhaps the most apparent difference between homegrown and outsourced NOCs are the capabilities that come with an already-mature NOC operation and access to niche NOC expertise.

Between planning a NOC build, hiring a team, training that team, and aligning over the operational plan, in-house NOCs can expect 16 to 24 weeks minimum before all the parts are even in place. It can then take months or years to gain confident control over the system and bring it into a state of real operational maturity.

Turning up support with an outsourced NOC condenses all that time and effort into a number of weeks instead of years—often far more cost-effectively.

Read also: Shared vs. Dedicated NOC Support: A Quick-Guide

Final Thoughts and Next Steps

For most companies, the cost and complexity of building and managing a NOC is a diversion from focusing its attention where it's needed most: innovating and growing the business. 

Without an effective NOC, persistent support issues lead to expensive project delays, endless stress, and serious vulnerabilities that threaten your business. 

Through the NOC as a service model, an outsourced support provider helps you take control of your infrastructure through a suite of NOC solutions designed to meet the specific needs of your technology environment and operational workflow—all while enabling you to focus internal resources on the projects that move the business forward.

Want to learn more about our approach to outsourced NOC support? Contact us to see how we can help you improve your IT service strategy and NOC support, schedule a free NOC consultation with our Solutions Engineers, or download our free white paper below.

White paper cover: A Practical Guide to Running an Effective NOCFREE WHITE PAPER

A Practical Guide to Running an Effective NOC

Download our free white paper and learn how to build, optimize, and manage your NOC to maximize performance and uptime.

Download

 

 

 


*Originally developed by the UK government’s Office of Government Commerce (OGC) - now known as the Cabinet Office - and currently managed and developed by AXELOS, ITIL is a framework of best practices for delivering efficient and effective support services.

Brandon Atkins

Author Bio

Brandon Atkins

Senior Director of NOC Operations, INOCBrandon is INOC's Senior Director of NOC Operations. He specializes in driving Network Operations Center (NOC) growth by initiating cross-functional solutions to support customer-driven project and process efforts, and offers 15+ years of progressive experience.

Let’s Talk NOC

Use the form below to drop us a line. We'll follow up within one business day.

men shaking hands after making a deal