Effectively diagnose and reduce MTTF in 10 steps

How can you effectively identify the cause of network problems? Experience shows that diagnosing network problems is often a time-consuming task, because they often do not involve a “hard” fault. By this I mean that, for example, a server has crashed or a plug is out.

Because today’s network is critical to business processes and continuity, it is imperative to eliminate problems as quickly as possible. Therefore, the so-called MTTF (mean time to fix) is a challenge for any organization when diagnosing a problem.

This blog is about some basic methodologies for tackling periodic problems. In an organization, when users complain about application performance, several technical disciplines are involved. Often one of the parties is an outsourcing partner. To still diagnose effectively, measurement is necessary. It provides factual insight, causes and consequences are easy to pinpoint and allocate.

The baseline

The starting point is what I call the baseline: what is the normal performance and what are the trends? Without measuring, these values are a matter of feeling. People often think something is functional, so it will automatically be good. With many of our clients, however, it turns out that there are non-visible trends that are going to cause problems in the short and long term. This is due to the (wild) growth of applied technologies. It can be compared to a major train accident. Research often shows that it is not a major cause, but a confluence of all kinds of circumstances that causes a disaster. In the network, you can think of equipment working out of specs, forwarding data with corrupt information, more queues of tasks in servers, applications eating up CPU and memory resources, etc.

Steps for a quick diagnosis

Understanding the technical chain is thus essential in being able to quickly find the cause of a performance problem. Buying a faster server may offer short-term relief, but soon turns out not to be a structural solution. The performance problem rears its head again.

Follow the ten steps to make an effective and quick diagnosis:

Deployment of measurement instruments and tools for understanding deviations
A block diagram for understanding the technical chain
Gathering facts about the components in the chain
Sharing the information (however technical)
Asking the following questions:
- Where and when do out of specs situations play out?
- What is a normal situation?
- What is the quantity of data?
- What can we exclude?
Inserting a review time for inspections and findings
Identify any resource problems or tipping points
Implementing the proposals for improvement
Control of triggers and out of specs
Establish new operational baseline level.

With following the steps and using the right tools, finding the cause of your network problems is a lot easier. You can radically reduce the MTTF! There are free tools but they take a very long time to distill the data. A paid tool often has an expert system. This indicates exactly where the errors are. Moreover, this way it hardly costs you any time at all.

I previously wrote this blog about Wireshark and the Observer Protocol Analyzer.

We provide solutions in the form of services and products that contribute to a stable IT environment. If you want more information, feel free to call us!