The Art of Problem Solving

0

Last Updated on June 3, 2024 by David Both

Image by: Opensource.com

Although it would be nice to believe that cars, home theater systems, computers, and Linux never break, the reality is that they do.

Many people have no problems with Linux, but those who do want the best information and guidance possible. You can obtain professional help from a number of places. For example, if you purchased Linux from a major vendor such as Red Hat, you are entitled to some level of service from that vendor. In fact, what you are actually purchasing is the service rather than the code. Other help is available on the internet on various web sites and forums. Local user groups may also be available in your geographical area, and you may even have some friends who use Linux and are willing to offer a hand. Do not hesitate to use any and all resources available to you.

Solving problems of any kind is an art and a science. Solving technical problems, such as those that occur with computers, requires a good deal of specialized knowledge as well.

One of the best things that my mentors helped me with was the formulation of a defined process that I could always use for solving problems of nearly any type. This process is based on the scientific method. Most of the time those of us who use Linux prefer–even enjoy–doing our own troubleshooting.

Any approach to solving problems of any nature—including problems with computers and Linux—must include more than just a list of symptoms and the steps necessary to fix or circumvent the problems which caused the symptoms. This so-called “symptom-fix” approach looks good on paper to the old-style managers (those managers who do not participate in The Open Organization) but sucks in practice.

I find this short article entitled, “How the Scientific Method Works,”1 to be very helpful. It describes the scientific method using a diagram very much like the one I have created for my Five Steps of Problem Solving. So I pass this on as a mentor and it is my contribution to all of you young SysAdmins. I hope that you find it as useful as I have.

Solving problems of any kind is art, science, and – some would say – perhaps a bit of magic, too. Solving technical problems, such as those that occur with computers, requires a good deal of specialized knowledge as well. Any approach to solving problems of any nature – including problems with Linux – must include more than just a list of symptoms and the steps necessary to fix or circumvent the problems which caused the symptoms. This so-called “symptom-fix” approach looks good on paper to the managers – the Pointy-Haired-Bosses, the PHBs – but it really sucks in practice. The best way to approach problem solving is with a large base of knowledge of the subject and a strong methodology.

The Five Steps of Problem Solving

There are five basic steps that are involved in the problem solving process as shown in Figure 1. This algorithm is very similar to that of the Scientific Method referred to in Footnote 1 but is specifically intended for solving technical problems.

You probably already follow these steps when you troubleshoot a problem but do not even realize it. These steps are universal and apply to solving most any type of problem, not just problems with computers or Linux. I used these steps for years in various types of problems without realizing it. Having them codified for me made me much more effective at solving problems because when I became stuck, I could review the steps I had taken, verify where I was in the process, and restart at any appropriate step.

Figure 1. The algorithm I use for troubleshooting.

You may have heard a couple other terms applied to problem solving in the past. The first three steps of this process are also known as problem determination, that is, finding the root cause of the problem. The last two steps are problem resolution which is actually fixing the problem. The next sections cover each of these five steps in more detail.

Knowledge

Knowledge of the subject in which you are attempting to solve a problem is the first step. All of the articles I have seen about the scientific method seem to assume this as a prerequisite. However the acquisition of knowledge is an ongoing process, driven by curiosity and augmented by the knowledge gained from using the scientific method to explore and extend your existing knowledge through experimentation. This is one of the reasons I use the term “experiment” in my courses rather than something like “lab project.”

You must be knowledgeable about Linux at the very least, and even more, you must be knowledgeable about the other factors that can interact with and affect Linux, such as hardware, the network, and even environmental factors such as how temperature, humidity and the electrical environment in which the Linux system operates can affect it.

Remember, “Without knowledge, resistance is futile,” to paraphrase the Borg. Knowledge is power.

Observation

The second step in solving the problem is to observe the symptoms of the problem. It is important to take note of all of the problem symptoms. It is also important to observe what is working properly. This is not the time to try to fix the problem; it is merely to observe.

Another important part of observation is to ask yourself questions about what you see and what you do not see. Aside from the questions you need to ask that are specific to the problem, there are some general questions to ask.

  • Is this problem caused by hardware, Linux, application software, or perhaps by lack of user knowledge or training?
  • Is this problem similar to others I have seen?
  • Is there an error message?
  • Are there any log entries pertaining to the problem?
  • What was taking place on the computer just before the error occurred?
  • What did I expect to happen if the error had not occurred?
  • Has anything about the system hardware or software changed recently?

As you gather data, never assume that the information obtained from someone else is correct. Observe everything yourself. The best problem solvers are those who never take anything for granted. They never assume that the information they have is 100% accurate or complete. When the information you have seems to contradict itself or the symptoms, start over from the beginning as if you have no information at all.

In one very strange incident, I fixed a large computer by sitting on it. That is a long story and amounts to the fact that I observed a very brief symptom that was caused by sitting on the workspace that was the top of a very large printer control unit. The complete story can be found in my book, The Linux Philosophy for SysAdmins2 and here on Both.org in my article, SysAdmin careers: Curiosity is an asset.

Reasoning

Use reasoning skills to take the information from your observations of the symptoms, your knowledge to determine a probable cause for the problem. The process of reasoning through your observations of the problem, your knowledge, and your past experience is where art and science combine to produce inspiration, intuition, or some other mystical mental process that provides some insight to the root cause of the problem.

It helps to remember that the symptom is not the problem. The problem causes the symptom. You want to fix the root cause problem not just the symptom.

Action

Now is the time to perform the appropriate repair action. This is usually the simple part. The hard part is what came before – figuring out what to do. After you know the cause of the problem it is usually easy to determine the correct repair action to take.

The specific action you take will depend upon the cause(s) of the problem. Remember, we want to fix the root cause, not just trying to get rid of or cover up the symptom.

Make only one change at a time. If there are several actions that can be taken that might correct the cause of a problem, only make the one change or take the one action that is most likely to resolve the root cause. The selection of the corrective action with the highest probability of fixing the problem is what you are trying to do here. Whether it is your own experience telling you which action to take, or the experiences of others, move down the list from highest to lowest priority, one action at a time. Test the results after each action.

Test

After taking some overt repair action the repair should be tested. This usually means performing the task that failed in the first place but it could also be a single, simple command that illustrates the problem.

Make a single change, one potential corrective action, and then test the results of that action. This is the only way in which we can be certain which corrective action fixed the problem. If we were to make several corrective actions and then test one time, there is no way to know which action was responsible for fixing the problem. This is especially important if we want to walk back those ineffective changes we made after finding the solution.

Be sure to check the original observed symptoms when testing. It is possible that they have changed due to the action you have taken and you need to be aware of this in order to make informed decisions during the next iteration of the process. Even if the problem has not been resolved the altered symptom could be very valuable in determining how to proceed.

As you work through a problem it will be necessary to iterate through at least some of the steps. Figure 1 shows that you may need to iterate to any previous step in order to continue.

Be flexible. Don’t hesitate to step back and start over if nothing else produces some forward progress.


1Harris, William, How the Scientific Method Works, https://science.howstuffworks.com/innovation/scientific-experiments/scientific-method6.htm

2Both, David, The Linux Philosophy for SysAdmins, Apress, 2018, 471-472

Leave a Reply