NET 121b: Essentials of Networking

Chapter 8: Troubleshooting Hardware Components

Objectives:

This chapter discusses the most common activity for computer technicians: troubleshooting. The topics of this chapter are:

  1. Introduction to troubleshooting
  2. Introduction to common network tools
  3. The science of troubleshooting
Concepts:
Introduction to troubleshooting

The chapter begins with a caution that network problems can have many causes, making troubleshooting more difficult. That being said, it is still a good idea to go over common causes and their solutions, while being mindful that today's problems may not be identical to yesterday's.

Several environmental factors are listed that affect how networks (and other computer equipment) run:

  • Temperature and humidity - The text recommends humidity of about 60%. Higher humidity reduces the risk of static electricity. A recommended temperature is not given. I have just checked the temperature of the server room at the data center where I work, and it has been at about 68 degrees Fahrenheit for the last week. (Computers function better if they are a bit cooler than most humans like it.)
  • Grounding - All electrical equipment should be properly grounded. This includes Uninterruptible Power Supplies.
  • Magnetism - Avoid magnetic fields, for obvious reasons. A technician once told me of a coworker who brought a pair of car stereo speakers into the office. The speakers were placed near a hard drive, and the drive lost its data.
  • Static electricity - ESD, or Electrostatic Discharge, can be a serious cause of problems. Some numbers from a previous text may help you understand the situation:
    • A human can't feel a static discharge until it is 3,000 volts or more.
    • Normal motion, like moving a chair or a foot can generate 1,000 volts.
    • Simply walking across a carpeted area can generate 1,500 to 35,000 volts.
    • Handling a plastic envelope can generate 600 to 7,000 volts.
    • Picking up a plastic bag can generate 1,200 to 20,000 volts.
    • Damage can be done to computer parts with 20 to 30 volts. The damage may not cause immediate failure.

    Rules of Static Prevention

    1. Ground yourself when working on computers. Use a wrist strap, EXCEPT when working on monitors or power supplies. Test your grounds. Unplug computers, as some modern models pass current through the system when plugged in, even if they are turned off. (This tech I know tried to change sound cards while the new Dell was plugged in...)
    2. Do not touch electrical leads.
    3. Do not touch ungrounded people while working on components.
    4. Use static-shielding bags (gray or silver) not antistatic bags (pink or blue).
    5. Keep nonconductors, like styrofoam, away from components. They generate static.
    6. Don't place components on metal surfaces.
    7. Increase humidity to minimize static.
    8. Put the computer on the desk, not the floor. Dry room, winter, feet scuffing on a carpet next to a computer: formula for disaster.
  • Avoid water damage - Read the list above again: get the computer off the floor! If your floors are mopped, electrical equipment needs to be raised out of range of the water.
  • Avoid fires and be able to put them out - Fire extinguishers are classed by the kind of fire they are able to put out. The links below will take you to sites with more information about fire classes and extinguishers. In surveying several sites, I found that there are currently at least four classes of fires, and that the symbols for them have been updated to use pictures instead of letters. Some sites list a Class K for cooking oils (Kitchen fires), but this does not seem to be universal.

    Description of Extinguisher Class
    Letter and Shape Symbol for Class
    Picture for Class
    Links to more information
    Class A: paper, cloth, wood.
    Class B: oil, gasoline, kerosene, propane.
    Class C: electrical
    Class D: combustible metals, such as magnesium, potassium, sodium

    In most cases, a multiclass extinguisher is preferred. Since computers are electronic, and are often found near stacks of paper, extinguishers rated for classes A, B, and C fires are recommended.

  • Don't smoke around the computers - smoke can damage electronic components, and contribute to their failure. I used to know a fellow who smoked in the server room. I think he drives a truck now.
  • Power lines and fluorescent lights - Do not run data cables parallel to fluorescent lights or near power cables. If you must, shield the cables from electromagnetic interference.
  • Avoid excess heat - especially from space heaters and kitchen appliances. Keeping your computer system cool, so that a fire will not ignite, is your most effective form of firefighting: don't let it start.
  • Examine your cables - Once upon a time, I was called to help a user who could not access the network. When I reached his desk, I saw that the network data cable ran from his computer to the floor, under his chair, across the cubicle floor, to a data jack. He had run his wheeled desk chair over the cable and broken it. I ran a new, longer cable from the computer through the channels in the cubicle walls to the data jack. This should never happen: run your cables safely, and keep your users off them.
  • Examine the computer equipment - Most equipment needs to be properly ventilated, kept dry, and kept clean.
    • I have seen older desktop computers that were turned so that their only air intakes were blocked. They overheated and failed.
    • I have seen hanging plants placed above computer equipment, and watched users water the plants. The sparks were pretty, but the equipment failed.
    • I have examined failed keyboards that spilled coffee, paper clips, and ashes when they were turned over. Words fail me.
Troubleshooting Procedure

The text spends only a few words on a systematic troubleshooting procedure. Consider this instead:

  • Approach every problem systematically. Be careful, check what you think you know, and don't assume anything.
  • Interview the user, if there is one. Be polite, be helpful, and confirm your understanding of the trouble report before moving on. Emphasize that you are collecting the facts, not placing blame. Your duty is to find the truth. What did the user do? How was the system changed since it last worked? Did it ever work? Who else could have used or changed the system?
  • Simplify the problem into components. Check each component, one at a time. For instance, if the user is having a problem, ask what software was running at the time. Remove the programs, one at a time, to test for a conflict or an out-of-memory problem. Alternatively, remove all programs, and add them to the mix, one at a time.
  • Check everything, especially simple things like the computer being turned on and plugged in correctly.
  • Check the simple things first. If you have three things to check, and two will take much less time than the third, save the third for last, unless you really believe that the third item is the real problem.
  • Research for an answer. The Internet has made a lot of difference in what companies offer in terms of support and online documentation.
  • Keep notes about what you do and what you learn. This is for you and for those who follow you. You may need to undo what you try, or you may need to repeat the process.
Troubleshooting common hardware problems:
  • Network Cards
    • Is the NIC the correct one for the bus it is plugged into? (Probably, or it won't fit.)
    • Is the wrong protocol and or frame type bound to the card? (Check properties in Windows.)
    • Are the LEDs on the card lighted or dark? A steady connection (link) light proves electrical connection to the hub or switch on an Ethernet card. A dark connection light means no connection, no power, or no driver for the card. A flashing connection light may mean that a circuit is reversed in a cable. A flashing traffic/activity light indicates the card is trying to send or receive data. A steady traffic light indicates too much data, which could be a bad card, very heavy traffic, or no successful transmissions. A dark traffic light means no network traffic.
  • Cables - As noted above, run cables in channels made for them, not across open floors. Use good quality cable and connectors, and replace damaged cables.
  • Hubs/Switches/Concentrators - Most people use switches now, instead of hubs. Switches require power, and typically have diagnostic lights that tell you about problems. A port that has been cabled to a workstation should have a link light that is lit. If it is not, the port may be malfunctioning: try another one.
  • Routers - The text states that routers connect dissimilar networks. This is incomplete: routers are also used to connect similar networks. When testing a network problem, it is often useful to determine the scope of it. If only one user is affected, start examining their workstation. If multiple users on one LAN are affected, look at their concentrator. If multiple users on multiple LANs are affected, examine a router they have in common. (It may only need to be restarted.)
Introduction to common network tools

The text lists several tools for troubleshooting:

  • Crossover cable - a crossover cable is used on UTP networks to bypass a concentrator and connect two NICs directly to each other. You can't use a standard UTP cable for this: one end of the cable must have the live circuits reversed. The tables below describe a standard cable, a crossover cable, and a rollover cable.

    Standard RJ-45 Pin Assignments
    Pin Color Color Signal Circuit
    1 Orange/White
    Orange White
    TX data + Orange Circuit
    2 Orange Orange TX data - Orange Circuit
    3 Green/White
    Green White
    RX data + Green Circuit
    4 Blue Blue unused Blue Circuit
    5 Blue/White
    Blue White
    unused Blue Circuit
    6 Green Green RX data - Green Circuit
    7 Brown/White
    Brown White
    unused Brown Circuit
    8 Brown Brown unused Brown Circuit

    The insulation shown in the graphics above should NOT be stripped back on these wires.
    Standard cable
    If you are making a standard cable (to run from a workstation to a hub) connect both ends as listed above and shown on the right. Insert the wires into the RJ-45 connector, then crimp it with the crimping tool. (There will be no spaces between the wires when they are inserted into the RJ45 connector. Space is used here to make the color pattern more readable.)
    End 1
    End 2
    Crossover cable
    If making a crossover cable (to run directly from one NIC to another) swap the orange and green circuits on one end only: put orange/white on 3, orange on 6, green/white on 1, and green on 2. Insert the wires as shown on the right, then crimp. (This second configuration is actually EIA/TIA 568A.)
    End 1
    End 2
    Rollover cable
    Now, for something completely different, if you are making a rollover cable (to run from a workstation to a Cisco router), prepare the cable like a standard cable, both ends in the same configuration.
    Before
    crimping the second end, roll the cable (or the RJ45 connector) over, 180 degrees. The will make pin 1 on one end of the cable connect to pin 8 on the other end, pin 2 to pin 7, pin 3 to pin 6, and pin 4 to pin 5. Insert the wires as shown on the right, then crimp. This cable is used with an adapter to connect to a Cisco router's console port.
    End 1
    End 2

  • Tone generators and locators - May be useful in large networks that have problems with lots of wiring. When you have lots of wire coming into wiring closets, it is possible to lose track of which cable goes to which cubicle. These tools can help identify which circuit is the one you are looking for.
  • Network vendor tools - Your network vendor will offer paid support calls, searchable Internet knowledge bases, and information CDs you can get by subscription.
  • Multimeter - to check the power supply output. Can also be used to check circuit continuity.
  • Fox and hound - two devices that are used together: the fox sends a signal through a cable, the hound is used to find where the signal goes. Can check continuity and signal loss to other circuits.
  • Protocol analyzers - The most commonly used protocol analyzer is Sniffer, which is now available from Network General. Competing products exist, but Sniffer is widely used. Its purpose is to track packets on the network, to determine where there is more traffic than needed, where packets are being dropped, and what network problems exist that it can detect.
  • Time domain reflectometer (TDR) - a tool used to find breaks in cable. A signal is sent into the cable that will be reflected by a break. The distance to the break can be computed by the time it takes the reflection to return to the TDR.
The Science of Troubleshooting

Troubleshooting a network problem starts with determining where the problem lies.

  • If only one workstation has a problem, the problem is probably there. If the problem affects everyone on a segment, or on a network, this gives you about the equipment that has a problem. Find the common factor in each case, and you typically find the source of the problem.
  • Determine when the problem occurs, if it occurs multiple times. Sometimes problems occur at times of heavier traffic, or particular times of day.
  • In other cases, a problem happens each time a user does something specific, such as using software that they installed themselves. Problems that can be repeated are easier to deal with, in that they have specific causes that can be addressed. Intermittent problems are the most difficult to diagnose and repair, since you have no single event that triggers them.

The text offers a suggested diagnostic method using a model it calls DIReCtional troubleshooting:

  • D - Define the problem. Determine exactly what problem is being experienced.
  • I - Isolate the problem. Where is it happening, to whom is it happening, and what is it? Determine what is and is not broken.
  • R - Resolve the problem. This includes brainstorming the possible fixes you could apply to the problem, researching the problem and solutions, applying a solution and checking to make sure the problem is resolved.
  • C - Confirm the solution. This means to watch the network for further problems: either the same one or new ones resulting from the solution. It also means to check with the users to make sure their problems are solved. It also includes informing staff who need to know about the solution, and documenting it for the future. Record what you saw and did, for the next time it happens, and for the people you work with.

Some guidelines about troubleshooting are suggested:

  1. Eliminate user error. This can be broken down into three possibilities:
    1. The user did something wrong. (Don't laugh, it happens a lot, especially before the users are trained.)
    2. The user did nothing wrong, but does not understand the system, so he/she expects something other than what happened. (This happens after the users are trained, but before they are experienced.)
    3. There is actually a problem. (After your users are experienced, this happens more often.)
  2. Check the inventory.
    1. Are parts missing? ("What's a drop cable?")
    2. Are all parts correct? ("But that cable works for my phone...")
    3. Are all parts correctly connected and turned on? ("What's a network jack?")
  3. Make a backup of data before changing anything. (It would be nice if you had one already.)
  4. Classic: shut everything off and reboot. This actually works quite often.
  5. Simplify the problem. Remove programs and hardware that are not necessary. Replace programs and hardware items, one at a time, to test the point of failure. This is less practical than it seems. It works best if the user is using some unapproved software from home that can be uninstalled and sent back home. Otherwise, check the properties of hardware and software for unexpected settings.
  6. Run diagnostic software, if you have any. This is a good way to rule out virus problems.

As noted elsewhere, take notes about what you see and do. You will want to document the solution, and you will want to undo actions that are not effective. When documenting a problem, record all the details you can about the hardware and software involved. Make specific notes, especially about any irregular items. Use documentation that already exists, such as event logs and trouble tickets, to your benefit.

If you are a technician who actually works on hardware, you will need a set of tools. Some general tools are suggested, in addition to the equipment listed above:

The most needed equipment: (Some items are shown in the picture on the right. Hover over the items with your mouse for the items' names.)

  • Phillips-head screwdriver - a couple of sizes here are a good idea
  • Extractor - a spring-loaded device that looks a bit like a hypodermic syringe. Press the plunger and wire prongs come out the other end, that can be used to pick up fallen objects. Sometimes you need this, sometimes you need the tweezers. Sometimes you need the much longer version that is available in automotive stores.
  • Tweezers - for picking pieces of paper out of printers or dropped screws from tight places. Surgical forceps are also good.
  • Flathead screwdriver - a wide blade and a narrow blade are useful.
  • Chip extractor - to remove chips; This is not done very often, but it is included in many technician's toolkits.
  • Socket drivers - for hex nuts and hex screws
  • Torx screwdrivers - unfortunately, these come in several sizes, and none can be substituted for each other. A good set of them is desirable. (Not shown in this picture.)
  • Pliers - to assist in removing some bolts, clips, etc. Do NOT use pliers on electronic components.
  • Flashlight - to examine anything that is not well lit. (A small Maglite® is handy, because they are durable and you can focus them.)

If you do not have a tool set, and can only get one, the best thing to have is a Swiss Army Cybertool. I took apart a hard drive with one to make the point that it could be done. Follow the link to see features of one model. Mouse over the components for a description of them.

Researching existing documentation about your network is essential in solving problems. Use all appropriate sources:

  • Vendor manuals
  • Vendor and user group web sites
  • Help desk documentation from your internal help desk
  • Other technicians who have worked with your network or with similar problems

When troubleshooting servers, check the event log on the server. You can do the same for workstations. Likewise, you can check Task Manager on either a Windows server or workstation to verify that services are actually running. You can also use Task Manager to determine the utilization level of each processor and of a machine's network connection.

The use of Diagnostic Software is discussed. Various products are available to diagnose servers, workstations and LANs. Some of the tasks you should be able to carry out with this kind of software are:

  • Determine facts about your hardware and OS
  • Inventory the hardware in a computer
  • Measure the performance of hardware
  • Check for incompatibilities, like IRQs, I/O addresses, and memory addresses already in use.
  • View and edit CMOS settings
  • Determine what drivers are in use
  • Diagnose failing components