Failure Analysis & Troubleshooting

A FMECA ("Failure Mode Effects Criticality Analysis") consists of an outlining of all of the possible Failure Modes of all elements in a system, and a determination of the Effects and Criticality of these failure modes. This analysis looks at the failure of each element within a device or system to determine the effect on the end performance. The failure of an element may be simple or complex. As an example, a resistor has three failure modes. The resistor could be open, shorted, or outside of its specified tolerance. An integrated circuit is obviously much more complex.

The goal is to eliminate any single-point failures that would cause the system to fail to meet its performance requirements. Another way to look at this is to validate that a “graceful” degradation of a device or system will take place over time.

As a further example, suppose that the input to a device or system is coupled to the outside world, via a resistor. If the resistor were to open, there would be no output from the device or system. The effect is, therefore, non-performance, the criticality is very high, and this constitutes a single-point failure. The goal is to minimize the likelihood of single-point failures and to provide the greatest amount of graceful degradation.

This analysis is performed using specialized computer software, along with SPICE and other mathematical models of the system.

The FMECA begins with a stress-based reliability (MTBF) analysis. This generates the failure rate of each of the components. Each component in the BOM is assigned a set of failure modes. These are normally defined using MIL-HDBK-338B which has a definition of the common types of failures for each type of component along with the probability (Mode Failure percentage) of each failure mode occurring. For example: Open 30%, Short 30%, out of tolerance 40%.

Next, each failure mode is analyzed for its local, next higher assembly, and system level effect which will occur when the failure model happens as shown below.

The failure mode is also assigned a severity classification.

Severity Type Description

The Mode criticality is then computed. Mode criticality is a numerical value that can be assigned to each failure mode. Mode criticalities are based on the FMECA approach defined in MIL-STD-1629 and other guides. The Mode Criticality is then computed as:

Mode Criticality = Failure Effect Probability * Mode Failure Rate * Operating Time of the System

This is really the essence of the FMECA. The result is the ranking of the most critical faults by the probability that they are most likely to occur -- the worst failure mode offenders. The results are listed in a table or graphed as shown below.

Getting a Quotation

In order to provide a firm fixed quotation for a FMECA, we will need to see a schematic of the design, a parts list (BOM), and a specification, if one exists.

Please let us know if a Non-Disclosure Agreement (NDA) needs to be put into place. If so, please forward a Mutual NDA Form in MS Word Format to AEi Systems can also send its mutual form to you if that is preferred.

Download the AEi Product Brochure

FMEA Mode Criticality Matrix

The difference between the FMEA and the FMECA is the criticality portion of the analysis -- the assessment of the probability along with the severity. In fact, the criticality represents the culmination of the analysis and much of the usefulness in the analysis stems from this aspect.

• Assesses what happens to the system when components do fail

• Ranks parts by failure rate and how bad a failure would be

• MIL-STD-1629A/ECSS-Q-ST-30-02C Guidance

• BOMS Configured with Failure Modes for each part

• MIL-HDBK-338B Failure mode Probabilities

• Open, short, Change in Value

• Local, Next, and End Effects Compiled

• Severity Assigned

• FMECA Info Read into Relex

• Stress, Temperature, and Failure Rates applied

• Criticality and Risk Levels Computed

In order to provide a firm fixed quotation for a FMECA, we will need to see a schematic of the design, a parts list (BOM), and a specification, if one exists.

Stress & Derating Analysis

A stress and derating analysis is a detailed accounting of the applicable voltage, current power and thermal stress on each individual component within a device or system. Both nominal and worst case end of life stress calculations are made, with the worst case results using EOL component tolerances, worst case environment, and worst case operating conditions (input and loading). Transient stress conditions such as startup, current limit, short circuit, and other fault conditions are also sometimes assessed.

The calculated quantities are compared against a set of derated component ratings. The nominal stress ratios (nominal value/rating) are computed for use in the Reliability (MTBF) analysis. The derated stress ratios are also computed.

Most failures in a device or system are a direct result of subjecting components to some sort of overstress condition. This analysis is typically performed by hand, and summarized in a spreadsheet format. SPICE models or other mathematical models may be used to determine the applicable stress.

A single over-stressed component can cost your company millions of dollars. A thorough Stress & Derating Analysis can prevent this from happening.

If you are going to perform one analysis to improve the reliability of your system, it should be a Stress & Derating analysis.

Stress Guides Supported:

● LMCO - N2.3.5-T3-ElecEng-3.8, 817AY007502


● SMC Standard SMC-S-010

● ECSS-Q-ST-30-11C

● EEE-INST-002


● MIL-Handbook-1547A/MIL-Std-1547B

● MIL-STD-975M

● MIL-P-11268

Here is a link to a sample Stress & Derating analysis report.

Example Stress Analysis

Reliability Prediction - MTBF Analysis

AEi Systems performs parts-count and stress-based MTBF analyses. The results are used to assess the potential component failure probabilities, and when the failures may occur during the product’s lifetime. When combined with a stress analysis and a FMECA analysis, these “part-based” analyses can be very effective in pinpointing the soft spots in your design, allowing you to dramatically improve system quality.

While there are many guides that discuss reliability analyses, few discuss some of the pitfalls with calculating and interpreting the results.

An "MTBF" analysis basically sums the failure rates of the parts in the BOM. The analysis produces a single number for the systems MTBF, failure rate, and reliability. Often, an MTBF analysis is done only to compute this number, which is used mainly for marketing purposes. This is a less than optimum goal for two reasons: one, the number as discussed below is suspect, and two, the most useful aspect of the analysis goes unutilized.

The usefulness is marred by the variability of each part’s data. To get the failure rate correct requires attention to various characteristics that impact the calculation, any one of which can greatly change the result. A parts-count analysis is of limited usefulness.

Here are some of the issues that AEi Systems takes into account. If you don’t, the results for the SAME BOM can differ by several hundred percent.

• Calculation irregularities are commonly encountered

• Confidence Level (up to 2.5x difference between 60% & 90% derived FIT rates)

• Arrhenius translation between temperatures can be complicated

• MIL-Handbook-217 and Telcordia have a limited set of categories

• Is Stress Included in the MTBF computation?

• Are vendor FIT Rates used properly"

• MTBF doesn’t consider “out-of-spec” conditions

• Only considers “extreme” types of failures; results may not be useful to improving reliability

Getting a Quotation

To provide a fixed firm quotation for an MTBF analysis, please forward a Bill of Materials (“BOM”).

There are valuable reasons for performing this analysis such as identifying the key reliability drivers, making MTBF comparisons with competitive products, and selecting warranty periods. There are other results from the analyses (PI factors, ranking of failure rates by category, part type, environment, stress level, etc.), but none is more important than the ranking of the parts in the design by their failure rate.

This is where the electrical designer can gain some insight into the design quality and choose parts that will most improve reliability.

AEi Systems supports MIL-Handbook-217 and Telcordia Technical Reference TR-332. Both standards have similar drawbacks to those listed above, though not to the same degree or specificity. MIL-Handbook-217 tends to be for more high-reliability applications (military, space), while Telcordia tends to be used more for commercial products.

To actually perform the analysis, you will need to provide the following Information:

• Assembly & subassembly structure (if there are multiple subassemblies)

• Which calculation model to use: Telcordia or MIL-HDBK-217

• Temperature(s) at which you want the results computed.

• Operational Duty Cycle (is the unit on all the time or does it cycle?)

• Burn-in time (of parts or the system)

• Expected Mission life, if known

• Environment (Ground Benign, Controlled, Space, etc.)

• Part Quality Levels (AEi Systems can help you figure these out)

Analysis | SPICE Modeling | Publications
Home | Services | About Us | Contact Us | Jobs
Terms of Use | Site Map

©AEi Systems, Inc. All Rights Reserved.