Safety Tactics Analysis in Architecture Patterns

Based on the paper “Building a Safety Architecture Pattern System,” this post analyzes how different safety tactics are used across various safety architecture patterns.

Fault Coverage Classification

Random Faults Only Patterns

These patterns use Replication Redundancy and can only handle random hardware faults, not systematic faults:

Pattern	Primary Tactics	Fault Coverage	Key Characteristics
Homogeneous Duplex	Replication Redundancy, Override, Condition Monitoring	Random Only	Two identical hardware modules
Triple Modular Redundancy	Replication Redundancy, Voting	Random Only	Three identical channels, majority voting

Systematic and Random Faults Patterns

These patterns use Diverse Redundancy and can handle both systematic and random faults:

Pattern	Primary Tactics	Fault Coverage	Key Characteristics
Heterogeneous Duplex	Diverse Redundancy, Override, Condition Monitoring	Both	Two diverse hardware implementations
M-out-of-N	Replication/Diverse Redundancy, Voting	Both*	N channels (identical or diverse)
M-out-of-N-D	Replication/Diverse Redundancy, Voting, Diagnostics	Both*	N channels with diagnostic capability
N-Version Programming	Diverse Redundancy, Voting	Both	N diverse software versions
Acceptance Voting	Diverse Redundancy, Voting, Sanity Check	Both	N diverse versions with acceptance tests
Recovery Block	Diverse Redundancy, Override, Sanity Check	Both	Sequential execution of diverse versions
N-Self Checking Programming	Diverse Redundancy, Voting, Comparison	Both	Diverse components with self-checking

*M-out-of-N and M-out-of-N-D can handle both types depending on implementation choice (diverse vs. identical channels)

Fault Detection/Monitoring Only Patterns

These patterns focus on detection and safe shutdown rather than fault tolerance:

Pattern	Primary Tactics	Fault Coverage	Key Characteristics
Sanity Check	Override, Sanity Check	Detection Only	Range/validity checking with safe shutdown
Monitor-Actuator	Override, Condition Monitoring	Detection Only	Reference-based monitoring with shutdown
Watchdog	Override, Heartbeat, Sanity Check	Detection Only	Timing fault detection with shutdown
Safety Executive	Override, Degradation, Heartbeat, Sanity Check	Detection Only	Centralized safety coordination
Protected Single Channel	Override, Condition Monitoring, Sanity Check	Detection Only	Single channel with monitoring
3-Level Safety Monitoring	Override, Condition Monitoring, Sanity Check, Heartbeat	Detection Only	Multi-level monitoring hierarchy

Detailed Fault Coverage Analysis

Random Fault Handling Mechanisms:

Replication Redundancy: Multiple identical components mask random hardware failures
Voting: Majority decision masks minority failures
Condition Monitoring: Detects random deviations from expected behavior

Systematic Fault Handling Mechanisms:

Diverse Redundancy: Different implementations avoid common systematic errors
N-Version Programming: Independent software development teams
Acceptance Testing: Different validation approaches for each version

Pattern Categories by Fault Tolerance Strategy

1. Masking Patterns (Continue Operation)

Random Only: Homogeneous Duplex, Triple Modular Redundancy
Both Random & Systematic: Heterogeneous Duplex, M-out-of-N (with diversity), M-out-of-N-D (with diversity), N-Version Programming, Acceptance Voting, N-Self Checking Programming

2. Detection + Safe Shutdown Patterns

Sanity Check, Monitor-Actuator, Watchdog, Safety Executive, Protected Single Channel, 3-Level Safety Monitoring

3. Hybrid Patterns (Detection + Recovery)

Both Random & Systematic: Recovery Block (detects via acceptance test, recovers via diverse versions)

Safety Tactics Usage Table

Safety Pattern	Override	Replication Redundancy	Diverse Redundancy	Voting	Condition Monitoring	Sanity Check	Heartbeat	Degradation	Comparison	Barrier	Substitution	Simplicity	Repair
Homogeneous Duplex	✓	✓	✗	✗	✓	✗	✗	✗	✗	✗	✗	✗	✗
Heterogeneous Duplex	✓	✗	✓	✗	✓	✗	✗	✗	✗	✗	✗	✗	✗
Triple Modular Redundancy	✗	✓	✗	✓	✗	✗	✗	✗	✗	✗	✗	✗	✗
M-out-of-N	✗	✓	✓	✓	✗	✗	✗	✗	✗	✗	✗	✗	✗
M-out-of-N-D	✓	✓	✓	✓	✓	✓	✗	✗	✗	✗	✗	✗	✗
N-Version Programming	✗	✗	✓	✓	✗	✗	✗	✗	✗	✗	✗	✗	✗
Acceptance Voting	✗	✗	✓	✓	✗	✓	✗	✗	✗	✗	✗	✗	✗
Recovery Block	✓	✗	✓	✗	✗	✓	✗	✗	✗	✗	✗	✗	✗
N-Self Checking Programming	✗	✗	✓	✓	✗	✗	✗	✗	✓	✗	✗	✗	✗
Sanity Check	✓	✗	✗	✗	✗	✓	✗	✗	✗	✗	✗	✗	✗
Monitor-Actuator	✓	✗	✗	✗	✓	✗	✗	✗	✗	✗	✗	✗	✗
Watchdog	✓	✗	✗	✗	✗	✓	✓	✗	✗	✗	✗	✗	✗
Safety Executive	✓	✗	✗	✗	✗	✓	✓	✓	✗	✗	✗	✗	✗
Protected Single Channel	✓	✗	✗	✗	✓	✓	✗	✗	✗	✗	✗	✗	✗
3-Level Safety Monitoring	✓	✗	✗	✗	✓	✓	✓	✗	✗	✗	✗	✗	✗

Key Insights

Random vs Systematic Fault Patterns: Only 2 patterns (13%) handle random faults exclusively; 7 patterns (47%) handle both random and systematic faults; 6 patterns (40%) focus on detection rather than fault tolerance.
Critical Tactical Differences:
- Replication Redundancy → Random faults only
- Diverse Redundancy → Both random and systematic faults
- Override + Monitoring → General fault detection and safe shutdown
Design Trade-offs:
- Random-only patterns: Lower cost, simpler implementation
- Both-fault patterns: Higher cost, more complex, but better fault coverage
- Detection patterns: Lowest cost, but reduced availability (safe shutdown vs. continued operation)

Unused Tactics in Safety Architecture Patterns

The following tactics from the paper’s taxonomy are NOT used in any of the 15 safety architecture patterns:

Barrier: Protects subsystems from unintentional influences. Not used because patterns focus on fault tolerance, not isolation.
Substitution: Replaces components with more reliable alternatives. Not used because patterns focus on runtime handling, not design-time selection.
Simplicity: Avoids failures by keeping systems simple. Not used as a structural pattern, but could be meta-guidance.
Repair: Restores failed systems to full functionality. Not used because patterns prioritize immediate fault response.

Note: “Rollback” is not an official safety tactic in the taxonomy, though it appears in the Recovery Block pattern as a recovery mechanism.

Analysis of Unused Tactics

Scope Mismatch: Some tactics are design-time decisions, not runtime structures.
Safety Focus: Patterns prioritize immediate fault response over long-term repair.
Pattern Granularity: Patterns focus on component interactions, not fine-grained isolation or maintenance.
Historical Context: Patterns represent established approaches and may not include newer concepts.

Potential Integration Opportunities:

Barrier: Could enhance mixed-criticality systems.
Repair: Valuable for autonomous/self-healing systems.
Substitution: Could be added as implementation guidance.
Simplicity: Useful as a meta-principle for pattern design.

Conclusion

This analysis shows how the tactical approach enables systematic pattern design and helps architects understand both the utilized and unutilized building blocks of safety-critical systems.

References

Peter H. Feiler, John B. Goodenough, Arie van Gemund, and John Hudak. “Building a Safety Architecture Pattern System.” Proceedings of the 2015 European Conference on Software Architecture Workshops (ECSA ‘15), Article 6, 1–7. https://dl.acm.org/doi/abs/10.1145/2739011.2739028

Notes

Explorer

Safety Tactics Analysis in Architecture Patterns

Safety Tactics Analysis in Architecture Patterns

Fault Coverage Classification

Random Faults Only Patterns

Systematic and Random Faults Patterns

Fault Detection/Monitoring Only Patterns

Detailed Fault Coverage Analysis

Pattern Categories by Fault Tolerance Strategy

1. Masking Patterns (Continue Operation)

2. Detection + Safe Shutdown Patterns

3. Hybrid Patterns (Detection + Recovery)

Safety Tactics Usage Table

Key Insights

Unused Tactics in Safety Architecture Patterns

Analysis of Unused Tactics

Conclusion

References

Graph View

Table of Contents