Safety Tactics Analysis in Architecture Patterns
Based on the paper “Building a Safety Architecture Pattern System,” this post analyzes how different safety tactics are used across various safety architecture patterns.
Fault Coverage Classification
Random Faults Only Patterns
These patterns use Replication Redundancy and can only handle random hardware faults, not systematic faults:
| Pattern | Primary Tactics | Fault Coverage | Key Characteristics |
|---|---|---|---|
| Homogeneous Duplex | Replication Redundancy, Override, Condition Monitoring | Random Only | Two identical hardware modules |
| Triple Modular Redundancy | Replication Redundancy, Voting | Random Only | Three identical channels, majority voting |
Systematic and Random Faults Patterns
These patterns use Diverse Redundancy and can handle both systematic and random faults:
| Pattern | Primary Tactics | Fault Coverage | Key Characteristics |
|---|---|---|---|
| Heterogeneous Duplex | Diverse Redundancy, Override, Condition Monitoring | Both | Two diverse hardware implementations |
| M-out-of-N | Replication/Diverse Redundancy, Voting | Both* | N channels (identical or diverse) |
| M-out-of-N-D | Replication/Diverse Redundancy, Voting, Diagnostics | Both* | N channels with diagnostic capability |
| N-Version Programming | Diverse Redundancy, Voting | Both | N diverse software versions |
| Acceptance Voting | Diverse Redundancy, Voting, Sanity Check | Both | N diverse versions with acceptance tests |
| Recovery Block | Diverse Redundancy, Override, Sanity Check | Both | Sequential execution of diverse versions |
| N-Self Checking Programming | Diverse Redundancy, Voting, Comparison | Both | Diverse components with self-checking |
*M-out-of-N and M-out-of-N-D can handle both types depending on implementation choice (diverse vs. identical channels)
Fault Detection/Monitoring Only Patterns
These patterns focus on detection and safe shutdown rather than fault tolerance:
| Pattern | Primary Tactics | Fault Coverage | Key Characteristics |
|---|---|---|---|
| Sanity Check | Override, Sanity Check | Detection Only | Range/validity checking with safe shutdown |
| Monitor-Actuator | Override, Condition Monitoring | Detection Only | Reference-based monitoring with shutdown |
| Watchdog | Override, Heartbeat, Sanity Check | Detection Only | Timing fault detection with shutdown |
| Safety Executive | Override, Degradation, Heartbeat, Sanity Check | Detection Only | Centralized safety coordination |
| Protected Single Channel | Override, Condition Monitoring, Sanity Check | Detection Only | Single channel with monitoring |
| 3-Level Safety Monitoring | Override, Condition Monitoring, Sanity Check, Heartbeat | Detection Only | Multi-level monitoring hierarchy |
Detailed Fault Coverage Analysis
Random Fault Handling Mechanisms:
- Replication Redundancy: Multiple identical components mask random hardware failures
- Voting: Majority decision masks minority failures
- Condition Monitoring: Detects random deviations from expected behavior
Systematic Fault Handling Mechanisms:
- Diverse Redundancy: Different implementations avoid common systematic errors
- N-Version Programming: Independent software development teams
- Acceptance Testing: Different validation approaches for each version
Pattern Categories by Fault Tolerance Strategy
1. Masking Patterns (Continue Operation)
- Random Only: Homogeneous Duplex, Triple Modular Redundancy
- Both Random & Systematic: Heterogeneous Duplex, M-out-of-N (with diversity), M-out-of-N-D (with diversity), N-Version Programming, Acceptance Voting, N-Self Checking Programming
2. Detection + Safe Shutdown Patterns
- Sanity Check, Monitor-Actuator, Watchdog, Safety Executive, Protected Single Channel, 3-Level Safety Monitoring
3. Hybrid Patterns (Detection + Recovery)
- Both Random & Systematic: Recovery Block (detects via acceptance test, recovers via diverse versions)
Safety Tactics Usage Table
| Safety Pattern | Override | Replication Redundancy | Diverse Redundancy | Voting | Condition Monitoring | Sanity Check | Heartbeat | Degradation | Comparison | Barrier | Substitution | Simplicity | Repair |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Homogeneous Duplex | ✓ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Heterogeneous Duplex | ✓ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Triple Modular Redundancy | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| M-out-of-N | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| M-out-of-N-D | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| N-Version Programming | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Acceptance Voting | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Recovery Block | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| N-Self Checking Programming | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ |
| Sanity Check | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Monitor-Actuator | ✓ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Watchdog | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Safety Executive | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Protected Single Channel | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| 3-Level Safety Monitoring | ✓ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
Key Insights
- Random vs Systematic Fault Patterns: Only 2 patterns (13%) handle random faults exclusively; 7 patterns (47%) handle both random and systematic faults; 6 patterns (40%) focus on detection rather than fault tolerance.
- Critical Tactical Differences:
- Replication Redundancy → Random faults only
- Diverse Redundancy → Both random and systematic faults
- Override + Monitoring → General fault detection and safe shutdown
- Design Trade-offs:
- Random-only patterns: Lower cost, simpler implementation
- Both-fault patterns: Higher cost, more complex, but better fault coverage
- Detection patterns: Lowest cost, but reduced availability (safe shutdown vs. continued operation)
Unused Tactics in Safety Architecture Patterns
The following tactics from the paper’s taxonomy are NOT used in any of the 15 safety architecture patterns:
- Barrier: Protects subsystems from unintentional influences. Not used because patterns focus on fault tolerance, not isolation.
- Substitution: Replaces components with more reliable alternatives. Not used because patterns focus on runtime handling, not design-time selection.
- Simplicity: Avoids failures by keeping systems simple. Not used as a structural pattern, but could be meta-guidance.
- Repair: Restores failed systems to full functionality. Not used because patterns prioritize immediate fault response.
Note: “Rollback” is not an official safety tactic in the taxonomy, though it appears in the Recovery Block pattern as a recovery mechanism.
Analysis of Unused Tactics
- Scope Mismatch: Some tactics are design-time decisions, not runtime structures.
- Safety Focus: Patterns prioritize immediate fault response over long-term repair.
- Pattern Granularity: Patterns focus on component interactions, not fine-grained isolation or maintenance.
- Historical Context: Patterns represent established approaches and may not include newer concepts.
Potential Integration Opportunities:
- Barrier: Could enhance mixed-criticality systems.
- Repair: Valuable for autonomous/self-healing systems.
- Substitution: Could be added as implementation guidance.
- Simplicity: Useful as a meta-principle for pattern design.
Conclusion
This analysis shows how the tactical approach enables systematic pattern design and helps architects understand both the utilized and unutilized building blocks of safety-critical systems.
References
- Peter H. Feiler, John B. Goodenough, Arie van Gemund, and John Hudak. “Building a Safety Architecture Pattern System.” Proceedings of the 2015 European Conference on Software Architecture Workshops (ECSA ‘15), Article 6, 1–7. https://dl.acm.org/doi/abs/10.1145/2739011.2739028