Safety Tactics Analysis in Architecture Patterns

Based on the paper “Building a Safety Architecture Pattern System,” this post analyzes how different safety tactics are used across various safety architecture patterns.

Fault Coverage Classification

Random Faults Only Patterns

These patterns use Replication Redundancy and can only handle random hardware faults, not systematic faults:

PatternPrimary TacticsFault CoverageKey Characteristics
Homogeneous DuplexReplication Redundancy, Override, Condition MonitoringRandom OnlyTwo identical hardware modules
Triple Modular RedundancyReplication Redundancy, VotingRandom OnlyThree identical channels, majority voting

Systematic and Random Faults Patterns

These patterns use Diverse Redundancy and can handle both systematic and random faults:

PatternPrimary TacticsFault CoverageKey Characteristics
Heterogeneous DuplexDiverse Redundancy, Override, Condition MonitoringBothTwo diverse hardware implementations
M-out-of-NReplication/Diverse Redundancy, VotingBoth*N channels (identical or diverse)
M-out-of-N-DReplication/Diverse Redundancy, Voting, DiagnosticsBoth*N channels with diagnostic capability
N-Version ProgrammingDiverse Redundancy, VotingBothN diverse software versions
Acceptance VotingDiverse Redundancy, Voting, Sanity CheckBothN diverse versions with acceptance tests
Recovery BlockDiverse Redundancy, Override, Sanity CheckBothSequential execution of diverse versions
N-Self Checking ProgrammingDiverse Redundancy, Voting, ComparisonBothDiverse components with self-checking

*M-out-of-N and M-out-of-N-D can handle both types depending on implementation choice (diverse vs. identical channels)

Fault Detection/Monitoring Only Patterns

These patterns focus on detection and safe shutdown rather than fault tolerance:

PatternPrimary TacticsFault CoverageKey Characteristics
Sanity CheckOverride, Sanity CheckDetection OnlyRange/validity checking with safe shutdown
Monitor-ActuatorOverride, Condition MonitoringDetection OnlyReference-based monitoring with shutdown
WatchdogOverride, Heartbeat, Sanity CheckDetection OnlyTiming fault detection with shutdown
Safety ExecutiveOverride, Degradation, Heartbeat, Sanity CheckDetection OnlyCentralized safety coordination
Protected Single ChannelOverride, Condition Monitoring, Sanity CheckDetection OnlySingle channel with monitoring
3-Level Safety MonitoringOverride, Condition Monitoring, Sanity Check, HeartbeatDetection OnlyMulti-level monitoring hierarchy

Detailed Fault Coverage Analysis

Random Fault Handling Mechanisms:

  • Replication Redundancy: Multiple identical components mask random hardware failures
  • Voting: Majority decision masks minority failures
  • Condition Monitoring: Detects random deviations from expected behavior

Systematic Fault Handling Mechanisms:

  • Diverse Redundancy: Different implementations avoid common systematic errors
  • N-Version Programming: Independent software development teams
  • Acceptance Testing: Different validation approaches for each version

Pattern Categories by Fault Tolerance Strategy

1. Masking Patterns (Continue Operation)

  • Random Only: Homogeneous Duplex, Triple Modular Redundancy
  • Both Random & Systematic: Heterogeneous Duplex, M-out-of-N (with diversity), M-out-of-N-D (with diversity), N-Version Programming, Acceptance Voting, N-Self Checking Programming

2. Detection + Safe Shutdown Patterns

  • Sanity Check, Monitor-Actuator, Watchdog, Safety Executive, Protected Single Channel, 3-Level Safety Monitoring

3. Hybrid Patterns (Detection + Recovery)

  • Both Random & Systematic: Recovery Block (detects via acceptance test, recovers via diverse versions)

Safety Tactics Usage Table

Safety PatternOverrideReplication RedundancyDiverse RedundancyVotingCondition MonitoringSanity CheckHeartbeatDegradationComparisonBarrierSubstitutionSimplicityRepair
Homogeneous Duplex
Heterogeneous Duplex
Triple Modular Redundancy
M-out-of-N
M-out-of-N-D
N-Version Programming
Acceptance Voting
Recovery Block
N-Self Checking Programming
Sanity Check
Monitor-Actuator
Watchdog
Safety Executive
Protected Single Channel
3-Level Safety Monitoring

Key Insights

  • Random vs Systematic Fault Patterns: Only 2 patterns (13%) handle random faults exclusively; 7 patterns (47%) handle both random and systematic faults; 6 patterns (40%) focus on detection rather than fault tolerance.
  • Critical Tactical Differences:
    • Replication Redundancy → Random faults only
    • Diverse Redundancy → Both random and systematic faults
    • Override + Monitoring → General fault detection and safe shutdown
  • Design Trade-offs:
    • Random-only patterns: Lower cost, simpler implementation
    • Both-fault patterns: Higher cost, more complex, but better fault coverage
    • Detection patterns: Lowest cost, but reduced availability (safe shutdown vs. continued operation)

Unused Tactics in Safety Architecture Patterns

The following tactics from the paper’s taxonomy are NOT used in any of the 15 safety architecture patterns:

  • Barrier: Protects subsystems from unintentional influences. Not used because patterns focus on fault tolerance, not isolation.
  • Substitution: Replaces components with more reliable alternatives. Not used because patterns focus on runtime handling, not design-time selection.
  • Simplicity: Avoids failures by keeping systems simple. Not used as a structural pattern, but could be meta-guidance.
  • Repair: Restores failed systems to full functionality. Not used because patterns prioritize immediate fault response.

Note: “Rollback” is not an official safety tactic in the taxonomy, though it appears in the Recovery Block pattern as a recovery mechanism.


Analysis of Unused Tactics

  • Scope Mismatch: Some tactics are design-time decisions, not runtime structures.
  • Safety Focus: Patterns prioritize immediate fault response over long-term repair.
  • Pattern Granularity: Patterns focus on component interactions, not fine-grained isolation or maintenance.
  • Historical Context: Patterns represent established approaches and may not include newer concepts.

Potential Integration Opportunities:

  • Barrier: Could enhance mixed-criticality systems.
  • Repair: Valuable for autonomous/self-healing systems.
  • Substitution: Could be added as implementation guidance.
  • Simplicity: Useful as a meta-principle for pattern design.

Conclusion

This analysis shows how the tactical approach enables systematic pattern design and helps architects understand both the utilized and unutilized building blocks of safety-critical systems.


References

  • Peter H. Feiler, John B. Goodenough, Arie van Gemund, and John Hudak. “Building a Safety Architecture Pattern System.” Proceedings of the 2015 European Conference on Software Architecture Workshops (ECSA ‘15), Article 6, 1–7. https://dl.acm.org/doi/abs/10.1145/2739011.2739028