Phase 3: Containment - How Do You Contain OT Incidents Without Shutdowns?
OT containment is fundamentally different from IT containment. In IT, the standard containment action is to isolate the affected system: disconnect it from the network, block it from communicating, quarantine it for forensic investigation. In OT, disconnecting a running production system from the network may cause a process upset, a safety event, or equipment damage, because many OT systems depend on real-time network communication for their control function. Containment actions in OT must be evaluated for operational impact before execution, with operations engineering sign-off required for any action that could affect running processes.
OT-specific containment strategies include network-level containment without device isolation: blocking specific traffic flows through firewall rule changes while maintaining required production communications. This can isolate a compromised component from lateral movement paths without disrupting its control function. Segmentation blocking that prevents IT-to-OT traffic while maintaining OT-internal communications is another approach. Full isolation (disconnecting the affected system completely) is reserved for situations where the system is already offline due to the incident, or where continued operation poses a confirmed safety risk.
Containment Decision Authority
Containment decision authority must be pre-defined in the OT IR plan. For each category of containment action, the plan should specify: who has authority to authorize the action, who must be consulted before execution, what process engineering validation is required, and what the rollback procedure is if the containment causes unintended operational impact. Ambiguity about decision authority during an active incident consistently produces delays that allow attackers to extend their access. Clear authority assignment eliminates that delay.
Phase 4: ICS Forensics - How Do You Investigate OT Incidents?
ICS forensics requires specialized tools and techniques that differ significantly from IT forensics. Standard IT forensics approaches including memory imaging with Volatility, disk imaging with FTK Imager, and network capture analysis with Wireshark are applicable to OT workstations running Windows. They are not applicable to PLCs, RTUs, or embedded controllers, which require vendor-specific diagnostic tools or specialized ICS forensics platforms to extract and preserve evidence safely.
PLC forensics focuses on preserving the state of the device at the time of the incident: the current program, data memory values, event logs, and communication logs. This requires using the vendor's engineering software or a specialized OT forensics tool to extract this information without modifying the device state. PLC event logs are often small and may overwrite older events quickly, so forensic preservation of PLC logs must happen as early in the investigation as possible without compromising containment or safety priorities.
Forensic Evidence Preservation in OT
Evidence preservation in OT incidents requires a chain of custody for OT-specific evidence types: PLC program backups taken at incident discovery, historian data covering the incident period, SCADA alarm and event logs, network traffic captures from the OT monitoring platform, and engineering software session logs showing who connected to OT systems and what actions they performed. Each evidence type should be preserved in its original form, with copies made for analysis. The original should not be analyzed directly; investigations should be conducted on copies to preserve forensic integrity.
Phase 5: Recovery - How Do You Restore OT After an Incident?
OT recovery requires two distinct validation steps that have no IT equivalent. First, security validation: confirm that the malware, unauthorized access, or compromised component has been fully remediated before restoring systems to production. Second, process validation: confirm that the restored OT systems are operating correctly and safely before returning to full production operation. IT teams can execute security validation. Process validation requires operations engineers who understand the expected process behavior and can confirm that restored control systems are producing correct process outputs.
Recovery sequencing for OT follows the Purdue Model hierarchy in reverse: restore safety systems first, then field device network connectivity, then control systems, then supervisory systems, then historian and data systems, and finally IT/OT boundary systems. Each level must be validated before restoring connectivity to the next level. Restoring production before the full safety system validation is complete creates the risk of running a production process with compromised safety instrumentation.
Post-Incident Process Validation
Post-incident process validation is a formal engineering review confirming that restored OT systems are producing expected process outputs. This validation compares post-restoration process parameters against historical baselines and confirms that all safety interlocks are functioning correctly. For high-consequence processes, this validation may include a supervised startup period where operations staff observe process behavior more closely than normal before returning to automated operation. The validation period duration depends on process complexity and the nature of the incident.
Frequently Asked Questions
What are the NIS2 incident reporting obligations during an OT incident?
Under NIS2, covered entities must submit an early warning to their national CSIRT within 24 hours of becoming aware of a significant incident. A significant OT incident includes any incident causing severe operational disruption, physical damage, or significant financial loss. A detailed incident notification follows within 72 hours, and a final report within 30 days. The OT IR plan must include a regulatory notification decision tree that helps responders quickly determine whether an incident meets the significance threshold ([NIS2 Directive, Article 23, 2022](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32022L2555)).
Should OT systems be shut down during a ransomware incident?
Not automatically. The decision to shut down OT systems during a ransomware incident requires assessment of whether the ransomware has reached OT networks, whether continued operation poses a safety risk, and whether a controlled shutdown is operationally feasible. Many industrial ransomware incidents affect IT networks without reaching OT directly; shutting down OT in those cases causes production loss without security benefit. The containment decision should be made by the OT IR team based on evidence, not as a reflexive response to ransomware detection on IT systems.
How long does OT incident recovery typically take?
OT incident recovery timelines range widely: days for incidents confined to IT systems in OT-adjacent environments, weeks for incidents that reach OT historian or SCADA systems requiring forensic investigation and system rebuild, and months for incidents involving physical process disruption or safety system compromise requiring engineering revalidation. The Colonial Pipeline incident (2021) resulted in six days of pipeline shutdown. The Norsk Hydro ransomware incident (2019) required months to fully restore affected systems. Recovery time is directly correlated with the scope of the incident and the quality of pre-incident preparation ([CISA, 2021](https://www.cisa.gov/uscert/ncas/alerts/aa21-131a)).
Conclusion
An OT incident response playbook is the operational artifact that separates organizations that manage industrial cyber incidents effectively from those that improvise under pressure and pay the consequences in extended downtime, safety events, and regulatory penalties. The playbook's value comes not from its existence but from its specificity: named decision authorities, pre-validated containment options, OT-specific forensic procedures, and tested recovery sequences that operations engineers have confirmed match their process requirements.
Building a comprehensive OT IR plan takes 3-6 months of collaborative work across IT security, OT operations, engineering, safety, and legal. Testing it takes another 2-3 months of tabletop exercises and process-specific scenario development. The investment is substantial. It's also far less costly than discovering the gaps in your IR plan during an actual incident with production systems offline, safety systems compromised, and regulators asking for incident notifications within 24 hours.