Deep Reinforcement Learning for Automated Cyber-Attack Path Prediction in Communication Networks
Date:
Building an intelligent agent that mimics attackers and autonomously identifies attack paths in a network has emerged as a crucial strategy for discovering and keeping control over potential security breaches in a communication network. A full and realistic automation of network security analysis necessitates discarding assumptions about prior knowledge of the network structure, hence the process should not be considered completely observable. Instead, it should be treated as a black box that is partially observable and dynamically discoverable. This can be achieved through the use of deep reinforcement learning (RL) and representing the target network as a graph-based Partially Observable Markov Decision Process (POMDP). We have been utilizing CyberBattleSim, an experimental research platform that is designed to offer a simulated and abstract network environment, suited for RL training. We have enhanced its partial observability and redefined the observation and action spaces to deal with a local abstraction of the problem, allowing a neural network structure that is generalizable among topologies. The observation space will consist of partially visible evolving features for the source and the target nodes of the attack. The set of all possible exploitable vulnerabilities will instead represent the action space. Preliminary convergence results were obtained with a CyberBattleSim environment that represents a chain of alternating Windows and Linux vulnerable nodes that lead to a terminal node with a goal flag. These initial results demonstrate the potential of value-based, policy-based, and actor-critic techniques to discover an optimal policy for compromising all network nodes, regardless of the chain size, in approximately the least number of steps possible.