Learning Generalizable Graph Heuristics via Reinforcement Learning with Applications to Cyber-Attack Path Discovery
Date:
Graph-based combinatorial optimization (GCO) is central to many real-world applications, as graph structures are well suited to model complex interactions and dependencies. However, exact solvers are often computationally intractable, while handcrafted heuristics are difficult to design and rarely generalize well across diverse graph instances. Recent advances have explored reinforcement learning (RL) to automatically learn graph-based heuristics through sequential decision-making over discrete graph components that collectively form a solution. Nevertheless, designing scalable learning heuristics that effectively generalize across diverse graph instances remains a major challenge.
In this talk, I will present our “projection”-based RL agents, whose key innovation is operating in continuous latent action spaces defined by GNN embeddings, rather than directly searching large discrete combinatorial spaces. The agents learn to project target decision variables into this latent space, where their latent representations are decoded into discrete graph solutions. Our approach outperforms state-of-the-art discrete and iterative methods on classical GCO benchmarks such as Maximum Cut, while achieving even stronger gains in real-world networking applications, including critical attack path discovery and traffic engineering.
The talk will then particularly focus on the cyber-attack path discovery benchmark, demonstrating how learned heuristics can proactively anticipate attack paths before they occur. It will also present complementary contributions on the supporting simulation and training framework, including enhanced realism in simulation dynamics and training scenarios, moving toward an end-to-end solution for more effective cybersecurity mitigation strategies.