Self-driving Telescope Schedules: A Framework for Training RL Agents based on Telescope Data
Date:
Reinforcement Learning (RL) is becoming one of the most effective paradigms of Machine Learning for training autonomous systems. In the context of astronomical observational campaigns, this paradigm can be used for training autonomous telescopes able to optimize sequential schedules based on a given scientific reward, avoiding the intervention of manual optimization which may result in suboptimal policies in terms of performance due to the complexity of the problem. This work wants to present a Python software framework under development that can be used to train RL algorithms on any offline dataset containing simulated or real interactions between a telescope and the sky. The dataset should contain a discrete set of sites of the sky to be visited, and RL algorithms will optimize the schedule of these sites, building a telescope able to change its pointing, site to site, in order to maximize a cumulative reward based on some observation variables as input included in the dataset. Given a dataset respecting a certain predefined format required by the framework and some experiment configurations, a training buffer will adapt it for acting as an environment, and several RL algorithms for discrete action spaces will be supported, all based on the PTAN (PyTorch AgentNet) library. Different well-known features are enabled by the framework, including the holdout method and K-Fold cross-validation with splitting strategies adapted for Modified Julian Days, data preprocessing strategies such as normalization, observation space reduction, and missing values handling. Rendering options are available to visualize autonomous telescope decisions through time ensuring interpretability and visualization metrics to monitor performances. Several other features are embedded in the framework, such as hyperparameters optimization based on the Optuna library, seed control, and multiple parallel training environments. The framework will enable the monitoring of multiple metrics throughout the training of the RL agent, and a dedicated testing module will utilize a predefined metric that can be used for the purpose of comparing agents and visualizing the transitions determined within the schedule of each of them.