Astronomy and Astrophysics Seminars

Deep Reinforcement Learning for High-Dimensional Optimization in Full Waveform Inversion

by Dr Bhaskararao Illa (DAA, TIFR)

Asia/Kolkata
A-269

A-269

Description

The Earth is a highly heterogeneous medium, and understanding its internal structure at high resolution is fundamental to both exploration-scale and global-scale seismology. Full Waveform Inversion (FWI) is a powerful seismic imaging technique that uses the complete recorded wavefield to recover high-resolution subsurface physical properties. However, FWI is a highly nonlinear and high-dimensional optimization problem, often affected by cycle skipping, strong dependence on the initial model, and convergence toward local minima. Although global optimization methods can mitigate some of these issues, they become computationally expensive for large-scale problems. Supervised deep learning approaches partially address these challenges, but they require large labeled datasets of seismic data and corresponding velocity models, which are rarely available in practical applications. Deep Reinforcement Learning (DRL) provides an alternative framework because it is model-free, requiring neither labeled training data nor adjoint-based gradient computations, while its stochastic exploration capability can help escape local minima that commonly trap gradient-based methods. In this study, we employ Proximal Policy Optimization (PPO), a policy-gradient-based DRL algorithm, to iteratively search for optimal subsurface velocity models. To make the approach computationally feasible for FWI, we parameterize the velocity model using the Discrete Cosine Transform (DCT), thereby significantly reducing the dimensionality of the optimization space. Guided by a waveform-misfit reward and a multiscale frequency approach, the PPO agent updates the model without any pre-training data. Preliminary results on complex models demonstrate a significant reduction in waveform misfit within a limited number of iterations, highlighting the potential of PPO-based reinforcement learning for high-dimensional FWI optimization.

 

 

Organised by

DAA