BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//CERN//INDICO//EN
BEGIN:VEVENT
SUMMARY:Deep Reinforcement Learning for High-Dimensional Optimization in F
 ull Waveform Inversion
DTSTART:20260602T053000Z
DTEND:20260602T063000Z
DTSTAMP:20260618T121900Z
UID:indico-event-9359@scitalks.tifr.res.in
DESCRIPTION:Speakers: Bhaskararao Illa (DAA\, TIFR)\n\nThe Earth is a high
 ly heterogeneous medium\, and understanding its internal structure at high
  resolution is fundamental to both exploration-scale and global-scale seis
 mology. Full Waveform Inversion (FWI) is a powerful seismic imaging techni
 que that uses the complete recorded wavefield to recover high-resolution s
 ubsurface physical properties. However\, FWI is a highly nonlinear and hig
 h-dimensional optimization problem\, often affected by cycle skipping\, st
 rong dependence on the initial model\, and convergence toward local minima
 . Although global optimization methods can mitigate some of these issues\,
  they become computationally expensive for large-scale problems. Supervise
 d deep learning approaches partially address these challenges\, but they r
 equire large labeled datasets of seismic data and corresponding velocity m
 odels\, which are rarely available in practical applications. Deep Reinfor
 cement Learning (DRL) provides an alternative framework because it is mode
 l-free\, requiring neither labeled training data nor adjoint-based gradien
 t computations\, while its stochastic exploration capability can help esca
 pe local minima that commonly trap gradient-based methods. In this study\,
  we employ Proximal Policy Optimization (PPO)\, a policy-gradient-based DR
 L algorithm\, to iteratively search for optimal subsurface velocity models
 . To make the approach computationally feasible for FWI\, we parameterize 
 the velocity model using the Discrete Cosine Transform (DCT)\, thereby sig
 nificantly reducing the dimensionality of the optimization space. Guided b
 y a waveform-misfit reward and a multiscale frequency approach\, the PPO a
 gent updates the model without any pre-training data. Preliminary results 
 on complex models demonstrate a significant reduction in waveform misfit w
 ithin a limited number of iterations\, highlighting the potential of PPO-b
 ased reinforcement learning for high-dimensional FWI optimization.\n \n 
 \n\nhttps://scitalks.tifr.res.in/event/9359/
LOCATION:A-269
URL:https://scitalks.tifr.res.in/event/9359/
END:VEVENT
END:VCALENDAR
