Stable Diffusion is an AI model that lets you enter text to generate images. Spectrograms visually represent sound. Seth Forsgren and Hayk Martiros combined the two for Riffusion, which lets you enter text and the model generates a spectrogram that is converted to audio.
Read more about the process here.