A sound that is more complex than a sine wave can be deconstructed in the sum of multiple harmonics that are sine wave. That was the work of Joseph Fourrier: Fourrier analysis. It's similar to how any color can be deconstructed in a sum of Red, Green, and Blue values.
If a complex sound has harmonics above 20 kHz and you sample the sound with a sample rate of 40 kHz, then you'll record all harmonics up to 20 kHz, but you won't record any of the harmonics that are above 20 kHz. However, since you can't hear them anyway, you don't need to record them!
Think of every sound as a sum of pure sine waves. You can't hear any of the sine waves that are above 20 kHz, so if you filter them out, it won't make any difference to you. Take any record, and run it through an EQ that filters out everything above 20 kHz, and you won't hear any difference at all.
So the idea behind the Nyquist-Shannon sampling theorem is that if you need to be able to record all kinds of sine waves (in order to reproduce any sound, no matter how complex) up to a certain frequency, you need to sample at twice that frequency.