It actually didn't add that much code.
I added a function to record the current velocity since the last frame. It loops with requestAnimationFrame and stores the coordinates in a variable so the next call can compare current and previous coordinates.
The velocity itself is computed as moving average. The window size is dynamic: 2 if the velocity is higher than the recorded one, and 15 if it's lower. The higher the window size, the less the impact of a new value on the average. This is necessary because when you're about to lift your finger, you typically slow down and this would've caused the final velocity to be lower than expected.
The inertia is also implemented with a function which calls itself in a requestAnimationFrame loop. I simply multiply the velocity variables by 0.95 and then add the velocity to the image coordinates.