Headlines acting like Claude increasing the context limit to 1M tokens is a huge improvement, but the way more important metric to look at is how coherent the model remains. A big context window doesn't mean shit if the model does nothing but repeat "lasagna" after 20%.