AMD Comments on Threadripper 2990WX Scheduling Issues

This site may earn affiliate commissions from the links on this page. Terms of use.

Ever since AMD launched its Threadripper 2990WX, there have been questions about how effectively it could scale in multi-threaded workloads. Initially, the performance drops and slowdowns in certain workloads were attributed to the asymmetric memory controller configuration — only some Threadripper die have direct access to memory controllers, while others connect only indirectly. This appeared to cause significant performance loss in certain benchmarks.

But information quickly surfaced suggesting the issue wasn’t in hardware. Under Linux, the 2990WX maintained high levels of performance, even when it sagged in the same configuration in Windows. Clearly, there was more to the story. Anandtech, which has been cooperating with Wendell from Level1Tech’s, has published their own update on the situation.

2990WX-Memory-Access

It appears that Windows is designed to maintain a “Best NUMA Node” setting and attempts to run threads on those cores. The scheduler attempts to move nodes to these positions as often as possible but will kick out previous threads that were also supposed to be assigned to this core cluster.

This creates enormous core contention as different threads jockey for position and the OS mindlessly attempts to stuff everything into the same over-used node. A fully multi-threaded application could spend up to 50 percent of its time shuffling data endlessly (which, it seems, is exactly what happens). The point of this system was originally for VMs, such that each VM would have its own runtime and be assigned to the “best” NUMA node, regardless of where it was. At some point, Microsoft realized that this eternal core contention is a major problem and created a patch that would disable the ‘best NUMA node’ issue on any two-node NUMA system. This is why Threadripper 1950X and 2950X aren’t affectedSEEAMAZON_ET_135 See Amazon ET commerce.

coreprio_ui

Systems with three or more nodes, however, are still impacted, which is why we’re seeing the issue hit both Epyc 7551 and Threadripper 2990WX. The reason the CorePrio NUMA Disassociator works is that it probes active software every few seconds and adjusts thread affinity while the application is running. Think of it like a manual sorting operation being run periodically to ensure the operating system’s built-in scheduler is functioning appropriately.

According to AMD, it has tickets open with Microsoft and is exploring methods of resolving this problem as quickly as possible. Wendell’s understanding of the problem is supposedly “very close” to what’s actually going on, but specifics on differences were not given. Microsoft is said to now be working on a fix, though the timeline for inclusion is unclear. The next logical point of inclusion for the update would be Redstone 6, aka the upcoming Windows Update that will drop in the first half of 2019.

There’s been a lot of user speculation about whose “fault” this is. To some extent, the question is poorly framed. AMD can inform Microsoft of a problem with scheduling at any point, but this doesn’t automatically mean Microsoft flags the problem for resolution — particularly in the context of a just-launched CPU at the very top of the market with a negligible user base.

We know that AMD now has a line of communication open to Microsoft and we can make some guesses about when solutions might hit the market. It certainly won’t hurt anything that Intel’s Cascade Lake Advanced Performance CPUs, with up to 48 cores and a chiplet design, will be coming to market this year. From Microsoft’s perspective, it may have made the most sense to wait and introduce all of the tweaks required to support a range of new NUMA configurations, include Threadripper 2990WX’s, at the same time — thereby debuting stronger support for 7nm Epyc, 32-core and above Threadripper CPUs, Cascade Lake AP, and future products from Intel in the same update.

Users with a 2990WX should download the modified version of CorePrio to improve performance — we’ll have to wait and see what the formal version of scheduler support introduces to see if it can improve performance over and beyond the application.

Now Read:

Let’s block ads! (Why?)

ExtremeTechExtreme – ExtremeTech