The launch of a new CPU, particularly a gaming-centric one, is supposed to be a flashy event. And when Intel unveiled their 9th generation desktop CPUs, including what they termed as "the best gaming processor ever", that was undoubtedly the plan.
But instead of flashy headlines and discussion about its gap over the competition, Intel has found itself in a scandal related to benchmarks, third-party firms, and the role the press plays in validating and refuting hype.
Everything kicked off at the launch of Intel's i5-9600K, i7-9700K and the i9-9900K CPUs at an event in New York. Even without testing there was little doubt that Intel's new CPUs would be the best gaming processors: games like high frequencies and clock speed, more so than extra cores, and Intel has had an advantage in that department for aeons.
So given that the octacore i9-9900K and i7-9700K could hit 5.0Ghz and 4.9GHz respectively out of the box, it wasn't a question of if Intel would have a lead in gaming. It was a question of "how much".
And when the i9-9900K launched, that "how much" was surprisingly high.
If you've seen an Intel or AMD press release before, you'll know that there's a substantial amount of footnotes. These footnotes contained a link to a report from Principled Technologies, a third-party firm that does white papers and testing for various manufacturers. Intel has contracted Principled Technologies before, although a lot of that testing has revolved around Chromebooks, Optane memory and enterprise tech.
What was interesting about the supplied report was that Principled Technologies had benchmarked the new i9 CPUs not only against previous Intel generations, but AMD's Ryzen 7 2700X and the recent Threadripper CPUs. The report also ran tests in 19 separate games, the kind of exhaustive detail you'd expect from a review.
Small problem: reviewers are still under NDA until later this week. Worse, experienced heads started noticing substantial quirks. All of the Intel CPUs, for instance, pummelled AMD in Ashes of the Singularity: Escalation. That's a game that has historically favoured AMD in the past.
But in Principled Technologies testing, the 9900K was supposedly almost 50% better than the Ryzen 2700X. So tech enthusiasts started digging into the testing methodology, and the more you read, the worse it got.
Two of the strongest criticisms came from Gamers Nexus and the Melbourne-based Hardware Unboxed, both of whom immediately called bullshit. Some of the flaws ranged from enabling Game Mode in AMD's software suite for the AMD Ryzen CPU - a mode designed for the high-core Threadripper CPUs to improve their performance in games. The Ryzen CPUs don't require this, however, as they're already suited for gaming as is.
So what does enabling Game Mode on the Ryzen 2700X do? It disables half the cores. Not for any performance benefit, but then, it wasn't a feature designed for use with the 2700X (or any 3, 5 or 7-series AMD CPU) to begin with.
It got worse from there. Rather than reporting the average from multiple runs, Intel's commissioned benchmarks used the median results of three test runs:
Other quirks included:
- Using the GPU benchmark for Civilization 6 instead of the AI benchmark, which is more apt given the game is CPU-bound;
- Running the RAM in the Intel machine at 2666MHz and the AMD machines at 2933Mhz, rather than the official memory speeds for each CPU (such as 3200Mhz, which the i7-8700K and 2700X both support). When questioned about this, a later response stated "we wanted to make sure none of the processors were overclocking" although it was never explained how RAM speeds would result in a CPU running at higher clock speeds;
- Using the stock AMD cooler with the Ryzen 7 2700X, but the vastly more efficient Noctua NH-U14S for the Intel and the Threadripper CPUs;
- Tests were often run at the High preset, unless the game being tested only had three preset settings for graphics, in which case the highest preset was used;
- Some tests were also poorly representative of anything approximating real-world gameplay. CS:GO, for instance, was tested by benching a game against Easy Bots, instead of a real-world server with a player count more close to what players would experience. Final Fantasy XV was tested using the external benchmark rather than the actual game, even though the benchmark is full of technical issues, while the Warhammer 2 results included the Laboratory benchmark, an "experimental test environment outside of the main game" one that has Intel's branding on it and not as representative of gameplay as the Battle or Skaven benchmarks.
Despite all of this, Intel sanctioned the release of the benchmarks by publishing them on their site and providing them to press through releases. Some outlets jumped on that, but without fully questioning or cautioning readers against the problems in the testing methodology.
"If benchmarks look flat out wrong like was the case with these Intel benchmarks, the best route in our opinion is to advise readers that something looks off, so they don’t fall for a company’s misleading marketing, or they don’t fall for a publication re-posting the benchmarks at face value," Hardware Unboxed told Kotaku over Twitter.
Steve Burke, the editor at Gamers Nexus, raised further concerns about the supposed breadth of info that was being released. Principled Technologies weren't bound to any NDA, having conducted testing on behalf of Intel. But by testing the CPUs against so many games, it helps give the illusion of a proper test - a review, almost - which increases the propensity to mislead.
"Pre-release figures should be relegated to just a few titles (not 19 - that's basically a review, and undermines all of us) and/or a few products," Burke said. "Our concern is that releasing what is functionally a review will leave consumers with apparent third-party data, despite first-party involvement, and no other data to counter deceptive testing."
"Because reviewers are still embargoed, a third-party test looks like the only data out there. Consumers will assume it to be valid, as it appears in-depth and it is the only set of charts on the web."
To be clear, there's nothing wrong per se with Intel - or any other manufacturer - from publishing in-house or even outsourced benchmarks prior. "There’s no harm in saying 'Intel expects their 9900K to perform like this', or something along those lines, as that at least gives viewers a ballpark of what to expect when the product is released," Hardware Unboxed said.
Burke pointed out another issue, one that hasn't been addressed by Intel and not fully acknowledged by Principled Technologies themselves. "While we believe [Intel] 'validated' PT's testing, it is still invalid procedure. In other words, the test is reproduce-able, but the testing methods are bad."
By chance, the Principled Technologies offices are in North Carolina. That's the same state the Gamers Nexus offices are in, so Burke decided to pay Principled Technologies a visit unannounced.
"[It] seemed more likely that we'd get somewhere than phoning ahead, as we figured this would likely allow PR enough time to shut everything down," Burke said.
Mark Van Name, one of the Principled Technologies co-founders, met Burke and his camera operator in the car park. "They were already ready with their own camera crew." Burke said. "Theirs was for legal purposes."
That's where the below video begins: in the car park. To Principled Technologies credit, however, co-founder Bill Catchings answered questions for around 40 minutes.
Most of Catchings' answers didn't fully answer the criticisms raised by enthusiast tech media, or Burke himself, and in some instances there were no answers at all. Around a week after the publication of the original figures, and still before the embargo lifts on the new Intel CPUs, Principled Technologies published an updated report as well as a supplementary statement.
"Our overall goal - and Intel's specific request for this project - was to create as level a playing field as possible for comparing the AMD and Intel processors as the majority of the gaming market would likely use them," the statement read.
"We are confident in our test methodology and results. We welcome questions and we are doing our best to respond to questions from our interim report, but doing so takes time."
It was never fully explained why the majority of the gaming market would deliberately downclock their RAM below what their CPU supported. Nor was it explained why all of the systems - 16 separate systems, raising more questions about why so few test runs were done given the silicon lottery - were configured with 64GB of RAM each. Just over 3.4% of all PCs surveyed in the latest Steam Hardware report have more than 16GB of RAM, with 8GB RAM still being the most common.
But despite question marks still hovering over the quality of the testing, the third-party firm won plaudits for the on-camera interview nonetheless. If anything, the spotlight has shifted more to Intel, who put a stamp of approval on the report by broadcasting it to the wider world. Plus, it's not like Intel doesn't have experience testing hardware internally, leaving enthusiasts to question why Intel stood by the figures when they surely must have known something was off.
"The real blame here has to go to Intel," Hardware Unboxed said, "for commissioning the PT report and publishing it before independent reviewers are allowed to refute the claims. They also likely published the report knowing full well that the AMD results were incorrect (because Intel should be 100% aware of how their competitors perform with properly configured systems), allowing PT to take the blame for the results while making their products look good to those who don’t see the backlash to the report."
It's worth noting that there isn't anything principally wrong with Intel - or any other manufacturer - outsourcing testing to a third-party lab. There is an obvious marketing benefit for the manufacturer, obviously, but that also increases the need for transparency from the third party involved. And Principled Technologies published a pretty exhaustive set of instructions for replicating their results. But due to a lack of experience in specialist testing for games and gaming-focused hardware, much of that methodology was off from the start.
Hardware Unboxed praised the third-party firm for the transparency they did show, and Burke also praised Principled Technologies for their handling of the backlash. "Our take-away from the interview left us with more questions for Intel than for [Principled Technologies]; we believe PT addressed the questions adequately and pointed more toward inexperience with game testing, not toward malice," Burke said.
"Intel, on the other hand, claimed in its public statement to have validated PT's findings. Intel knows better, and knows that Game Mode disables half the CCXs on Ryzen. Intel's validation process is therefore questionable at best, and deceptive and malicious at worst."
Intel's first statement said that Principled Technologies conducted their tests "in spec, configured to show CPU performance". "The data is consistent with what we have seen in our labs," Intel said.
In a follow-up statement, Intel clarified that the new third-party results continued to highlight the i9-9900K's dominance.
Given the feedback from the tech community, we are pleased that Principled Technologies ran additional tests. They’ve now published these results along with even more detail on the configurations used and the rationale. The results continue to show that the 9th Gen Intel CoreTM i9-9900K is the world’s best gaming processor.
We are thankful for Principled Technologies’ time and transparency throughout this process. We always appreciate feedback from the tech community and are looking forward to comprehensive third party reviews coming out on October 19.
The problem, mind you, wasn't that the new Intel CPUs wouldn't showcase the best gaming performance. The question is by how much. An Intel i9-9900K retails for $859 locally; the Ryzen 7 2700X goes for almost half that at $469. For some, the difference in price is too great. Even the i7-9700K is pretty pricey from $659.
The fastest CPU is really only something that matters to people flush with cash. For everyone else, the value proposition matters more. Being able to precisely define that gap is the role of the press and enthusiast media - but some responsibility lies on the manufacturers to not muddie the waters, deliberately or otherwise.
"I think the saga will have an impact on how critical we are of footnotes going forward," Burke said. "It is the journalist's job to be discerning and note in commentary that data is first-party and not yet validated by third-party, independent reviewers."
Hardware Unboxed noted that there has been times where their internal testing has borne out figures similar to the manufacturer's testing. But it's necessary to verify that information where possible, and if not, then discussing those figures with that disclaimer in mind.
"Ignoring [supplied] benchmarks isn’t the right approach," Hardware Unboxed said. "But reporting on them with a disclaimer keeps users informed while managing expectations ahead of full reviews, which is ultimately better than letting often very inaccurate rumours run wild."
"I think the bare minimum for these sorts of reports would be all system specifications, so CPU frequencies, memory capacity, timings and frequencies, etc ... If a report is transparent about all these elements of their setup and everything looks to be done/configured in a standard way, that adds credibility."
And at the end of the day, that's all that has traded hands: credibility. Intel's latest desktop CPUs are still undoubtedly going to be the fastest gaming processors. That was the case before, though, and has been for years. But instead of discussing the advancements of the latest generation, or the gap between the manufacturers, enthusiasts have been left with a sour taste - all over CPUs that were almost certainly going to be chart-toppers regardless.