Intel on Tuesday pushed microcode updates to fix a high-severity CPU bug that has the potential to be maliciously exploited against cloud-based hosts.
The flaw, affecting virtually all modern Intel CPUs, causes them to “enter a glitch state where the normal rules don’t apply,” Tavis Ormandy, one of several security researchers inside Google who discovered the bug, reported. Once triggered, the glitch state results in unexpected and potentially serious behavior, most notably system crashes that occur even when untrusted code is executed within a guest account of a virtual machine, which, under most cloud security models, is assumed to be safe from such faults. Escalation of privileges is also a possibility.
Very strange behavior
The bug, tracked under the common name Reptar and the designation CVE-2023-23583, is related to how affected CPUs manage prefixes, which change the behavior of instructions sent by running software. Intel x64 decoding generally allows redundant prefixes—meaning those that don’t make sense in a given context—to be ignored without consequence. During testing in August, Ormandy noticed that the
REX prefix was generating “unexpected results” when running on Intel CPUs that support a newer feature known as fast short repeat move, which was introduced in the Ice Lake architecture to fix microcoding bottlenecks.
The unexpected behavior occurred when adding the redundant rex.r prefixes to the FSRM-optimized
rep mov operation. Ormandy wrote:
We observed some very strange behavior while testing. For example, branches to unexpected locations, unconditional branches being ignored and the processor no longer accurately recording the instruction pointer in xsave or call instructions.
Oddly, when trying to understand what was happening we would see a debugger reporting impossible states!
This already seemed like it could be indicative of a serious problem, but within a few days of experimenting we found that when multiple cores were triggering the same bug, the processor would begin to report machine check exceptions and halt.
We verified this worked even inside an unprivileged guest VM, so this already has serious security implications for cloud providers. Naturally, we reported this to Intel as soon as we confirmed this was a security issue.
Jerry Bryant, Intel’s senior director of Incident Response & Security Communications, said on Tuesday that company engineers were already aware of a “functional bug” in older CPU platforms that could result in a temporary denial of service and had scheduled a fix for next March. The severity rating had tentatively been set at 5 out of a possible 10. Those plans were disrupted following discoveries within Intel and later inside Google. Bryant wrote:
Thanks to the diligence and expertise of Intel security researchers, a vector was later discovered that could allow a possible escalation of privilege (EoP). With an updated CVSS 3.0 score of 8.8 (high), this discovery changed our approach to mitigating this issue for our customers and we pulled the update forward to align with disclosures already planned for November 2023.
While preparing the February 2024 Intel Platform Update bundle for customer validation, we received a report from a Google researcher for the same TDoS issue discovered internally. The researcher cited a Google 90 day disclosure policy and that they would go public on November 14, 2023.
Crisis (hopefully) averted
Google worked with industry partners to identify and test a successful mitigation so all users are protected from this risk in a timely manner. In particular, Google’s response team ensured a successful rollout of the mitigation to systems before it posed a risk to customers, mainly Google Cloud and ChromeOS customers.
Intel’s official bulletin lists two classes of affected products: those that were already fixed and those that are fixed using microcode updates released Tuesday. Specifically, these products have the new microcode update:
|Product Collection||Vertical Segment||CPU ID||Platform ID|
|10th Generation Intel Core Processor Family||Mobile||706E5||80|
|3rd Generation Intel Xeon Processor Scalable Family||Server||606A6||87|
|Intel Xeon D Processor||Server||606C1||10|
|11th Generation Intel Core Processor Family||Desktop
|11th Generation Intel Core Processor Family||Mobile
|Intel Server Processor||Server
An exhaustive list of affected CPUs is available here. As usual, the microcode updates will be available from device or motherboard manufacturers. While individuals aren’t likely to face any immediate threat from this vulnerability, they should check with the manufacturer for a fix.
People with expertise in x86 instruction and decoding should read Ormandy’s post in its entirety. For everyone else, the most important takeaway is this: “However, we simply don’t know if we can control the corruption precisely enough to achieve privilege escalation.” That means it’s not possible for people outside of Intel to know the true extent of the vulnerability severity. That said, anytime code running inside a virtual machine can crash the hypervisor the VM runs on, cloud providers like Google, Microsoft, Amazon, and others are going to immediately take notice.
In a separate post, Google officials wrote:
The impact of this vulnerability is demonstrated when exploited by an attacker in a multi-tenant virtualized environment, as the exploit on a guest machine causes the host machine to crash resulting in a Denial of Service to other guest machines running on the same host. Additionally, the vulnerability could potentially lead to information disclosure or privilege escalation.
The post said that Google worked with industry partners to identify and test successful mitigations that have been rolled out. It’s likely any potential crisis has now been averted, at least in the biggest cloud environments. Smaller cloud services may still have work to do.