What Are ARM CPUs, and Are They Going To Replace x86 (Intel)?

What Are ARM CPUs, and Are They Going To Replace x86 (Intel)?

Can x86 cpu read or write on physical address which is larger than RAM?

The physical address space contains RAM, ROM, memory mapped devices (some PCI and some built into the chipset) and unused space.

An OS can access all of it, including unused space (even though there's no sane reason to deliberately access unused space).

The total amount of physical address space depends on the CPU, and is a "size in bits" (which you can obtain from the CPUID instruction) that ranges from 32 bits to 52 bits, but is often in the 36 to 48 bits range. If you try to use paging to access a "too high, not supported by the CPU" physical address you will get a General Protection Exception (because the "not supported by CPU physical address bits" are treated as reserved and the CPU checks if reserved bits are set in page table entries, etc).

Note that when writing an OS (for modern CPUs) it's easier to assume that physical addresses are 64 bits (regardless of what the CPU supports) and that the physical address space includes a reserved area that can't be accessed (where the size of the reserved area depends on what the CPU supports); as this simplifies code and data structures used for physical memory management (e.g. C has a uint64_t type but nothing has a uint52_t ).

I'm doing operating system lab on QEMU. I found that read/write is allowed when accessing physical address after paging which is larger than RAM. Is it the same condition on a real x86 machine?

Yes; both Qemu and real hardware work the same.

Will x32 or x64 cause different results?

The CPU supports several types of paging structures - "plain 32-bit paging", PSE36, PAE (Physical Address Extensions), and long mode. For x32 you can't use long mode paging, but PAE normally has the same layout and the same physical address restrictions (the only case where it doesn't is some Xeon Phi accelerator cards).

If x32 is using "plain 32-bit paging" physical addresses will be restricted to 32 bits; and if it's using PSE36 physical addresses will be restricted to 36 bits.

The other possibility is that x32 isn't using any paging at all. In this case addresses are masked so that only 32 bits can be used (e.g. if you create a segment with a base address of 0xFFFFF000 and "high enough" limit; then use an offset within the segment that's 0x00001000 or more, the result will be masked causing physical addresses to wrap around; like (0xFFFFF000 + 0x00001234) & 0xFFFFFFFF = 0x00000234 ).

Apart from that, it still works the same (you can still accessed unused parts of the physical address space, there's just less of it, and you might not be able to access all RAM).

Changing the CPU Performance Mode (x86 Server)

Enter keywords to search.

Rate and give feedback:

Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).

What Are ARM CPUs, and Are They Going To Replace x86 (Intel)?

Everyone is going ARM these days—Amazon and Apple are both shipping in-house CPUs with crazy performance increases, with Microsoft rumored to be developing their own. ARM has historically been used for low power mobile chips, so why exactly is ARM crushing x86 on the desktop and server space?

Everyone Is Going ARM These Days

The processor world is a complex industry, and only a few designs from a few companies are able to compete on the high end of performance. Usually, Intel or AMD holds the crown of performance, with both of them manufacturing x86 CPUs. But recently, CPUs from Apple and Amazon based on ARM have been giving Intel (and the x86 architecture) a run for their money.

Amazon has their Graviton2 CPU, which isn’t faster than Intel’s server counterparts, but is more cost effective and uses less power. With how much of an improvement it was over Graviton1, their next iteration will likely be fierce competition in the server space.

Apple hit it out of the park with their first non-mobile CPU, the Apple Silicon M1 processor, which runs faster than desktop Intel CPUs and nearly as fast as AMD’s Ryzen 5000 series, the current performance crown. It’s custom silicon which makes Apple Macbooks the current fastest laptops in the world, much to the chagrin of PC enthusiasts (myself included).

In fact, they’re so far ahead in the laptop space that Windows on the M1 Macbook runs faster than the Surface Pro X, despite Windows on ARM only running through an emulator. And as if that wasn’t humiliating enough, it absolutely crushes it with a Geekbench Single-Core score of 1,390 compared the Surface’s 802, which is laughably bad in comparison. The M1 is seriously fast.

Advertisement

Microsoft is also rumored to be developing their own in-house ARM processor, and though that rumor comes from the Azure server space, they’d likely use the same chip for the Surface if they can match Apple’s performance.

What’s The Difference Between ARM and x86?

At the end of the day, there isn’t too much of a difference between ARM and x86. You can still run Google Chrome and watch YouTube on either one. In fact, you may be doing so right now, as nearly all Androids and every iPhone uses an ARM-based processor.

The biggest difference for most people is that older applications meant for x86 will need to be recompiled to run on ARM as well. For some things this is easy, but not everything will be supported, especially legacy software. However, even that can usually run through x86 emulation, which Windows is starting to support.

For developers, there are a lot of differences in how applications get compiled, but these days, most compilers do a good job of supporting the major instruction sets, and you won’t really have to make many changes to get it compiling for multiple platforms.

But How is ARM Running Faster?

To answer this question, we’ll have to delve deeper into how CPUs work under the hood.

ARM and x86 are both instruction sets, also known as architectures, which basically are a list of micro-code “programs” that the CPU supports. This is why you don’t need to worry about running a Windows app on a specific AMD or Intel CPU; they’re both x86 CPUs, and while the exact designs are different (and perform differently), they both support the same instructions. This means any program compiled for x86 will, in general, support both CPUs.

Advertisement

CPUs basically execute operations sequentially, like a machine given a list of tasks to do. Each instruction is known as an opcode, and architectures like x86 have a lot of opcodes, especially considering they’ve been around for decades. Because of this complexity, x86 is known as a “Complex Instruction Set,” or CISC.

CISC architectures generally take the design approach of packing a lot of stuff into a single instruction. For example, an instruction for multiplication may move data from a memory bank to a register, then perform the steps for the multiplication, and shuffle the results around in memory. All in one instruction.

Under the hood though, this instruction gets unpacked into many “micro-ops,” which the CPU executes. The benefit of CISC is memory usage, and since back in the day it was at a premium, CISC used to be better.

However, that’s not the bottleneck anymore, and this is where RISC comes into play. RISC, or Reduced Instruction Set, basically does away with complex multi-part instructions. Each instruction mostly can execute in a single clock cycle, though many long operations will need to wait on results from other areas of the CPU or memory.

While this seems like going backwards, it has huge implications for CPU design. CPUs need to load all their instructions from RAM and execute them as fast as possible. It turns out it’s far easier to do that when you have many simple instructions versus a lot of complex ones. The CPU runs faster when the instruction buffer can be filled up, and that’s a lot easier to do when the instructions are smaller and easier to process.

RISC also has the benefit of something called Out-of-Order execution, or OoOE. Essentially, the CPU has a unit inside of it that reorders and optimizes instructions coming into it. For example, if an application needs to calculate two things, but they don’t depend on each other, the CPU can execute both in parallel. Usually, parallel code is very complicated for developers to write, but at the lowest levels of the CPU, it can make use of multi-tasking to speed things up. The Apple M1 chip uses OoOE to great effect.

Advertisement

If you’re interested in the inner workings, you should read this fantastic write-up by Erik Engheim on what makes the Apple M1 chip so fast. In short, it makes heavy usage of specialized silicon, Out-of-order execution, and having way more instruction decoders to support its speed.

Is ARM Going To Replace x86?

The honest answer is, probably. Intel has been feeling the end of Moore’s law for years now, and while AMD has been able to make performance leaps in recent years, they’re not far ahead.

This isn’t to say that x86 will die off anytime soon, but it’s clear that ARM has more potential than just being a mobile architecture—a stigma which is no longer valid given the current direction of the industry. The benefits of RISC architectures are clear, and with how much the Apple M1 chip has already improved, the future of the industry looks promising.

Plus, ARM isn’t the only RISC architecture out there. It’s still proprietary, though ARM licenses its designs to third-party designers, like Qualcomm, Samsung, and Apple. RISC-V is open source, and is similarly promising. It’s a standardized instruction set architecture, leaving the exact implementations up to the manufacturer. If the industry does move towards RISC in general, there will be open and closed source implementations available.

Leave a Replay