The CPU microarchitecture is a great unknown, but one that should be taken into account when choosing the next microprocessor that you are going to acquire. The performance and other features of the chip you are going to buy depend on it.
Sometimes it is camouflaged with some parameters that may be more a question of marketing than anything else, such as the size of the cache, the clock frequency, or the numerous proprietary brands to designate a technology that perhaps has competition, but is not known. has bothered to patent a brand to designate it.
ISA vs microarchitecture: differences
It is important not to confuse a CPU microarchitecture and an architecture or ISA (Instruction Set Architecture), nor with the system itself or macroarchitecture. Although they may seem similar terms, the truth is that there are differences between them:
- Macro-architecture or system design: it represents the entire design of the complete system that makes up the computer in question, that is, not only the CPU but the rest of the basic hardware, such as the bus, the memory controller, the main memory, memory units, etc. extra processing that works together with the CPU, controllers, DMA system, I/O (Input and Output for peripherals), etc. That is, most of the elements found on the motherboard.
- ISA: refers to the instructions that the CPU can understand and execute, in addition to also defining the word size, number of available registers, memory addressing modes, and the format of the data that can be handled. This is closely related to ASM or assembly language since the mnemonics used for the instructions are also used for this lower-level language (eg: ADD, SUB, MUL, ).
- Microarchitecture or organization of the computer: it is nothing more than an implementation of an ISA. That is the representation of the data paths, the execution units, the register banks, buses, etc. This microarchitecture is passed to a logical design and then to a physical/electronic one, that is, the integrated circuit of the CPU.
There is also what is known as UISA or μISA, that is, a Microcode Instruction Set Architecture, although this is something we will not go into.
What is a CPU microarchitecture?
The CPU microarchitecture, µarch, is an implementation of ISA itself. It will depend on how the instructions are executed and what instructions or data types are accepted. This implementation covers the different parts of the CPU and the technologies or paradigms that it can use. For example:
- Microcode and Control Unit Type – Some older or simpler microprocessors used a faster and more efficient hardwired unit, however, most of today’s microprocessors use more complex and programmable units. The number of instructions that the CPU will be able to interpret will depend on this unit, that is if it only accepts some or the complete ISA (with or without extensions). As you know, this microcode can be updated.
- Cache size, type, and hierarchy: This is another of the technical issues that are altered within a microarchitecture, and is important for speeding up memory accesses and trying to minimize latency impact with main memory or system E/S. For example, the design takes into account:
- What will be the LLC or the last level of the cache and, therefore, the number of levels it will have?
- TLB and how to act in case of a cache miss.
- Special cache levels, such as the trace cache.
- Types of memory used.
- Split (L/R) or unified.
- Clock frequency at which it will run (same as CPU core or different)
- Pipeline: Most modern CPUs are pipelined, that is, they have been divided into several stages in which each output is the input of the subsequent stage. This way you can split up the process and increase the number of instructions in parallel. Everything will depend on the depth or levels of the pipeline. For example:
- CPU without pipeline: an instruction enters and the next one will not be able to enter until the previous one is completely finished.
- CPU with 4 stages: If those stages are fetching, deco, exe, and write, the first instruction would go into the fetch or seek loop. When this stage is finished, it will go to decoding so that it can be interpreted, and then to the corresponding execution unit to obtain the result, and finally, it is written in memory. When the first instruction has passed to deco, a new one can enter in fetch, and so on. This will mean that after a few clock cycles, the CPU will have filled its channel or stages and will be processing several instructions at the same time.
- Number and type of execution units: you can choose the number of functional units that will execute the instructions, such as ALUs, FPUs, AGUs, MULs, MACs, etc., some of them directed to integer data and others to floating point data. This is what differentiates a scalar processor from a superscalar one (with multiple units of the same type to speed up execution).
- Speculative execution: Many modern microprocessors also make use of a technique that allows pre-branch instructions to be executed before knowing the result of said code branch. In this way, if they are correct, they will already have the result obtained. If not, they should flush the pipeline and process the correct one. In order to have the highest number of hits possible and minimize the penalty for failure, prediction units are used that can be of many different types, and even use neural networks.
- Execution Order – Some CPUs have in-order execution and are simpler, but with lower performance. Most high-performance CPUs use the OoOE (Out of Order Execution) paradigm. This implies that the instructions that have their data already ready to be processed can be processed, even if they are not in the same sequential order of the program. Then they are reordered and you get the same result as in one in order. What is achieved in this way is to avoid idle times when the data of instruction is not ready and you have to wait.
- Register popularity: it is another technique widely used in high-performance microprocessors in which there is an abstraction of logical and physical registers. Each logical record has an associated set of physical records. When a logical register is referenced in the executing binary code, the CPU transposes it to a physical one on the mark. The physical records are opaque to the programmer, but thanks to this technique some data dependencies that arise can be eliminated.
- Multiprocessing and multithreading: Of course, when designing the microarchitecture one thinks of augmenting parallel processing with additional methods:
- You can think of a singlecore or a multicore (and even manycore), varying the number of cores that work simultaneously, from one to several, tens or thousands of them.
- Or in an MP system, mount several microprocessors on the same motherboard and help each other by communicating through meshes such as HT, Infinity Fabric, Intel Mesh, etc.
- And even parallelism at the thread level, with implementations such as SMT (Simultaneous MultiThreading). While each core can work with one process, when SMT is implemented, each core can handle up to two threads simultaneously, switching from one context to another to speed up execution. That is, it is as if a physical core is divided into several logical cores.
All this is what really makes a CPU perform more or less, and be better in certain areas or others. Other technical characteristics such as clock frequency, etc., are less relevant.
Compatibility in microarchitectures and implementation
As I mentioned, the CPU microarchitecture is an implementation of an ISA, so there could be several different implementations for the same ISA. What’s more, for the ISA AMD64 or EM64T or x86-64, whatever you want to call it, there are many different implementations or microarchitectures from both the AMD company and the Intel company, and even others like Zhaoxin (VIA Technologies), Hygon Dhyana (Zen1 licensed by AMD), etc. And not only that but within the same company, there can be generations and generations of microarchitectures that are being developed to obtain better benefits.
For example, if we look at AMD and the AMD64 ISA, we find microarchitectures like:
- K8/Hammer (Turion64, Sempron, Athlon64, Opteron)
- K10 (Phenom, Athlon, Sempron, A-Series,…)
- Bulldozer (Opteron, FX-Series,…)
- Zen (Ryzen, Threadripper, EPYC…)
- Zen 2
- Zen 3…
Of course, there are also other microarchitectures that are not compatible with this ISA but belong to different ISAs. For example, a chip based on the ARM (A64) ISA, such as one with the Avalanche+Blizzard microarchitecture (Apple A15 Bionic), would not be able to run binaries compiled for an x86 or any ISA other than its own. For this reason, running software for another ISA on a computer whose ISA is not the same is impossible, except by using emulators like QEMU, Rosetta 2, etc.
Not even a microprocessor with a microarchitecture belonging to the same ISA guarantees that it is compatible with the entire set of instructions (it may be designed to execute only one subset), or with the extensions it may have. For example, not all chips support the AVX512 instruction extension. This leads us to think that not even software compiled for the same architecture can work the same on all CPUs. For example, a binary compiled to run on a Raspberry Pi (ARM) could not run on an Apple M1 (ARM), or vice versa.
When software is compiled, it is usually done in the most generic way possible, without optimizations for a specific microarchitecture. This ensures that it works on as many processors as possible. For example, even if an AMD chip doesn’t have certain instructions or extensions, it can still run software just like an Intel, only more likely it has to run more instructions. However, if it is compiled in a way that is optimized for a particular microarchitecture, it will greatly improve performance by using all possible instructions.
The previous section may also raise the question of whether these implementations are free or are under some kind of control. It is important to note that an ISA is registered and protected by a license, just like software. Therefore, anyone cannot create an implementation (CPU microarchitecture) for an ISA without further ado, since it would be breaking the law. However, there are some ISAs under open licenses, which allow them to be used without such restrictions.
Let’s see some practical examples :
- Gated licenses: They are protected by a proprietary license and cannot be legally deployed by others.
- IA-32 (x86-32): It was created by Intel, and licensed by others such as AMD, Winchip, Nextgen, IDT, VIA, etc. However, there were a large number of litigation lawsuits at the time.
- AMD64 (x86-64): This time it was AMD who took the lead and designed this new ISA, a 64-bit extension of the old one. Intel calls it the EM64T, not to be confused with Intel’s IA-64, which is a 64-bit architecture intended for HPC with its Itanium microprocessors. Intel can use it by a cross-licensing agreement obtained between both companies. In addition, others like Hygon Dhyana have also obtained licenses from AMD (only for Zen 1) for the Chinese market.
- Flexible licenses: the licenses that I call flexible are those that allow both the ISA itself to be licensed and IP cores to be licensed to its clients. They do it under various payment or subscription methods, similar to other services.
- SPARC: was initially created by Sun Microsystems, a company that no longer exists, and which was absorbed years ago by Oracle. Now, this is the one that has the rights of the ISA, and has a system similar to that of ARM, allowing others to create their own implementations.
- OpenPOWER: similar to the previous one. IBM continues to build its POWERS for HPC but allows others to come up with their own implementations.
- ARM: is one of the most relevant cases today, allowing various types of licensing that can range from €0 to several million euros. For example, it allows you to use the ISA for academic or research purposes, sell IP cores of your Cortex-A, Cortex M, Cortex R, or Neoverse cores, as well as a license called BoC to allow third parties to obtain IP cores and make modifications to them. its microarchitecture, and even license only the ISA. Practical examples:
- Kryo cores for Qualcomm Snapdragons use a BoC type.
- Apple Silicon simply pays for the ISA license and creates the microarchitectures itself.
- Others buy Neoverse IP cores as is the case of the European processor of the EPI project.
- Open licenses: they do allow the creation of microarchitectures, following a model like open source in software. For example:
- RISC-V: is an open ISA under a BSD license (permissive), which allows anyone to create microarchitectures from it, without paying anything, and these microarchitectures can also be open or closed.
- MIPS: very similar to RISC-V, but especially aimed at the IoT sector.
Not only is the ISA proprietary, but also the CPU implementation or microarchitecture, as it is a proprietary design. However, in these cases, there are also some open ones that are distributed as soft-core, created in languages like Verilog, VHDL, Chisel, etc.
With this, we finish our article on CPU microarchitectures. Did you find it interesting? Are you interested in us writing about something in particular? We hear you!
Also, read about Chiplet: What Is It And What Is It For?
Abram left his e-business studies to devote himself to his entrepreneurial projects. In 2017, he created the company Inbound Media and wrote articles about high-tech products for his Chromebookeur site. In 2019, Chromebookeur was renamed Macbound and became a general purchasing advice site. Today, Abram manages the development and growth of Macbound, surrounded by a young and talented team.