The encoding of x86 and x86-64 instructions is well documented in Intel or AMD’s manuals. However, they are not quite easy for beginners to start with to learn encoding of the x86-64 instructions. In this post, I will give a list of useful manuals for understanding and studying the x86-64 instruction encoding, a brief introduction and an example to help you get started with the formats and encodings of the x86-64 instructions. For further more details, you may go on to read the reference documents listed using the technique shown in this post.
Table of Contents
Before you move on to following parts, let’s make clear of one especially confusing part for beginners about the assembly syntax when you read various documents: there are AT&T syntax and Intel syntax. In Intel documents, it is usually in Intel syntax. With GNU tool chains on Linux, the default syntax used is usually the AT&T one.
The most significant different between these 2 syntaxes is that AT&T and Intel syntax use the opposite order for source and destination operands. Intel syntax uses “dest, source” while the AT&T syntax uses “source, dest”. Note that instructions with more than one source operand, such as the enter instruction, do not have reversed order. For the representation of operands with the SIB or displacement, the formats are different. For example, the Intel syntax is
0xa(%rdi) in AT&T syntax. For a list of notable differences, please check AT&T Syntax versus Intel Syntax.
Here is a list of references and useful documents I will refer to in this post and you can further check later too to encode more instructions.
- x86-64 (and x86) ISA Reference from Intel and AMD’s x86-64 (and x86) ISA Reference are the authoritative document here. Especially,
- the Intel 64 and IA-32 Architectures Software Developer’s Manuals‘ “CHAPTER 2 INSTRUCTION FORMAT” is a good starting point. There are also references for each instruction.
- the Intel 64 and IA-32 Architectures Software Developer’s Manuals‘ “APPENDIX B INSTRUCTION FORMATS AND ENCODINGS” is a good reference.
- x86-64 Instruction Encoding is another very good page from OSDev as a quick reference.
To quickly find out the encoding of an instruction, you can use the GNU assembler as and the objdump tool together. For example, to find out the encoding of the instruction
addq 10(%rdi), %r8, you can do it as follows.
First, create a file add.s containing one line
addq 10(%rdi), %r8
Second, assemble the add.s to and object file by
$ as add.s -o add.o
Last, deassemble the object file by
objdump -d by
$ objdump -d add.o
It will print out
add.o: file format elf64-x86-64Disassembly of section .text:0000000000000000 <.text>: 0: 4c 03 47 0a add 0xa(%rdi),%r8
Memory Addressing in Siemens PLC Pr...
Memory Addressing in Siemens PLC Pr...
4c 03 47 0a is the 4-byte encoding of the
If you want to check the instructions in Intel syntax, you may do
Last, deassemble the object file by
objdump -d by
$ objdump -d --disassembler-options=intel-mnemonic add.o
You will get
add.o: file format elf64-x86-64Disassembly of section .text:0000000000000000 <.text>: 0: 4c 03 47 0a add r8,QWORD PTR [rdi+0xa]
The x86-64 instructions are encoded one by one as a variable number of bytes for each. Each instruction’s encoding consists of:
- an opcode
- a register and/or address mode specifier consisting of the ModR/M byte and sometimes the scale-index-base (SIB) byte (if required)
- a displacement and an immediate data field (if required)
Let’s take a look at the encoding of an instruction
add r8,QWORD PTR [rdi+0xa] (in Intel syntax) in the previous part. Let’s see how it is encoded to
4c 03 47 0a.
From the “add” instruction reference from “ADD”, “INSTRUCTION SET REFERENCE” in the ISA reference Volume 2A., find the line for the encoding of the
ADD r64, r/m64 corresponding to this instruction
Opcode Instruction Op/ 64-bit Compat/ Description En Mode Leg ModeREX.W+03/r ADD r64,r/m64 RM Valid N.E. Add r/m64 to r64.
and, from the REX description
In 64-bit mode, the instruction’s default operation size is 32 bits. … Using a REX prefix in the form of REX.W promotes operation to 64 bits.
So, we get
REX.W = 1
The ‘R’, ‘X’ and ‘B’ bits are related to the operand encoding (check “Table 2-4. REX Prefix Fields [BITS: 0100WRXB]” of the reference volume 2A).
REX.X bit modifies the SIB index field.
SIB is not used in this instruction. Hence,
REX.X = 0
Let’s further look at the encoding of the operands. From the “Instruction Operand Encoding” for the
Op/En Operand 1 Operand 2 Operand 3 Operand 4RM ModRM:reg(r,w) ModRM:r/m(r) NA NA
There will be 2 operand parts for the
RM encoding. The first part will be
ModRM:reg(r,w) and the second part will be
ModRM:r/m(r). “Figure 2-4. Memory Addressing Without an SIB Byte; REX.X Not Used” from Volume 2 shows the encoding for this case.
The REX.R and REX.B bits and the ModeRM byte will be decided accordingly. There are 3 parts in the ModRM byte: ‘mod’, ‘reg’ and ‘r/m’.
There is a table “Table 2-2. 32-Bit Addressing Forms with the ModR/M Byte” (it is for 32-bit operands. But from 18.104.22.168, “In 64-bit mode, these formats do not change. Bits needed to
define fields in the 64-bit context are provided by the addition of REX prefixes” and hence the same value can be used) in Volume 2 which shows mapping of the operands combinations to the bits values of ‘mod’.
Although the table applies to 64-bit modes too, it does not show the additional registers like
r8. Hence, we only use it to find out bits for ‘Mod’ only for the
addq instruction we are encoding it. As
0xa can be encoded in a byte, we can use
disp8 to keep the instruction encoding short. From the row of
[EDI]+disp8 (actually, all
disp8 ones share the same ‘Mod’ bits),
Mod = 01 (in bits)
For the encoding of the registers, I compiled a table for the general purpose 64-bit registers for your reference:
_.Reg Register----------------0.000 RAX0.001 RCX0.010 RDX0.011 RBX0.100 RSP0.101 RBP0.110 RSI0.111 RDI1.000 R81.001 R91.010 R101.011 R111.100 R121.101 R131.110 R141.111 R15
The ‘‘ in the ‘.Reg’ are usually a bit in the REX prefix, such as REX.B and REX.R, depending on specific instructions and operand combinations.
addq instruction in this case,
0.111. Hence, in bits, we get
reg = 000r/m = 111REX.B = 0 (from `rdi`)REX.R = 1 (from `r8`)
Now, let’s put them together.
By putting the ‘WRXB’ bits (
[BITS: 0100WRXB]) together, we get the REX prefix for this instruction is
Together with the
REX.W+03/r from the reference for the
ADD instruction, the opcode part, in hexadecimal, is
By putting the
r/m together, we get the ModRM byte (in bits)
01 000 111
which is, in hexadecimal,
Following the ModRM byte is the displacement is
10‘s hexadecimal representation) in one byte (
Putting all these together, we finally get the encoding of
4c 03 47 0a
In this example, to show the process, I have shown how to manually do an instruction’s encoding which is usually done by the assembler. You may use the same method to encode all other instruction by checking the reference documents for details of every instruction/operand combinations’ cases. Enjoy low level system programming!
- HTML Document Character Set and Encoding
- Installing the Flash Plugin for 64-bit Firefox in Linux x86-64
- x86-64 ISA / Assembly Programming References
- x86-64 calling convention by gcc
- Notes for Beginners of Software Development on Linux
- Vim Tutorial for Beginners: vimtutor
- How to install JRE for Chrome on Linux x86-64
- x86-64 instructions for floating-point comparisons
states that the current x86-64 design “contains 981 unique mnemonics and a total of 3,684 instruction variants” .What is basic x86 instruction set? ›
The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.What are the most common x86 assembly instructions? ›
PUSH and POP are the two most popular instructions when working with the stack. PUSH instruction is used to push a value onto the stack and the POP instruction is used to pop a value off the stack and store it into a register.Is x86 Assembly easy to learn? ›
I'd say learning assembly by starting with x86 is indeed a bit crazy. =) If you want easy training wheels, try 6502 or 6809. If you want more relevant skills, go to a recent ARM instruction set like v6 or v7. > I'd say learning assembly by starting with x86 is indeed a bit crazy.What is x86 assembly code format? ›
x86 assembly has the standard mathematical operations, add , sub , mul , with idiv ; the logical operators and , or , xor , neg ; bitshift arithmetic and logical, sal / sar , shl / shr ; rotate with and without carry, rcl / rcr , rol / ror , a complement of BCD arithmetic instructions, aaa , aad , daa and others.What is the memory size of x86-64? ›
The x86-64 architecture (as of 2016) allows 48 bits for virtual memory and, for any given processor, up to 52 bits for physical memory. These limits allow memory sizes of 256 TiB (256 × 10244 bytes) and 4 PiB (4 × 10245 bytes), respectively.What is x86-64-bit code? ›
x86-64 machine code is the native language of the processors in most desktop and laptop computers. x86-64 assembly language is a human-readable version of this machine code.What is the main challenge for decoding x86 instructions? ›
In contrast x86 has variable length instruction encoding with a 1 - 15 byte length. This makes decoding hard because the subsequent instruction cannot be decoded until the length of the current instruction is determined.What is the difference between x86 and x86-64 assembly? ›
A 32-bit processor on x86 architecture has 32-bit registers, while 64-bit processors have 64-bit registers. Thus, x64 allows the CPU to store more data and access it faster. The register width also determines the amount of memory a computer can utilize.How long is an x86-64 instruction set? ›
An x86-64 instruction may be at most 15 bytes in length. It consists of the following components in the given order, where the prefixes are at the least-significant (lowest) address in memory: Legacy prefixes (1-4 bytes, optional)
The x86 processors allow you to perform several activities at the same time from a single instruction. Also, they can perform numerous simultaneous tasks without any of them being affected. This makes them very sophisticated and advanced processors, allowing many complex calculations in a short time.What is the most popular instruction set? ›
- ARM architecture is the most widely used instruction set architecture and the instruction set architecture produced in the largest quantity.
- MIPS architecture is a 32 bit and 64 bit instruction set developed by MIPS Technologies and is often used in academic study.
It has four parts; label, mnemonic, operand, comment; not all are present in every line. The first part (LOOP in this example) is a label ; this is a word, invented by the programmer, which identifies this point in the program. It will be set equal to the value of the address where this instruction is stored.What are the four 4 addressing modes in x86 machine instruction code? ›
- Register Addressing. In this addressing mode, a register contains the operand. ...
- Immediate Addressing. An immediate operand has a constant value or an expression. ...
- Direct Memory Addressing. ...
- Direct-Offset Addressing. ...
- Indirect Memory Addressing. ...
- The MOV Instruction.
If you want to know what is really going on at every step of execution, learning assembly language is the best way to achieve that goal. Assembly knowledge also allows you to explore what your compiler does to convert high-level conditional statements and loops in C to machine code.Why is it so hard to learn assembly language? ›
Programming in assembly language is hard work; it's slow, tedious and needs a lot of concentration. You have no variables, just registers and memory locations. Throw away any aversion to using Goto because the JMP instruction (Goto's equivalent in assembly language) gets used quite a bit.How long does it take to learn assembler? ›
It takes 3 years of professional experience to become an assembler. That is the time it takes to learn specific assembler skills, but does not account for time spent in formal education. There are certain skills that you need to obtain in order to become a successful assembler.What is the most popular assembly language? ›
The most commonly used assembly languages include ARM, MIPS, and x86.How long is a word in x86-64? ›
In the x86 PC (Intel, AMD, etc.), although the architecture has long supported 32-bit and 64-bit registers, its native word size stems back to its 16-bit origins, and a "single" word is 16 bits.
The x86-64 architecture divides canonical addresses into two groups, low and high. Low canonical addresses range from 0x0000'0000'0000'0000 to 0x0000'7FFF'FFFF'FFFF. High canonical addresses range from 0xFFFF'8000'0000'0000 to 0xFFFF'FFFF'FFFF'FFFF.
Physical address space details
Current AMD64 processors support a physical address space of up to 248 bytes of RAM, or 256 TB. However, as of 2020, there were no known x86-64 motherboards that support 256 TB of RAM. The operating system may place additional limits on the amount of RAM that is usable or supported.
x86-64 is also known as AMD64, x64, and Intel 64. It was built by Advanced Micro Devices (AMD) as an extension to the 32-bit x86 architecture.What does x86-64 mean in Windows? ›
For a 32-bit version operating system, it will say X86-based PC. For a 64-bit version, you'll see X64-based PC.Why is it called x86-64? ›
The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors. Partly. For some advanced features, x86 may require license from Intel; x86-64 may require an additional license from AMD.What are the three basic computer instruction code format? ›
Computers have three formats for instruction code: memory reference, register and input/output.How do you decode instructions? ›
The CPU's control unit decodes the program instruction once it is loaded into the instruction register (IR). A Decode Phase is the second phase of the instruction cycle. The instruction is decoded by the CPU's control unit based on the operation code (OPCODE), determined by three bits (bit 12, 13, 14).What is the longest x86 instruction? ›
- The longest possible instruction on x86 is 15 bytes. ...
- Note that, while it's possible to construct a single instruction that should be longer than 15 bytes (by using a bunch of prefixes), the x86 instruction decoder frontend will choke on it. ...
- @duskwuff: Intel manual says "Up to four prefixes of 1 bytes each".
Click Start, type system in the search box, and then click System in the Control Panel list. The operating system is displayed as follows: For a 64-bit version operating system: 64-bit Operating System appears for the System type under System.What is the advantage of x64 or x86? ›
- Better software performance.
- Increased memory support.
- Better computer performance.
- Enhanced Security features.
- The pride of using the latest and the most advanced technology.
X86 processors use more registers and place a greater emphasis on performance and high throughputs. As a result, there is some excess heat production and electricity consumption. ARM devices are substantially more energy-efficient by design.
To find this, we have to find the number of bits that are used for the registers. Since there are 64 registers → 6 bits are needed (ceil(log264) = 6 bits) for each register. It implies, DR, SR1 and SR2 all require 18 bits in all. No of bits available for OPCODES = (32 – 18) = 14 bits.How many characters of data can a 64-bit computer process at a time? ›
Although a 64-bit system can handle up to 16 EB of data at one time, current hardware limitations typically place this amount of RAM out of reach of a single consumer system.What is the memory address size of x86? ›
Modern x86-compatible processors are capable of addressing up to 232 bytes of memory: memory addresses are 32-bits wide.Do people still use x86? ›
For a long time, the Intel-made x86 architecture and the AMD-made x64 have dominated consumer computing for years. For the most part, the vast majority of computers are x86 even today, despite the architecture being several decades old. However, a new competitor has begun to arise in recent years.What are the disadvantages of x86? ›
The disadvantages of x86 are increased power consumption and heat generation. Except for that these processors are also too complicated and intricate of the commands due to the long history of development.Why is x86 obsolete? ›
These processors ARE better than some high end gaming desktops from a few years ago, but there is one major drawback to them — which is why they're never used in laptops or desktops: App Compatibility. Just about every application has been developed solely for x86, with no room for ARM in mind.What is the simplest possible instruction set? ›
The most basic instruction types for a computer are data movements, logic/arithmetic operations and branching. For arithmetic operations, just an add/subtract is enough.What is the most efficient instruction set architecture? ›
RISC is an alternative to the Complex Instruction Set Computing (CISC) architecture and is often considered the most efficient CPU architecture technology available today.What are the four common instruction formats? ›
- Zero(0) Address Instruction format. The instruction format in which there is no address field is called zero address instruction. ...
- One(1) Address Instruction format. ...
- Two(2) Address Instruction format. ...
- Three(3) Address Instruction format.
Syntax of Assembly Language Statements
A basic instruction has two parts, the first one is the name of the instruction (or the mnemonic), which is to be executed, and the second are the operands or the parameters of the command.
- Fundamental of Assembly Language.
- •Label (optional)
- •Instruction mnemonic (required)
- • Operand(s) (usually required)
- • Comment (optional)
- Instruction Mnemonic.
Three kinds of operands are generally available to the instructions: register, memory, and immediate.What are the most important x86 instructions? ›
PUSH and POP are the two most popular instructions when working with the stack. PUSH instruction is used to push a value onto the stack and the POP instruction is used to pop a value off the stack and store it into a register.What are the 7 types of addressing mode? ›
- Implied Mode.
- Immediate Mode.
- Register Mode.
- Register Indirect Mode.
- Autodecrement Mode.
- Autoincrement Mode.
- Direct Address Mode.
- Indirect Address Mode.
Computers use addressing mode techniques for the purpose of accommodating one or both of the following provisions: To give programming versatility to the user by providing such facilities as pointers to memory, counters for loop control, indexing of data, and program relocation.Is x86 instruction set RISC or CISC? ›
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant.Does x86 use RISC or CISC? ›
RISC-V and ARM processors are based on RISC concepts in terms of computing architectures, while x86 processors from Intel and AMD employ CISC designs. A RISC architecture has simple instructions that can be executed in a single computer clock cycle.How does x86 read instruction pointer? ›
The x86 processor maintains an instruction pointer (EIP) register that is a 32-bit value indicating the location in memory where the current instruction starts. Normally, it increments to point to the next instruction in memory begins after execution an instruction.How are ARM instructions encoded? ›
Each ARM instruction is a single 32-bit word in that stream. The encoding of an ARM instruction is: Table 5.1 shows the major subdivisions of the ARM instruction set, determined by bits[31:25, 4]. Most ARM instructions can be conditional, with a condition determined by bits[31:28] of the instruction, the cond field.Does Windows use CISC or RISC? ›
Many OSs, including Windows and various *nix flavors, have been ported to both RISC and CISC processors. Windows (Windows 10 IOT) and (Windows RT) and (Windows Phone 7,8, and 10) run on RISC hardware also.
A 32-bit processor on x86 architecture has 32-bit registers, while 64-bit processors have 64-bit registers. Thus, x64 allows the CPU to store more data and access it faster. The register width also determines the amount of memory a computer can utilize.What devices uses RISC? ›
Due to the low power consumption of RISC CPUs, smartphones and tablets (except most Windows tablets) use RISC-based ARM chips almost exclusively. See Apple A series, Apple M series and RISC-V.What is the difference between ARM and Intel x86? ›
For example, ARM architectures (like ARMv8) tend not to have simplified cooling systems (no fans on a cell phone). However, x86 CPUs have tended to favor high-end processing speed over low power consumption.Who uses x86 architecture? ›
x86 is a term used to describe a CPU instruction set compatible with the Intel 8086 and its successors, including the Pentium and others made by Intel and other companies. This is the CPU architecture used in most desktop and laptop computers.What is the difference between stack pointer and base pointer in x86? ›
in two words: stack pointer allow push/pop operations to work (so push and pop knows where to put/get data). base pointer allows code to independently reference data that have been pushed previously on the stack.What is the pointer size in x86-64? ›
the x86-64 supports a 48-bit virtual address space (248 bytes of memory). Despite this, x86-64 pointers are 64 bits wide. In current processors, the upper 16 bits of each pointer must be zero, but future extensions to the architecture may make more bits accessible.How long is an x86-64 machine instruction? ›
An x86-64 instruction may be at most 15 bytes in length. It consists of the following components in the given order, where the prefixes are at the least-significant (lowest) address in memory: Legacy prefixes (1-4 bytes, optional)What are the three types of instruction encoding? ›
There are three different instruction formats: R-Type instructions, I-Type instructions, and J-Type instructions. R-Type instructions, or Register instructions are used for register based ALU operations. The two operands and the destination of the result are specified by locations in the register file.What language does ARM processor use? ›
ARM Cortex-A Series Programmer's Guide for ARMv7-A
Assembly language is a low-level programming language. There is in general, a one-to-one relationship between assembly language instructions (mnemonics) and the actual binary opcode executed by the core.
Arm processor features include the following: load/store architecture. integrated security. orthogonal instruction set.