- A register that is used as the implicit source and destination of
an operation (the register doesn't have to be specified separately).
has the best example in this document.
RISC processors use a load/store architecture instead - to add
memory to a register, it must be loaded into an intermediate
- Asynchronous Design
- A design which does not synchronize individual circuits using a
clock signal, as synchronous designs do.
Some other method (such as a "dummy circuit" which does nothing but
consume the same amount of time as the real circuit) is used to
generate a signal when the result is ready/valid, and the valid
signals can be used to start the next operation.
There is an asynchronous version of the
ARM architecture, and Sun is researching
Transport-triggered architecture with a project
- Branch Prediction
- The general method of keeping track of which path was taken by a
particular branch instruction, and following that path the next time
the same instruction is encountered. Generally a history table is
used to indicate how often a branch at a given address is taken or
- Branch Target Cache
- The practice of saving one or more instructions which are executed
immediately after a branch instruction, so that the next time the
branch is encountered, the instructions have already been loaded.
- You should know this term already. But if you don't, it refers to
a small amount of fast memory which holds recently accessed data or
instructions so that if they are used by the programs again, the cache
can supply them transparently faster than main memory.
Cache memory is typically organised into lines (several bytes are
loaded at once, on the assumption that nearby memory will beused
next). The lines are organised into sets, each set is mapped to a
separate group of memory addresses, and there are usually between two
and sixty-four lines per set (fewer lines per set are simpler, but
access to more addresses than cache lines in the same set can cause
data in the cache to be discarded before it can be used).
Smaller caches are faster, so often a small level 1 cache is used,
with a larger but slower level 2 cache supporting it. Level 3 caches
can even be used in some cases.
Some cache controllers monitor the memory bus to detect when a
cached memory value has been modified by another CPU, or a
- Digital Signal Processor, a CPU designed mainly for performing
simple, repetitious operations on a stream or buffer of data - for
example, decoding digital audio data from a CD.
Generally meant for embedded applications, leaving out features of
general purpose CPUs which aren't needed in a DSP application.
There is usually little or no interrupt support, or memory management
- Electrically Erasable Programmable ROM.
- The order in which a multi-byte binary number is stored in
byte-addressable memory. "Little-endian" means the least significant
byte (the "little end") is stored in the first (lowest) address,
"big-endian" means the most significant byte ("big end") has the
first position in memory.
A potential source of code and communications incompatibility,
but with no significant advantages to either, making the decision
arbitrary (except for compatibility requirements). The term comes
from an equally arbitrary disagreement in Liliputian society (from
Jonathan Swift's book "Gulliver's Travels") over which end to break
boiled eggs (the big or little end), a distinction which caused
civil wars. Swift was satirizing differences in the treatment of
Catholics in his own time - fortunately there's been no documented
case of CPU designers coming to blows over CPU endian-ness, despite
the heated discussions that once took place (but which later became
unfashionable after network endian
order was standardised in TCP/IP).
- Explicitly Parallel Instruction Computing
- The HP/Intel term for a form of VLIW
with Variable Length Instruction Groupings which uses fields in the
instruction stream or instructions themselves to group (specify
instruction dependencies), rather than using a fixed length
instruction word. Used in the
Two problems are usually identified with VLIW processors (like the
Phillips TriMedia). One is that if the instruction word can't be filled,
the rest of the entries need to be filled with NOP instructions, which
waste space. The other is that it prevents future versions which may be
able to execute more instructions in parallel, or lower cost versions
which execute fewer. EPIC solves this, but requires a small semantic
change that instructions within a group must be independent - that is,
act the same whether they were executed in order or parallel. By
contrast, in the MultiFlow TRACE systems a pair of instructions such as
"MOVE A, B" and "MOVE B, A" could be in the same word because they were
guaranteed to execute in parallel, with the result that values in A and
B would be swapped.
- Erasable Programmable ROM (erased by exposing the EPROM to
- Harvard Architecture
- Strictly speaking, refers to a CPU with separate program and data
spaces, (specifically the PIC
embedded processors), but it's often generally used to refer to
separate program and data busses (and usually caches too) for
improved speed, though the address spaces are actually shared.
Originally Harvard architecture computers were programmed using plug
boards or something similar, and data was in a writable storage
area. The von Neumann architecture introduced the idea of a stored
program in the same writable memory that data was stored in.
- Indirection Bit
- Some designs used one address bit as an indirection bit, meaning
that the value in memory is the address of the actual value. Other
designs used a separate addressing mode for indirect addressing.
- An actual programming language designed to be as evil as possible.
- Earlier CPUs were designed to execute instructions with the
circuitry directly decoding and executing program instructions.
Microcode was a way of simplifying CPU design by allowing simpler
hardware which executes simple microinstructions to interpret more
complex machine instructions, first used commercially in the mid and
low range IBM System/360.
Microcode is often slower and increases CPU size (compare
transistor count of microcoded
(68,000) with hardwired Zilog Z-8000
(17,500) - and the fact that the Z-8000) was both late and
Implementations generally use either 'horizontal' or 'vertical'
microcode, which differ mainly in number of bits. Microinstructions
include a condition code and jump address (jump if condition is true,
next instruction if false), and the operation to be performed.
In horizontal microcode, each operation bit triggers an individual
control line (simple CPU controller but large microcode storage), in
vertical microcode, the operation field is decoded to produce the
control signals (smaller microcode but more complex controller). Some
CPUs used a combination.
- The ability to share CPU resources among multiple
threads. 'Vertical' multithreading allows
a CPU to switch execution between threads without needing to save
thread state (generally using duplicated registers, and usually used
to continue execution with another thread when one thread hits a
delay due to a cache miss and must wait). 'Horizontal'
multithreading allows threads to share functional units without
halting the execution of a thread (an idle functional unit can be
assigned to any thread that needs it).
A simpler variation called a "barrel processor" cycles through
every clock cycle whether there is a delay or not, so when there are
enough thread "slots" to cover any expected execution delay, it
appears to the program that each instruction takes one cycle (in
addition, no hardware is required to check for data dependencies in
- Network order
- Big-endian, used in TCP/IP standards.
- Out Of Order Execution
- A superscalar CPU may issue instructions in an order different
than that in the program if state conflicts can be resolved (with
renaming for example). For example:
1: add r1,r2->r8
2: sub r8,r3->r3
3: add r4,r5->r8
4: sub r8,r6->r6
Instructions 1 and 3 can be executed in parallel if r8 is
renamed, and instructions 2 and 4 can then be
executed in parallel. Instruction 3 is executed before 2, out of the
order which they appear in the program.
- Predicated instructions
- Instructions which are executed only if conditions are true,
usually bits in a condition code register. This eliminates some
branches, and in a superscalar machine can allow both branches in
certain conditions to be executed in parallel,
and the incorrect one discarded with no branch penalty. Used in the
HP some PA-RISC instructions, and the
upcoming HP/Intel IA-64.
- Programmable ROM (not erasable).
- If you don't know what Random Access Memory is, why are you reading
this in the first place?
- Register Renaming
- A number of extra registers can be assigned to hold the data that
would normally be written to the destination register (in other words,
the extra register is renamed as far as that particular instruction is
concerned). One use for this is for
speculative execution of
branches - if the branch is eventually taken, then data in the rename
register can be written to the real register, if not then the data is
discarded. Another use is for out of order
registers can produce an 'image' of the processor state which an
instruction expects, while the actual processor state has already been
modified by another instruction (known as write conflicts).
The circutry required to keep track of renamed registers can be
- Resource Renaming
- A more general form of register renaming
where resources other than registers are renamed.
- Read Only RAM. It's really spelled ROR. Engineers know this, but
don't tell anybody so that they can laugh at everyone who says 'ROM'.
Really, this is the truth.
- Saturation Arithmetic
- When arithmetic operations produce values too large or too small for
registers, the largest or smallest value that can be represented is
- Properly, a section of memory of almost any size and at any
address, accessed through an identifier tag which includes protection
bits, particularly useful for object oriented programming. A good idea
which was missed by a painful margin with the
- Speculative Execution
- In a pipelined processor, branch instructions in the execute stage
affect the instruction fetch stage - there are two possible paths of
execution, and the correct one isn't known until the conditional
branch executes. If the CPU waits until the conditional branch executes,
the stages between fetch and execute become empty, leading to a delay
before execution can resume after a branch (the time taken for new
instructions to fill the pipeline again). The alternative is to
choose an execution path, and if that is the correct one, there is no
branch delay. But if it's the wrong one, any results from the
speculative execution have to either be discarded or undone.
- Stack Frame
- A segment of a stack which holds parameters, local variables,
previous stack frame pointer and return address, created when calling
a procedure, function (procedure which returns a value), or method
(function or procedure which can access private data in an object) in
most high level languages.
- Refers to a processor which executes more than one instruction
simultaneously, but more properly refers to the issuing of
instructions (the CDC 6600
issues one, but executes many simultaneously).
- Synchronous Design
- A design which ensures that when two circuits take different amounts
of time to perform a function, further operations will wait until a
voltage signal (which switches between on and off at a specified
frequency) changes. The changing signal is called the circuit's
clock, and changes at the speed of the slowest circuit, in order to
keep the faster circuits synchronized with it.
Designs which don't use a clock signal are called
- A thread is a stream or path of execution where the state is
entirely stored in the CPU, while a process includes extra state
information - mainly operating system support to protect processes
from unexpected and unwanted interferences (either from bugs or
intentional attack). Threads are sometimes called lightweight
- Transport Triggered Architecture
- Also called a Transfer Triggered Architecture, or Move Machine, a
TTA is a design where operations are triggered by moving data to the
functional units which operate on it, instead or moving data in
response to the CPU operations (an Operation Triggered
Architechture, or OPA).
For example, a TTA would have one unit for add, one for
subtract, one for load, and so on. A number would be loaded by
moving the address to the load unit, triggering it to load. The
result could be transferred to the add unit, and another number from
a register or another unit could be transferred, triggering the
unit to add them together.
TTAs are primarily experimental, with researchers into using the
very regular design properties for automated custom CPU designs. The
TI MSP430 implements the multiplier
as an on-chip peripheral, and Sun is researching high-speed
- Very Long Instruction Word (VLIW)
- An instruction which includes more than one operation, intended to
be executed concurrently - either a fixed number of operations per
instruction, or a variable number (Variable Length Instruction
Grouping or Explicitly Parallel Instruction Computing
- Virtual Machine
- A software emulation of a CPU, usually including an OS