РефератыИностранный языкFlFloating Point Coprocessors Essay Research Paper Floating

Floating Point Coprocessors Essay Research Paper Floating

Floating Point Coprocessors Essay, Research Paper


Floating Point Coprocessors


The designer of any microprocessor would like to extend its instruction


set almost infinitely but is limited by the quantity of silicon available (not


to mention the problems of testability and complexity). Consequently, a real


microprocessor represents a compromise between what is desirable and what is


acceptable to the majority of the chip’s users. For example, the 68020


microprocessor is not optimized for calculations that require a large volume of


scientific (i.e. floating point) calculations. One method to significantly


enhance the performance of such a microprocessor is to add a coprocessor. To


increase the power of a microprocessor, it does not suffice to add a few more


instructions to the instruction set, but it involves adding an auxiliary


processor that works in parallel to the MPU (Micro Processing Unit). A system


involving concurrently operating processors can be very complex, since there


need to be dedicated communication paths between the processors, as well as


software to divide the tasks among them. A practical multiprocessing system


should be as simple as possible and require a minimum overhead in terms of both


hardware and software. There are various techniques of arranging a coprocessor


alongside a microprocessor. One technique is to provide the coprocessor with an


instruction interpreter and program counter. Each instruction fetched from


memory is examined by both the MPU and the coprocessor. If it is a MPU


instruction, the MPU executes it; otherwise the coprocessor executes it. It can


be seen that this solution is feasible, but by no means simple, as it would be


difficult to keep the MPU and coprocessor in step. Another technique is to equip


the microprocessor with a special bus to communicate with the external


coprocessor. Whenever the microprocessor encounters an operation that requires


the intervention of the coprocessor, the special bus provides a dedicated high-


speed communication between the MPU and the coprocessor. Once again, this


solution is not simple. There are more methods of connecting two (or more)


concurrently operating processors, which will be covered in more detail during


the specific discussions of the Intel and Motorola floating point coprocessors.


Motorola Floating Point Coprocessor (FPC) 68882


The designers of the 68000-family coprocessors decided to implement


coprocessors that could work with existing and future generations of


microprocessors with minimal hardware and software overhead. The actual approach


taken by the Motorola engineers was to tightly couple the coprocessor to the


host microprocessor and to treat the coprocessor as a memory-mapped peripheral


lying inside the CPU address space. In effect, the MPU fetches instructions from


memory, and, if an instruction is a coprocessor instruction, the MPU passes it


to the coprocessor by means of the MPU’s asynchronous data transfer bus. By


adopting this approach, the coprocessor does not have to fetch or interpret


instructions itself. Thus if the coprocessor requires data from memory, the MPU


must fetch it. There are advantages and disadvantages to this design. Most


notably, the coprocessor does not have to deal with, for example, bus errors, as


all fetching is performed by the host MPU. On the other hand, the FPC can not


act as a bus master (making it a non-DMA device), making memory accesses by the


FPC slower than if it were directly connected to the address and data bus.


In order for the coprocessor to work as a memory mapped device, the


designers of the 68000 series of MPU’s had to set aside certain bit patterns to


represent opcodes for the FPC. In the case of the 68000’s, the FPC is accessed


through the opcode 1111(2). This number is the same as ?F’ in hexadecimal


notation, so this bit pattern is often referred to as the F-line.


Interface


The 68882 FPC employs an entirely conventional asynchronous bus


interface like all 68000 class devices, and absolutely no new signals whatsoever


are required to connect the unit to an MC 68020 MPU. The 68882 can be configured


to run under a variety of different circumstances, including various sized data


buses and clock speeds. What follows is a diagram of connections necessary to


connect the 68882 to a 68020 or 68030 MPU using a 32-bit data path.


As mentioned previously, all instructions for the FPC are of the F-line


format, that is, they begin with the bit pattern 1111(2). A generic coprocessor


instruction has the following format: the first four bits must be 1111. This


identifies the instruction as being for the coprocessor. The next three bits


identify the coprocessor type, followed by three bits representing the


instruction type. The meaning of the remaining bits varies depending on the


specific instruction.


Coprocessor Operation


When the MPU detects an F-line instruction, it writes the instruction


into the coprocessors memory mapped command register in CPU space. Having sent a


command to the coprocessor, the host processor reads the reply from the


coprocessor’s response register. The response could, for example, instruct the


processor to fetch data from memory. Once the host processor has complied with


the demands from the coprocessor, it is free to continue with instruction


processing, that is, both the processor and coprocessor act concurrently. This


is why system speed can be dramatically improved upon installation of a


coprocessor.


MC 68882 Specifics


The MC 68882 floating point coprocessor is basically a very simple


device, though it’s data manual is nearly as thick as that of the MC 68000. This


complexity is due to the IEEE floating point arithmetic standards rather than


the nature of the FPC. The 68882 contains eight 80-bit floating point data


registers, FP0 to FP7, one 32-bit control register, FPCR, and one 32-bit status


register, FPSR. Because the FPC is memory mapped in CPU space, these registers


are directly accessible to the programmer within the register space of the host


MPU. In addition to the standard byte, word and longword operations, the FPC


supports four new operand sizes: single precision real (.S), double precision


real (.D), extended precision real (.X) and packed decimal string (.P). All on-


chip calculations take place in extended precision format and all floating point


registers hold extended precision values. The single real and double real


formats are used to input and output operands. All three real floating point


formats comply with the corresponding IEEE floating point number standards. The


FPC has built in functions to convert between the various data formats added by


the unit, for example a register move with specified operand type (.P, .B, etc).


The 68882 FPC has a significant instruction set designed to satisfy many


number-crunching situations. All instructions native to the FPC start with the


bit pattern 1111(2) to show that the instruction deals with floating point


numbers. Some instructions supported by the FPC include FCOSH, FETOX, FLOG2,


FTENTOX, FADD, FMUL and FSQRT. There are many more instructions available, but


this excerpt demonstrates the versatility of the 68882 unit.


One of the registers within the FPC is the status register. It is very


similar in function to the status register in a CPU; it is updated to show the


outcome of the most recently executed instruction. Flags within the status


register of the FPC include divide by zero, infinity, zero, overflow, underflow


and not a number. Some of the conditions signaled by the status register of the


FPC (for example divide by zero) require an exception routine to be executed, so


that the user is informed of the situation. These exceptions are stored and


executed within the host MPU, which means that the FPC can be used to control


loops and tests within user programs ? further extending the functionality of


the coprocessor.


Intel Math Coprocessor 80387 DX


In many respects, the Intel 80387 math coprocessor (MCP) is very similar


to the MC 68882. Both designs were influenced by such factors as cost, usability


and performance. There are, however, subtle differences in the designs of the


two units.


Firstly, I shall discuss the similarities between the designs followed


by differences. Like the 68882, the 80387 requires no additional hardware to be


connected to a 80386. It is a non-DMA device, having no direct access to the


address bus of the motherboard. All memory and I/O is handled by the CPU, which


upon detection of a MCP instruction passes it along to the MCP. If additional


memory reads are necessary to load operands or data, the MCP instructs the CPU


to perform these actions. This design, although reducing MCP performance when


compared to a direct connection to the address bus, significantly decreases


complexity of the MCP as no separate address decoding or error handling logic is


necessary. The connection between the CPU and the MCP instruction is via a


synchronous bus, while internal operation of the MCP can run asynchronously


(higher clockspeed). Moreover, the three functional units of the MCP can work in


parallel to increase system performance. The CPU can be transferring commands


and data to the MCP bus control logic while the MCP floating unit is executing


the current instruction. Similar to the 68882, the 80387 has a bit pattern


(11011(2)) reserved to identify instructions intended for it. Also, the


registers of the MCP are memory mapped into CPU address space, making the


internal registers of the MCP available to programmers.


Internally, the 80387 contains three distinct units: the bus control


logic (BCL), the data interface and control unit and the actual floating point


unit. The data interface and control unit directs the data to the instruction


decoder. The instruction decoder decodes the ESC instructions sent to it by the


CPU and generates controls that direct the data flow in the instruction buffer.


It also triggers the microinstruction sequencer that controls execution of each


instruction. If the ESC instruction is FINIT, FCLEX, FSTSW, FSTSW AX, or FSTCW,


the control unit executes it independently of the FPU and the sequencer. The


data interface and control unit is the unit that generates the BUSY?, PEREQ and


ERROR? signals that synchronize Intel 387 DX MCP activities with the Intel 80386


DX CPU. It also supports the FPU in all operations that it cannot perform alone


(e.g. exceptions handling, transcendental operations, etc.).


The FPU executes all instructions that involve the register stack,


including arithmetic, logical, transcendental, constant, and data transfer


instructions. The data path in the FPU is 84 bits wide (68 significant bits, 15


exponent bits, and a sign bit) which allows internal operand transfers to be


performed at very high speeds.


Interface


The MCP is connected to the MPU via a synchronous connection, while the


numeric core can operate at a different clock speed, making it asynchronous. The


following diagram will clarify this.


The following diagram shows the specific connections necessary between


the 80386 MPU and the 80387 MCP.


A typical coprocessor instruction must begin with the bit pattern


11011(2) to identify the instruction for the coprocessor. The bus control logic


of the MCP (BCL) communicates solely with the CPU using I/O bus cycles. The BCL


appears to the CPU as a special peripheral device. It is special in one


important respect: the CPU uses reserved I/O addresses to communicate with the


BCL. The BCL does not communicate directly with memory. The CPU performs all


memory access, transferring input operands from memory to the MCP and


transferring outputs from the MCP to memory.


Coprocessor Operation


When the CPU detects the arrival of a coprocessor instruction, it writes the


instruction into the coprocessors memory mapped command register in CPU space.


Having sent a command to the coprocessor, the host processor reads the reply


from the coprocessor’s signals. The response could, for example, instruct the


processor to fetch data from memory. Once the host processor has complied with


the demands from the coprocessor, it is free to continue with instruction


processing, that is, both the processor and coprocessor act concurrently. This


is why system speed can be dramatically improved upon installation of a


coprocessor.


80387 Specifics


Just like the MC 68882 floating point coprocessor, the Intel 80387 is basically


a very simple device. Like any reasonable math coprocessor, it conforms to the


IEEE standards of floating point number representations. The 80387 contains


eight 82-bit floating point data registers (including a 2-bit tag field), R0 to


R7, one 16-bit control register, one 16-bit status register and a tag word (that


contains the tag fields for the eight data registers). The MCP also indirectly


uses the 48-bit instruction and data pointer registers of the 80386 host


processor, even though these are external to the unit. Because the FPC is memory


mapped in CPU space, these registers are directly accessible to the programmer


within the register space of the host MPU. In addition to the standard word,


short and long (16, 32 and 64-bit) integer operations, the MCP supports four new


operand sizes: single precision real, double precision real, extended precision


real and packed binary coded decimal strings. All on-chip calculations take


place in extended precision format and all floating point registers hold


extended precision values. The single real and double real formats are used to


input and output operands. All three real floating point formats comply with the


corresponding IEEE floating point number standards. The MCP has built in


functions to convert between the various data formats added by the unit.


The 80387 has a significant instruction set designed to satisfy many


number-crunching situations. All instructions native to the MCP start with the


bit pattern 11011(2) to show that the instruction should be directed to the


coprocessor. Some (of the over 70) instructions supported by the MCP are FCOMP,


FDIV, FSQRT, FSINCOS, FINIT. There are many more instructions available, but


this excerpt demonstrates the versatility of the 80387 unit, which is very


similar to that of the 68882 unit.


One of the registers within the MCP is the status register. Just like


for the 68882, the status register shows the outcome of the most recently


executed instruction. Flags within the status register of the FPC include divide


by zero, infinity, zero, overflow, underflow and invalid operation. Some of the


conditions signaled by the status register of the FPC (for example divide by


zero) require an exception routine to be executed by the host MPU, so that the


user is informed of the situation. These exceptions are stored and executed


within the host MPU, which means that the MCP can again be used to control loops


and tests within user programs ? further extending the functionality of the


coprocessor. The Intel 80387 DX MCP register set can be accessed either as a


stack, with instructions operating on the top one or two stack elements, or as a


fixed register set, with instructions operating on explicitly designated


registers. The TOP field in the status word identifies the current top-of-stack


register. A “push” operation decrements TOP by one and loads a value into the


new TOP register. A “pop” operation stores the value from the current top


register and then increments TOP by one. Like the 80386 DX microprocessor stacks


in memory, the MCP register stack grows “down” toward lower-addressed


registers. Instructions may address the data registers either implicitly or


explicitly. The explicit register addressing is also relative to TOP. A notable


feature of the 80387 is the addition of a tag field of 2 bits to each of the


eight floating point registers. The tag word marks the content of each numeric


data register, as Figure 2.1 shows. Each two-bit tag represents one of the eight


numeric registers. The principal function of the tag word is to optimize the


MCP’s performance and stack handling by making it possible to distinguish


between empty and nonempty register locations. It also enables exception


handlers to check the contents of a stack location without the need to perform


complex decoding of the actual data.


Evaluation of the two Coprocessor


I started this paper thinking that the Motorola math coprocessor had to


be better in design, implementation and features than its Intel counterpart.


Throughout my research I came to realize that my opinions were based on nothing


but myths. In many respects the two coprocessors are very similar to each other,


while in other respects the coprocessors differ radically in design and


implementation. I will sum up the points I consider most important.


1. Intel uses a synchronous bus between the CPU and the MCP, while the actual


internal floating unit can run asynchronously to this. This increases complexity


of the design as synchronization logic must exist between the two processors,


but like this the floating point unit can run at a higher clock speed than the


CPU upon installation of a dedicated clock generator. 2. The (logical, not


physical) addition of tag fields to the data registers in the 80387 to signal


certain conditions of the data registers makes certain operations that support


tags much faster, as certain information does not need to be decoded as it is ?


cached? in the tag fields. 3. The 80387 can use its registers either in stack


mode or absolute addressing mode. Though some operations require stack


addressing, this feature adds a little more flexibility to the MCP (even though


the stack operations might be a legacy from the 8087 or 80287).


In most other fields, the coprocessors are equals. They have the same number of


data registers, both add their own instruction set and registers to programmers


in a transparent fashion and both support the same IEEE numeric representation


standards. Probably both coprocessors have similar processing power at equal


clockspeed as well. Even though the Motorola coprocessor seems to be superior by


name, I have to admit that the 80387 gets my vote for more flexibility and


thoughtful optimizations (tags).

Сохранить в соц. сетях:
Обсуждение:
comments powered by Disqus

Название реферата: Floating Point Coprocessors Essay Research Paper Floating

Слов:3176
Символов:21679
Размер:42.34 Кб.