1. IBM 3838
Introduction: IBM 3838 is a multiple –pipeline scientific processor. It is evolved from the
earlier IBM 2938 array processor. Both processors are specially designed to attach to IBM
mainframes, like the System/370, for enhancing the vector-processing capability of the host
machines. These attached pipeline processors reflect recent progress in scientific processing at
IBM beyond the level of the 360/91 and 375/195. Vector instructions that can be executed in
the 3838 include the componentwise vector add, vector multiply, the inner product, the sum of
vector components, convolving multiply, vector move, vector factor conversion, fast Fourier
transforms, table interpolations, vector trigonometric and transcendental functions,
polynomial evolution, and matrix operations. Like the AP-120B and the FPS-164, both the IBM
2938 and the 3838 are the microprogrammed pipeline processor which can be supplied with
custom-ordered instruction sets for specific vector applications.
Architecture: The hardware architecture of the IBM 3838 array processor is shown in the
figure 1. The processor can attach to a system/370 via block-multiplexer I/O channel with a
data transfer rate of 1.5 byte per second. With an optional two type interface, the maximum
data transfer can be doubled to 3 byte per second. The 3838 appears to the host processor I/O
channel as a shared control unit. Up to seven users can be simultaneously active in the 3838.
The tasks by the each user is pipelined at the various pipelined at the various subsystem in the
3838. The control processor can assist the user with the set of scalar instructions and the
necessary registers in preparing vector instruction. The bulk memory is used to hold large
volume vector operands. The I/O unit supervises the transfer of data or programs between the
host and the bulk memory. Data-word size of the 3838 is 32-bits, matching that of the system
370.
The transfer of the working sets of the vector segments between the bulk memory and
the working stores is supervised by the data transfer control (DTC). Each working store can hold
8129 bytes. Vector-addressing parameters are supplied to the DTC by control processors. This
DTC is microprogrammed to generate the effective memory addresses for both the bulk and
working memories before data can be properly transferred. Furthermore, the DTC can transfer
data-format conversion during the data flow. The arithmetic controller is also a
microprogrammed unit. The microprogram sequences preformed by the arithmetic pipelines
are initialized by this controller. The use of working stores by the arithmetic pipelines and by
the DTC is synchronized. The basic pipeline cycle time is 100 ns in the 3838.
There are five arithmetic units in the 3838. The pipeline units as diagram in the figure 1
include two floating-point adders of four stages each; a four stage floating –point multiplier; a
three stage sine/cosine pipeline; and a five-stage reciprocal estimator. Even the working store
appears as a four-stage pipeline. The delay of each stage is 100 ns. The interconnection paths
between these functional pipelines are under the microprogrammed control of the arithmetic
arifch2009@gmail.com M. TECH , CSE DEPARTMENT, NIT SILCHAR
2. element controller. The access of the writable control stage is also pipelined into two stage
delays.
The programmed and the data to be processed by the 3838 are prepared by the host
computer. Both vector and scalar instructions can be contained in these 3838 programs. The
hosts send the program and the data to the 3838 through the I/O channel. Data will be stored
in the bulk store. The instructions will be executed by the control processor. After decoding of
the each instruction, the control processors provide link lists of microprogram sequences for
supervising the pipeline execution of the instructions. While the arithmetic pipelines are
updating vector data from one working store, the DTC can load the other working store. The
data loading and the instruction execution can be done simultaneously at the two banks of the
working stores. This facilitates the multiprogrammed use of the 3838. Concurrent pipelining
allows multiple users to share the hardware resources in achieving high system throughput. The
maximum speed of the 3838 has been estimated to be 30 megaflops.
Figure 1: The arithmetic Processor in IBM 3838
arifch2009@gmail.com M. TECH , CSE DEPARTMENT, NIT SILCHAR