The R520 architecture is referred to by ATI as an "Ultra Threaded Dispatch Processor", which refers to ATI's plan to boost the efficiency of their GPU, instead of going with a brute force increase in the number of processing units. A central pixel shader "dispatch unit" breaks shaders down into threads (batches) of 16 pixels (44) and can track and distribute up to 128 threads per pixel "quad" (4 pipelines each). When a shader quad becomes idle due to a completion of a task or waiting for other data, the dispatch engine assigns the quad with another task to do in the meantime. The overall result is theoretically a greater utilization of the shader units. With a large number of threads per quad, ATI created a very large processor register array that is capable of multiple concurrent reads and writes, and has a high-bandwidth connection to each shader array, providing the temporary storage necessary to keep the pipelines fed by having work available as much as possible. With chips such as RV530 and R580, where the number of shader units per pipeline triples, the efficiency of pixel shading drops off slightly because these shaders still have the same level of threading resources as the less endowed RV515 and R520.[3]
ATI Radeon X1200 Pixel Shader 3.0 Driver
The vertex shader engines were already at the required FP32 precision in ATI's older products. Changes necessary for SM3.0 included longer instruction lengths, dynamic flow control instructions, with branches, loops and subroutines and a larger temporary register space. The pixel shader engines are actually quite similar in computational layout to their R420 counterparts, although they were heavily optimized and tweaked to reach high clock speeds on the 90 nm process. ATI has been working for years on a high-performance shader compiler in their driver for their older hardware, so staying with a similar basic design that is compatible offered obvious cost and time savings.[3]
At the end of the pipeline, the texture addressing processors are decoupled from pixel shaders, so any unused texturing units can be dynamically allocated to pixels that need more texture layers. Other improvements include 4096x4096 texture support and ATI's 3Dc normal map compression saw an improvement in compression ratio for more specific situations.[3]
The RV530 has a 3:1 ratio of pixel shaders to texture units. It possesses 12 pixel shaders while retaining RV515's four texture units and four ROPs. It also gains three extra vertex shaders, bringing the total to 5 units. The chip's single "quad" has 3 pixel shader processors per pipeline, similar to the design of R580's 4 quads. This means that RV530 has the same texturing ability as the X1300 at the same clock speed, but with its 12 pixel shaders it is on par with the X1800 in shader computational performance. Due to the programming content of available games, the X1600 is greatly hampered by lack of texturing power.[3]
With R520's delayed release, its competition was far more impressive than if the chip had made its originally scheduled spring/summer release. Like its predecessor, the X850, the R520 chip carries 4 "quads", which means it has similar texturing capability at the same clock speed as its ancestor and the NVIDIA 6800 series. Unlike the X850, the R520's shader units are vastly improved: they are Shader Model 3 capable, and received some advancements in shader threading that can greatly improve the efficiency of the shader units. Unlike the X1900, the X1800 has 16 pixel shader processors and equal ratio of texturing to pixel shading capability. The chip also increases the vertex shader number from six on the X800 to eight. With the 90 nm low-K fabrication process, these high-transistor chips could still be clocked at very high frequencies, which allows the X1800 series to be competitive with GPUs with more pipelines but lower clock speeds, such as the NVIDIA 7800 and 7900 series that use 24 pipelines.[3]
The X1900 and X1950 series fixed several flaws in the X1800 design and added a significant pixel shading performance boost. The R580 core is pin-compatible with the R520 PCBs, which meant a redesign of the X1800 PCB was not needed. The boards carry either 256 MB or 512 MB of onboard GDDR3 memory depending on the variant. The primary change between the R580 and the R520 is that ATI changed the pixel shader processor-to-texture processor ratio. The X1900 cards have three pixel shaders on each pipeline instead of one, giving a total of 48 pixel shader units. ATI took this step with the expectation that future 3D software will be more pixel shader intensive.[15]
The X1950 Pro was released on October 17, 2006, and was intended to replace the X1900GT in the competitive sub-$200 market segment. The X1950 Pro GPU is built off of the 80 nm RV570 core with only 12 texture units and 36 pixel shaders, and is the first ATI card that supports native Crossfire implementation by a pair of internal Crossfire connectors, which eliminates the need for the unwieldy external dongle found in older Crossfire systems.[17] 2ff7e9595c
Comments