Name: T3
Brand: T3
Availability: InStock

Large language models increasingly rely on distributed technologies for training and inference. These technologies require communication between devices, which can reduce scaling efficiency as the number of devices increases. While some distributed techniques can overlap, thus hiding communication for independent computations, techniques like tensor parallelism (TP) inherently serialize communication with model execution. One way to hide this serialized communication is to interleave it with producer operations (the production of communication data) in a fine-grained way. However, implementing this fine-grained interleaving of communication and computation in software can be difficult. Furthermore, like any concurrent execution, it requires sharing of computing and memory resources between computation and communication, leading to resource contention and thus reducing overlap efficiency. To overcome these challenges, we propose T3, which applies hardware-software co-design to transparently overlap serial communications while minimizing resource contention with computation. T3 transparently blends producer operation and subsequent communication by simply configuring the producer's output address space, requiring minor software changes. At the hardware level, T3 adds lightweight tracking and triggering mechanisms to orchestrate producer computation and communication. It further utilizes compute-enhanced memory to perform communication-related computations. As a result, T3 reduces resource contention and effectively overlaps serial communications with computation. For important Transformer models such as T-NLG, T3 speeds up communication-intensive sublayers by 30% of the geometric mean (max 47%) and reduces data movement by 22% of the geometric mean (max 36%). Furthermore, the benefits of T3 persist as the model scales: for sublayers of the SIM50 billion parameter model, the geometric mean is 29% for PALM and MT-NLG.

T3

Product Details

Main Features

Target Users

Examples

Quick Access

Categories

Related Recommendations

AI Fiesta

Horizon Alpha

Open WebUI Desktop

Find local AI in 10 secs with Suverenum

OnSpace.AI

Stakpak.dev

JoyAgent-JDGenie

Tile

PrompTessor

Shipable AI

Tila AI

BestModelAI

PromptPilot

Capacity

Instance

Nexty