AMD on MI355X Inference: Multi-Token Prediction to Reduce Latency, Focus on Throughput in Interactive Ranges
AMD’s technical article outlines MI355X inference optimizations such as multi-token prediction, framing performance around throughput under interactive latency constraints.