Wednesday, September 15, 2021

- PDT
OPEN TALK: Low Latency and High Throughput Chat Moderation on a CPU
Join on Hopin
Neha Rao
Neha Rao
Stream, Data Scientist

Transformer-based models have been dominant in the NLP landscape due to their state of the art performance on a wide variety of benchmarks and tasks. However, deploying such large models at scale can be quite difficult and costly. Learn about the techniques that we've utilized at Stream to overcome these challenges and moderate real-time chat messages efficiently on relatively inexpensive hardware. While this talk will focus on the BERT and its offshoots, many of these techniques can also be applied to other models.