Andriy Burkov

@@andriyburkov

PhD in AI, author of 📖 The Hundred-Page Language Models Book and 📖 The Hundred-Page Machine Learning Book · Read my just released 📖 The Hundred-Page Language Models Book: https://thelmbook.com Read my bestselling 📖 The Hundred-Page Machine Learning Book: https://themlbook.com Read 📖 Machine Learning Engineering: https://mlebook.com Subscribe to my weekly ✉️ Artificial Intelligence newsletter: https://www.linkedin.com/newsletters/artificial-intelligence-6598352935271358464/ and to my weekly ✉️ Data Science newsletter: https://www.linkedin.com/newsletters/7102511020270608384/ About me: Ph.D. in Artificial Intelligence, passionate about data, fluent in English, French, and Russian. Solid scientific programming and team leadership skills, with over 20 years of experience working on various computing projects, including several of my own startups. More than 15 years of hands-on experience in automated data analysis, machine learning, and natural language processing. Trained a Transformer from scratch and fine-tuned pretrained transformers for various tasks. Built a robot that crawls the internet, finds websites with business-critical information, and retrieves updated information periodically. Developed an enterprise chatbot that doesn’t hallucinate. Expert in Python and Java with several years of daily design and development experience in big data contexts. Specialties: machine learning, natural language processing, conversational interfaces (chatbots), information retrieval. · Experience: ChapterPal · Location: Greater Quebec City Metropolitan Area · 500+ connections on LinkedIn. View Andriy Burkov’s profile on LinkedIn, a professional community of 1 billion members.

Shared posts: 1
Last activity: 3 months ago
Media: 10 media

nbluemer shared this post · Mar 27

Andriy Burkov

Mar 25 · archived Mar 27

Google just published TurboQuant https://lnkd.in/ecbpBSpB, a model compression technique that can quantize the transformer's key-value cache to just 3 bits without requiring training or finetuning and causing any compromise in model accuracy, all while achieving a faster runtime than the original LLMs.

As you might already know, LLMs store intermediate computations in something called a key-value cache — essentially a running memory of what the model has processed so far — and this cache grows linearly with the length of the input, eating up GPU memory fast.

1 / 10

Evan Powell Reason #432 to bet on transformers Mar 26

Uzair Javaid, Ph.D. Milad Abdollahzadeh Jiayu Li Zilong ZHAO Mar 26

Author posts