Doing More With Less
Small Models, Coding Agents, and AI in the ER
📬 Distill AI delivers the most important AI papers worth your time, tailored to your interests and field, every morning. 💬 Chat with any of them, instantly!
A practical edition, and most of it runs lean. Our lead paper puts machine learning to work in the emergency room — flagging at-risk patients the standard hospital codes miss. The rest leans small and efficient: a coding agent made for tiny models, a 1-billion-parameter model that reasons, compact open models from Microsoft and Meta, and a cleaner way to scale a classic piece of math.
📰 THE QUICK BRIEF
• NVIDIA open-sourced Nemotron 3 Ultra — a 550-billion-parameter mixture-of-experts model (55B active) built for long-running agents, with a one-million-token context and up to ~6x the inference speed of comparable open models. It’s billed as the fastest US open-weight model yet.
• Anthropic confidentially filed to go public, submitting a draft S-1 to the SEC — on the back of a revenue run-rate that reportedly climbed roughly 5x in a year, to around $47 billion.
• Microsoft rolled out MAI-Thinking-1 and a wider family of in-house models — a clear move to lean less on OpenAI for its core AI.
⭐ TODAY’S HIGHLIGHT
Transferable Self-Harm Surveillance from Emergency Department Triage Notes
Hospitals and public-health teams track self-harm cases to know where support is needed, but the diagnostic codes they rely on miss a large share of them. This work trains a three-stage machine-learning system to read the free-text notes a nurse writes at ER intake and flag likely cases the codes overlook.
The key word is “transferable”: the method is built to keep working when it’s moved to a hospital it wasn’t trained on, which is usually where these systems fall apart.
Why it matters: better detection means a truer count of who is struggling — and a better chance of reaching the people the current system quietly misses.
📄 MORE PAPERS WORTH READING
A Biconvex Formulation for Stable Transport of Mixture Models with a Unique Solution
Optimal transport is the math of reshaping one distribution into another as cheaply as possible — handy across economics, graphics, and ML, but slow because it usually works point by point. This paper (Optimal Mixture Transport) moves whole groups instead of single points, and rewrites the problem so it has one stable solution and scales to much larger datasets.
💻 ON GITHUB
A 1-billion-parameter text model built on the Hierarchical Reasoning Model design, with extra machinery for finishing multi-step tasks and “thinking” internally before it answers. A small model trying to punch above its weight, with the Python code to try it yourself.
A coding agent made specifically for small language models. It’s a TypeScript project that shows the tricks needed to get usable code generation out of models that don’t have billions of parameters to throw at the problem.
thinkpixellab/polymarket-ai-trading
An AI trading setup for the Polymarket prediction market: a GPT model reads the markets, the Kelly criterion decides how much to stake, and a mean-reversion strategy guides the calls — with a dashboard and a paper-trading mode so you can test without real money.
A proxy that puts 113+ models — Claude, GPT, Gemini, Grok, Kimi — behind one OpenAI/Anthropic-style endpoint, with image upload and Cursor support. Useful for juggling providers from one place, though it’s worth checking each provider’s terms before you rely on it.
🤗 ON HUGGING FACE
microsoft/phi-2 — small / reasoning.
A compact 2.7-billion-parameter model from Microsoft that does a surprising amount of careful reasoning for its size, and is light enough to run almost anywhere.
meta-llama/Meta-Llama-3-8B-Instruct — small / workhorse.
Meta’s 8-billion-parameter instruct model — a dependable everyday pick for summarizing, answering questions, and chatting, without the cost of a giant model.
bigcode/starcoder — code.
An open code-generation model from BigCode, trained across many programming languages to turn plain descriptions into working code.
h94/IP-Adapter-FaceID — image / identity.
An image-generation tool that keeps a face consistent: give it a text prompt plus a reference photo, and it produces new images that still look like the same person.
⏰ DEADLINES CLOSING SOON
• ICDM 2026 — IEEE International Conference on Data Mining — Shenyang, China — paper deadline June 6, 2026
• PRICAI 2026 — Pacific Rim International Conference on AI — Shenyang, China — full-paper deadline June 7, 2026
• ACML 2026 — Asian Conference on Machine Learning — Melbourne, Australia — paper deadline June 27, 2026



