We are Bagel Labs - a distributed machine learning research lab working towards open-source superintelligence.
We ignore years of experience and pedigree. If you have high agency - meaning your default assumption is that you can control the outcome of whatever situation you are in - we want to hear from you. Every requirement below is flexible for a candidate with high enough agency and tolerance for ambiguity.
Role Overview
We encourage curiosity-driven research and welcome bold, untested concepts. You will push the boundaries of diffusion models and distributed learning systems, testing hypotheses at the intersection of generative AI and scalable infrastructure. We love novel, provocative, untested ideas that challenge conventional paradigms.
Key Responsibilities
- Prototype AI methodologies that can redefine distributed machine learning.
- Pioneer next-generation diffusion architectures including rectified flows, EDM variants, and latent consistency models that can scale across distributed infrastructures.
- Develop novel sampling algorithms, guidance mechanisms, and conditioning strategies that unlock new capabilities in controllable generation.
- Partner with cryptographers and economists to embed secure, incentive-aligned protocols into model pipelines.
- Publish papers at top-tier ML venues, organize workshops, and keep our roadmap aligned with the latest academic advances.
- Share insights through internal notes, external blog posts, and conference-grade write-ups (e.g., blog.bagel.com).
- Contribute to open-source code, and stay active in the ML community.
Who You Might Be
You are extremely curious. You actively consume the latest ML research - scanning arXiv, attending conferences, dissecting new open-source releases, and integrating breakthroughs into your own experimentation. You thrive on first-principles reasoning, see potential in unexplored ideas, and view learning as a perpetual process.
Desired Skills (Flexible)
- Deep expertise in modern diffusion models score matching, flow matching, consistency training, and distillation techniques.
- Hands-on experience with distributed training frameworks: FairScale, DeepSpeed, Megatron-LM, or custom implementations of tensor/pipeline parallelism.
- Strong mathematical foundation in SDEs, ODEs, optimal transport, and variational inference for designing novel generative objectives.
- Clear, concise communication.
- Bonus: Experience with model quantization (QLoRA, GPTQ), knowledge distillation for diffusion models, or cryptographic techniques for secure distributed training.
What We Offer
- Top of the market compensation and time to pursue open-ended research.
- A deeply technical culture where bold, frontier ideas are debated, stress-tested, and built.
- Full remote flexibility within North American time zones.
- Ownership of work that can set the direction for decentralized AI.
- Paid travel opportunities to the top ML conferences around the world.