AI Research Scientist
Job Description:
Job Title: AI Research Scientist
Location: Singapore
Job Type: Full-time
About the Opportunity
Our client is seeking an AI Research Scientist who combines frontier research curiosity with deep engineering discipline. This is not a purely theoretical role; you will be at the core of their research efforts, training state-of-the-art (SOTA) models and simultaneously building and maintaining the core training infrastructure.
This role is ideal for a high-performance individual who understands the complex nuances of training large models, is obsessed with fast experimentation, and is "productively combative"—willing to ask meaningful questions and scrutinize results to find the best-applied solutions.
Key Responsibilities
- Research & Experimentation: Generate new ideas, design and implement experiments, and relentlessly debug training runs to accelerate iteration and find solutions to everyday problems.
- Core Infrastructure Development: Build, own, and maintain modular, high-quality training codebases and efficient data loading pipelines.
- Distributed Training & Optimization: Scale training jobs effectively across multiple GPUs and nodes (e.g., using DDP, NCCL). Optimize model training for maximum performance, stability, and hardware utilization.
- Engineering Rigor: Write clean, testable, and reproducible code to maintain long-term code health, applying a high standard of engineering to the research process.
- Collaboration: Contribute to open-source dependencies and collaborate closely with the wider research team.
Required Qualifications
- Deep, hands-on expertise in modern training codebases, especially PyTorch (or equivalents).
- Proven, real-world experience training large-scale deep learning models in a research or production setting (not just academic/toy problems).
- Strong engineering and software development skills in Python. (Performance-critical C++ is a plus).
- Crucial: A deep, intuitive understanding of training dynamics: what goes wrong during training (e..g., non-convergence, instability), and the advanced debugging skills to fix it.
- Experience working with large datasets and complex data pipelines.
- Familiarity with job launchers, logging tools (e.g., Weights & Biases, TensorBoard), and checkpointing systems.
- A firm mindset of applying engineering rigor to research (readable code, thoughtful design, and reproducibility).
Preferred Qualifications
- Experience with Transformer models, diffusion models, or other large-scale vision/NLP tasks.
- Knowledge of model optimization for inference (e.g., TorchScript, ONNX).
- Significant open-source contributions to PyTorch or related ML tooling.
- Familiarity with cluster environments: batch schedulers (SLURM) and GPU resource management.
- Experience collaborating with MLOps or systems engineering teams.
Why Join This Opportunity?
- Collaborate with a world-class, high-impact research team.
- Take direct ownership of the core training code infrastructure used daily by the entire team.
- Work on real models, real data, and real-world scale—not "toy" problems.
- Be a key part of bridging the gap between rapid research velocity and high-quality engineering.
- A flexible work environment with a culture that values depth, clarity, and intellectual curiosity.
How to Apply
Interested candidates are invited to submit their resume, detailing their experience in training large-scale models and building research infrastructure.