Employment Information
Overview:
You will join our product team in a position that sits at the intersection of artificial intelligence research and real-world solutions. We foster a highly collaborative work culture where you can expect to work closely with your teammates and have a high level of communication between teams through methodologies such as pair or mob programming.
Your responsibilities:
-
Model Inference: Focus on inference optimization to ensure rapid response times and efficient resource utilization during real-time model interactions.
-
Hardware Optimization: Run models on various hardware platforms, from high-performance GPUs to edge devices, ensuring optimal compatibility and performance.
-
Experimentation and Testing: Regularly run experiments, analyze outcomes, and refine the strategies to achieve peak performance in varying deployment scenarios.
-
Staying up to date with the current literature on MLSys
Your profile:
-
You care about making something people want. You want to ship something that will bring value to our users. You want to deliver AI solutions end-to-end and not finish building a prototype.
-
Bachelor's degree or higher in computer science or a related field.
-
You understand how multimodal transformers work.
-
You understand the characteristics of LLM inference (KV caching, flash attention, and model parallelization).
-
You have hands-on experience with large language models or other complex AI architectures.
-
You have experience in system design and optimization, particularly within AI or deep learning contexts.
-
You are proficient in Python and have deep understanding of deep learning frameworks such as PyTorch.
-
A deep understanding of the challenges associated with scaling AI models for large user bases.
Nice if you have:
-
Previous experience in a high-growth tech environment or a role focused on scaling AI solutions.
-
Expertise with CUDA and Triton programming and GPU optimization for neural network inference.
-
Experience with Rust.
-
Experience in adapting AI models to suit a range of hardware, including different accelerators.
-
Experience in model quantization, pruning, and other neural network optimization methodologies.
-
A track record of contributions to open-source projects (please provide links).
-
Some Twitter presence discussing ML Sys topics.
What you can expect from us:
-
Become part of an AI revolution!
-
30 days of paid vacation
-
Access to a variety of fitness & wellness offerings via Wellhub
-
Mental health support through nilo.health
-
Substantially subsidized company pension plan for your future security
-
Subsidized Germany-wide transportation ticket
-
Budget for additional technical equipment
-
Flexible working hours for better work-life balance and hybrid working model
-
Virtual Stock Option Plan
-
JobRad® Bike Lease