Job Description
We are seeking a software engineer to drive the implementation and
performance optimization of generative AI workloads on Intel GPUs as part of
the OpenVINO GPU team.
This role focuses on building high-performance, HW-aware software that
enables efficient execution of AI models on current and future Intel GPU
architectures. You will work across multiple layers of the stack—AI models,
runtime systems, and GPU hardware—and take ownership of complex performance
problems that require deep technical insight and careful trade-off analysis.
You will work on state-of-the-art AI models that push the limits of GPU
performance. Your work directly impacts real-world AI performance experienced
by developers and customers.
About OpenVINO
OpenVINO(https://github.com/openvinotoolkit/openvino)
is a performance-focused AI inference runtime designed to efficiently execute
deep learning models across Intel architectures.
The GPU plugin is a core component of OpenVINO that bridges high-level
AI models and low-level GPU execution, covering areas such as graph
transformation, kernel dispatch, memory management, and hardware-specific
optimizations.
The codebase is performance-critical, largely written in modern C++, and
requires strong understanding of system-level software design, debugging, and
optimization.
What You Will Do
- Take technical ownership of
performance-critical paths for generative AI workloads (e.g., LLMs,
diffusion models) on Intel GPUs
- Analyze end-to-end execution of AI models to
identify compute, memory, bandwidth, and parallelism bottlenecks
- Implement and optimize generative AI
techniques, adapting state-of-the-art ideas to efficiently run on Intel
GPU architectures
- Translate deep understanding of GPU hardware
architecture into efficient, scalable, and maintainable software designs
- Optimize workloads for both current and future
Intel GPU platforms, including hardware that is still under development
- Diagnose and resolve complex issues that span
runtime, kernel, driver, and hardware boundaries
- Collaborate with global teams across software,
hardware architecture, and validation to deliver optimized solutions
Required Qualifications
- Computer science, computer engineering, or a
related field with 3+ years of professional software engineering
experience
- Strong programming skills in C and C++;
working experience with Python
- Experience working with large and complex C++
codebases, with attention to performance, correctness, and maintainability
- Proven analytical thinking and strong
problem-solving abilities, especially for ambiguous or under-specified
problems
Preferred Qualifications
- Experience with GPU programming or parallel
computing, such as multi-threading, SIMD, or accelerator programming
models
- Strong understanding of computer and GPU
architecture, and how hardware characteristics impact software performance
- Technical understanding of generative AI
models from a system and performance perspective
- Familiarity with AI runtimes or frameworks
- Solid foundation in computer science
fundamentals, including data structures, algorithms, and operating systems
- Ability to communicate technical ideas clearly
in written and spoken English
Work Model
- This role follows a structured hybrid work
model. The team regularly combines remote work and in-office
collaboration, with a designated in-office days each week, while the
remaining days are remote.