Location: Singapore

Job Function: Technology Group

Job Type: Permanent

Req ID: 17138

About GIC
GIC is one of the world's largest sovereign wealth funds. With over 2,000 employees across 11 locations around the world, we invest in more than 40 countries globally across asset classes and businesses. Working at GIC gives you exposure to an extraordinary network of the world's industry leaders. As a leading global long-term investor, we work at the Point of Impact for Singapore's financial future, and the communities we invest in worldwide.

Technology Group
We experiment, design, and lead a 24×7 global business where we support core capabilities in asset management, trading, investment operations, and risk management. We deliver secure, reliable, and integrated solutions, and provide insights on new and emerging technologies.

Strategy, Architecture, and Transformation Group
The Strategy, Architecture & Transformation (SAT) group shapes and drives GIC’s technology strategy, ensuring alignment with business priorities and enterprise goals. Bringing together expertise in strategy, architecture, engineering, and transformation, the team strengthens governance, promotes consistency, and accelerates delivery across the Technology Group. Through modern practices and close collaboration, SAT leads the development of an architectural strategy that reinforces oversight and accountability while enabling reliable, scalable solutions and informed decision making across the Technology Group and, more broadly, across GIC.

AI Engineering
The AI Engineering team within SAT is driving GIC's transformation from AI-enabled to AI-native. We build and operate the foundational AI platform — gateway, agent runtime, agentic IAM, memory, observability, and more — so that every team across GIC can develop and deploy AI agents that are secure, observable, and production-grade.

What impact can you make in this role?

As a Senior AI Engineer on the core platform team, you will be the technical anchor for the most complex problems in the stack — how agents run, scale, fail, recover, and compose in production. You will own the architecture and technical direction of the agent runtime, orchestrating multi-step workflows, managing durable state, scaling agent workloads, and connecting agents to tools, data, and other agents.

You will make pragmatic technology choices — knowing when to adopt a framework, when to extend one, and when to build an abstraction layer that keeps the team from getting locked in. You are a platform engineer who enables dozens of teams to run hundreds of agents reliably, securely, and at enterprise scale.

Own the agent runtime architecture — design and evolve the execution layer, including lifecycle management, workflow orchestration, durable execution (Temporal), state checkpointing, and compute scheduling
Scale agentic workflows in production — solve the hard problems of running agents at scale: long-running workflows, parallelism, delegation, backpressure, rate limiting, and graceful degradation under load
Drive technical direction across the platform — lead architecture decisions for distributed-systems concerns such as state management, event-driven communication, context engineering, multi-agent coordination, and protocol selection (MCP, A2A)
Build for resilience from day one — embed reliability patterns like circuit breakers, retries with jitter, dead-letter queues, idempotent operations, and chaos-testable interfaces
Shape the agent framework strategy — evaluate, adopt, and extend frameworks (LangGraph, CrewAI, OpenAI Agents SDK, Semantic Kernel) with clear trade-offs
Architect the memory and state layer — design boundaries between runtime state and durable agent memory to ensure agents can remember, resume, and improve across sessions
Harden the platform as we build it — embed production-readiness from the first sprint: structured logging, distributed tracing, health checks, deployment safety, and operational runbooks
Mentor and raise the engineering bar — set technical standards for code review, system design, testing strategy, and documentation; make the team better through reviews and pairing

What will you do as an AI Engineer?
You will design and evolve the agent runtime — the execution layer that orchestrates multi-step workflows, manages durable state, scales workloads, and connects agents to tools and other agents. You will work across all platform capabilities (gateway, IAM, memory, observability) because the runtime touches all of them.

Lead architecture and technical direction for the agent runtime and related distributed systems.
Operate agentic workflows at scale, ensuring reliability, fault tolerance, and performance under tight latency budgets.
Collaborate with SRE and platform teams to embed resilience and observability patterns.
Evaluate and extend agent frameworks and build abstraction layers for portability and stability.
Design the memory and state layers for long-term agent learning and continuity.
Contribute to the early-stage buildout of a new AI platform, embedding production-readiness from day one.
Mentor engineers and uphold high standards for design, testing, and documentation.

What makes you a successful candidate?

Must Have:
- 8+ years in software or platform engineering, with at least 3 years building or operating AI/ML infrastructure or agentic systems at scale
- Deep distributed-systems expertise — hands-on experience with state management, consistency, fault tolerance, and concurrency
- Production agentic systems experience — operating multi-step workflows or autonomous systems with non-deterministic behaviour and complex failure modes
- Durable execution mastery — experience with workflow engines (Temporal preferred, Step Functions, Inngest, or equivalent)
- Python mastery — building production-grade services (FastAPI / gRPC), shared libraries, and SDKs
- Agent framework depth — hands-on experience with LangGraph, CrewAI, Autogen, Semantic Kernel, or OpenAI Agents SDK
- LLM integration at scale — experience with model routing, fallback strategies, structured output parsing, streaming, and provider failover
- System design leadership — producing architecture decisions, API contracts, and technical specs that withstand scrutiny
- Comfort with containers, orchestration, and IaC (Docker, Kubernetes/EKS, Terraform/CDK) and CI/CD pipelines (GitHub Actions, ArgoCD)
- Cloud platform experience (AWS preferred) — EKS, Bedrock, SageMaker, Lambda, SQS/SNS, DynamoDB, ElastiCache

Nice to Have:
- Experience designing or operating MCP server fleets or registries at scale
- Background in agentic memory systems and context engineering
- Experience building multi-agent systems and understanding their failure modes
- Depth in event-driven architectures — event sourcing, CQRS, message brokers (Kafka, SQS/SNS, Redis Streams)
- Track record of framework evaluation and migration
- Experience with AI observability and evaluation tools (Arize, Langfuse, OpenTelemetry)
- Contributions to open-source AI/ML infrastructure or frameworks
- Working knowledge of Go or Rust for performance-critical components

Mindset & Working Style:
- Scale-first thinker — design for hundreds of agents from day one
- Pragmatic architect — make technology bets with clear trade-offs
- Operationally minded — design for failure and recovery
- Strong communicator — explain complex trade-offs clearly
- Force multiplier — mentor and elevate the team
- Builder at heart — thrive in early-stage environments defining foundational architecture

Work at the Point of Impact
We need to be forward-looking to attract the right people to help us become the Leading Global Long-term Investor. Join our ambitious, agile, and diverse teams - be empowered to push boundaries and pursue innovative ideas, share your views, and be heard. Be anchored on our PRIME Values: Prudence, Respect, Integrity, Merit and Excellence, which guides us in how we make our day-to-day decisions. We strive to inspire. To make an impact.

Flexibility at GIC
At GIC, our offices are vibrant hubs for ideation, professional growth, and interpersonal connection. At the same time, we believe that flexibility allows us to do our best work and be our best selves. Thus, our teams come into the office four days per week to harness the benefits of in-person collaboration, but have the flexibility to choose which days they work from home and adjust this arrangement as situational needs arise.

GIC is an equal opportunity employer
GIC is an equal opportunity employer, and we value diversity. We do not discriminate based on race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment.

Learn more about our Technology Group here:
https://gic.careers/group/technology-group/

Career Opportunities: AVP/VP, AI Engineer, Technology Group (17138)

Key Facts

Hard Skills

Other Skills

Roles & Responsibilities

Requirements:

Job description

Artificial Intelligence Engineer Related jobs

Lead Agentic AI Engineer

Senior AI Engineer (Portugal-based Remote/Hybrid)

Senior AI Engineer

Lead Agentic AI Engineer

AI Engineer (Infrastructure team)

Other jobs at GIC

Career Opportunities: SVP, Head of IT Service Operations (Global Infrastructure & Cyber), Technology Group (17159)

Career Opportunities: Associate, IT Audit (contract) (17145)

Career Opportunities: VP, Risk & Controls Officer, Technology Group (17148)

We help you get seen. Not ignored.

Auto-Apply

AI Match Feedback