On behalf of Atlas Invest, SD Solutions is looking for a talented Senior, research-oriented Data Engineer / Data-Focused Backend developer who can take a feature idea from concept through research, data validation, modeling approach, and full implementation. You will play a key role in designing, developing, and maintaining our core services, with a focus on performance, reliability, and scalability.
SD Solutions is a staffing company operating globally. Contact us to get more details about the benefits we offer.
As a Data-Focused Backend developer, you will own the full arc from idea to impact. "End-to-end" here isn't just a buzzword; it means you translate abstract problems into testable hypotheses. It means the same person who reads a paper on hybrid document classification prototypes it in a notebook, evaluates it with DSPy metrics, wires it into a LangGraph node, and deploys it into our production Python/TypeScript monorepo.
You will bridge the gap between abstract research and concrete engineering. You won’t stop at a "notebook win" or just build isolated models; you will build the pipelines, FastAPI services, and TypeScript integrations that serve them to the real world, ensuring reliability and measurable business value. We are looking for a "rockstar" who can seamlessly navigate the boundary between high-level AI orchestration and low-level system reliability.
Your First 90 Days
Month 1: Codebase Mastery & First Shipped Wins
- Get fully onboarded by successfully running the monorepo locally and tracing a 'live' data request through our core AI and data services within your first few days.
- Ship your first pipeline improvement to production (e.g., an extraction fix or a schema normalization) by the end of Week 1.
- Reproduce a notebook experiment, publish a short gap analysis, and transition your first DSPy or LangGraph prototype into a tested FastAPI service.
Month 2: Pipeline Ownership & The Research Flywheel
- Take end-to-end ownership of a complex pipeline component (like due diligence intelligence or multi-source data fusion).
- Deliver a new evaluation harness tied to a live pipeline, and immediately use it to measure and drive a real-world performance increase.
- Productionize a research-driven upgrade (like a new DSPy optimizer strategy) with clear before/after metrics.
Month 3: Architecture & Scale
- Lead the architecture of a next-generation research initiative (e.g., advanced GraphRAG or a new autonomous diligence agent) from abstract idea to production deployment.
- Define and accelerate a repeatable “research-to-release” playbook for your domain, setting the standard for how we bridge AI research and production engineering.
What You Will Own
- AI Extraction Pipelines: Design and ship improvements to the OCR → Classify → Extract pipeline (using PaddleOCR, LangGraph, DSPy) to reduce extraction error and latency for complex document types like T12 financials, rent rolls, and appraisals.
- Scale Data Normalization: Expand our property data aggregation layer. You will pull data from various top-tier real estate and demographic APIs, optimizing schema normalizations and conflict resolution to unify external datasets with our internal systems.
- Strengthen Automated Risk Engines: Improve the underlying engine to generate smarter, cleaner, and higher-quality risk assessments.
- Optimize Property Intelligence Pipelines: Enhance automated data enrichment to deliver instantaneous, actionable insights on asset-specific attributes and external risk factors.
- External Provider Resilience: Expand and maintain our TypeScript-based provider ecosystem, ensuring reliability against third-party outages via robust caching, retries, and observability.
- Drive the Research Flywheel: Conduct systematic gap analyses using custom evaluation suites (accuracy/precision-recall) on current modules. You will identify the next 2-3 bottlenecks, feed them back into the engineering loop, and implement academic approaches (e.g., SOTA advanced chunking, multi-step RLM reasoning) to continuously boost precision and recall.
- Orchestrate Agentic Workflows: Use LangGraph to build complex, fault-tolerant state machines that connect our document classification, OCR, and schema extraction modules.
What hard skills do we need?
Note: We don't expect you to have every single skill listed below-that's nearly impossible. We value equivalent skills and a proven ability to learn fast, especially when it comes to specific technologies like DSPy or Neo4j Cypher.
- Languages: Python 3.12+ (FastAPI/Pydantic), TypeScript (Strict mode/Zod), SQL/Cypher, and the newest programming language -> English.
- AI/ML/LLM Systems: Prompts/DSPy optimization, LangGraph orchestration, vector retrieval (Weaviate, Elastic, or alternatives), prompt/eval loops, and multi-model integrations (OpenAI, Gemini, vLLM).
- Data & Graphs: Neo4j modeling, schema design, multi-source data fusion, and ORMs (SQLAlchemy, Prisma, or Drizzle are an advantage).
- Document Intelligence: Working with pre-implemented OCR pipelines, document parsing, and classification under noisy, real-world inputs/files/tables.
- Production Engineering: Monorepo tooling, Docker/Docker-compose, message queues (RabbitMQ or others), and observability (tracing, structured logging).
- Experimentation: Comfortable in Jupyter Notebooks for rapid prototyping, benchmark/evaluation harnesses, reproducible experiments, and A/B metric tracking.
Core Responsibilities:
- Identify and onboard new data sources
- Perform data comparisons & validation
- Assess data quality and usability
- Define the modeling approach
- Implement and productionize solutions
- Work independently with minimal structure
The Team X @ Atlas Mission & Culture
Atlas Invest’s Team X is building the intelligence layer for real estate. We ingest, normalize, and reason over the messiest data in one of the world's largest asset classes – property records scattered across multiple external providers, complex ownership networks buried in public filings, and financial details locked inside massive, unstructured rent rolls and appraisals.
Team X is a diverse, high-performing squad of engineers and researchers within Atlas. We value ownership, velocity, and craftsmanship. We ship a polyglot monorepo and treat the boundary between research and production as a feature, not friction. You will join a culture where people are trusted to run with ambiguity, publish Jupyter experiments on Monday, and deploy those results to production by Friday.
About the company:
Atlas Invest is transforming the bridge loan landscape, seamlessly connecting investors with real estate developers using advanced big data analytics for a personalized investment experience.
By applying for this position, you agree to the terms outlined in our Privacy Policy. Please take a moment to review our Privacy Policy https://sd-solutions.breezy.hr/privacy-notice, and make sure you understand its contents. If you have any questions or concerns regarding our Privacy Policy, please feel free to contact us.