Key Facts

Remote From:

Anywhere

Category: Data Engineer

Full time

Senior (5-10 years)

English

Hard Skills

Apache Iceberg Data Architecture Data Lineage Database Schema Indexing Data Caching Data Lakes Data Architecture AWS Glue Data Integration +30 more

Other Skills

•
Communication
•
Teamwork
•
Problem Solving

Roles & Responsibilities

Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field
5+ years of experience in data engineering, ETL development, or data platform engineering
Strong hands-on experience with ETL/ELT tools and AWS data services (S3, Glue, Lambda, Redshift) and Apache Iceberg/modern data lake architectures
Experience designing and implementing CI/CD pipelines for data platforms and ETL workflows, and proficiency with AI-assisted development workflows

Requirements:

Design, develop, and maintain robust ETL/ELT pipelines to ingest, transform, and deliver data across enterprise platforms, including XBRL filings and financial datasets; ensure reliability, performance, and cloud scalability; leverage AI-assisted development tools
Develop and manage data solutions leveraging AWS services and Apache Iceberg; optimize lakehouse architectures and data storage for performance and cost; enable AI/ML workloads and downstream generative AI use cases
Design and implement CI/CD pipelines for data pipelines, infrastructure, and analytics code; automate build, test, and deployment; enforce DataOps governance and integrate AI-driven testing and monitoring
Apply context engineering and metadata governance to enrich data with metadata, lineage, and business context; support data modeling, schema design, and governance integration with data catalogs

Anika Systems

About Anika Systems

The pace at which the government is changing, there is a need for technology consulting companies that rise to those challenges by taking a fresh approach to problems, solutions that get to market faster, offer service that exceed the customers’ expectations and disrupt the status quo. Anika Systems is an outcome-driven technology consulting firm that helps federal agencies solve business problems and enable them for the future, with services and solutions spanning Data and Analytics, Intelligent Automation, IT Modernization, Application Development and Cloud Engineering. We’re a team of thinkers, lifelong learners, makers, and doers that deeply understand the Federal government customers missions and goals. Our teams are deeply connected and bring their shared experiences and insights to each and every engagement. Philosophy Anton Chekhov, the famous playwright, about telling versus showing said: “Don't tell me the moon is shining; show me the glint of light on broken glass.” With this "Show me over Tell Me" philosophy that is imbibed in our corporate DNA, we produce Minimum Viable Products (MVPs) and delight our customers with solutions, not boring "solution decks". We accomplish these MVPs in our multi-cloud based Virtual Innovation Transformation Acceleration Lab (VITAL), wherein we synthesize ideas into a business concept (intake), select ideas (assess, evaluate, decide) and implement the selected ideas using the appropriate technology (fulfillment). We specialize in building Agency-wide Centers of Excellence for Data and Analytics, Intelligent Automation and Cloud Management.

Company type: SME

Founded: 2018

Company size: 51 - 200

Website LinkedIn See all jobs →

Job description

Anika Systems is seeking a skilled Data Engineer to design, build, and optimize scalable data pipelines and platforms supporting federal clients. This role will play a critical part in enabling enterprise data strategies, supporting Office of the Chief Data Officer (OCDO) initiatives, and delivering high-quality, trusted data for analytics, reporting, and mission operations.

This opportunity is 100% remote.

The ideal candidate has hands-on experience with ETL/ELT pipelines, XBRL data processing, Apache Iceberg-based architectures, and advanced data optimization techniques such as materialized views and context-aware data engineering. This role also requires proficiency in AI tools and AI-assisted development workflows, along with experience building and deploying CI/CD pipelines for data and analytics platforms.

Key Responsibilities
Data Pipeline Development & ETL/ELT

Design, develop, and maintain robust ETL/ELT pipelines to ingest, transform, and deliver data across enterprise platforms.
Build scalable data ingestion frameworks for structured and semi-structured data, including XBRL filings and financial datasets.
Implement data transformation logic to support analytics, reporting, and regulatory use cases.
Ensure data pipelines are reliable, performant, and scalable in cloud environments.
Leverage AI-assisted development tools to accelerate pipeline development, testing, and optimization.

Cloud Data Platforms & Iceberg Architecture

Develop and manage data solutions leveraging AWS services (e.g., S3, Airflow, DAGs, Glue, Lambda, Redshift).
Implement and optimize Apache Iceberg table formats for large-scale, ACID-compliant data lakes.
Support lakehouse architectures that unify data lakes and data warehouses.
Optimize data storage and retrieval strategies for performance and cost efficiency.
Enable data platforms that support AI/ML workloads and downstream generative AI use cases.

CI/CD & DataOps Engineering

Design and implement CI/CD pipelines for data pipelines, infrastructure, and analytics code using tools such as GitHub Actions, GitLab CI, Jenkins, or AWS-native services.
Automate build, test, and deployment processes for ETL pipelines and data platform components.
Implement DataOps best practices, including version control, automated testing, environment promotion, and rollback strategies.
Ensure reproducibility, reliability, and governance of data pipeline deployments across environments.
Integrate AI-driven testing and monitoring tools to improve pipeline quality and reduce operational risk.

Data Optimization & Performance Engineering

Design and implement materialized views and other performance optimization techniques to improve query efficiency.
Tune data pipelines and queries for performance, scalability, and cost.
Implement partitioning, indexing, and caching strategies aligned to workload patterns.

XBRL & Financial Data Processing

Develop pipelines to ingest, parse, and normalize XBRL (eXtensible Business Reporting Language) data.
Support regulatory and financial data use cases requiring high accuracy and traceability.
Ensure alignment with data standards and validation rules for financial reporting datasets.

Context Engineering & Data Modeling Support

Apply context engineering principles to ensure data is enriched with meaningful metadata, lineage, and business context.
Collaborate with Data Architects to support data modeling, schema design, and entity relationships.
Enable downstream analytics and AI use cases by structuring data for usability, discoverability, and governance.

Metadata, Data Catalog, and Governance Integration

Integrate pipelines with enterprise data catalogs and metadata management systems.
Support automated metadata capture, lineage tracking, and data quality monitoring.
Ensure alignment with data governance frameworks and standards established by OCDO organizations, including AI data readiness and traceability.

Stakeholder Collaboration & Agile Delivery

Collaborate with data architects, analysts, and business stakeholders to understand data needs and deliver solutions.
Participate in stakeholder listening campaigns, workshops, and data discovery efforts.
Work in Agile teams to iteratively deliver data capabilities and enhancements.
Contribute to identifying and implementing AI-driven efficiencies and automation opportunities across the data lifecycle.

Required Qualifications

Bachelor’s degree in Computer Science, Engineering, Data Science, or related field.
5+ years of experience in data engineering, ETL development, or data platform engineering.
Strong hands-on experience with:
- ETL/ELT tools and frameworks
- AWS data services (S3, Glue, Lambda, Redshift, etc.)
- Apache Iceberg and modern data lake architectures
Experience designing and implementing CI/CD pipelines for data platforms and ETL workflows.
Demonstrated proficiency using AI tools and AI-assisted development workflows (e.g., LLM copilots, automated code generation, pipeline optimization tools).
Experience processing XBRL or complex financial/regulatory datasets.
Proficiency in SQL and Python.
Experience implementing materialized views and query optimization techniques.
Understanding of data modeling concepts and metadata management.
Familiarity with data governance, data quality practices, and data readiness for AI/ML use cases.
Ability to work in Agile, DevOps-oriented environments.
U.S. Citizenship required; ability to obtain and maintain a federal clearance.

Preferred Qualifications

Experience supporting federal agencies such as SEC, DHS, Treasury, or Federal Reserve System.
Familiarity with data catalog tools (e.g., Collibra, Alation, ServiceNow).
Experience with Apache Spark, Kafka, or other distributed data processing frameworks.
Experience enabling data pipelines for AI/ML or generative AI applications.
Knowledge of data maturity frameworks (e.g., EDM DCAM, TDWI).
Exposure to context engineering or semantic data layer design.
AWS or data engineering certifications.
Experience with infrastructure-as-code (IaC) tools (e.g., Terraform, CloudFormation) in support of CI/CD pipelines.