Job Description
· Engineer
data solutions in support of Sustainability reporting and analytics initiatives.
· Engage
with product owner, analysts, visualization developers, and business partners
to understand capability requirements, and to develop and support data
solutions based on product backlog priorities.
Responsibilities
General Purpose Python
Programming:
· Python has
been your primary coding language (daily use) for at least 3 years.
· You have authored
distributable Python packages (packages which can be built, installed, and distributed
using setuptools, pip, and twine)
· You have a
solid understanding of how pip dependency resolution works.
· You are proficient
in authoring and automating unit and integration tests for python packages using
(minimally) unittests, pytest, and tox.
· You are meticulous
about code quality, including readability, know your PEP8 guidelines inside and
out, and are capable of authoring code which will pass validation by commonly used
static analysis tools including mypy and flake8.
Database Design and SQL
· You are
proficient in authoring readable, well-structured, SQL SELECT statements using
ISO/ANSI-standard SQL.
· You have
hands-on professional experience in data warehouse design and modeling,
including authoring DDL statements.
Version
Control and CI/CD
· You have experience
with trunk-based development (feature branching) using git for version control,
with fully automated deployments (CI/CD).
Required Skillsets
General
Purpose Python Programming:
· You have a
deep understanding of python’s standard library and python internals. You understand
python memory management, how CPython implements built-in data structures, and which
data structures are best suited for different scenarios.
· You understand
and can compare/contrast CPython’s built-in concurrency models, when to use each,
and what obstacles might prevent the use of each mechanism.
Database
Design, SQL, and Object Relational Models:
· You are adept
at performance-tuning SQL queries for both OLAP and OLTP databases.
· You
understand and are prepared to discuss how and when/where to utilize more esoteric
and/or modern SQL features such as window functions and common table expressions.
· You understand
and are prepared to discuss the performance implications of columnar vs relational databases.
· You have firsthand
experience in managing database schema migrations (ideally using SQLAlchemy’s ORM
+ Alembic).
Version Control
and CI/CD
· You have experience
with trunk-based development (feature branching) using git for version control,
with fully automated deployments (CI/CD).
Cloud Infrastructure
and Amazon Web Services
· You have firsthand
experience using boto3 to interact with Amazon Web Services’ resource APIs, particularly
Amazon S3 (Simple Storage Service).
· You have hands-on
experience authoring unit and integration tests utilizing localstack to emulate
AWS resources.
· You have firsthand
experience using HashiCorp Terraform to manage cloud infrastructure.
· You have firsthand
experience developing serverless ASGI applications using AWS lambda and AWS API
Gateway.
Web API Server
and Client Development:
· You have
experience planning and executing the design and development of web APIs using a
modern python ASGI framework (preferably FastAPI).
· You have authored,
validated, and maintained OpenAPI documents describing your web APIs accurately.
You have experience developing and testing python web API client libraries
based on an OpenAPI document.
Distributed Computing and Apache Spark
· You have
experience using Apache Spark for ingestion and manipulation of data sets which
are too large to process efficiently in-memory.
· You have firsthand
experience translating algorithms and procedures designed by topical subject matter
experts, having varying levels of engineering experience, into well-designed data
pipelines.
· You have experience
configuring and tuning Spark clusters to optimize use of computing resources for
varying workloads.
· You understand
and can discuss when and why to use distributed computing frameworks, such as
Apache Spark, versus alternate concurrency models such as asyncio or multiprocessing.
Database
Design, SQL, and Object Relational Models.
· You have experience
modeling databases using SQLAlchemy’s ORM framework.
Required Soft Skills
· You are proficient
in communicating effectively and efficiently within a hybrid remote/in-person team
structure:
· You are meticulous
about managing your calendar to accurately reflect your free/busy hours.
· You respect
and seek to learn digital communications etiquette—including region-specific, industry-specific,
and organization-specific etiquette.
· You proactively
initiate constructive discussions while curating and targeting your communications
with respect for your colleagues’ time and schedules.
· You are adept
at discovering and navigating the complex bureaucratic resources of a large organization.
Top 3 Skills
· Python
· SQL
· Spark