This is a remote position.
Key Responsibilities:
Database Management:
Design, implement, and maintain scalable relational and non-relational databases.
Optimize database performance, ensuring minimal latency and high availability.
Develop and manage database schemas, indexes, and partitions for optimal query performance.
Monitor and troubleshoot database issues to maintain system integrity.
Cloud Storage Solutions:
Manage and optimize cloud-based storage systems (e.g., AWS S3, Azure Blob Storage, Google Cloud Storage).
Ensure data storage systems are secure, cost-efficient, and compliant with organizational policies.
Implement lifecycle policies for efficient cloud storage management and cost control.
Data Pipeline Development:
Design and build ETL/ELT pipelines to move data between systems efficiently and reliably.
Automate data ingestion, transformation, and loading processes.
Handle batch and real-time data processing using tools like Apache Airflow, Apache Kafka, or AWS Glue.
Data Governance & Security:
Implement and enforce data governance best practices, ensuring data quality and consistency.
Apply encryption and access control measures to protect sensitive data in transit and at rest.
Ensure compliance with data regulations such as GDPR, CCPA, or HIPAA.
Collaboration:
Work closely with data scientists, analysts, and software engineers to understand data requirements and support their workflows.
Support DevOps teams with database-related CI/CD processes and cloud infrastructure setup.
Documentation & Monitoring:
Create and maintain documentation for data architecture, pipelines, and storage systems.
Implement monitoring tools to track system performance, storage utilization, and data pipeline health.
Required Skills and Qualifications:
Experience:
3+ years of experience in data engineering, with a focus on databases and cloud storage.
Technical Expertise:
Proficiency in SQL and experience with relational database systems like PostgreSQL, MySQL, or Microsoft SQL Server.
Knowledge of NoSQL databases like MongoDB, Cassandra, or DynamoDB.
Hands-on experience with cloud platforms (AWS, Azure, or GCP) and cloud storage systems.
Familiarity with data warehousing solutions like Snowflake, BigQuery, or Redshift.
Strong programming skills in Python, Java, or Scala.
Experience with data pipeline tools such as Apache Airflow, AWS Glue, or Apache NiFi.
Experience with streaming data platforms like Apache Kafka or AWS Kinesis.
Knowledge of file formats like Parquet, ORC, Avro, and JSON.
Experience with version control tools like Git and CI/CD pipelines.
Understanding of data modeling and normalization techniques.