Job Description
This is a remote position.
Requirements
Strong experience as a AWS Data Engineer and must have AWS Databricks experience.
Expert proficiency in Spark Scala, Python, and PySpark is a plus
Must have data migration experience from on prem to cloud
Hands-on experience in Kinesis to process & analyze Streaming data, and AWS DynamoDB
In depth understanding of AWS cloud and AWS Data lake and Analytics solutions.
Expert level hands-on development Design and Develop applications on Databricks, Databricks Workflows, AWS Managed Airflow, Apache Airflow is required.
Extensive hands-on experience implementing data migration and data processing using AWS services: VPC/SG, EC2, S3, AutoScaling, CloudFormation, LakeFormation, DMS, Kinesis, Kafka, Nifi, CDC processing, Amazon S3, EMR, Redshift, Athena, Snowflake, RDS, Aurora, Neptune, DynamoDB, Cloudtrail, CloudWatch, Docker, Lambda, Spark, Glue, SageMaker, AI/ML, API GW, etc.
Hands-on experience with the Technology stack available in the industry for data management, data ingestion, capture, processing, and curation: Kafka, StreamSets, Attunity, GoldenGate, Map Reduce, Hadoop, Hive, Hbase, Cassandra, Spark, Flume, Hive, Impala, etc.
Knowledge of different programming and scripting languages
Good working knowledge of code versioning tools [such as Git, Bitbucket or SVN]
Hands-on experience in using Spark SQL with various data sources like JSON, Parquet and Key Value Pair
Experience preparing data for Data Science and Machine Learning.
Experience preparing data for use in SageMaker and AWS Databricks.
Demonstrated experience preparing data, automating and building data pipelines for AI Use Cases (text, voice, image, IoT data etc....).
Good to have programming language experience with .NET or Spark/Scala
Experience in creating tables, partitioning, bucketing, loading and aggregating data using Spark Scala, Spark SQL/PySpark
Knowledge of AWS/Azure DevOps processes like CI/CD as well as Agile tools and processes including Git, Jenkins, Jira, and Confluence
Working experience with Visual Studio, PowerShell Scripting, and ARM templates.
Strong understanding of Data Modeling and defining conceptual logical and physical data models.
Big Data/analytics/information analysis/database management in the cloud
IoT/event-driven/microservices in the cloud- Experience with private and public cloud architectures, pros/cons, and migration considerations.
Ability to remain up to date with industry standards and technological advancements that will enhance data quality and reliability to advance strategic initiatives
Basic experience with or knowledge of agile methodologies
Working knowledge of RESTful APIs, OAuth2 authorization framework and security best practices for API Gateways
Responsibilities:
Work closely with team members to lead and drive enterprise solutions, advising on key decision points on trade-offs, best practices, and risk mitigation
Manage data related requests, analyze issues, and provide efficient resolution. Design all program specifications and perform required tests
Design and Develop data Ingestion using Glue, AWS Managed Airflow, Apache Airflow and processing layer using Databricks.
Work with the SMEs to implement data strategies and build data flows.
Prepare codes for all modules according to required specification.
Monitor all production issues and inquiries and provide efficient resolution.
Evaluate all functional requirements, map documents, and troubleshoot all development processes
Document all technical specifications and associates project deliverables.
Design all test cases to provide support to all systems and perform unit tests.
Qualifications:
2+ years of hands-on experience designing and implementing multi-tenant solutions using AWS Databricks for data governance, data pipelines for near real-time data warehouse, and machine learning solutions.
5+ years experience in a software development, data engineering, or data analytics field using Python, PySpark, Scala, Spark, Java, or equivalent technologies.
Bachelors or Masters degree in Big Data, Computer Science, Engineering, Mathematics, or similar area of study or equivalent work experience
Strong written and verbal communication skills
Ability to manage competing priorities in a fast-paced environment
Ability to resolve issues
Self-Motivated and ability to work independently
Nice to have-
- AWS Certified: Solutions Architect Professional
- Databricks Certified Associate Developer for Apache Spark
Salary
0 - 3000000 INR (Per Year)