Experience building batch data pipelines in Spark (Scala) and distributed storage/compute (S3, Hive, Spark)
Strong SQL and relational database querying skills
Experience with ETL frameworks (Airflow, Flume, Oozie, etc.) to build and deploy production-quality pipelines
Requirements:
Develop and automate large-scale, high-performance data processing systems (batch and/or streaming) to drive Airbnb business growth and improve the product experience
Build scalable Spark data pipelines leveraging Airflow scheduler/executor framework
Analyze large data sets to identify gaps and inconsistencies, provide data insights, and support effective product solutions
Build and deploy production-quality ETL pipelines using ETL frameworks (Airflow, Flume, Oozie)
Job description
Job Name : Data Engineer - REMOTE Job Type : Contract Job Authorization: US Citizen/ GC /EAD (H4/L2/TN) preferred- No 3rdPARTIES RESUME C2C ACCEPTED
Responsibilities:
Develop and automate large scale, high-performance data processing systems (batch and/or streaming) to drive Airbnb business growth and improve the product experience.
Build scalable Spark data pipelines leveraging Airflow scheduler/executor framework
Minimum Requirements:
4+ years of relevant industry experience
Demonstrated ability to analyze large data sets to identify gaps and inconsistencies, provide data insights, and advance effective product solutions
Working knowledge of relational databases and query authoring (SQL).
Good communication skills, both written and verbal
Strong experience using ETL framework (ex: Airflow, Flume, Oozie etc.) to build and deploy production-quality ETL pipelines.
Experience building batch data pipelines in Spark Scala.
Strong understanding of distributed storage and compute (S3, Hive, Spark)
General software engineering skills (Java or Python, Github)