Match working

Remote Data Engineer

77% Flex
Full Remote
120 - 140 K yearly

  • Remote from:United States

Remote Data Engineer

77% Flex
Remote:Full Remote
Salary:120 - 140K yearly
Work from:United States
LookFar Labs logo
Match working
LookFar Labs
Energy & Chemicals
2 - 10 Employees

Job description

Logo JobgetherYour missions

Remote AWS Data Engineer

We have an existing commercial SaaS platform that consists of 3 components: a web application, several 3rd party databases integrated into our backend, and a Natural Language Processing ML model based on a custom taxonomy.

We are looking to build 2.0 of our platform, with a brand new front end based on new algorithms, and scalable data science models that use a confluence of data from various data sources (e.g., patent, financial, and people). Its a challenge and a fun opportunity for someone looking to make the next big platform that the world is going to use.

Our Data Engineer would need to create a new data pipeline, ETL process, and architecture for 2.0 of our platform. This could include multi-modal databases, and should consider the delineation between production, development, and staging/testing data pipelines and environments.

The data pipeline should easily integrate new data sources, with both structured and unstructured data, and should enable associations between data as well. It should also enable and further enhance the strong entity resolution that we have already started building for our disparate, large data sets to be cleanly integrated.

You should also not rely solely on off the shelf tools or default pipelines. This role will require creativity and customization.

Your solutions should keep in mind scalability, to enable optimized usage of distributed computing frameworks like Spark. You should also have strong familiarity and experience with how to leverage the AWS ecosystem to bring in relevant AWS tools, services, and resources to enable substantial processing of very large datasets before runtime, entity resolution between very large datasets, and real-time processing in a scalable, distributed computing environment.

Role Responsibilities:

  • Create and maintain a scalable ETL data pipeline that ingests multiple large data sets of both structured data (in the form of financial and patent data) andunstructured data (in the form of white papers, scraped websites, etc.), and enablesentity resolution and other transformations for clean data integration and usage
  • Create and maintain a multi-modal data storage system that enables scalable, real-time processing for production-level data
  • Work with the data science team to enable ML Ops
  • Have curiosity and passion for data, and demonstrate strong and extensiveunderstanding of our data, including ability to efficiently query and obtain data viaSQL
  • Demonstrate a strong sense of ownership, of both technical and business outcomes
  • Assist dev and data science teams with processing and integrating data analysis
  • Clearly document processes, methodologies, and tools usedExperience


  • B.S. in relevant technical degree
  • Significant use and experience (at least 3-5 years) as a data engineer in the AWS ecosystem, including strong familiarity with structured and unstructured large datasets, enabling scalable and distributed compute, and ensuring real-time processingat scale
  • Significant use and experience (at least 3-5 years) with writing complex SQL queries and analysis of data correlations
  • Significant experience (at least 3-5 years) with the AWS ecosystem, including tools, services, and resources that enable scalable, distributed computing
  • Project management skills, ability to scope out timeline, methodology, and deliverables for development, testing, and integration into the platform
  • Excellent communication and story-telling skills (written and verbal)

Our Current Tech Stack:
AWS to host the infrastructure, including the CICD, SpringBoot, Angular, Python, PySpark, Kubernetes, EMR, Spark, Elasticsearch, RedShift, AWS (S3, Code Commit, Code Build, Code Deploy, EC2, EMR, etc.), Docker, Spacy, Scikit learn, Openpyxl, Streamlit, Watchdog, sklearn, seaborn, nltk, matplotlib, pandas, SQLAlchemy, and additional ML and python libraries. This stack is subject to change as we build v2.0.

We want to modernize and streamline our models, MLOps, code, deployment, front-end, and distributed processing capabilities.

Logistics: Geography, Work Status, Etc.

The position is full-time on a W2 and fully remote. The candidate must have the legal right to work in the United States.

Interview Process:

We will conduct 3 rounds of interviews.

  • First Round: Culture, fit, and background interview with the Founders
  • Second Round: Technical Interview

Technical Project: Execute a small data engineering project, if selected for the third round of interview

  • Third Round: In-Person Day in Washington D.C. (We will have the candidate fly out to D.C. to meet the founders and team.) Present the results of the data engineeringproject during the In-Perso Day.How to Apply:Please provide the following:Resume

Cover Letter
Any links to Git repositories or data engineering projects that we can review

About the Company:

We are the source of truth for patent intelligence. Patents protect revenue and investment in the market. Given that, patent intelligence is not complete UNLESS it integrates financial and market data. We provide SaaS platforms that correlate multiple data sets (patent, financial, and people data) using scalable data science models, in order to answer fundamental questions related to patent and innovation strategy.

We provide patent intelligence to corporate IP departments and the defense sector. We are expanding to a larger commercial market, including technology transfer, venture capital, and financial institutions.

We are committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, pregnancy, disability, age, veteran status, or other characteristics.

Job Type: Full-time

Salary: $120,000.00 - $140,000.00 per year


  • Dental insurance
  • Health insurance
  • Paid time off
  • Vision insurance

Experience level:

  • 4 years


  • Monday to Friday

Application Question(s):

  • Do you now or in the future need work authorization sponsorship?
  • If selected for the final interview round, are you able to fly to Washington, DC for an in-person interview (all travel expenses paid)?


  • complex SQL queries and analysis: 3 years (Required)
  • AWS: 3 years (Required)
  • ML Ops: 3 years (Required)
  • building scalable ETL data pipelines: 3 years (Required)
  • Spark framework: 2 years (Required)

Work Location: Remote

Required profile

Match working
Remote location allowed
To apply to this offer, be sure you can work from :
United States
Match working
Spoken language(s)
Check out the description to know which languages are mandatory.

Find other similar jobs