Bachelor’s degree in Linguistics, Data Analytics, Engineering, Computer Science, Statistics, Artificial Intelligence, NLP or similar., Proficiency in multiple languages, with preference for candidates fluent in several., Understanding of syntax and structural analysis of languages., Experience with SQL, Microsoft Excel, and data analysis tools like Python..
Key responsabilities:
Analyze and improve data quality of multilingual text classifiers.
Collaborate with linguistics and engineering teams to develop new parsers across languages.
Translate taxonomies such as Skills, Titles, and Occupations into various languages.
Annotate data for model training and validation, ensuring adherence to Lightcast standards.
Report This Job
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
Emsi Burning Glass is now Lightcast.
Our name changed to Lightcast in 2022, but our dedication to providing the world’s best data-driven talent strategies remains the same. We’re going to continue bringing clarity to the labor market, guiding our customers through a complex and changing world and giving them the competitive advantage they demand.
Our mission is to unlock new possibilities in the labor market.
The primary expectation for this role as a data analyst for the linguistics team is proficiency in multiple languages, enabling you to effectively manage, develop, and optimize linguistic resources. You will be assigned languages based on fluency and your role will be to foster these languages and develop them for a multitude of products delivered to customers. Your job will be to build and maintain these languages per our Lightcast standards and help in the development of further features. To fill this role we are looking for a dynamic and multilingual person that will quickly learn the ins and outs of the role in order to become an active part of a multicultural team.
Major Responsibilities:
Analyze and improve data quality of multilingual text classifiers
Work with linguistics and engineering teams to build out new parsers across languages
Translate various taxonomies such as Skills, Titles, and Occupations.
Create crosswalks from origin language titles and skills to Lightcast taxonomies
Use SQL for data handling and database management
Annotate data used for model training and validation
Skills/Abilities:
Competency in any of the following languages: Arabic, Chinese, Croatian, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish, Turkish, Vietnamese.
(Preference will be given to candidates that possess multiple)
Understanding of syntax and structural analysis of languages
Microsoft Excel experience (including vlookups, data cleanup, and functions)
Knowledge of query languages such as SQL
Knowledge of text analysis or machine learning principles
Experience with data analysis using tools such as Excel or Python
Knowledge of RegEx
Education and Experience:
Bachelor’s degree in Linguistics, Data Analytics, Engineering, Computer Science, Statistics, Artificial Intelligence, NLP or similar.
Strong linguistics knowledge
Lightcast is a global leader in labor market insights with headquarters in Moscow (ID) with offices in the United Kingdom, Europe, and India. We work with partners across six continents to help drive economic prosperity and mobility by providing the insights needed to build and develop our people, our institutions and companies, and our communities.
Lightcast is proud to be an equal opportunity workplace and is committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. Lightcast has always been, and always will be, committed to diversity, equity and inclusion. We seek dynamic professionals from all backgrounds to join our teams, and we encourage our employees to bring their authentic, original, and best selves to work.