Lead Data Scientist - NLP & Machine Learning

Posted 6 hours 30 minutes ago by Norton Blake

£100,000 - £120,000 Annual

Permanent

Not Specified

Other

England, United Kingdom

Job Description

Lead Data Scientist - NLP & Machine Learning, Fully Remote, £100,000 - £120,000 per annum

Overview

A great client of mine, a leader in the AI space are seeking a highly skilled Lead Data Scientist with a robust background in Natural Language Processing (NLP) and machine learning engineering. This role is pivotal in driving their advanced ML projects-from initial proof-of-concept through full-scale production deployment. The ideal candidate will combine deep technical expertise with a pioneering spirit, capable of transforming open-ended specifications into concrete, scalable solutions.

Key Responsibilities

Advanced NLP Development:

Design, build, and deploy NLP models including synthetic data generation, classifier training (SetFit), and fine-tuning large language models (LLMs) using modern techniques like LORA, QLORA, RL.

Data Modeling & Semantic Analysis:

Utilise semi-structured data models such as RDF and RDFS within machine learning contexts to enhance data integration and semantic understanding.
Build RDF-based components for use cases such as semantic similarity and search/retrieval.

Engineering & Deployment:

Develop robust APIs and scalable solutions using Fast API, Docker, and Azure.
Lead projects through all phases, from technical spikes/POCs to full production deployment.

Green Field Innovation:

Work on projects with vague or open-ended specifications, transforming abstract ideas into actionable, concrete components.

Software Design Patterns:

Implement and advocate for standard design patterns (eg, ports and adapters, provider patterns) to ensure system reliability and scalability.

Data Source Integration:

Interface with a variety of data sources including Postgres, Gremlin, and Neptune to build comprehensive and cohesive data solutions.

Required Skills & Experience

Expertise in NLP:

Proven experience in creating synthetic data, training classifiers, and deploying NLP models.
Experience with fine-tuning and deploying LLMs.
Familiarity with standard paradigms such as FTI (feature, training, and inference).
Highly proficient working within the HuggingFace ecosystem

Data Modeling Proficiency:

Solid understanding and hands-on experience with semi-structured data models (RDF, RDFS) in a machine learning environment.
Experience working with RDF based data sources ie JSON-LD, TTL, etc.

Engineering Acumen:

Strong software development skills with a track record of building and deploying applications using Python, Fast API, Docker, and Azure.
Experience in taking projects from initial technical proof-of-concept to production-ready deployment.

Adaptability & Innovation:

Demonstrated ability to work in green field environments, converting vague requirements into practical solutions.

Design Patterns Knowledge:

Familiarity with standard design patterns (eg, ports and adapters, provider patterns) to architect scalable systems.

Data Source Versatility:

Proficient in handling a diverse range of data sources such as Postgres, Gremlin, and Neptune.

Preferred Skills

Ontologies & Taxonomies:

Knowledge of SKOS taxonomies and OWL ontologies.
Experience applying NLP techniques in this context (eg, ontology embedding, text-to-SPARQL conversion).

Vector Search & RAG Applications:

Familiarity with vector search methodologies and developing Retrieval-Augmented Generation (RAG) applications, including reranking, embedding, and utilizing distributed platforms like Milvus and Qdrant.

Advanced Visualisation:

Experience with advanced data visualisation tools such as Plotly and Graphviz.

APPLY NOW!