Lead Data Scientist - NLP & Machine Learning
Posted 6 hours 30 minutes ago by Norton Blake
£100,000 - £120,000 Annual
Permanent
Not Specified
Other
England, United Kingdom
Job Description
Lead Data Scientist - NLP & Machine Learning, Fully Remote, £100,000 - £120,000 per annum
Overview
A great client of mine, a leader in the AI space are seeking a highly skilled Lead Data Scientist with a robust background in Natural Language Processing (NLP) and machine learning engineering. This role is pivotal in driving their advanced ML projects-from initial proof-of-concept through full-scale production deployment. The ideal candidate will combine deep technical expertise with a pioneering spirit, capable of transforming open-ended specifications into concrete, scalable solutions.
Key Responsibilities
- Advanced NLP Development:
- Design, build, and deploy NLP models including synthetic data generation, classifier training (SetFit), and fine-tuning large language models (LLMs) using modern techniques like LORA, QLORA, RL.
- Data Modeling & Semantic Analysis:
- Utilise semi-structured data models such as RDF and RDFS within machine learning contexts to enhance data integration and semantic understanding.
- Build RDF-based components for use cases such as semantic similarity and search/retrieval.
- Engineering & Deployment:
- Develop robust APIs and scalable solutions using Fast API, Docker, and Azure.
- Lead projects through all phases, from technical spikes/POCs to full production deployment.
- Green Field Innovation:
- Work on projects with vague or open-ended specifications, transforming abstract ideas into actionable, concrete components.
- Software Design Patterns:
- Implement and advocate for standard design patterns (eg, ports and adapters, provider patterns) to ensure system reliability and scalability.
- Data Source Integration:
- Interface with a variety of data sources including Postgres, Gremlin, and Neptune to build comprehensive and cohesive data solutions.
Required Skills & Experience
- Expertise in NLP:
- Proven experience in creating synthetic data, training classifiers, and deploying NLP models.
- Experience with fine-tuning and deploying LLMs.
- Familiarity with standard paradigms such as FTI (feature, training, and inference).
- Highly proficient working within the HuggingFace ecosystem
- Data Modeling Proficiency:
- Solid understanding and hands-on experience with semi-structured data models (RDF, RDFS) in a machine learning environment.
- Experience working with RDF based data sources ie JSON-LD, TTL, etc.
- Engineering Acumen:
- Strong software development skills with a track record of building and deploying applications using Python, Fast API, Docker, and Azure.
- Experience in taking projects from initial technical proof-of-concept to production-ready deployment.
- Adaptability & Innovation:
- Demonstrated ability to work in green field environments, converting vague requirements into practical solutions.
- Design Patterns Knowledge:
- Familiarity with standard design patterns (eg, ports and adapters, provider patterns) to architect scalable systems.
- Data Source Versatility:
- Proficient in handling a diverse range of data sources such as Postgres, Gremlin, and Neptune.
Preferred Skills
- Ontologies & Taxonomies:
- Knowledge of SKOS taxonomies and OWL ontologies.
- Experience applying NLP techniques in this context (eg, ontology embedding, text-to-SPARQL conversion).
- Vector Search & RAG Applications:
- Familiarity with vector search methodologies and developing Retrieval-Augmented Generation (RAG) applications, including reranking, embedding, and utilizing distributed platforms like Milvus and Qdrant.
- Advanced Visualisation:
- Experience with advanced data visualisation tools such as Plotly and Graphviz.
APPLY NOW!