**Machine Learning Developer/Natural Language Processing Engineer**
**Location: Remote (Romania)**
**Work Schedule: Pacific Standard Time**
Responsibilities:
- Model Development and Pre-Training:
- Research and implement advanced pre-training techniques for NLP models, including self-supervised learning and masked language modeling.
- Fine-tune transformer-based architectures (e.g., BERT, GPT, T5) to achieve state-of-the-art performance on domain-specific tasks.
- Post-Training and RAG Implementation:
- Develop and optimize retrieval-augmented generation pipelines, leveraging tools like LangChain for knowledge integration.
- Speech-to-Text Systems:
- Evaluate and deploy automatic speech recognition (ASR) tools such as Whisper, DeepSpeech, or Kaldi, ensuring high accuracy across diverse audio datasets.
- Model Deployment and Optimization:
- Collaborate with DevOps teams to deploy NLP models in production environments, ensuring scalability, low latency, and reliability.
- Data Processing and Feature Engineering:
- Build and maintain pipelines for tokenization, text normalization, and feature extraction tailored to specific use cases.
- Optimize large-scale datasets for efficient model training and inference.
Requirements:
Educational Background:
- Bachelor's, Master's, or PhD in Natural Language Processing, Machine Learning, Artificial Intelligence, Data Science, or a related field.
Experience:
- Minimum 3 years of professional experience in NLP, machine learning, or related fields.
Technical Skills:
- Expertise in transformer architectures (e.g., BERT, GPT, T5) and pre-training methodologies.
- Experience with post-training tools like LangChain and retrieval-based frameworks (RAG).
- Familiarity with ASR engines, such as Whisper, DeepSpeech, Kaldi, or Julius, and their integration into NLP pipelines.
- Strong programming skills in Python, with experience in libraries like Hugging Face Transformers, PyTorch, and TensorFlow.
- Knowledge of model deployment technologies (e.g., Docker, Kubernetes) and serving frameworks like FastAPIor TorchServe.
- Proficiency with data processing tools such as spaCy, NLTK, and OpenAI APIs.
Preferred Skills:
- Experience with optimizing NLP models for real-time inference and low-latency environments.
- Familiarity with vector search engines like FAISS, Pinecone, or Weaviate.
- Understanding of domain-specific language tasks such as named entity recognition (NER), text summarization, or sentiment analysis.