Machine Learning Intern

WadhwaniAI LEHS

Delhi
Contract
Full-time

3 days ago
Apply easily

This is a remote position.

Conduct experiments and reports results reliably, with guidance

Experiments includes (but not limited to):

Benchmark open-source LLMs (gpt-oss-120/20b, Llama models, etc.) against the proprietary LLMs (like OpenAI's GPT-4o/-mini, Gemini, etc.)

Experiment with LLMs to create the scalable and search-efficient KB, synthetic QA generation from the digital documents, and Prompting optimizations to build the scalable chatbots

Evaluate language translation models/services for Indian languages (e.g., Bhashini, Sarvam, Google Translate, etc.)

Assess Speech models for Indian languages for ASR (STT) and TTS tasks (e.g., Amazon Polly, AI4Bharat's Conformers, Sarvam, etc.)

Improve the existing RAG-QA pipeline by finding performance gaps, benchmark different Retrieval (Embedding) and chunking techniques

Gather, clean, analyze, and process the text and speech data for building the knowledge base (KB) for conversational chatbots.

Learns to derive insights from experiments and next steps

Watches incoming data regularly and performs quality checks

Collaborates with cross-functional teams to complete tasks on time

Proactively seeks help and required information from peers

Communicate the research findings in a clean and compact manner

Supports development of good and clean codebases with documentation of code and work consistently with high standards

Communicates and presents results effectively with peers

Stay updated with recent advancements in GenAI-LLMs, ASR (STT and TTS), RAG-QA, Evaluation of LLMs, etc. (that can be applied in our product)

Develops expertise with typical ML tooling such as Pandas, ML frameworks (Pytorch, Scikit-Learn), Excel (Pivot tables), Visualization libraries, Experiment monitoring (Weights & Biases), GitHub

Learns to work efficiently with tooling: Unix, VSCode, Google office suite, Calendar, Slack

Ability to work in a fast-paced startup environment

Eagerness to learn, and apply the latest research/work happening in the domain in the solution

Requirements

Good at AI/ML fundamentals

Good at LLM fundamentals like RAG-QA, Prompt Engineering, Evaluations, Vector Stores, Retrievals and Chunking, Fine-tuing, Synthetic data generation, model deployment, etc.

Experience with GenAI tools (but not limited to) like LangChain, LlamaIndex, LlamaParse, Langfuse, FAISS, Chroma, vLLM, OpenAI toolkits and SDK, etc.

Familiarity with OpenAI models and tools, Open-source LLM models like gpt-oss, Llama, Gemma, Mistral, Wave2Vec2, Bhashini's or AI4Bharat's language translation model, etc.

Familiarity with Docker, AWS, GCP is a plus

Strong Python coding and debugging skills, hands-on experience with some of the Data Science toolkits like Pandas, Numpy, Matplotlib/Seaborn, etc. and preferably at least one Deep Learning Framework among Pytorch (preferably), Keras, TensorFlow, etc.

Should have completed coursework in Probability, Linear Algebra, Calculus and preferably has some exposure to AI / Machine Learning.

Highly preferred to have demonstrated experience of working in the field via an internship or project. Do provide links to some of your open-source projects.

Prior exposure to Linux/Unix is expected before joining for the internship.

WadhwaniAI LEHS