This is a remote position.As our Data Engineering Intern, you’ll be at the core of how we train AI. From scraping raw data to refining, annotating, and scripting it into usable formats — you’ll learn how real AI products are built from the ground up. This is not a “clean spreadsheets” internship. This is for someone who wants to see how messy, massive, and magical data becomes intelligence.What You’ll OwnScrape, collect, and organize datasets from multiple sources (web, APIs, communities)Refine and clean messy data into structured, usable formatsAnnotate and label datasets for machine learning trainingWrite basic scripts (Python/SQL/automation) to process and transform dataSupport dataset quality checks, error spotting, and pipeline improvementsWork closely with the AI training team to test and iterate data workflowsContribute to building scalable data playbooks for future AI projectsRequirementsWe’re Excited About You If…You’re curious about how data powers AI modelsYou know your way around Excel/Sheets, and are comfortable experimenting with Python or SQLYou can handle repetitive tasks without losing attention to detailYou enjoy solving small data puzzles — spotting errors, fixing inconsistencies, refining structuresYou’re a self-starter who can Google your way out of blockersBonus: You’ve worked on a data project (scraping, analysis, or ML experiments) in college or personallyBenefitsWhy Parikshak.aiWe’re not just training another AI model. We’re building India’s first AI-native, prompt-to-hire™ platform — and data is our foundation. This is your chance to:Get hands-on experience with real datasets used in AI trainingLearn the “behind-the-scenes” work that powers AI productsWork directly with product + AI teams, not just sit siloed on data cleaningBuild a portfolio of data projects that go beyond classroom assignmentsPerks & DetailsRemote-first, async-flexibleReal mentorship and feedback loopsPerformance-based stipend, with conversion opportunitiesGreat for students / freshers curious about data + AIYour work will directly fuel the training of AI models that ship live