
Assistant Manager
- Noida, Uttar Pradesh
- Permanent
- Full-time
- Design, develop and maintain ETL data pipelines using BigQuery, SQL and Python/Pyspark
- Process and transform large-scale structured and unstructured datasets cleaning, preprocessing, and feature engineering using advanced ML algorithms.
- Analyze, understand and migrate existing ETL processes from SAS to BigQuery/ Python/PySpark
- Develop workflows for data ingestion, transformation, and loading on GCP (BigQuery, Cloud Storage, Dataflow)
- Generate effective data visualizations and insights using Tableau/PowerBI
- Ensure data quality, consistency and reliability during migration and in production pipelines
- 3+ years of Experience in ETL development, Data Engineering, Data Management with strong focus on GCP, SQL, Python/Pyspark and AI/ML for building production ready Python-based AI/ML pipelines and APIs for integration with production systems.
- Strong HEOR/RWE/Clinical Trials or US healthcare data (medical/pharmacy claims, Labs and enrollment data.
- Strong understanding of data preprocessing, feature selection, and model evaluation techniques. Apply ML/DL frameworks (e.g., TensorFlow, PyTorch, Keras) for NLP, and predictive analytics.
- Strong hands-on experience with GCP BigQuery, Data Proc, Airflow DAG, Dataflow, GCS, Pub/sub, Secret Manager, Cloud Functions, Beams, CI/CD workflows, Git.
- Strong data visualization skills with Tableau/PowerBI.
- Good understanding of SAS/SQL data step logic and procedures; experience in migrating ETL jobs from SAS to Python
- Strong problem solving and communication skills.
- Good understanding of Generative AI and LLM fine-tuning.
- Familiar with cloud AI services (AWS, Azure ML/AI Platform) is a plus.