Data Engineer
Weekday AI View all jobs
- Thiruvananthapuram, Kerala
- Permanent
- Full-time
- Design, develop, and maintain robust, scalable, and high-performance data pipelines using Python and PySpark.
- Build and optimize batch and real-time data processing systems capable of handling large-scale structured and unstructured data.
- Architect and implement data solutions on modern data platforms, ensuring high availability, fault tolerance, and performance efficiency.
- Collaborate with data scientists and analysts to understand data requirements and deliver clean, well-structured datasets for analytics and machine learning use cases.
- Ensure data quality, integrity, and governance by implementing validation, monitoring, and alerting mechanisms.
- Optimize data workflows and processing jobs to reduce latency and improve throughput.
- Work with cloud-based data ecosystems and distributed computing frameworks to manage large datasets efficiently.
- Mentor junior engineers and contribute to best practices in data engineering, code quality, and system design.
- Participate in code reviews, technical discussions, and architectural decisions.
- 8–16 years of experience in Data Engineering or related roles.
- Strong programming expertise in Python with a focus on writing clean, maintainable, and efficient code.
- Hands-on experience with PySpark and distributed data processing frameworks.
- Proven experience building large-scale data pipelines and ETL/ELT processes.
- Strong understanding of data modeling, data warehousing concepts, and database systems (SQL and NoSQL).
- Experience with big data technologies such as Apache Spark, Hadoop ecosystem, or similar frameworks.
- Familiarity with cloud platforms (AWS, Azure, or GCP) and their data services.
- Solid understanding of data structures, algorithms, and system design principles.
- Experience with workflow orchestration tools such as Airflow or similar.
- Strong problem-solving skills and ability to troubleshoot complex data issues.
- Experience working with real-time data streaming technologies (e.g., Kafka, Spark Streaming).
- Knowledge of containerization and orchestration tools like Docker and Kubernetes.
- Exposure to CI/CD pipelines and DevOps practices in data engineering.
- Experience in handling data security, compliance, and governance frameworks.