Sr. Site Reliability Engineer (Hadoop, Product Support, Hive)
Visa
- Bangalore, Karnataka
- Permanent
- Full-time
- Single window support: Leverage deep understanding of Hadoop and its related tools, especially Hive, Spark, HDFS and do complete root cause analysis, whether it is platform, data or user code related.
- System configuration: Recommend necessary changes to the system to the DAP platform engineering team by checking system activity and user logs for triaging and troubleshooting.
- Performance tuning: Direct team members on crafting efficient queries, leveraging expertise in performance tuning and optimization strategies for big data technologies.
- Issue resolution across tech teams: Troubleshoot and resolve complex technical issues. Identify root causes, find which tech/data platform team can fix it, and coordinate among those teams.
- Reliability engineering: Create reports to define performance and resolution metrics for proactively identifying issues and generating alerts.
- Office hours and liaising: Participate in calls across regions in multiple time zones to ensure timely client delivery.
- Knowledge cataloging and sharing: Share knowledge and cross-train peers across geographic regions using wikis and communications. Provide comms around issues/outages affecting multiple users.
- Develop standards: The team would prepare standard configuration for a variety of VCA workloads to make the jobs run with optimal settings to maintain good cluster health while executing the jobs efficiently.
- Continuous learning of VCA workload: Continuously learn and stay updated with the changing nature of data science jobs to help improve cluster utilization also with active engagement, collaboration, effective communication, quality, integrity, and reliable delivery, develop and maintain a trusted and valued relationship with the team, customers, and business partners.
- 2+ years of relevant work experience and a Bachelors degree, OR 5+ years of relevant work experience
- Hands on experience working as a Hadoop system engineer in managing Hadoop platforms.
- Ability to solve complex production problems and debug code.
- Strong understanding on data pipelines built using PySpark, Hive, Airflow
- Experience working with scheduling tools (Airflow, Oozie) or building data processing orchestration workflows.
- Experience in tuning application performance on Hadoop platforms.
- Good knowledge on Hadoop eco-system such as Zookeeper, HDFS, Yarn, HIVE and SPARK.
- Hands-on experience in debugging Hadoop issues both on platform and applications.
- Understanding of Linux, networking, CPU, memory, and storage.
- Knowledge/Experience in Python.
- Excellent written and verbal communication skills is a must have.
- Enjoy working fast and smart, and able to grasp complex concepts and functionalities.