AWS Data Engineer

Haryana
Permanent
Full-time

28 days ago
Apply easily

Role and responsibilities· Understands the process flow and the impact on the project module outcome.· Works on coding assignments for specific technologies basis the project requirements and documentation available· Debugs basic software components and identifies code defects.· Focusses on building depth in project specific technologies.· Expected to develop domain knowledge along with technical skills.· Effectively communicate with team members, project managers and clients, as required.· A proven high-performer and team-player, with the ability to take the lead on projects.· Design and create S3 buckets and folder structures (raw, cleansed_data, output, script, temp-dir, spark-ui)· Develop AWS Lambda functions (Python/Boto3) to download Bhav Copy via REST API and ingest into S3· Author and maintain AWS Glue Spark jobs to:– partition data by scrip, year and month– convert CSV to Parquet with Snappy compression· Configure and run AWS Glue Crawlers to populate the Glue Data Catalog· Write and optimize AWS Athena SQL queries to generate business-ready datasets· Monitor, troubleshoot and tune data workflows for cost and performance· Document architecture, code and operational runbooks· Collaborate with analytics and downstream teams to understand requirements and deliver SLAsTechnical skills requirementsThe candidate must demonstrate proficiency in,· 3+ years’ hands-on experience with AWS data services (S3, Lambda, Glue, Athena)· PostgreSQL basics· Proficient in SQL and data partitioning strategies· Experience with Parquet file formats and compression techniques (Snappy)· Ability to configure Glue Crawlers and manage the AWS Glue Data Catalog· Understanding of “serverless” architecture and best practices in security, encryption and cost control· Good documentation, communication and problem-solving skillsNice-to-have skills· SQL Database· Experience in Python (or “Rubby”) scripting to integrate with AWS services· Familiarity with RESTful API consumption and JSON processing· Background in financial markets or working with large-scale time-series data· Knowledge of CI/CD pipelines for data workflows

Qode