Python, PySpark, ETL Developer
Infosys Limited Hyderabad
Applications close on July 25, 2026
- PySpark
- Python - Big Data
Job Description
Responsibilities
Data Pipeline Development
- Develop and maintain scalable batch ETL pipelines using Python and PySpark for data ingestion, transformation, and loading.
- Implement reusable transformation logic, ensuring pipelines are modular, testable, and easy to maintain.
- Optimize Spark jobs for performance (partitioning, caching, joins, shuffles) and cost efficiency.
Data Quality & Reliability
- Apply data validation checks, handle schema evolution, and ensure accuracy and completeness of processed datasets.
- Troubleshoot pipeline failures, analyze logs, and implement robust error handling and retry mechanisms.
- Monitor job runs and support operational stability through alerts, runbooks, and timely incident resolution.
Collaboration & Delivery
- Work with cross-functional teams to gather requirements, define data mappings, and deliver datasets aligned to business needs.
- Participate in code reviews, follow engineering best practices, and contribute to continuous improvement of standards and tooling.
- Document pipeline logic, dependencies, and operational procedures for smooth handovers and long-term maintainability.
Additional Responsibilities
- Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field (or equivalent practical experience).
- 2–5 years of hands-on experience building data pipelines using Python and PySpark.
- Strong understanding of ETL concepts, data transformations, and handling large-scale datasets.
- Proficiency in writing clean, maintainable code and debugging production issues.
- Working knowledge of data structures, algorithms, and software development best practices.
Technical and Professional Requirements
Technology->Analytics – Packages->Python – Big Data,Technology->Big Data – Data Processing->PySpark, ETL
Preferred Skills
- Python – Big Data
- PySpark
Educational Requirements
Bachelor of Engineering