Python+Spark Scala

Applications close on July 29, 2026

Job Description

Big Data & Spark Development

Develop and maintain data processing pipelines using Apache Spark (PySpark & Scala)

Work with Spark DataFrames, RDDs, and Spark SQL

Implement transformations, joins, aggregations, and optimizations

Tune Spark jobs for performance, scalability, and reliability

Python & Scala Programming

Write clean, efficient, and scalable code in Python and Scala

Develop modular and reusable components

Integrate data pipelines with various applications and APIs

ETL & Data Engineering

Design and build ETL workflows for structured and unstructured data

Extract data from multiple sources (databases, APIs, flat files)

Perform data cleansing, transformation, and validation

Ensure data accuracy, consistency, and completeness

Data Platforms & Integration

Work with Hadoop ecosystem (HDFS, Hive, Spark)

Handle large datasets in data lakes and warehouses

Process data in formats like Parquet, ORC, JSON, CSV

Collaboration & Support

Work with data engineers, analysts, and business stakeholders

Troubleshoot pipeline issues and provide production support

Participate in Agile/Scrum processes

Maintain technical documentation

Core Skills

2–5 years of experience in Python development

Hands-on experience with Apache Spark (PySpark and/or Scala)

Strong understanding of data processing and ETL concepts

Good knowledge of SQL and relational databases

Primary skills:Technology->Big Data – Data Processing->Spark,Technology->Java->Apache,Technology->Machine Learning->Python

MCA,MSc,MTech,Bachelor of Engineering,BCA,BSc,BTech