Skip to content
getujobs
Back to jobs Posted on 29/06/2026

Scala, Spark/pyspark

  • Infosys Limited
  • Bangalore
  • 5 - 9 Years
  • Full Time
  • PySpark
  • Scala
  • SparkSQL

Applications close on July 29, 2026


Job Description

Responsibilities

Big Data & Spark Development

Design and implement scalable data pipelines using Apache Spark (Scala and/or PySpark)

Work extensively with Spark Core, Spark SQL, DataFrames, and Datasets

Develop batch and real-time data processing solutions using Spark Streaming / Structured Streaming

Optimize Spark jobs for performance, memory management, and parallel processing

Scala & Python Development

Develop robust and efficient applications using Scala and Python

Write reusable, modular, and maintainable code

Implement business logic and transformations on large datasets

Data Engineering & ETL

Build and maintain ETL/ELT pipelines for large-scale data ingestion and transformation

Process structured and unstructured data from multiple sources

Ensure data validation, quality, and consistency

Work with file formats like Parquet, ORC, Avro, JSON, CSV

Big Data Ecosystem

Work with Hadoop ecosystem (HDFS, Hive, YARN)

Integrate Spark jobs with data lakes and warehouses

Handle large datasets with distributed computing techniques

Cloud & Integration (Optional but Preferred)

Work with cloud platforms (AWS/Azure/GCP) for big data solutions

Utilize services such as AWS EMR, Glue, S3 / Azure Databricks / Synapse

Integrate pipelines with APIs and external systems

Collaboration & Leadership

Collaborate with data engineers, architects, and business teams

Lead technical discussions and provide guidance to junior developers

Participate in code reviews and best practice implementation

Work in Agile/Scrum environments

Additional Responsibilities

Core Skills

5–9 years of experience in data engineering / big data development

Strong hands-on expertise in Scala (mandatory for this role)

Extensive experience with Apache Spark (Scala and/or PySpark)

Solid understanding of ETL processes and data pipelines

Strong proficiency in SQL and database concepts

Technical Skills

Deep knowledge of Spark architecture and execution model

Experience with Spark performance tuning and optimization

Strong data modeling and warehousing concepts

Familiarity with version control tools (Git)

Understanding of distributed computing principles

Preferred Skills

Experience with Spark Streaming / Kafka

Hands-on with Databricks platform

Knowledge of Airflow or workflow orchestration tools

Familiarity with Docker/Kubernetes

Exposure to NoSQL databases (Cassandra, MongoDB, HBase)

Technical and Professional Requirements

  • Primary skills:Domain->Finacle-Core-Functional->Finacle-Core-WMS->Grand Master,Technology->Big Data – Data Processing->Spark,Technology->Java->Apache

Preferred Skills

  • SparkSQL
  • Scala
  • PySpark

Educational Requirements

MCA,MSc,MTech,Bachelor of Engineering,BCA,BSc,BTech