Scala, Spark/pyspark
-
Infosys Limited
- Bangalore
- 5 - 9 Years
- Full Time
- PySpark
- Scala
- SparkSQL
Applications close on July 29, 2026
Please sign in or register for free to apply.
Job Description
Responsibilities
Big Data & Spark Development
Design and implement scalable data pipelines using Apache Spark (Scala and/or PySpark)
Work extensively with Spark Core, Spark SQL, DataFrames, and Datasets
Develop batch and real-time data processing solutions using Spark Streaming / Structured Streaming
Optimize Spark jobs for performance, memory management, and parallel processing
Scala & Python Development
Develop robust and efficient applications using Scala and Python
Write reusable, modular, and maintainable code
Implement business logic and transformations on large datasets
Data Engineering & ETL
Build and maintain ETL/ELT pipelines for large-scale data ingestion and transformation
Process structured and unstructured data from multiple sources
Ensure data validation, quality, and consistency
Work with file formats like Parquet, ORC, Avro, JSON, CSV
Big Data Ecosystem
Work with Hadoop ecosystem (HDFS, Hive, YARN)
Integrate Spark jobs with data lakes and warehouses
Handle large datasets with distributed computing techniques
Cloud & Integration (Optional but Preferred)
Work with cloud platforms (AWS/Azure/GCP) for big data solutions
Utilize services such as AWS EMR, Glue, S3 / Azure Databricks / Synapse
Integrate pipelines with APIs and external systems
Collaboration & Leadership
Collaborate with data engineers, architects, and business teams
Lead technical discussions and provide guidance to junior developers
Participate in code reviews and best practice implementation
Work in Agile/Scrum environments
Additional Responsibilities
Core Skills
5–9 years of experience in data engineering / big data development
Strong hands-on expertise in Scala (mandatory for this role)
Extensive experience with Apache Spark (Scala and/or PySpark)
Solid understanding of ETL processes and data pipelines
Strong proficiency in SQL and database concepts
Technical Skills
Deep knowledge of Spark architecture and execution model
Experience with Spark performance tuning and optimization
Strong data modeling and warehousing concepts
Familiarity with version control tools (Git)
Understanding of distributed computing principles
Preferred Skills
Experience with Spark Streaming / Kafka
Hands-on with Databricks platform
Knowledge of Airflow or workflow orchestration tools
Familiarity with Docker/Kubernetes
Exposure to NoSQL databases (Cassandra, MongoDB, HBase)
Technical and Professional Requirements
- Primary skills:Domain->Finacle-Core-Functional->Finacle-Core-WMS->Grand Master,Technology->Big Data – Data Processing->Spark,Technology->Java->Apache
Preferred Skills
- SparkSQL
- Scala
- PySpark
Educational Requirements
MCA,MSc,MTech,Bachelor of Engineering,BCA,BSc,BTech