We provide IT Staff Augmentation Services!

Spark And Hadoop Developer Resume

New, YorK


  • 5+ years of rich experience in the IT industry for delivering effective Database/ETL/Hadoop Development solutions to meet client’s complex technical / business requirements ; Customer - centric professional familiar with the Offshore - Onsite Software Services Framework
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Spark Streaming, Spark MLlib, Spark GraphX
  • Technical expertise in developing applications using Hadoop Eco System: HDFS, SPARK, Scala, Hive, Pig, Sqoop, Flume, YARN, Kafka, Oozie
  • Proficient in No-SQL Databases like HBase, MongoDB and Cassandra
  • Extensive experience working with Teradata, Oracle, Sybase and SQL Server Relational databases
  • Experience in developing and maintaining datawarehouse and datamarts using ETL tools like Informatica and Pentaho
  • Acquired relevant exposure while executing projects and gained knowledge of various technologies
  • HDFS, Yarn, Spark-Scala, Pig, Hive and HBase concepts within a short span of time
  • ETL, SQL Server, Teradata within a short span of time to implement Authentication module for SFG, iFM project
  • PERL language and Business Logic of BFM applications to implement new functionalities in Banking Financial Metrics project
  • Good Knowledge in Teradata, Netezza and Data warehousing modeling including Star Schema and Snowflake schema
  • Experience in working with Slowly Changing Dimensions and setting up Changing Data Capture (CDC) mechanism
  • Supported all phases of SDLC including Requirement Analysis, Coding, Testing, Defect Tracking, Implementation, Post- Implementation Support and End-User Trainings
  • Worked on large sets of structured, semi-structured and unstructured data
  • Worked under both SDLC methodology, Waterfall and Agile to in corporate highly collaborative and effective IT solutions to the customers
  • Exposure of working as a ETL Developer with proficiency in Extraction, Transformation & Loading (ETL) of Data from various sources into Data Warehouses and Data Marts using Informatica concepts such as CDC, Slowly changing Dimensions, concurrent workflows
  • Competencies in Performance Tuning of SQL for better performance, identifying & resolving performance bottlenecks in various levels Like Sources, Targets, Mappings & Sessions and providing assistance in Unit Testing and SQL Coding
  • Reviewing specifications provided by clients and developing High Level Design (HLD) & Low Level Design (LLD) Documents
  • Administering complete Design, Development, Coding change, Unit & Integration Testing, Troubleshooting & Debugging of the Software
  • Effective communicator with strong problem-solving, analytical & team-building skills; proven talent in guiding team members and enabling knowledge-sharing amongst them
  • Coordinated with the business on User Acceptance Tests (UAT) and to get approval from the business on the design
  • Demonstrated ability to complete projects in a challenging and fast paced environment where deliverables are met on time with good code quality and followed some best practices like pair programming and code review using Fisheye
  • Expertise in using source code control systems such as Subversion(SVN), GIT, JIRA under a continuous integration platform
  • Performed troubleshooting in the event of an issue with the applications and updated findings and fix to clients.
  • Self-motivated, able to handle multiple priorities and excellent time management skills.
  • Willingness and ability to easily adapt to learn any newer technology or software.


Big Data & Eco System: Hadoop 2x, Spark, Scala, YARN, Hive, Pig, Sqoop, Flume, Oozie

No SQL Databases: HBase, MongoDB, Apache Cassandra

Database: Sybase 15.7, Oracle 12g, SQL Server 2012, PostgreSQL, Teradata

Technologies: Data Warehousing, HTML

Database: Sybase 15.7, Oracle 10g, SQL Server 2008

Technologies: Data Warehousing, HTML, JavaScript

Tools: Informatica PC 10.1.1, Pentaho 5.4, JIRA, ServiceNow, Control-M

Version Control: PTC Integrity Client 10, Apache SVN, GIT, JIRA

IDE: Embarcadero Rapid SQL 8.7, Toad

Operating Systems: UNIX (IBM AIX, Ubuntu, Fedora), Windows NT/ 7/8/10

Programming Languages: T-SQL, PL/SQL, C++, Java

Scripting Languages: UNIX Shell Scripting, PERL, Python


SPARK and Hadoop Developer

Confidential, New York


  • Worked closely with administrators, architects, and application teams to insure applications are performing well and within agreed upon SLAs
  • Created file to Hadoop frameworks to ingest the data from 20 different sources into Hadoop
  • Loaded and transformed large sets of structured, semi structured and unstructured data into HDFS using sqoop
  • Used Spark to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
  • Designed and implemented the Spark Dataframes to read the data from HDFS
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's
  • Created tables in Hive and wrote Hive queries using Spark HiveContext
  • Worked on Oozie workflow engine to run multiple hive jobs and on schedulers
  • Involved in Agile process, monthly Sprints, and daily Scrums to discuss the development of the Application.

Environment: LINUX, Cloudera Hadoop Distribution(CDH5.6), Scala 2.11, Java 8, Hive, Apache spark 1.6.0, HDFS, Sqoop, Oozie, Kafka, Spark Streaming 1.6.0, Control-M

Hadoop and ETL Developer

Confidential, New York


  • Load and transform large sets of structured, semi structured and unstructured data into HDFS
  • Worked extensively with Sqoop for importing metadata from Teradata and implemented schema extraction for Parquet and Avro file Formats in Hive using compression techniques like Snappy, LZO and GZip for efficient management of cluster resources
  • Written and Implemented Teradata Fast load, Multiload and Bteq scripts, DML and DDL
  • Involved in creating Hive tables, and loading and analyzing data using hive queries and also Implemented Partitioning, Dynamic Partitions, Buckets in HIVE
  • Understanding the load process of all tables in existing DB2 process and export the data DB2 export utilities and load the data into Teradata staging & perm tables using specific load operator based on the volume
  • Expertise in UNIX shell scripts using bash-shell for the automation of processes and scheduling the jobs using wrappers
  • Knowledge and experience of ELT for Data Lake to ETL for the data servicing layer life cycle
  • Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism
  • Providing support in enhancing the:
  • Performance of SFG application by 80% by optimizing the Informatica workflows with help of CDC, Unconnected lookups, Unconnected stored procedures, concurrent workflow technique, etc.
  • Functionality of SFG project with the new requirements as per requirements
  • Performance of the application by optimizing the SQL queries
  • Developing the DB model to in corporate the Change Request as suggested by the client
  • Planning the jobs and workflows to meet the business requirements
  • Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target Data warehouse

Environment: Informatica, Sybase, Oracle, UNIX shell scripting, Teradata, HDFS, Hive, Shell scripting, python, Control-M

Programmer Analyst Trainee(Internship)



  • Contributing in enhancing the:
  • FIG Credit Opinion application to automate all the executable & improve the performance of the UNIX based application
  • PWB, FIGWB and XRATIOS applications to include ratios and other level of validation as part of M&E changes
  • Performed enhancement of the PERL based business validations as part of BRM3 project
  • Migrated few components of BFM applications from PERL to Java
  • Provided support during Migration of the applications from Sybase 12.5 to Sybase 15 version and optimized queries to improve the performance of the applications

Environment: UNIX, Sybase, Oracle, PERL, C++, Java

Hire Now