We provide IT Staff Augmentation Services!

Senior Hadoop & Spark Developer Resume

5.00/5 (Submit Your Rating)

Dallas, TX

PROFESSIONAL SUMMARY:

  • 8+ years of IT experience and over 4+ years in working with Big Data and cloudera in Banking, Medical, telecom and Retail. Involved in various SDLC methods from analysis, design, development, testing, implementation and maintenance with timely delivery against aggressive deadlines in both Agile/Scrum andWaterfall methodology.
  • Good experience in Team Leadership with green belt certification and sound knowledge of DMIAC and KAIZEN methodology & excellent Communication, Management and Presentation skills.
  • Experience on Big Data Ecosystem using Hadoop framework and related technologies such as HDFS, HBase, Map Reduce, Hive, Pig, Flume, Oozie, Kafka, Sqoop, Zookeeper, YARN, Spark (PySpark & Spark - shell), Cassandra, NiFi
  • Experience in Python, Scala, Java, SQL and R programming
  • Experience in Multiple Relational Databases primarily like Oracle, SQL Server, MySQL and knowledge of non-relational and NoSQL database
  • Experience with AWS components and services, particularly EMR, S3 and EC2
  • Hands on experience in Object Oriented programming methodologies (OOPS) and object-oriented features like Inheritance, Polymorphism, Exception handling and development experience with Java technologies
  • Extensive experience with SQL, PL/SQL and database concepts, developed stored procedures and queries using PL/SQL.
  • Effective in working independently and collaboratively in teams.
  • Strong experience on Hadoop distributions like HortonWorks.
  • Experience in designing and developing applications in Spark using Java API, Pyspark and Scala API
  • Experienced in developing Nifi data flow processors that work with different file formats like Text, JSON, Parquet and Avro.
  • Extensive Experience on importing and exporting data using stream processing platforms like Kafka.
  • Developed multiple Kafka Producers and Consumers from scratch as per the business requirements.
  • Executed HIVE Commands for reading, writing, and managing large datasets residing in distributed storage(HDFS).
  • Importing and exporting the data from relational databases, NO SQL DB’S
  • Managing large set of hosts, Co-coordinating and managing a service in a distributed environment using Zookeeper.
  • To improve flexibility, great performance and for big data horizontal scaling, used column based NOSQL database.
  • Ability to analyze different file formats Avro and Parquet.
  • Worked as part of an Agile Team serving as a developer to customize, maintain, and enhance a variety of applications to handle big data

SKILL:

DATA INGESTION: SQOOP, KAFKA, FLUME, NIFI, Apache Hadoop Eco Systems

DATA PROCESSING: SPARK, IMPALA, YARN, MAP REDUCE

DISTRIBUTED STORAGE AND COMPUTING: hdfs, ZOOKEEPER

Data Formats: PARQUET, SEQUENCE, AVRO, ORC, CSV, JSON

PROGRAMING LANGUAGES: PYTHON, SCALA, SQL & JAVA

MONITORING: AMBARI, CLOUDERA MANAGER

RELATIONAL DATABASES: ORACLE, MYSQL, MICROSOFT SQL SERVER, ORACLE SQL

NOSQL DATABASES: MONGODB, CASSANDRA, HBASE, DYNAMODB

CLOUD AWS: EMR, EC2, S3, DyDB

VERSION CONTROL: GIT, SVN

OPERATING SYSTEM: LINUX, WINDOWS, UNIX

EXPERIENCE:

Senior hadoop & Spark developer

Confidential, DALLAS, TX

Environment: Spark Streaming, Spark SQL, Spark Core, HDFS, S3, EMR, Impala, Kafka, Sqoop, Oozie, Cloudera Manager, Apache NiFi, Zoo Keeper.

Responsibilities:

  • Involved in project Life Cycle - from analysis to production implementation, with emphasis on identifying the source and source data validation, developing logic and transformation as per the requirement and creating mappings and loading the data into different targets
  • Loaded periodic incremental imports of structured batch data from various RDBMS to HDFS using Sqoop
  • Implemented Kafka consumers for HDFS and Spark Streaming
  • Used Spark Streaming to preprocess the data for real-time data analysis
  • Involved in writing query using Impala for better and faster processing of data. Implemented Partitioning in Impala for faster and efficient data access
  • Worked on reading multiple data formats such as Avro, Parquet, ORC, JSON including Text
  • Spark transformation scripts using API’s like Pyspark, Spark Core and Spark SQL in Scala
  • Experience in Amazon AWS to spin up the EMR cluster to process the data which is stored in Amazon S3
  • Worked on writing custom Spark Streaming API’s to ingest the data to Elastic Search post the data enrichment in Spark.
  • Worked on Apache Nifi in implementing basic workflows using prebuilt processors
  • Worked with the team in visualizing data using Tableau

Senior hadoop developer

Confidential, GA

Environment: Hadoop, HDFS, AWS, Scala, Kafka, MapReduce, YARN, Spark, Pig, Hive, Scala, Java, NiFi, HBase, IMS Mainframe, Maven Utilized SQOOP, Kafka, Flume and Hadoop File System API’s for implementing data ingestion pipelines from heterogenous data Sources.

Responsibilities:

  • Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage.
  • Worked on real time streaming, performed transformations on the data using Kafka and Spark Streaming.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Created data pipeline for different events of ingestion, aggregation and load consumer response data from AWS S3 bucket into Hive external tables and generated views to serve as feed for tableau dashboards.
  • Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and XML.
  • Used Apache NiFi to automate data movement between different Hadoop components and perform conversion of raw XML data into JSON, AVRO.

HADOOP ENGINEER

Confidential, AR

Environment: Java/J2EE, JSP, HTML, Eclipse, SPARK, Map Reducer, PIG Scripts, HBASE, Hive, HDFS & Sqoop.

Responsibilities:

  • Developed Mobile Manager User interface (UI) which logs all the transaction flow between different interfaces.
  • Developed the application in Eclipse Environment.
  • Hibernate 3.0 was used to develop persistence layer. Custom DAOs were developed to retrieve the records from ORACLE database.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved Implementing POCs using Spark shell commands to process the huge data and compare the process time.
  • Involved in creating Hive tables loading data and writing queries that will run internally in MapReduce way.
  • Used Pig tool to do transformations, event joins, filter, and some pre-aggregations.
  • Involved in processing ingested raw data using MapReduce, Apache Pig and HBase.
  • Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • SVN version control system has been used to check-in and check-out the developed artifacts.

HADOOP PYTHON DEVELOPER

Confidential, CA

Environment: Python, HTML, Django HTML, CSS, Oracle SQL, SQL Database, XML, JSON.

Responsibilities:

  • Written MapReduce code to parse the data from various sources and storing parsed data into HBase and Hive.
  • Developed Kafka producer and consumers, HBase and Hadoop MapReduce jobs along with components on HDFS, Hive. .
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Experienced in managing and reviewing the Hadoop log files, used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Written helper classes using the Java Collection Framework, Written, JUnit Test Cases for the classes developed.
  • Utilized Flume to filter out the input data read to retrieve only the data needed to perform analytics by implementing flume interception. Implemented logic using Python.
  • Involved in multiple phases of Software development lifecycle (SDLC).
  • Involved in Requirement analysis, Project planning, Database designing and Report creation
  • Manipulation and fetching of data on SQL Database.
  • Developed automated scripts using Python Shell Scripting for data collection and transfer to databases.
  • Worked with source version control tool GIT to manage the code repository.
  • Testing, Debugging and fixing issues related to Automation framework.

Java Programmer

Confidential

Environment: Java1.6, Struts 1.x, J2EE, HTML, JavaScript, Servlets, JSP, JDBC, Eclipse, XML, XSLT, XSD, EJB, Ant, JUnit, Tomcat, Windows XP, UNIX, Oracle10g, CSS Style Sheets

Responsibilities:

  • Qualified in UNIX shell scripting as part of the implementation process. Used Eclipse Galileo IDE for application progress.
  • Wrote SQL queries, PL/SQL stored procedures, involved in modifications to the existing database structure.
  • Actively worked on developing the automatic build scripts by using ANT for application to organize and test in Servers. Wrote JUnit 3 Test Cases for unit testing various modules of application
  • Worked using XML documents with XSLT and CSS to interpret the content into HTML also validated XML using DTD's and XSD's. Used Struts validators and 1.x tiles concept for front end execution.
  • Involved in analyzing business needs as a part of releases as a Developer in Java,
  • Developed the application by using various design patterns like DAO, Singleton and Session Facade.

We'd love your feedback!