We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • 8+ years of technical expertise in complete software development life cycle (SDLC), which includes 6+ years of Hadoop Development and 2+ years of Core Java Development, Design and Testing.
  • Hands on experience working with Apache Spark and Hadoop ecosystems like MapReduce (MRv1 and YARN), Sqoop, Hive, Oozie, Flume, Kafka, Zookeeper and NoSQL Databases like Cassandra.Apache Spark:
  • Excellent knowledge on Spark Core architecture.
  • Hands on expertise in writing different RDD (Resilient Distributed Datasets) transformations and actions using Scala, Python and Java.
  • Created Data Frames and performed analysis using Spark SQL.
  • Acute knowledge on Spark Streaming and Spark Machine Learning Libraries.Apache Sqoop:
  • Used Sqoop to Import data from Relational Database (RDBMS) into HDFS and Hive, storing using different formats like Text, Avro, Parquet, Sequence File, ORC File along with compression codecs like Snappy and Gzip.
  • Performed transformations on the imported data and exported back to RDBMS.Apache Hive:
  • Experience in writing queries in HQL (Hive Query Language), to perform data analysis.
  • Created Hive External and Managed Tables.
  • Implemented Partitioning and Bucketing on Hive tables for Hive Query Optimization.Apache Oozie:
  • Experienced in writing Oozie workflows and coordinator jobs to schedule sequential Hadoop jobs.Apache Flume and Apache Kafka:
  • Used Apache Flume to ingest data from different sources to sinks like Avro, HDFS.
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Excellent knowledge and hands on experience on Fan Out and Multiplexing flows.
  • Excellent knowledge on Kafka Architecture.
  • Integrated Flume with Kafka, using Flume both as a producer and consumer (concept of FLAFKA).
  • Used Kafka for activity tracking and Log aggregation.
  • Good understanding of Relational Databases like MySQL.
  • Ability to write complex SQL queries to analyze structured data.
  • Experienced in using GIT, SVN.
  • Ability to deal with build tools like Apache Maven, SBT.
  • Excellent knowledge of Object - oriented analysis and design. Very good at analyzing the use requirements and using the design patterns.
  • Designed and developed Java enterprise and web applications using Java, J2EE, Spring framework, JDBC API and Hibernate.
  • Utilized the concepts of multi-threaded programming in developing applications.
  • Implemented unit test cases and documented all the code and applications.

TECHNICAL SKILLS:

Languages/Others: Java, Scala, Linux, AWS Hadoop - Apache Spark, MapReduce, Hive, Python, Stream Sets, Kafka, Flume, Oozie, Sqoop.

PROFESSIONAL EXPERIENCE:

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

  • Interacting with Business Analysts to understand the requirements behind BRD/FRD/SR
  • Low level/high level design documents for the application/framework components.
  • Design, develop and enhance enterprise applications in risk technology area using Big Technologies using Spark
  • Develop Spark framework components to process data and integrate with Hadoop, Oozie and Hive systems to perform CECL reserves Calculation and Aggregation.
  • Develop Batch processing for Model output validations
  • Develop data layer to persist account and non-account level data
  • Participate in Unit Testing, Integration Testing and UAT/SIT support
  • Responsible for fixing high priority issues in production environment and support in all application activities.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Building a Data Quality framework, which consists of a common set of model components and patterns thatcan be extended to implement complex process controls and data quality measurements using Hadoop.
  • Created and populated bucketed tables in Hive to allow for faster map side joins and for more efficient jobs and more efficient sampling. Also performed partitioning of data to optimize Hive queries.
  • Implemented DDL Curated Data Store logic using Spark Scala and Data frames concepts.
  • Used Spark, hive for implementing the transformations need to join the daily ingested data to historic data.
  • Enhanced the performance of queries and daily running spark jobs using the efficient design of partitioned hive tables and Spark logic.
  • Implemented the Spark Scala code for Data Validation in Hive
  • Implemented the automated workflows for all the jobs using the Oozie and shell script.
  • Used Spark SQL functions to move data from stage hive tables to fact and dimension tables in
  • Implemented dynamic partitioning in hive tables and used appropriate file format, compression
  • Work with Data Engineering Platform team to plan and deploy new Hadoop Environments and expand existing Hadoop clusters.

Confidential

Intern Hadoop & Java Developer

Responsibilities:

  • Imported trading and derivatives data in Hadoop Distributed File System using Eco System components MapReduce, Pig, Hive, Sqoop.
  • Was part of activity to setup Hadoop ecosystem at development & QA Environment.
  • Managed and reviewed Hadoop Log files.
  • Responsible writing PIG Script and Hive queries for data processing.
  • Running Sqoop for importing data from Oracle & Other Database.
  • Creation of shell script to collect raw logs from different machines.
  • Created Partitions in Hive as static and dynamic.
  • Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT and UNION.
  • Defined some PIG UDFs for some functions such as swap, hedging, Speculation and arbitrage.
  • Coded MapReduce program to process unstructured logs file.
  • Worked on Import and export data into HDFS and Hive using Sqoop.
  • Used parameterize Pig Script and optimized script using illustrate and explain.
  • Involved in the process of configuring HA, Kerberos security issues and name node failure restoration activity time to time as a part of zero downtime.
  • Implemented FAIR Scheduler as well.
  • Used Spring framework that handles application logic and makes calls to business, make them as Spring Beans.
  • Implemented, configured data sources, session factory and used Hibernate Template to integrate Spring framework with Hibernate.
  • Developed JUNIT test cases for application unit testing.
  • Used SVN as version control to check in the code, created branches and tagged the code in SVN.
  • Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.

Confidential

Hadoop Developer

Responsibilities:

  • Worked with Apache Kafka to get data from web servers through Flume.
  • Leveraged Flume to stream data from Spool Directory source to HDFS Sink using AVRO protocol.
  • Developed Scala scripts to parse clickstream data using complex RegEx.
  • Developed Pig UDFs for processing complex data making use of Eval, Load and Filter Functions.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented Hive queries using indexes and bucketing for time efficiency.
  • Implemented UDF’s, UDAF’s, UDTF’s in java for hive to process the data that can’t be performed using Hive inbuilt functions.
  • Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed data.
  • Implemented Oozie Coordinator to schedule the workflow, leveraging both data and time dependent properties.
  • Worked closely with BI and Data Science teams to gather requirements on data.
  • Debugging and troubleshooting the issues in MapReduce development using Test environments like MRUnit and JUnit.
  • Used Git as Version Control System and extensively used Maven as build tool.
  • Implemented Batch Data Import and also worked on Stream processing using Spark Streaming.
  • Developed this project using Spark in YARN mode and in-depth knowledge on Standalone mode.
  • Created RDDs on the log files and converted them to Data Frames.
  • Developed Spark SQL queries to perform analysis on the log data.
  • Used Hive Context to connect with Hive Metastore and write HQL queries.

Tools: and Technologies: Cloudera Manager (CDH5), MapReduce, HDFS, Sqoop, Pig, Hive, Oozie, Kafka, flume, Java, Git, Maven, Jenkins.

Confidential

Hadoop Developer

Responsibilities:

  • Hands on experience in loading data from UNIX file system to HDFS. Also performed parallel transfer of data from landing zone to the HDFS file system using DistCp.
  • Experienced on loading and transforming of large sets of structured and semi structured datafrom HDFS through Sqoop and placed in HDFS for further processing.
  • Designed appropriate partitioning/bucketing schema to allow faster data retrieval during analysis using HIVE.
  • Involved in processing the data in the Hive tables using HQL high-performance, low latencyqueries.
  • Transferred the analyzed data across relational database from HDFS using Sqoop enabling BIteam to visualize analytics.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Managing and scheduling Jobs on a Hadoop cluster using Airflow DAG.
  • Involved in creating Hive tables, loading data, and running hive queries in those data.
  • Extensive working knowledge of partitioned table, UDFs, performance tuning, compression related properties in Hive.
  • Work with Data Engineering Platform team to plan and deploy new Hadoop Environments and expandexisting Hadoop clusters.
  • Deploy Informatica objects in production repository.
  • Monitor and debug Informatica components in case of failure or performance issues.

Tools: and Technologies: HADOOP TECHNOLOGIES (SPARK, HIVE, IMPALA, SQOOP), INFORMATICA 9.1, ORACLE, AUTOSYS, UNIX

Confidential

Intern Java Developer

Responsibilities:

  • Involved in building and implementing the application using MVC architecture with Java Spring framework.
  • Used Hibernate as the Object-Relational mapping framework to simplify the transformation of business data between an application and relational database.
  • UsedJunit as the testing framework. Involved in developing test plans and test cases. Performed unit testing for each module and prepared code documentation.
  • Responsible for testing, analyzing, and debugging the software.
  • Applied design patterns and OO design concepts to improve the existing code base.
  • Involved in documentation of the module and project. Involved in providing post-production support.
  • Followed Agile Methodologiesto manage the life cycle of the project. Provided daily updates, sprint review reports, and regular snapshots of project progress.

We'd love your feedback!