We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Torrance, CA

SUMMARY:

  • 7 years of Professional Experience in Implementation and Application Support projects which includes 4+ years on Big Data technologies such as Hadoop, Kafka, NiFi, and Couchbase .
  • Expertise in Big Data technologies using Hortonworks distribution and its ecosystem like HDFS, MapReduce (MRV1,MRV2/YARN), Apache PIG, Apache Spark, Apache HBase, Apache Hive, Apache Sqoop, Apache Zookeeper, Apache Flume, Apache Oozie, Apache Cassandra, Cloudera Hue.
  • In depth understanding / knowledge of the Hadoop Architecture and its various components such as HDFS, Resource Manager, Node Manager, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node and MapReduce concepts.
  • Experienced in importing and exporting data from relational database into / from HDFS using Sqoop.
  • Extensively worked on creating complex MapReduce (MR) Batch programs to perform Big Data processing and analysis using Pig Latin and customized core JAVA UDF’s.
  • Developed Pig Latin scripts to perform complex Big Data processing and analysis on HDFS
  • Experience in implementing partitioning and bucketing techniques in HIVE.
  • Experience in writing Hive QL queries to store processed data into Hive tables for Big Data oriented analysis.
  • Developed projects using Apache Spark with in - memory processing features.
  • Experience in working with NoSQL Column-Oriented Databases like HBase and their Integration with HDFS.
  • Used AWS services like EC2 and S3 for small data sets.
  • Experience in tuning and debugging Spark application and using spark optimization techniques.
  • Experience in loading log files from multiple sources directly into HDFS using Flume.
  • Experience in analyzing data using Hive QL, Pig Latin, HBase and custom Map Reduce programs in Java. Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in tuning the performances by using Partitioning , Bucketing and Indexing in HIVE.
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper .
  • Extensively worked in Core JAVA and developed various customized JAVA UDF’s
  • Passionate towards working in Big Data and Analytics environment
  • Excellent knowledge on Spark Architecture and Hadoop Architecture and its ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Analytical thinker that consistently resolves on-going issues or defects, often called upon to consult on problems as well a fast learner.
  • An individual with excellent interpersonal and communication skills, strong business acumen and work ethics, technical competency, team-player spirit and leadership skills
  • Highly motivated, creative problem solving skills, self-starter with a positive attitude and willingness to learn new concepts and accepts challenges
  • Experienced in Team management and Project management

TECHNICAL SKILLS:

Big Data: HDFS, MapReduce (MR), Apache Spark, Apache PIG, Apache Hive, Apache Sqoop, Apache Zookeeper, Apache Flume, Apache Oozie, Cloudera Hue

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, Teradata, Mongo DB, Apache HBase, Apache Cassandra, CouchDB

Languages: Pig Latin /C++, UNIX Shell Scripts, PERL Scripts, Python, R

Hadoop distributions: Cloudera, YARN, Hortonworks

Java & J2EE Technologies: Core Java, JSP, JDBC

Version Control: Tortoise SVN, Git

Tools: SSIS, Tableau, Eclipse, NetBeans

Methodologies: Waterfall and Agile SCRUM

PROFESSIONAL EXPERIENCE:

Confidential, Torrance, CA

Senior Hadoop Developer

Responsibilities:

  • Working on a live Big Data Hadoop production environment
  • Understanding the client requirements and creating formal Business Requirement specifications.
  • Transforming the data according to business logic in HIVE & PIG .
  • Worked with Big Data Policy and Security teams in order to create data policy and develop interfaces to anonymize the data
  • Good experience in writing MapReduce programs in Java on MRv2 / YARN environment.
  • Worked on NiFi , created workflows in NiFi, supported in performance testing the NiFi cluster.
  • Developed spark streaming jobs to consume data from different sources such as Kafka/s3 and ingest data into couchbase/Kafka.
  • Migrated an existing on-premises application to AWS.
  • As a part of POC used the Amazon AWS S3 as an underlying file system for the Hadoop and implemented the elastic Map-Reduce jobs on the data in S3 buckets.
  • Developed Spark streaming jobs to ingest data into HDFS/HBase and validated the data.
  • Extracted structured data from Teradata, DB2 UDB and DB2 z/OS Relational Database onto HDFS using Sqoop
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
  • Generating the required reports using Oozie workflow and Hive queries for operations team from the ingested data.
  • Created complex Pig Latin scripts to process the extracted data as per the Business Requirement specifications and developed Pig Scripts to store unstructured data in HDFS.
  • Experience in writing customized Hive UDFs for complex processing
  • Created Managed tables and External tables in Hive and loaded data from HDFS
  • Designed and implemented complex Map Reduce (MR) jobs to support distributed processing using Pig Latin
  • Worked with structured/semi-structured/unstructured data
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how it translates to MapReduce (MR) jobs
  • Created HBase Tables to store data onto them with Hive integration
  • Unit testing and Integration testing
  • Production Deployment and maintenance.

Environment: Hortonworks, HDFS, MapReduce (MR), Apache PIG, Apache Sqoop, Apache Zookeeper, Apache Flume, Hive, Core JAVA, Teradata, DB2 UDB, Apache HBase, Apache Cassandra, UNIX Shell Scripts, Agile SCRUM.

Confidential, Minneapolis, MN

Hadoop Engineer

Responsibilities:

  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
  • Created Hive queries that helped data analysis on Customer purchase trends by comparing fresh data with EDW reference data and historical metrics.
  • Developed Shell scripts to perform data loads in automated way and perform analysis.
  • Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume.
  • Developed MapReduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side using distributed cache.
  • Developed custom MapReduce programs and custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirement.
  • Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
  • Created internal and external tables with properly defined static and dynamic partitions for efficiency.
  • Implemented Hive custom UDF’s to achieve comprehensive data analysis.
  • Implemented Sqoop Jobs for incremental data imports for few 100 TB’s data.
  • Created HBase tables on Hive for handling updated in Hadoop.
  • Responsible for landing multi source data to HDFS using Spark streaming.

Environment: HDFS, Map Reduce, Apache Hive, Apache Pig, Apache Spark, Sqoop, Oozie, HBase, EDW

Confidential, Chicago,IL

Hadoop developer

Responsibilities:

  • Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Hive, Pig, HBase, Sqoop, Spark, Zookeeper etc.), Hortonworks for Hadoop
  • Imported and exported data into HDFS, Hive and HBase using SQOOP.
  • Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
  • Load data from various data sources into HDFS using Flume.
  • Developed the Pig UDFs to pre-process the data for analysis.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Stored parsed data into HBase and Hive using HBase-Hive Integration .
  • Involved in loading data into Cassandra NoSQL Database and Cassandra integration and merging with SQL data.
  • Hiding the customer personal information by doing encryption on HDFS level using voltage encryption.
  • Performed Hive Query Optimization for better performance.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDFs in Hive Querying.
  • Used SQL scripting, and Stored Procedures for managing data in databases.
  • Worked on performance tuning of HIVE queries with partitioning and bucketing process.
  • Worked on huge datasets from Hive to understand and visualize the data for analysis
  • Experienced in the design, development, and creation of HBase schemas.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Hive.

Environment: Cloudera Apache Hadoop (CDH 4), HDFS, MapReduce (MR), Apache PIG, Apache Sqoop, Apache Zookeeper, Core JAVA, Teradata, DB2 UDB (LUW), HBase, UNIX Shell Scripts, Tortoise SVN, Agile SCRUM

Confidential

Java/Hadoop Developer

Responsibilities:

  • Involved in understanding the requirement from business and design an implementation plan.
  • Extracted files from DB2 through Sqoop and placed in HDFS and processed
  • Analyzed large data sets by running Hive queries and Pig scripts
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Developed Simple to complex MapReduce Jobs using Hive and Pig
  • Involved in running Hadoop jobs for processing millions of records of text data
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing
  • Involved in unit testing using MR unit for Map Reduce jobs
  • Involved in loading data from LINUX file system to HDFS
  • Responsible for managing data from multiple sources
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured data.
  • Responsible to manage data coming from different sources.
  • Assisted in exporting analyzed data to relational databases using Sqoop
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Hadoop MapReduce, HDFS, Hive, Sqoop, Pig, Linux and MySQL

Confidential

Java Developer

Responsibilities:

  • Responsible for understanding the requirement from Business users
  • Participated discussions with the teammates and in designing the system
  • Developed web pages using JSP and handled the requests using java and Servlets.
  • Developed client side validations using java script.
  • Validation done Server side on basis of file format support.
  • Developed java code according to MVC architecture
  • Created objects like table, complex stored procedure, view, UDF, Cursor, DML, and DDL Trigger based on the requirement using T-SQL programming.
  • Applying Query optimization techniques & tuning the query by creating indexes to improve the Query& System performance.
  • Used SQL Database in the project and developed the Admin screens using JSP, JavaScript.
  • Worked on SQL Server Integration Services (SSIS) to integrate and analyze data from multiple homogeneous and heterogeneous information sources
  • Used SQL Database in the project and developed the Admin screens using JSP, JavaScript.
  • Bug fixing for priority one issues.

Environment: JAVA, J2EE, JSP, HTML, CSS, JAVA SCRIPT, Tomcat, Servlets, JDBC, Oracle, SQL, DB2 UDB (LUW), DB2 z/OS (Mainframe), UNIX Shell Scripts and PERL Scripts, Tortoise SVN, Waterfall, MS SQL Server 2008, T-SQL, SQL Server Integration Services (SSIS).

We'd love your feedback!