Senior Hadoop Developer Resume
Los Angeles, CA
SUMMARY:
- 7 years of Professional Experience in Implementation and Application Support projects which includes 4+ years on Big Data technologies such as Hadoop, Kafka, NiFi, and Couchbase.
- Expertise in Big Data technologies using Hortonworks distribution and its ecosystem like HDFS, MapReduce (MRV1, MRV2/YARN), Apache PIG, Apache Spark, Apache HBase, Apache Hive, Apache Sqoop, Apache Zookeeper, Apache Flume, Apache Oozie, Apache Cassandra, Cloudera Hue.
- In depth understanding / knowledge of the Hadoop Architecture and its various components such as HDFS, Resource Manager, Node Manager, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node and MapReduce concepts.
- Experienced in importing and exporting data from relational database into / from HDFS using Sqoop.
- Good at doing the encryption on HDFS level by using voltage encryption for hiding customer personal information.
- Extensively worked on creating complex MapReduce (MR) Batch programs to perform Big Data processing and analysis using Pig Latin and customized core JAVA UDF's.
- Developed Pig Latin scripts to perform complex Big Data processing and analysis on HDFS
- Experience in implementing partitioning and bucketing techniques in HIVE.
- Experience in writing Hive QL queries to store processed data into Hive tables for Big Data oriented analysis.
- Developed projects using Apache Spark with in - memory processing features.
- Experience in working with NoSQL Column-Oriented Databases like HBase and their Integration with HDFS.
- Experience in tuning and debugging Spark application and using spark optimization techniques.
- Experience in loading log files from multiple sources directly into HDFS using Flume.
- Experience in analyzing data using Hive QL, Pig Latin, HBase and custom Map Reduce programs in Java. Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in tuning the performances by using Partitioning, Bucketing and Indexing in HIVE.
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Extensively worked in Core JAVA and developed various customized JAVA UDF's
- Passionate towards working in Big Data and Analytics environment
- Excellent knowledge on Spark Architecture and Hadoop Architecture and its ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Analytical thinker that consistently resolves on-going issues or defects, often called upon to consult on problems as well a fast learner.
- An individual with excellent interpersonal and communication skills, strong business acumen and work ethics, technical competency, team-player spirit and leadership skills
- Highly motivated, creative problem solving skills, self-starter with a positive attitude and willingness to learn new concepts and accepts challenges
- Experienced in Team management and Project management.
PROFESSIONAL EXPERIENCE:
Senior Hadoop Developer
Confidential, Los Angeles, CAEnvironment: Hortonworks, HDFS, MapReduce (MR), Apache PIG, Apache Sqoop, Apache Zookeeper, Apache Flume, Hive, Core JAVA, Teradata, DB2 UDB, Apache HBase, Apache Cassandra, UNIX Shell Scripts, Agile SCRUM.
Responsibilities:
- Working on a live Big Data Hadoop production environment
- Understanding the client requirements and creating formal Business Requirement specifications.
- Transforming the data according to business logic in HIVE & PIG.
- Worked with Big Data Policy and Security teams in order to create data policy and develop interfaces to anonymize the data
- Good experience in writing MapReduce programs in Java on MRv2 / YARN environment.
- Worked on NiFi, created workflows in NiFi, supported in performance testing the NiFi cluster.
- Developed spark streaming jobs to consume data from different sources such as Kafka/s3 and ingest data into couchbase/Kafka.
- Developed Spark streaming jobs to ingest data into HDFS/HBase and validated the data.
- Extracted structured data from Teradata, DB2 UDB and DB2 z/OS Relational Database onto HDFS using Sqoop
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
- Generating the required reports using Oozie workflow and Hive queries for operations team from the ingested data.
- Created complex Pig Latin scripts to process the extracted data as per the Business Requirement specifications and developed Pig Scripts to store unstructured data in HDFS.
- Experience in writing customized Hive UDFs for complex processing
- Created Managed tables and External tables in Hive and loaded data from HDFS
- Designed and implemented complex Map Reduce (MR) jobs to support distributed processing using Pig Latin
- Worked with structured/semi-structured/unstructured data
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how it translates to MapReduce (MR) jobs
- Created HBase Tables to store data onto them with Hive integration
- Unit testing and Integration testing
- Production Deployment and maintenance.
Hadoop Engineer
ConfidentialEnvironment: HDFS, Map Reduce, Apache Hive, Apache Pig, Apache Spark, Sqoop, Oozie, HBase, EDW
Responsibilities:
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Migrated the existing data to Hadoop from RDBMS (SQL Server and Oracle) using Sqoop for processing the data.
- Created Hive queries that helped data analysis on Customer purchase trends by comparing fresh data with EDW reference data and historical metrics.
- Developed Shell scripts to perform data loads in automated way and perform analysis.
- Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume.
- Developed MapReduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side using distributed cache.
- Developed custom MapReduce programs and custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirement.
- Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
- Created internal and external tables with properly defined static and dynamic partitions for efficiency.
- Implemented Hive custom UDF's to achieve comprehensive data analysis.
- Implemented Sqoop Jobs for incremental data imports for few 100 TB's data.
- Created HBase tables on Hive for handling updated in Hadoop.
- Responsible for landing multi source data to HDFS using Spark streaming.
Hadoop Developer
Confidential - Chicago, IL
Environment: Cloudera Apache Hadoop (CDH 4), HDFS, MapReduce (MR), Apache PIG, Apache Sqoop, Apache Zookeeper, Core JAVA, Teradata, DB2 UDB (LUW), HBase, UNIX Shell Scripts, Tortoise SVN, Agile SCRUM
Responsibilities:
- Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Hive, Pig, HBase, Sqoop, Spark, Zookeeper etc.), Hortonworks for Hadoop
- Imported and exported data into HDFS, Hive and HBase using SQOOP.
- Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
- Load data from various data sources into HDFS using Flume.
- Developed the Pig UDFs to pre-process the data for analysis.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Stored parsed data into HBase and Hive using HBase-Hive Integration.
- Involved in loading data into Cassandra NoSQL Database and Cassandra integration and merging with SQL data.
- Hiding the customer personal information by doing encryption on HDFS level using voltage encryption.
- Performed Hive Query Optimization for better performance.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDFs in Hive Querying.
- Used SQL scripting, and Stored Procedures for managing data in databases.
- Worked on performance tuning of HIVE queries with partitioning and bucketing process.
- Worked on huge datasets from Hive to understand and visualize the data for analysis
- Experienced in the design, development, and creation of HBase schemas.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Hive.
Hadoop Developer
Confidential
Description: This was a pilot project intended to train employees on the latest advancements in Hadoop and its ecosystem components and setup a Hadoop environment to perform big data analysis.
Environment: Hadoop MapReduce, HDFS, Hive, Sqoop, Pig, Linux and MySQL
Responsibilities:
- Involved in understanding the requirement from business and design an implementation plan.
- Extracted files from DB2 through Sqoop and placed in HDFS and processed
- Analyzed large data sets by running Hive queries and Pig scripts
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed Simple to complex MapReduce Jobs using Hive and Pig
- Involved in running Hadoop jobs for processing millions of records of text data
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Developed multiple MapReduce jobs in java for data cleaning and pre-processing
- Involved in unit testing using MR unit for Map Reduce jobs
- Involved in loading data from LINUX file system to HDFS
- Responsible for managing data from multiple sources
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
Software Developer
Confidential
Environment: JAVA, J2EE, JSP, HTML, CSS, JAVA SCRIPT, Tomcat, Servlets, JDBC, Oracle, SQL, DB2 UDB (LUW), DB2 z/OS (Mainframe), UNIX Shell Scripts and PERL Scripts, Tortoise SVN, Waterfall, MS SQL Server 2008, T-SQL, SQL Server Integration Services (SSIS)
Responsibilities:
- Responsible for understanding the requirement from Business users
- Participated discussions with the teammates and in designing the system
- Developed web pages using JSP and handled the requests using java and Servlets.
- Developed client side validations using java script.
- Validation done Server side on basis of file format support.
- Developed java code according to MVC architecture
- Created objects like table, complex stored procedure, view, UDF, Cursor, DML, and DDL Trigger based on the requirement using T-SQL programming.
- Applying Query optimization techniques & tuning the query by creating indexes to improve the Query& System performance.
- Used SQL Database in the project and developed the Admin screens using JSP, JavaScript.
- Worked on SQL Server Integration Services (SSIS) to integrate and analyze data from multiple homogeneous and heterogeneous information sources
- Used SQL Database in the project and developed the Admin screens using JSP, JavaScript.
- Bug fixing for priority one issues.
