Hadoop Developer Resume
Dearborn, MI
SUMMARY:
- Seven years of impeccable work experience in IT Industry, with over Three years of professional work experience in Big DataHadoop (Cloudera distribution CDH3, and 4) on clusters of 300 nodes
- Extensive experience inMapReduce MRv1.
- Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms
- Extensive experience in working with HDFS, PIG, Hive, Sqoop, Flume, Oozie, Zookeeper and Cassandra
- Experience with Cloudera CDH3, CDH4 distributions
- Extensive experience with ETL and Big data query tools like Pig Latin and Hive QL
- Expertise in installing,designing, sizing, configuring, provisioning and upgrading Hadoop environments
- Experience in tuning and troubleshooting performance issues in Hadoop cluster with size of data over 70 TB
- Experience on monitoring, performance tuning, SLA, scaling and security in Big Data systems.
- Hands on NoSQL database experience with HBase, and Cassandra
- Good working knowledge on Eclipse IDE for developing and debugging Java applications
- Expertise in creating UI using JSP, HTML, XML and JavaScript.
- Good experience in using databases - SQL Server, Stored Procedures, Constraints and Triggers.
- Created the data maps from database to dimension and fact table.
- Carried out the QA deployments and worked on the process flow diagram.
- Created dimension and fact jobs and scheduling job runs.
- Well experienced in using networking tools like PuTTY and WinSCP
- Extensive experience in documenting requirements, functional specifications, technical specifications
- Highly motivated, adaptive and quick learner
- Exhibited excellent communication and leadership capabilities
- Excellent Analytical, Problem solving and technical skills
- Holds strong ability to handle multiple priorities and work load and also has ability to understand and adapt to new technologies and environments faster
TECHNICAL SKILLS:
Big Data/ Hadoop Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase.
Databases: Microsoft SQL Server, MySQL, Oracle, Cassandra
Languages: C, C++, Java, SQL, PLSQL, Pig Latin, HiveQL
Web Technologies: JSP, JavaBeans, JDBC, XML
Operating Systems: Windows, Unix and Linux
Front-End: HTML, CSS3, JavaScript/Jquery
Development Tools: Microsoft SQL Studio, Eclipse,NetBeans, MySQL Workbench.
Office Tools: Microsoft Office Suite
Development Methodologies: Agile/Scrum, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Dearborn, MI
Hadoop Developer
Responsibilities:
- Worked on a live Big Data Hadoop production environment with 400 nodes
- Worked with highly unstructured and semi structured data of 40 TB in size
- Designed and developed Pig ETL scripts to process data in a Nightly batch
- Created Pig Macros to improve reusability of code and modularizing the code
- Developed Hive scripts for end user / analyst requirements for ad-hoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
- Worked in tuning Hive and Pig scripts to improve performance
- Good experience in troubleshooting performance issues and tuning Hadoop cluster
- Good working knowledge of using Sqoop in performing incremental imports from Oracle to HDFS.
- Good experience in working with compressed files and related formats.
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Hands on experience with Cassandra and its architecture
- Performed data analysis with HBase using Hive external tables to HBase
- Very good understanding of Single Point Of Failure (SPOF) of Hadoop Daemons and recovery procedures
- Worked with the infrastructure and admin team in designing, modeling, sizing and configuring Hadoop cluster of 60 nodes
- Currently planning a migration from CDH3 to CHD4
- Good understanding of Impala
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Cassandra, Java, Oracle 10g, MySQL, Ubuntu.
Confidential, Danville, IL
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration
- Worked on moving all log files generated from various sources to HDFS for further processing
- Developed workflows using custom MapReduce, Pig, Hive and Sqoop
- Developing predictive analytic product for using SQL/HiveQL, JavaScript, and High Charts.
- Developing data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (json) for visualization, and generating. E.g. High charts: Outlier, data distribution, Correlation/comparison
- Creating various views for HBASE tables and also utilizing the performance of Hive on top of HBASE.
- Developed the Apache Storm and HDFS integration project to do a real time data analyses.
- Designed and developed the Apache Storm topologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
- Developed Map Reduce program for parsing and loading into HDFS information
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Written Hive UDF to sort Structure fields and return complex data type
- Responsible for loading data from UNIX file system to HDFS
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
- Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
- Developed workflow in Control M to automate tasks of loading data into HDFS and preprocessing with PIG.
- Cluster co-ordination services through Zookeeper
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster
- Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning
Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse, Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Flume, PIG, Sqoop, Spark, UNIX.
Confidential, Austin, TX
Hadoop Developer
Responsibilities:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Responsible for building scalable distributed data solutions using Hadoop.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed Simple to complex MapReduce Jobs using Hive and Pig
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Extensively used Pig for data cleansing.
- Created partitioned tables in Hive.
- Managed and reviewed Hadoop log files.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Installed and configured Pig and also written Pig Latin scripts.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Involved in writing, testing, and running MapReduce pipelines using Apache Crunch
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Sqoop, Java (jdk 1.6), Eclipse
Confidential, Omaha, NE
ETL QA Tester
Responsibilities:
- Tested ETL jobs as per business rules using ETL design document
- Promoted Unix/Data Stage application releases from development to QA and to UAT environments
- Assisted in creating fact and dimension table implementation in Star Schema model based on requirements.
- Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
- Defined data requirements and elements used in XML transactions.
- Tested the database schema with help of data architects using ERWIN
- Involved in the testing of Data Mart using Power Center
- Identified and Documented additional data cleansing needs and consistent error patterns that could diverted by modifying ETL code.
- Extensively used Teradata load utilities FastLoad, Multiload and FastExport to extract, transform and load the Teradata data warehouse
- Responsible for different Data mapping activities from Source systems to Teradata.
- Queried Teradata Database and validated the data using SQL Assistant.
- Tested the messages published by data stage and data loaded into various databases and for extraction, transformation and loading process
- Used import and export facilities of the application to download/upload XMLs of failed test cases so as to re-verify.
- Writing UNIX scripts to perform certain tasks and assisting developers with problems and SQL optimization.
- Configured Quick Test Pro with Quality Centre and Maintained the project information in Quality Centre.
- Extensively used Autosys for automation of scheduling jobs on daily, bi-weekly, weekly monthly basis with proper dependencies.
- Wrote complex SQL queries using joins, sub queries and correlated sub queries
- Performed Unit testing and System Integration testing by developing and documenting test cases in Quality Center.
- Designed and developed UNIX shell scripts as part of the ETL process, automate the process of loading, pulling the data.
- Tested several complex reports generated by Micro Strategy including Dashboard, Summary Reports, Master Detailed, Drill Down and Score Cards
- Involved in testing the Micro Strategy reports by writing complex SQL queries
- Involved in extensive DATA validation using SQL queries and back-end testing
- Tested complex objects to the universe to enhance the report functionality.
- Responsible for migrating the code changes from development environment to SIT, UAT and Production environments.
- Validated cube and query data from the reporting system back to the source system.
Environment: Data stage, Flat files, Perl, Erwin 4.0, DTS, MS SQL Server 2008, Oracle 10g, SQL, PL/SQL, IBM DB2 8.0, AGILE, Teradata V2R6, Teradata SQL Assistant, Micro strategy, COBOL, HP QTP 9.0, HP Quality Center 10, Autosys, Toad, Unix Shell Scripting, Windows XP/2000
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in analysis and design of the application.
- Involved in preparing the detailed design document for the project.
- Developed the application using J2EE architecture.
- Involved in developing JSP forms.
- Designed and developed web pages using HTML and JSP.
- Designed various applets using JBuilder.
- Designed and developed Servlets to communicate between presentation and business layer.
- Used EJB as a middleware in developing a three-tier distributed application.
- Developed Session Beans and Entity beans to business and data process.
- Used JMS in the project for sending and receiving the messages on the queue.
- Developed the Servlets for processing the data on the server.
- The processed data is transferred to the database through Entity Bean.
- Used JDBC for database connectivity with MySQL Server.
- Used CVS for version control.
- Involved in unit testing using Junit.
Environment: Core Java, J2EE, JSP, Servlets, XML, XSLT, EJB, JDBC, JBuilder 8.0, JBoss, Swing, JavaScript, JMS, HTML, CSS, MySQL Server, CVS, Windows 2000