Hadoop Developer Resume Dearborn, MI - Hire IT People

SUMMARY:

Seven years of impeccable work experience in IT Industry, with over Three years of professional work experience in Big DataHadoop (Cloudera distribution CDH3, and 4) on clusters of 300 nodes
Extensive experience inMapReduce MRv1.
Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms
Extensive experience in working with HDFS, PIG, Hive, Sqoop, Flume, Oozie, Zookeeper and Cassandra
Experience with Cloudera CDH3, CDH4 distributions
Extensive experience with ETL and Big data query tools like Pig Latin and Hive QL
Expertise in installing,designing, sizing, configuring, provisioning and upgrading Hadoop environments
Experience in tuning and troubleshooting performance issues in Hadoop cluster with size of data over 70 TB
Experience on monitoring, performance tuning, SLA, scaling and security in Big Data systems.
Hands on NoSQL database experience with HBase, and Cassandra
Good working knowledge on Eclipse IDE for developing and debugging Java applications
Expertise in creating UI using JSP, HTML, XML and JavaScript.
Good experience in using databases - SQL Server, Stored Procedures, Constraints and Triggers.
Created the data maps from database to dimension and fact table.
Carried out the QA deployments and worked on the process flow diagram.
Created dimension and fact jobs and scheduling job runs.
Well experienced in using networking tools like PuTTY and WinSCP
Extensive experience in documenting requirements, functional specifications, technical specifications
Highly motivated, adaptive and quick learner
Exhibited excellent communication and leadership capabilities
Excellent Analytical, Problem solving and technical skills
Holds strong ability to handle multiple priorities and work load and also has ability to understand and adapt to new technologies and environments faster

TECHNICAL SKILLS:

Big Data/ Hadoop Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase.

Databases: Microsoft SQL Server, MySQL, Oracle, Cassandra

Languages: C, C++, Java, SQL, PLSQL, Pig Latin, HiveQL

Web Technologies: JSP, JavaBeans, JDBC, XML

Operating Systems: Windows, Unix and Linux

Front-End: HTML, CSS3, JavaScript/Jquery

Development Tools: Microsoft SQL Studio, Eclipse,NetBeans, MySQL Workbench.

Office Tools: Microsoft Office Suite

Development Methodologies: Agile/Scrum, Waterfall

PROFESSIONAL EXPERIENCE:

Confidential, Dearborn, MI

Hadoop Developer

Responsibilities:

Worked on a live Big Data Hadoop production environment with 400 nodes
Worked with highly unstructured and semi structured data of 40 TB in size
Designed and developed Pig ETL scripts to process data in a Nightly batch
Created Pig Macros to improve reusability of code and modularizing the code
Developed Hive scripts for end user / analyst requirements for ad-hoc analysis
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
Worked in tuning Hive and Pig scripts to improve performance
Good experience in troubleshooting performance issues and tuning Hadoop cluster
Good working knowledge of using Sqoop in performing incremental imports from Oracle to HDFS.
Good experience in working with compressed files and related formats.
Developed Oozie workflow for scheduling and orchestrating the ETL process
Hands on experience with Cassandra and its architecture
Performed data analysis with HBase using Hive external tables to HBase
Very good understanding of Single Point Of Failure (SPOF) of Hadoop Daemons and recovery procedures
Worked with the infrastructure and admin team in designing, modeling, sizing and configuring Hadoop cluster of 60 nodes
Currently planning a migration from CDH3 to CHD4
Good understanding of Impala

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Cassandra, Java, Oracle 10g, MySQL, Ubuntu.

Confidential, Danville, IL

Hadoop Developer

Responsibilities:

Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing
Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration
Worked on moving all log files generated from various sources to HDFS for further processing
Developed workflows using custom MapReduce, Pig, Hive and Sqoop
Developing predictive analytic product for using SQL/HiveQL, JavaScript, and High Charts.
Developing data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (json) for visualization, and generating. E.g. High charts: Outlier, data distribution, Correlation/comparison
Creating various views for HBASE tables and also utilizing the performance of Hive on top of HBASE.
Developed the Apache Storm and HDFS integration project to do a real time data analyses.
Designed and developed the Apache Storm topologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
Developed Map Reduce program for parsing and loading into HDFS information
Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
Written Hive UDF to sort Structure fields and return complex data type
Responsible for loading data from UNIX file system to HDFS
Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
Developed workflow in Control M to automate tasks of loading data into HDFS and preprocessing with PIG.
Cluster co-ordination services through Zookeeper
Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster
Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse, Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Flume, PIG, Sqoop, Spark, UNIX.

Confidential, Austin, TX

Hadoop Developer

Responsibilities:

Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
Responsible for building scalable distributed data solutions using Hadoop.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Developed Simple to complex MapReduce Jobs using Hive and Pig
Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Extensively used Pig for data cleansing.
Created partitioned tables in Hive.
Managed and reviewed Hadoop log files.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Installed and configured Pig and also written Pig Latin scripts.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Load and transform large sets of structured, semi structured and unstructured data
Responsible to manage data coming from different sources
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
Involved in writing, testing, and running MapReduce pipelines using Apache Crunch

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Sqoop, Java (jdk 1.6), Eclipse

Confidential, Omaha, NE

ETL QA Tester

Responsibilities:

Tested ETL jobs as per business rules using ETL design document
Promoted Unix/Data Stage application releases from development to QA and to UAT environments
Assisted in creating fact and dimension table implementation in Star Schema model based on requirements.
Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
Defined data requirements and elements used in XML transactions.
Tested the database schema with help of data architects using ERWIN
Involved in the testing of Data Mart using Power Center
Identified and Documented additional data cleansing needs and consistent error patterns that could diverted by modifying ETL code.
Extensively used Teradata load utilities FastLoad, Multiload and FastExport to extract, transform and load the Teradata data warehouse
Responsible for different Data mapping activities from Source systems to Teradata.
Queried Teradata Database and validated the data using SQL Assistant.
Tested the messages published by data stage and data loaded into various databases and for extraction, transformation and loading process
Used import and export facilities of the application to download/upload XMLs of failed test cases so as to re-verify.
Writing UNIX scripts to perform certain tasks and assisting developers with problems and SQL optimization.
Configured Quick Test Pro with Quality Centre and Maintained the project information in Quality Centre.
Extensively used Autosys for automation of scheduling jobs on daily, bi-weekly, weekly monthly basis with proper dependencies.
Wrote complex SQL queries using joins, sub queries and correlated sub queries
Performed Unit testing and System Integration testing by developing and documenting test cases in Quality Center.
Designed and developed UNIX shell scripts as part of the ETL process, automate the process of loading, pulling the data.
Tested several complex reports generated by Micro Strategy including Dashboard, Summary Reports, Master Detailed, Drill Down and Score Cards
Involved in testing the Micro Strategy reports by writing complex SQL queries
Involved in extensive DATA validation using SQL queries and back-end testing
Tested complex objects to the universe to enhance the report functionality.
Responsible for migrating the code changes from development environment to SIT, UAT and Production environments.
Validated cube and query data from the reporting system back to the source system.

Environment: Data stage, Flat files, Perl, Erwin 4.0, DTS, MS SQL Server 2008, Oracle 10g, SQL, PL/SQL, IBM DB2 8.0, AGILE, Teradata V2R6, Teradata SQL Assistant, Micro strategy, COBOL, HP QTP 9.0, HP Quality Center 10, Autosys, Toad, Unix Shell Scripting, Windows XP/2000

Confidential

Java/J2EE Developer

Responsibilities:

Involved in analysis and design of the application.
Involved in preparing the detailed design document for the project.
Developed the application using J2EE architecture.
Involved in developing JSP forms.
Designed and developed web pages using HTML and JSP.
Designed various applets using JBuilder.
Designed and developed Servlets to communicate between presentation and business layer.
Used EJB as a middleware in developing a three-tier distributed application.
Developed Session Beans and Entity beans to business and data process.
Used JMS in the project for sending and receiving the messages on the queue.
Developed the Servlets for processing the data on the server.
The processed data is transferred to the database through Entity Bean.
Used JDBC for database connectivity with MySQL Server.
Used CVS for version control.
Involved in unit testing using Junit.

Environment: Core Java, J2EE, JSP, Servlets, XML, XSLT, EJB, JDBC, JBuilder 8.0, JBoss, Swing, JavaScript, JMS, HTML, CSS, MySQL Server, CVS, Windows 2000

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Dearborn, MI

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship