Hadoop Developer Resume MN - Hire IT People

SUMMARY

Eightyears of impeccable work experience in IT Industry, with over four years of professional work experience in Big DataHadoop (Cloudera distribution CDH3, 4 and 5) on clusters of 600 nodes
Extensive experience in both MapReduce MRv1 and MapReduce MRv2 (YARN)
Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms
Extensive experience in working withHDFS, PIG, Hive,Sqoop, Flume, Oozie, Zookeeper and Cassandra
Experience with Cloudera CDH3, CDH4 and CDH5 distributions
Extensive experience with ETL and Big data query tools like Pig Latin and Hive QL
Experience with Sequence files, AVRO and HAR file formats and compression
Expertise in installing,designing, sizing, configuring, provisioningand upgrading Hadoop environments
Experience in tuning and troubleshooting performance issues in Hadoop cluster with size of data over 120 TB
Experience on monitoring, performance tuning, SLA, scaling and security in Big Data systems.
Strong experience in working with Elastic MapReduceand setting up environments on Amazon AWS EC2 instances
Hands on NoSQL database experience with HBase, MongoDb and Cassandra
Good working noledge on Eclipse IDE for developing and debugging Java applications
Expertise in creating UI using JSP, HTML, XML and JavaScript.
Good experience in using databases - MongoDB, SQL Server, Stored Procedures, Constraints and Triggers.
Created teh data maps from database to dimension and fact table.
Carried out teh QA deployments and worked on teh processflowdiagram.
Created dimension and factjobs and schedulingjobruns.
Well experienced in using networkingtools like PuTTY and WinSCP
Extensive experience in documenting requirements, functional specifications, technical specifications
Highly motivated, adaptive and quicklearner
Exhibited excellentcommunication and leadership capabilities
Excellent Analytical, Problem solving and technical skills
Holds strong ability to handle multiple priorities and work load and also TEMPhas ability to understand and adapt to new technologies and environments faster
Open to relocation

TECHNICAL SKILLS

BigData/ Hadoop Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase, Amazon AWS (EMR)

Databases: MongoDB, Microsoft SQL Server, MySQL, Oracle, Cassandra

Languages: C, C++, Java,Python,SQL, TSQL, Pig Latin, HiveQL

Web Technologies: JSP, JavaBeans, JDBC, XML

Operating Systems: Windows, Unix and Linux

Front-End: HTML/HTML 5, CSS3, JavaScript/JQuery

Development Tools: Microsoft SQL Studio, Toad, Eclipse,NetBeans, MySQLWorkbench, Tableau

Reporting Tool: SSRS, Succeed

Office Tools: Microsoft Office Suite

Development Methodologies: Agile/Scrum, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, MN

Hadoop Developer

Responsibilities:

Worked on a live Big DataHadoop production environment with 600 nodes
Worked with highly unstructured and semi structured data of 40 TB in size
Designed and developed Pig ETL scripts to process data in a Nightly batch
Created Pig Macros to improve reusability of code and modularizing teh code
Developed Hive scripts for end user / analyst requirements for ad-hoc analysis
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
Worked in tuning Hive and Pig scripts to improve performance
Good experience in writing MapReduce programs in Java on MRv2 / YARN environment
Good experience in troubleshooting performance issues and tuning Hadoop cluster
Good working noledge of using Sqoop in performing incremental imports from Oracle to HDFS.
Experience in using Sequence files, AVRO and HAR file formats.
Good experience in working with compressed files and related formats.
Developed Oozie workflow for scheduling and orchestrating teh ETL process
Hands on experience withCassandraand its architecture
Performed data analysis with HBase using Hive external tables to HBase
Very good understanding of Single Point Of Failure (SPOF) of Hadoop Daemons and recovery procedures
Experience in setting upCloudera CDH3 / CDH4 nodes on Amazon EC2
Worked with teh infrastructure and admin team in designing, modeling, sizing and configuringHadoop cluster of 60 nodes
Currently planning a migration from CDH4 to CHD5
Good understanding of Impala

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Cassandra, Java, Oracle 10g, MySQL, Ubuntu, AWS

Confidential - Windsor, CT

Hadoop Developer

Responsibilities:

Installed and configured HadoopMapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing
Written MapReduce code to process and parsing teh data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration
Worked on moving all log files generated from various sources to HDFS for further processing
Developed workflows using custom MapReduce, Pig, Hive and Sqoop
Developing predictive analytic product for using Apache Spark, SQL/HiveQL, JavaScript, and High Charts.
Writing Spark programs to load, parse, refined and store sensor data into Hadoop and also process analyzed and aggregate data for visualizations.
Developing data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (json) for visualization, and generating. E.g. High charts: Outlier, data distribution, Correlation/comparison
Creating various views for HBASE tables and also utilizing teh performance of Hive on top of HBASE.
Developed teh Apache Storm, Kafka, and HDFS integration project to do a real time data analyses.
Designed and developed teh Apache Storm topologies for Inbound and outbound data for real time ETL to find teh latest trends and keywords.
Developed Map Reduce program for parsing and loading into HDFS information
Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
Written Hive UDF to sort Structure fields and return complex data type
Responsible for loading data from UNIX file system to HDFS
Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
Designed and developed a distributed processing system running to process binary files in parallel and crunch teh analysis metrics into a Data Warehousing platform for reporting.
Developed workflow in Control M to automate tasks of loading data into HDFS and preprocessing with PIG.
Cluster co-ordination services through Zookeeper
Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster
Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Sqoop, Spark, UNIX, Cosmos.

Confidential - Austin, TX

Hadoop Developer

Responsibilities:

Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
Responsible for building scalable distributed data solutions using Hadoop.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Developed Simple to complex MapReduce Jobs using Hive and Pig
Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted teh data from MySQL into HDFS using Sqoop
Working with Apache Crunch library to write, test and run HADOOP MapReduce pipeline jobs
Involved in joining and data aggregation using Apache Crunch
Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team
Extensively used Pig for data cleansing.
Created partitioned tables in Hive.
Managed and reviewed Hadoop log files.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
Installed and configured Pig and also written Pig Latin scripts.
Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
Load and transform large sets of structured, semi structured and unstructured data
Responsible to manage data coming from different sources
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
Involved in writing, testing, and running MapReduce pipelines using Apache Crunch

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Apache Crunch, Sqoop, Java (jdk 1.6), Eclipse

Confidential, Township of Warren, NJ

ETL QA Tester

Responsibilities:

Tested ETL jobs as per business rules using ETL design document
Promoted Unix/Data Stage application releases from development to QA and to UAT environments
Assisted in creating fact and dimension table implementation in Star Schema model based on requirements.
Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
Defined data requirements and elements used in XML transactions.
Tested teh database schema with help of data architects using ERWIN
Involved in teh testing of Data Mart using Power Center
Identified and Documented additional data cleansing needs and consistent error patterns that could diverted by modifying ETL code.
Extensively used Teradata load utilities FastLoad, Multiload and FastExport to extract, transform and load teh Teradata data warehouse
Responsible for different Data mapping activities from Source systems to Teradata.
Queried Teradata Database and validated teh data using SQL Assistant.
Tested teh messages published by data stage and data loaded into various databases and for extraction, transformation and loading process
Used import and export facilities of teh application to download/upload XMLs of failed test cases so as to re-verify.
Writing UNIX scripts to perform certain tasks and assisting developers with problems and SQL optimization.
Configured Quick Test Pro with Quality Centre and Maintained teh project information in Quality Centre.
Extensively used Autosys for automation of scheduling jobs on daily, bi-weekly, weekly monthly basis with proper dependencies.
Wrote complex SQL queries using joins, sub queries and correlated sub queries
Performed Unit testing and System Integration testing by developing and documenting test cases in Quality Center.
Designed and developed UNIX shell scripts as part of teh ETL process, automate teh process of loading, pulling teh data.
Tested several complex reports generated by Micro Strategy including Dashboard, Summary Reports, Master Detailed, Drill Down and Score Cards
Involved in testing teh Micro Strategy reports by writing complex SQL queries
Involved in extensive DATA validation using SQL queries and back-end testing
Tested complex objects to teh universe to enhance teh report functionality.
Responsible for migrating teh code changes from development environment to SIT, UAT and Production environments.
Validated cube and query data from teh reporting system back to teh source system.

Environment: Data stage, Flat files, Perl, Erwin 4.0, DTS, MS SQL Server 2008, Oracle 10g, SQL, PL/SQL, IBM DB2 8.0, AGILE, Teradata V2R6, Teradata SQL Assistant, Micro strategy, COBOL, HP QTP 9.0, HP Quality Center 10, Autosys, Toad, Unix Shell Scripting, Windows XP/2000

Confidential

Java Developer

Responsibilities:

Involved in teh analysis, design, and development and testing phases of Software Development Life Cycle (SDLC).
Designed and developed framework components, involved in designing MVC pattern using Struts and Spring framework.
Responsible for developing Use case, Class diagrams and Sequence diagrams for teh modules using UML and Rational Rose.
Developed teh Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts - config.xml, Web.xml files.
Involved in Deploying and Configuring applications in Web Logic Server.
Used SOAP for exchanging XML based messages.
Used Microsoft VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in teh design phase.
Developed Custom Tags to simplify teh JSP code. Designed UI screens using JSP and HTML.
Actively involved in designing and implementing Factory method, Singleton, MVC and Data Access Object design patterns.
Web services used for sending and getting data from different applications using SOAP messages. Then used DOM XML parser for data retrieval.
Used JUnit framework for unit testing of application and ANT to build and deploy teh application on WebLogic Server.

Environment: Java, J2EE, JSP, Oracle, VSAM, Eclipse, HTML, MVC, ANT, WebLogic.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship