Hadoop Developer Resume
MN
SUMMARY
- Eightyears of impeccable work experience in IT Industry, with over four years of professional work experience in Big DataHadoop (Cloudera distribution CDH3, 4 and 5) on clusters of 600 nodes
- Extensive experience in both MapReduce MRv1 and MapReduce MRv2 (YARN)
- Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms
- Extensive experience in working withHDFS, PIG, Hive,Sqoop, Flume, Oozie, Zookeeper and Cassandra
- Experience with Cloudera CDH3, CDH4 and CDH5 distributions
- Extensive experience with ETL and Big data query tools like Pig Latin and Hive QL
- Experience with Sequence files, AVRO and HAR file formats and compression
- Expertise in installing,designing, sizing, configuring, provisioningand upgrading Hadoop environments
- Experience in tuning and troubleshooting performance issues in Hadoop cluster with size of data over 120 TB
- Experience on monitoring, performance tuning, SLA, scaling and security in Big Data systems.
- Strong experience in working with Elastic MapReduceand setting up environments on Amazon AWS EC2 instances
- Hands on NoSQL database experience with HBase, MongoDb and Cassandra
- Good working noledge on Eclipse IDE for developing and debugging Java applications
- Expertise in creating UI using JSP, HTML, XML and JavaScript.
- Good experience in using databases - MongoDB, SQL Server, Stored Procedures, Constraints and Triggers.
- Created teh data maps from database to dimension and fact table.
- Carried out teh QA deployments and worked on teh processflowdiagram.
- Created dimension and factjobs and schedulingjobruns.
- Well experienced in using networkingtools like PuTTY and WinSCP
- Extensive experience in documenting requirements, functional specifications, technical specifications
- Highly motivated, adaptive and quicklearner
- Exhibited excellentcommunication and leadership capabilities
- Excellent Analytical, Problem solving and technical skills
- Holds strong ability to handle multiple priorities and work load and also TEMPhas ability to understand and adapt to new technologies and environments faster
- Open to relocation
TECHNICAL SKILLS
BigData/ Hadoop Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase, Amazon AWS (EMR)
Databases: MongoDB, Microsoft SQL Server, MySQL, Oracle, Cassandra
Languages: C, C++, Java,Python,SQL, TSQL, Pig Latin, HiveQL
Web Technologies: JSP, JavaBeans, JDBC, XML
Operating Systems: Windows, Unix and Linux
Front-End: HTML/HTML 5, CSS3, JavaScript/JQuery
Development Tools: Microsoft SQL Studio, Toad, Eclipse,NetBeans, MySQLWorkbench, Tableau
Reporting Tool: SSRS, Succeed
Office Tools: Microsoft Office Suite
Development Methodologies: Agile/Scrum, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, MN
Hadoop Developer
Responsibilities:
- Worked on a live Big DataHadoop production environment with 600 nodes
- Worked with highly unstructured and semi structured data of 40 TB in size
- Designed and developed Pig ETL scripts to process data in a Nightly batch
- Created Pig Macros to improve reusability of code and modularizing teh code
- Developed Hive scripts for end user / analyst requirements for ad-hoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
- Worked in tuning Hive and Pig scripts to improve performance
- Good experience in writing MapReduce programs in Java on MRv2 / YARN environment
- Good experience in troubleshooting performance issues and tuning Hadoop cluster
- Good working noledge of using Sqoop in performing incremental imports from Oracle to HDFS.
- Experience in using Sequence files, AVRO and HAR file formats.
- Good experience in working with compressed files and related formats.
- Developed Oozie workflow for scheduling and orchestrating teh ETL process
- Hands on experience withCassandraand its architecture
- Performed data analysis with HBase using Hive external tables to HBase
- Very good understanding of Single Point Of Failure (SPOF) of Hadoop Daemons and recovery procedures
- Experience in setting upCloudera CDH3 / CDH4 nodes on Amazon EC2
- Worked with teh infrastructure and admin team in designing, modeling, sizing and configuringHadoop cluster of 60 nodes
- Currently planning a migration from CDH4 to CHD5
- Good understanding of Impala
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Cassandra, Java, Oracle 10g, MySQL, Ubuntu, AWS
Confidential - Windsor, CT
Hadoop Developer
Responsibilities:
- Installed and configured HadoopMapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing
- Written MapReduce code to process and parsing teh data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration
- Worked on moving all log files generated from various sources to HDFS for further processing
- Developed workflows using custom MapReduce, Pig, Hive and Sqoop
- Developing predictive analytic product for using Apache Spark, SQL/HiveQL, JavaScript, and High Charts.
- Writing Spark programs to load, parse, refined and store sensor data into Hadoop and also process analyzed and aggregate data for visualizations.
- Developing data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (json) for visualization, and generating. E.g. High charts: Outlier, data distribution, Correlation/comparison
- Creating various views for HBASE tables and also utilizing teh performance of Hive on top of HBASE.
- Developed teh Apache Storm, Kafka, and HDFS integration project to do a real time data analyses.
- Designed and developed teh Apache Storm topologies for Inbound and outbound data for real time ETL to find teh latest trends and keywords.
- Developed Map Reduce program for parsing and loading into HDFS information
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Written Hive UDF to sort Structure fields and return complex data type
- Responsible for loading data from UNIX file system to HDFS
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
- Designed and developed a distributed processing system running to process binary files in parallel and crunch teh analysis metrics into a Data Warehousing platform for reporting.
- Developed workflow in Control M to automate tasks of loading data into HDFS and preprocessing with PIG.
- Cluster co-ordination services through Zookeeper
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster
- Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning
Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Sqoop, Spark, UNIX, Cosmos.
Confidential - Austin, TX
Hadoop Developer
Responsibilities:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Responsible for building scalable distributed data solutions using Hadoop.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed Simple to complex MapReduce Jobs using Hive and Pig
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted teh data from MySQL into HDFS using Sqoop
- Working with Apache Crunch library to write, test and run HADOOP MapReduce pipeline jobs
- Involved in joining and data aggregation using Apache Crunch
- Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team
- Extensively used Pig for data cleansing.
- Created partitioned tables in Hive.
- Managed and reviewed Hadoop log files.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
- Installed and configured Pig and also written Pig Latin scripts.
- Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Involved in writing, testing, and running MapReduce pipelines using Apache Crunch
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Apache Crunch, Sqoop, Java (jdk 1.6), Eclipse
Confidential, Township of Warren, NJ
ETL QA Tester
Responsibilities:
- Tested ETL jobs as per business rules using ETL design document
- Promoted Unix/Data Stage application releases from development to QA and to UAT environments
- Assisted in creating fact and dimension table implementation in Star Schema model based on requirements.
- Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
- Defined data requirements and elements used in XML transactions.
- Tested teh database schema with help of data architects using ERWIN
- Involved in teh testing of Data Mart using Power Center
- Identified and Documented additional data cleansing needs and consistent error patterns that could diverted by modifying ETL code.
- Extensively used Teradata load utilities FastLoad, Multiload and FastExport to extract, transform and load teh Teradata data warehouse
- Responsible for different Data mapping activities from Source systems to Teradata.
- Queried Teradata Database and validated teh data using SQL Assistant.
- Tested teh messages published by data stage and data loaded into various databases and for extraction, transformation and loading process
- Used import and export facilities of teh application to download/upload XMLs of failed test cases so as to re-verify.
- Writing UNIX scripts to perform certain tasks and assisting developers with problems and SQL optimization.
- Configured Quick Test Pro with Quality Centre and Maintained teh project information in Quality Centre.
- Extensively used Autosys for automation of scheduling jobs on daily, bi-weekly, weekly monthly basis with proper dependencies.
- Wrote complex SQL queries using joins, sub queries and correlated sub queries
- Performed Unit testing and System Integration testing by developing and documenting test cases in Quality Center.
- Designed and developed UNIX shell scripts as part of teh ETL process, automate teh process of loading, pulling teh data.
- Tested several complex reports generated by Micro Strategy including Dashboard, Summary Reports, Master Detailed, Drill Down and Score Cards
- Involved in testing teh Micro Strategy reports by writing complex SQL queries
- Involved in extensive DATA validation using SQL queries and back-end testing
- Tested complex objects to teh universe to enhance teh report functionality.
- Responsible for migrating teh code changes from development environment to SIT, UAT and Production environments.
- Validated cube and query data from teh reporting system back to teh source system.
Environment: Data stage, Flat files, Perl, Erwin 4.0, DTS, MS SQL Server 2008, Oracle 10g, SQL, PL/SQL, IBM DB2 8.0, AGILE, Teradata V2R6, Teradata SQL Assistant, Micro strategy, COBOL, HP QTP 9.0, HP Quality Center 10, Autosys, Toad, Unix Shell Scripting, Windows XP/2000
Confidential
Java Developer
Responsibilities:
- Involved in teh analysis, design, and development and testing phases of Software Development Life Cycle (SDLC).
- Designed and developed framework components, involved in designing MVC pattern using Struts and Spring framework.
- Responsible for developing Use case, Class diagrams and Sequence diagrams for teh modules using UML and Rational Rose.
- Developed teh Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts - config.xml, Web.xml files.
- Involved in Deploying and Configuring applications in Web Logic Server.
- Used SOAP for exchanging XML based messages.
- Used Microsoft VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in teh design phase.
- Developed Custom Tags to simplify teh JSP code. Designed UI screens using JSP and HTML.
- Actively involved in designing and implementing Factory method, Singleton, MVC and Data Access Object design patterns.
- Web services used for sending and getting data from different applications using SOAP messages. Then used DOM XML parser for data retrieval.
- Used JUnit framework for unit testing of application and ANT to build and deploy teh application on WebLogic Server.
Environment: Java, J2EE, JSP, Oracle, VSAM, Eclipse, HTML, MVC, ANT, WebLogic.