We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

MN

SUMMARY

  • Eightyears of impeccable work experience in IT Industry, with over four years of professional work experience in Big DataHadoop (Cloudera distribution CDH3, 4 and 5) on clusters of 600 nodes
  • Extensive experience in both MapReduce MRv1 and MapReduce MRv2 (YARN)
  • Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms
  • Extensive experience in working withHDFS, PIG, Hive,Sqoop, Flume, Oozie, Zookeeper and Cassandra
  • Experience with Cloudera CDH3, CDH4 and CDH5 distributions
  • Extensive experience with ETL and Big data query tools like Pig Latin and Hive QL
  • Experience with Sequence files, AVRO and HAR file formats and compression
  • Expertise in installing,designing, sizing, configuring, provisioningand upgrading Hadoop environments
  • Experience in tuning and troubleshooting performance issues in Hadoop cluster with size of data over 120 TB
  • Experience on monitoring, performance tuning, SLA, scaling and security in Big Data systems.
  • Strong experience in working with Elastic MapReduceand setting up environments on Amazon AWS EC2 instances
  • Hands on NoSQL database experience with HBase, MongoDb and Cassandra
  • Good working noledge on Eclipse IDE for developing and debugging Java applications
  • Expertise in creating UI using JSP, HTML, XML and JavaScript.
  • Good experience in using databases - MongoDB, SQL Server, Stored Procedures, Constraints and Triggers.
  • Created teh data maps from database to dimension and fact table.
  • Carried out teh QA deployments and worked on teh processflowdiagram.
  • Created dimension and factjobs and schedulingjobruns.
  • Well experienced in using networkingtools like PuTTY and WinSCP
  • Extensive experience in documenting requirements, functional specifications, technical specifications
  • Highly motivated, adaptive and quicklearner
  • Exhibited excellentcommunication and leadership capabilities
  • Excellent Analytical, Problem solving and technical skills
  • Holds strong ability to handle multiple priorities and work load and also TEMPhas ability to understand and adapt to new technologies and environments faster
  • Open to relocation

TECHNICAL SKILLS

BigData/ Hadoop Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase, Amazon AWS (EMR)

Databases: MongoDB, Microsoft SQL Server, MySQL, Oracle, Cassandra

Languages: C, C++, Java,Python,SQL, TSQL, Pig Latin, HiveQL

Web Technologies: JSP, JavaBeans, JDBC, XML

Operating Systems: Windows, Unix and Linux

Front-End: HTML/HTML 5, CSS3, JavaScript/JQuery

Development Tools: Microsoft SQL Studio, Toad, Eclipse,NetBeans, MySQLWorkbench, Tableau

Reporting Tool: SSRS, Succeed

Office Tools: Microsoft Office Suite

Development Methodologies: Agile/Scrum, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, MN

Hadoop Developer

Responsibilities:

  • Worked on a live Big DataHadoop production environment with 600 nodes
  • Worked with highly unstructured and semi structured data of 40 TB in size
  • Designed and developed Pig ETL scripts to process data in a Nightly batch
  • Created Pig Macros to improve reusability of code and modularizing teh code
  • Developed Hive scripts for end user / analyst requirements for ad-hoc analysis
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
  • Worked in tuning Hive and Pig scripts to improve performance
  • Good experience in writing MapReduce programs in Java on MRv2 / YARN environment
  • Good experience in troubleshooting performance issues and tuning Hadoop cluster
  • Good working noledge of using Sqoop in performing incremental imports from Oracle to HDFS.
  • Experience in using Sequence files, AVRO and HAR file formats.
  • Good experience in working with compressed files and related formats.
  • Developed Oozie workflow for scheduling and orchestrating teh ETL process
  • Hands on experience withCassandraand its architecture
  • Performed data analysis with HBase using Hive external tables to HBase
  • Very good understanding of Single Point Of Failure (SPOF) of Hadoop Daemons and recovery procedures
  • Experience in setting upCloudera CDH3 / CDH4 nodes on Amazon EC2
  • Worked with teh infrastructure and admin team in designing, modeling, sizing and configuringHadoop cluster of 60 nodes
  • Currently planning a migration from CDH4 to CHD5
  • Good understanding of Impala

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Cassandra, Java, Oracle 10g, MySQL, Ubuntu, AWS

Confidential - Windsor, CT

Hadoop Developer

Responsibilities:

  • Installed and configured HadoopMapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing
  • Written MapReduce code to process and parsing teh data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration
  • Worked on moving all log files generated from various sources to HDFS for further processing
  • Developed workflows using custom MapReduce, Pig, Hive and Sqoop
  • Developing predictive analytic product for using Apache Spark, SQL/HiveQL, JavaScript, and High Charts.
  • Writing Spark programs to load, parse, refined and store sensor data into Hadoop and also process analyzed and aggregate data for visualizations.
  • Developing data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (json) for visualization, and generating. E.g. High charts: Outlier, data distribution, Correlation/comparison
  • Creating various views for HBASE tables and also utilizing teh performance of Hive on top of HBASE.
  • Developed teh Apache Storm, Kafka, and HDFS integration project to do a real time data analyses.
  • Designed and developed teh Apache Storm topologies for Inbound and outbound data for real time ETL to find teh latest trends and keywords.
  • Developed Map Reduce program for parsing and loading into HDFS information
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Written Hive UDF to sort Structure fields and return complex data type
  • Responsible for loading data from UNIX file system to HDFS
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library
  • Designed and developed a distributed processing system running to process binary files in parallel and crunch teh analysis metrics into a Data Warehousing platform for reporting.
  • Developed workflow in Control M to automate tasks of loading data into HDFS and preprocessing with PIG.
  • Cluster co-ordination services through Zookeeper
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster
  • Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Sqoop, Spark, UNIX, Cosmos.

Confidential - Austin, TX

Hadoop Developer

Responsibilities:

  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted teh data from MySQL into HDFS using Sqoop
  • Working with Apache Crunch library to write, test and run HADOOP MapReduce pipeline jobs
  • Involved in joining and data aggregation using Apache Crunch
  • Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team
  • Extensively used Pig for data cleansing.
  • Created partitioned tables in Hive.
  • Managed and reviewed Hadoop log files.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
  • Involved in writing, testing, and running MapReduce pipelines using Apache Crunch

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Apache Crunch, Sqoop, Java (jdk 1.6), Eclipse

Confidential, Township of Warren, NJ

ETL QA Tester

Responsibilities:

  • Tested ETL jobs as per business rules using ETL design document
  • Promoted Unix/Data Stage application releases from development to QA and to UAT environments
  • Assisted in creating fact and dimension table implementation in Star Schema model based on requirements.
  • Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
  • Defined data requirements and elements used in XML transactions.
  • Tested teh database schema with help of data architects using ERWIN
  • Involved in teh testing of Data Mart using Power Center
  • Identified and Documented additional data cleansing needs and consistent error patterns that could diverted by modifying ETL code.
  • Extensively used Teradata load utilities FastLoad, Multiload and FastExport to extract, transform and load teh Teradata data warehouse
  • Responsible for different Data mapping activities from Source systems to Teradata.
  • Queried Teradata Database and validated teh data using SQL Assistant.
  • Tested teh messages published by data stage and data loaded into various databases and for extraction, transformation and loading process
  • Used import and export facilities of teh application to download/upload XMLs of failed test cases so as to re-verify.
  • Writing UNIX scripts to perform certain tasks and assisting developers with problems and SQL optimization.
  • Configured Quick Test Pro with Quality Centre and Maintained teh project information in Quality Centre.
  • Extensively used Autosys for automation of scheduling jobs on daily, bi-weekly, weekly monthly basis with proper dependencies.
  • Wrote complex SQL queries using joins, sub queries and correlated sub queries
  • Performed Unit testing and System Integration testing by developing and documenting test cases in Quality Center.
  • Designed and developed UNIX shell scripts as part of teh ETL process, automate teh process of loading, pulling teh data.
  • Tested several complex reports generated by Micro Strategy including Dashboard, Summary Reports, Master Detailed, Drill Down and Score Cards
  • Involved in testing teh Micro Strategy reports by writing complex SQL queries
  • Involved in extensive DATA validation using SQL queries and back-end testing
  • Tested complex objects to teh universe to enhance teh report functionality.
  • Responsible for migrating teh code changes from development environment to SIT, UAT and Production environments.
  • Validated cube and query data from teh reporting system back to teh source system.

Environment: Data stage, Flat files, Perl, Erwin 4.0, DTS, MS SQL Server 2008, Oracle 10g, SQL, PL/SQL, IBM DB2 8.0, AGILE, Teradata V2R6, Teradata SQL Assistant, Micro strategy, COBOL, HP QTP 9.0, HP Quality Center 10, Autosys, Toad, Unix Shell Scripting, Windows XP/2000

Confidential

Java Developer

Responsibilities:

  • Involved in teh analysis, design, and development and testing phases of Software Development Life Cycle (SDLC).
  • Designed and developed framework components, involved in designing MVC pattern using Struts and Spring framework.
  • Responsible for developing Use case, Class diagrams and Sequence diagrams for teh modules using UML and Rational Rose.
  • Developed teh Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts - config.xml, Web.xml files.
  • Involved in Deploying and Configuring applications in Web Logic Server.
  • Used SOAP for exchanging XML based messages.
  • Used Microsoft VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in teh design phase.
  • Developed Custom Tags to simplify teh JSP code. Designed UI screens using JSP and HTML.
  • Actively involved in designing and implementing Factory method, Singleton, MVC and Data Access Object design patterns.
  • Web services used for sending and getting data from different applications using SOAP messages. Then used DOM XML parser for data retrieval.
  • Used JUnit framework for unit testing of application and ANT to build and deploy teh application on WebLogic Server.

Environment: Java, J2EE, JSP, Oracle, VSAM, Eclipse, HTML, MVC, ANT, WebLogic.

We'd love your feedback!