Sr. Hadoop Developer Resume
San Antonio, TX
SUMMARY:
- Result - driven IT Professional with 8+ years of professional experience in various Software Development positions in core and enterprise software development using Big Data, Java/J2EE and Open Source technologies.
- 3+ years of hands-on experience on Hadoop Ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper, Flume, and Kafka including its installation and configuration.
- Good experience in AWS, Horton Works and Cloudera Hadoop distributions.
- In depth knowledge of Hadoop architecture and various components such as HDFS, JobTracker, NameNode, DataNode, MapReduce and Yarn concepts.
- Experience in writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Experience in using TalenD Big Data components to create connections to various third-party tools used for transferring, storing or analyzing big data, such as Sqoop, MongoDB and Big Query to quickly load, extract, transform and process large and diverse data sets.
- Experience in extending Hive and Pig core functionality by writing custom UDFs.
- Hands on experience in extending the core functionalities of HIVE using UDF, UDAF and UDTF.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems
- Experience in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Good knowledge of NoSQL databases such as HBase, MongoDB and Cassandra.
- Strong experience in developing Shell Scripts and Python Scripts for system management.
- Experience in developing distributed Web applications and Enterprise applications using Java/ J2EE technologies (Core Java (JDK 6+).
- Excellent programming skills on Java, C, SQL and Python Programming.
TECHNICAL SKILLS:
Hadoop Core Services: HDFS, MapReduce, Spark, Yarn
Hadoop Distribution: Hortonworks, Cloudera, Apache
NoSQL Databases: HBase, Cassandra, MongoDB
Hadoop Data Services: Hive, Pig, Impala, Sqoop, Flume, Kafka, Storm, Solr
Hadoop Operational Services: Zookeeper, Oozie
Programming Languages: Core Java, Servlets, Hibernate, Spring, Struts, Scala, Python
Databases: Oracle, MySQL, SQL Server
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat
Operating Systems: UNIX, Windows, LINUX
Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ
PROFESSIONAL EXPERIENCE:
Confidential, San Antonio, TX
Sr. Hadoop Developer
Roles & Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop
- Monitored health of all the Processes related to HDFS, YARN, PIG, HIVE, using Cloudera Manager.
- Effectively worked on reviewing and managing log files.
- Involved in Installing Cloudera Manager, Hadoop, Zookeeper, HIVE, PIG etc.
- Used Oozie to develop automatic workflows of Sqoop and Hive jobs.
- Wrote XML scripts to build OOZIE functionality.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Implemented twenty nodes CDH5 Hadoop cluster on Ubuntu LINUX
- Upgraded the Cloudera management version from 5.9 to 5.12.
- Implemented dynamic partitions in Hive Shell.
- Migrated data from MySQL server to Hadoop using Sqoop for processing data.
- Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
Environment: Hadoop 2.8.0-cdh5.12, Map Reduce, HDFS, Pig, Hive, Impala, Oozie, Java, Linux, Spark SQL, Shell Scripting, Python Scripting
Confidential, Princeton, NJ
Sr. Hadoop Developer
Roles & Responsibilities:
- Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS
- Developed NiFi Workflow to pick up the multiple files from ftp location and move those to HDFS on daily basis.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
- Integrated Hive and Tableau Desktop reports and published to Tableau Server.
- Involved in developing Impala scripts to do Adhoc queries
- Worked with developer teams on NiFi Workflow to pick up the data from rest API server, from Data Lake as well as from SFTP server and send that to Kafka broker.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Effectively used Oozie to develop automatic workflows of Sqoop, MapReduce and Hive jobs. Created Hive tables to store the processed results in tabular format.
- Handled Hive queries using Spark SQL that integrate Spark environment.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON
- Pre-processed all input files with python code to make ready to load into table
- Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
- Designed and implemented custom NiFi Processors that reacted, processed for the data pipeline
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Implemented partitioning, dynamic partitions and buckets in HIVE Shell
- Used SFTP to transfer and receive the files from various upstream and downstream systems.
- Developed and executed shell scripts to automate the jobs
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Worked in tuning Hive and Pig scripts to improve performance
- Worked on performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Environment: Hadoop 2.6.0-cdh5.8.3, Map Reduce, HDFS, Pig, Hive, Impala, Oozie, Java, Linux, Spark SQL, NiFi, Shell Scripting, Python Scripting
Confidential, Waltham, MA
Sr. Hadoop Developer
Roles & Responsibilities:
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Developed MapReduce programs that filter bad and un-necessary records and find out unique records based on different criteria.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Cassandra and SQL.
- Worked on TalenD job migration and deployment to different environment and successfully scheduled job in TAC.
- Developed Secondary sorting implementations to get sorted values at reduce side to improve MapReduce performance.
- Implemented Custom Writable, Input Format, Record Reader, Output Format, and Record Writer for MapReduce computations to handle custom business requirements.
- Implemented MapReduce programs to classify data records into different classifications based on different type of records.
- Involved in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Created complex mappings in TalenD using components like tMap, tJoin, tReplicate, tParallelize, tAggregateRow, tSort, tFilterRow etc. and created various complex mappings.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Implemented Daily Oozie coordination jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
- Responsible for performing extensive data summarization using Hive.
- Imported the data into Spark from Kafka Consumer group using Spark Streaming APIs.
- Developed Pig UDF's to pre-process the data for analysis using Java or Python.
- Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
- Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Implemented test scripts to support test driven development and continuous integration.
- Implemented unit tests with MRUnit and PIG Unit.
- Configured build scripts for multi module projects with Maven and Jenkins CI.
Environment: HDP 2.2, Map Reduce, HDFS, Pig, Hive, Impala, Oozie, Java, Kafka, Storm, Linux, Maven, Oracle 11g/10g, SVN, MongoDB, TalenD
Confidential, New York, NY
Hadoop Developer
Roles & Responsibilities:
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
- Involved in loading data from LINUX file system to HDFS.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Imported data into HDFS and Hive using Sqoop.
- Integrated Spark with various NoSQL Databases like HBase, Cassandra and Message Brokering Kafka in Cloudera.
- Implemented test scripts to support test driven development and continuous integration.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System)
- Developed multiple MapReduce jobs in java for data.
- Performed performance tuning for Spark Streaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleansing and preprocessing.
- Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise to get transformed data sets.
- Involved in maintaining and debugging MR programs.
- Worked on tuning the performance Pig scripts.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple MapReduce jobs.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as Required
Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Linux, Java, Oozie, Cassandra
Confidential
Java Developer
Roles & Responsibilities:
- Designed, developed, maintained, tested, and troubleshoot Java and PL/SQL programs in support of Payroll employees.
- Developed documentation for new and existing programs, designs specific enhancements to application.
- Implemented web layer using JSF.
- Implemented business layer using Spring MVC.
- Implemented Getting Reports based on start date using SQL.
- Implemented Session Management using Session Factory in Hibernate.
- Developed the DO’s and DAO’s using hibernate.
- Implement SOAP web service to validate zip code using Apache Axis.
- Wrote complex queries, PL/SQL Stored Procedures, Functions and Packages to implement Business Rules.
- Wrote PL/SQL program to send EMAIL to a group from backend.
- Developer scripts to be triggered monthly to give current monthly analysis.
- Scheduled Jobs to be triggered on a specific day and time.
- Modified SQL statements to increase the overall performance as a part of basic performance tuning and exception handling.
- Used Cursors, Arrays, Tables, Bulk collect concepts.
- Extensively used log4j for logging the log files
- Performed UNIT testing in all the environments.
- Used Subversion as the version control system
Environment: Java 1.4.2, Spring MVC, JMS, Java Mail API 1.3, Hibernate, HTML, CSS, JSF, JavaScript, Junit, RAD, Web Service, UNIX
Confidential
Java/J2ee Developer
Roles & Responsibilities:
- Involved in all the phases of the life cycle of the project from requirements gathering to quality assurance testing.
- Developed Class diagrams, Sequence diagrams using Rational Rose.
- Responsible in developing Rich Web Interface modules with Struts tags, JSP, JSTL, CSS, JavaScript, Ajax, GWT.
- Developed presentation layer using Struts framework, and performed validations using Struts Validator plugin.
- Created SQL script for the Oracle database
- Implemented the Business logic using Java Spring Transaction Spring AOP.
- Implemented persistence layer using Spring JDBC to store and update data in database.
- Produced web service using WSDL/SOAP standard.
- Implemented J2EE design patterns like Singleton Pattern with Factory Pattern.
- Extensively involved in the creation of the Session Beans and MDB, using EJB 3.0.
- Used Hibernate framework for Persistence layer.
- Extensively involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
- Deployed and built the application using Maven.
- Performed typo JUnit.
- Used JIRA to track bugs.
- Extensively used Log4j for logging throughout the application.
- Produced a Web service using REST with Jersey implementation for providing customer information.
- Used SVN for source code versioning and code repository.
Environment: Java (JDK1.5), J2EE, Eclipse, JSP, JavaScript, JSTL, Ajax, GWT, Log4j, CSS, XML, Spring, EJB, MDB, Hibernate, WebLogic, REST, Rational Rose, Junit, Maven, JIRA, SVN