Hadoop Consultant Resume
Chicago, IL
SUMMARY
- Result Oriented Professional building on 8+ years of progressive experience in Software Development includes application design and development along wif 3+ years in Big Data/ Hadoop experience in Hadoop ecosystem such as HDFS, MapReduce, Hive, Pig, Flume, Sqoop, Zookeeper, HBase, and Spark.
- Big data development experience wif Google cloud.
- Experience in working wif various Hadoop distributions - Cloudera and HortonWorks.
- Experience in migrating the data using Sqoop from Hadoop to Relational Database System and vice-versa.
- Expertise in Hadoop administration such as managing cluster, reviewing Hadoop log files.
- Experience wif leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
- Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
- Experience in NoSQL database MongoDB and Cassandra.
- Proficient in configuring Zookeeper, Cassandra & Flume to the existing Hadoop cluster.
- Experience in installation, configuration, supporting and managing- CloudEra's Hadoop platformalong wif CDH3&4 &5 clusters.
- Familiarity on real time streaming data wif Spark and Kafka.
- Experience in ETL analytics on ingested data using scripts built wif Hive, Pig, Spark, MapReduce dat include interactive, batch and real time processing.
- Expertise in Java/J2EE technologies such as Core Java, spring, Hibernate, JDBC, JSON, HTML, Struts, Servlets, JSP, JBOSS and JavaScript.
- Have Experience of using integrated development environment like Eclipse, Net beans, JDeveloper, My Eclipse.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Good Experience in writing complex SQL queries wif databases like DB2, Oracle 10g, MySQL, SQL Server and MS SQL Server 2005/2008.
- Extensive Experience in developing test cases, performing Unit Testing and Integration Testing using source code management tools such as GIT, SVN and Perforce.
- Strong team player, ability to work independently and in a team as well, ability to adapt to a rapidly changing environment, commitment towards learning.
- Ability to blend technical expertise wif strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.
TECHNICAL SKILLS
Big Data: Hadoop, Map Reduce, Pig, Hive, Hbase, Sqoop, Oozie, Cassandra, MongoDB, Horton Works, Kafka, Spark and Zookeeper, Big Query
Web development: HTML, Java Script, XML, PHP, JSP, Servlets, JavaScript
Databases: DB2, MySQL,MS Access, MS SQL server,Teradata, NoSQL, Vertica, Aster nCluster, SSAS, Oracle, Oracle Essbase.
Languages: Java / J2EE, HTML, SQL,Spring, Hibernate, JDBC,JSON, JavaScript
Operating Systems: Mac OS, Unix, Linux (Various Versions), Windows 2003/7/8/8.1/XP/Vista
Web/Application server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans
Version Control: Git, SVN, Perforce
IDE’S: Intellij, Eclipse, NetBeans, JDeveloper
PROFESSIONAL EXPERIENCE
Confidential - Chicago, IL
Hadoop Consultant
Responsibilities:
- Cloudera Hadoop installation & configuration of multiple nodes using Cloudera Manager and CDH 4.X/5.X.
- Prepared low level Design document and estimated efforts for the project.
- Developed the UNIX scripting code for loading, filtering and storing the data.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Installed and configured Hadoop Map Reduce, HDFS and Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Installing, Upgrading and Managing Hadoop Cluster on Hortonworks and wifin AWS.
- Experience wif NoSQL data modeling wif Cassandra/Hbase/MongoDB etc.
- Involved in loading data from UNIX file system to HDFS.
- Performed data analysis in Hive by creating tables, loading it wif data and writing hive queries which will run internally in a MapReduce way.
- Running process improvement processes to reduce defects in order to close production issues and improve applications.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
- Developed and implemented jobs in MR2 Horton Works Cluster and Cluster wifin AWS.
- Developed the PIG code for loading, filtering and storing the data.
- Developed Hive Scripts (HQL) for automating the joins for different sources.
- Developed various Big Data workflows using Oozie.
- Big data development wif cloud experience Google cloud preferred.
- Development of MapReduce programs and data migration from existing data source using Sqoop.
- Developed the custom writable Python programs to load the data into the HBase.
- Developed Map Reduce Programs using MRv1 and MRv2 (YARN).
- Developed Spark SQL jobs dat read data from Data Lake using Hive transform and save it in Hbase.
- Strong application DBA skills wif Data modeling skills for NoSQL and relation databases.
- Built Java client dat is responsible for receiving XML file using REST call and publishing it to Kafka.
- Built Kafka + Spark streaming job dat is responsible for reading XML file messages from Kafka and transforming it to POJO using JAXB.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
- Effectively used Oozie to develop automatic workflows of Sqoop, MapReduce and Hive jobs.
- Involved in running Hadoop jobs for processing millions of records of text data for batch and online processes by using Tuned/Modified SQL.
- Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Cassandra, MongoDB & Elastic Search.
- Designed and published workbooks and dashboards using Tableau Dashboard/Server 6.X/7.X
Environment: Hadoop (HDFS), HBase, Map Reduce, Hive, Spark, Kafka, Oozie, flume, Spark, Cassandra, Horton works, UNIX Shell Scripting, MongoDB, MySQL, Eclipse, Toad, and HP Vertica 6.X/7.X.
Confidential - Hartford
Hadoop/Spark Developer
Responsibilities:
- Review Business requirements documents test team to provide insights into the data scenarios and test cases.
- Analyzing and understanding the Business requirements and Verifying the Business requirement document and Technical design document against requirements.
- Experience in Extract, Transform, and Load (ETL) Design, development and Testing.
- Experience working on Spark and Scala.
- Experience in scheduling the Workflows and monitoring them. Provided Pro-Active Production Support after go-live.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for data cleaning and preprocessing.
- Extracted and processed the data from Legacy systems and stored it on HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in creating Hive tables, writing complex Hive queries to populate Hive tables.
- Generating user reports using HQL on the data stored on HDFS.
- Experience in tuning the HQL queries to improve the performance.
- Experienced in managing and reviewing Hadoop log files.
- Load and transform large sets of structured, semi structured and unstructured data.
- Supported Map Reduce programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Used Oozie as an automation tool for running the jobs.
- Experience working on Hadoop and utilities like HDFS, Map Reduce, SQOOP, HIVE, OOZIE, KAFKA, IMPALA, HUE.
- Experience in Unix scripting.
- Experience in utilizing Teradata utilities FastLoad, MultiLoad, BTEQ scripting, TPT and FastExport.
- Identified and performed field level compression on Teradata tables.
- Experience in testing Data Marts, Data Warehouse/ETL Applications developed in mainframe/Teradata.
- Experience in loading from various data sources like Teradata, Oracle, Fixed Width and Delimited Flat Files.
- Involved in Data Extraction from Teradata and Flat Files using SQL assistant.
- Written several complex SQL queries for validating Reports.
- Tested several stored procedures.
- Attending reviews, status meetings and participated in customer interaction.
- Debugging the SQL-Statements and stored procedures for business scenarios.
- Performed extensive Data Validation, Data Verification against Data Warehouse.
- Analyzed the bug reports running SQL queries against the source system(s) to perform root-cause analysis.
- Created SQL queries to generate ad-hoc reports for the business.
- Verifying the Business requirement document and Technical design document against requirements.
- Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Created and validated the test data environment for Staging area, loading the Staging area wif data from multiple sources.
- Created data masking rules to mask sensitive data before extracting of test data from various sources and loading of data into tables.
- Created ETL test data for all transformation rules and covered all the scenarios required for implementing business logic.
- Developed and tested various stored procedure as part of process automation in Teradata.
- Tested the ETL process for both before and after data cleansing process.
- Validating the data passed to downstream systems.
- Experience in generating following Hadoop performance metrics using Cloudera Manager dat portrays the overall cluster health status on weekly basis for senior management Confidential the bank.
- CPU and Memory Utilization for all edge nodes and Data nodes
- Disk Space Utilization on all mount points for all the edge nodes
- Disk Space and Memory Utilization on Name Node
- Edge Node Disk Utilization by application
- Job tracker Memory used
- Average map & reduce task running
- RPC average processing time Remote procedure calls
- HDFS Cluster Disk usage by applications
- Healthy task tracker
- Block distribution across all PROD data nodes
Environment: Teradata, Hadoop, Unix, Spark, Scala, Subversion, Git, Bitbucket, DM Express, Mainframe, MS Visio, MS Office Suite, Quality Centre, MS Outlook, HP Quality Centre.
Confidential - Dallas, TX
Hadoop Consultant
Responsibilities:
- Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Worked on debugging, performance tuning and Analyzing data using Hadoop components Hive & Pig.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Developed and implemented jobs in MR2 Horton Works Cluster.
- Developed and executed Hive, Spark and PIG Queries for de-normalizing the data.
- Created Hive tables from JSON data using data serialization framework like AVRO.
- Implemented generic export framework for moving data from HDFS to RDBMS and vice-versa.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Worked on loading data from LINUX file system to HDFS.
- Created HBase tables to store various data formats of PII data coming from different portfolios Implemented Map-reduce for loading data from oracle database to NoSQL database.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
- Moved data from Hadoop to Cassandra using Bulk output format class.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Responsible for processing unstructured data using Pig and Hive.
- Adding nodes into the clusters & decommission nodes for maintenance.
- Created PIG script jobs in maintaining minimal query optimization.
- Worked on various Business Object Reporting functionalities such as Slice and Dice, Master/detail, User Response function and different Formulas.
- Strong experience on Apache server configuration.
Environment: Hadoop, HDFS, HBase, Pig, Hive, Spark, HortonWorks, Oozie, MapReduce, Sqoop, Cloudera, MongoDB, Cassandra, Kafka, LINUX, Java APIs, Java collection, Windows.
Confidential, Seattle, WA
Hadoop Admin/Developer
Responsibilities:
- Supported Map Reduce Programs those are running on the cluster.
- Involved in using Pig Latin to analyze the large scale data.
- Involved in loading data from UNIX file system to HDFS.
- Interacted wif business users on regular basis to consolidate and analyze the requirements and presented them wif design results.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in data visualization and provided the files required for the team by analyzing the data in hive and developed Pig scripts for advanced analytics on the data
- Created many user-defined routines, functions, before/after subroutines which facilitated in implementing some of the complex logical solutions.
- Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
- Worked on improving the performance by using various performance tuning strategies.
- Managed the evaluation of ETL and OLAP tools and recommended the most suitable solutions depending on business needs.
- Migrated jobs from development to test and production environments.
- Created external tables wif proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
- Involved in moving all log files generated from various sources to HDFS for further processing.
- Used Shell Scripts for loading, unloading, validating and records auditing purposes.
- Used Teradata Aster bulk load feature to bulk load flat files to Aster.
- Shell Scripts are also used for file validating, records auditing purposes.
- Used Aster UDFs to unload data from staging tables and client data for SCD which resided on Aster database.
- Extensively used SQL and PL/SQL for development of Procedures, Functions, Packages and Triggers.
Environment: Java, SQL, PL/SQL, Unix Shell Scripting, XML, Teradata Aster, Hive, Pig, Hadoop, MapReduce, Clear Case, HP Unix, Windows XP professional.
Confidential
Java Developer / Hadoop Developer
Responsibilities:
- Involved in Requirements analysis, design, and development and testing.
- Involved in developing of Group portal and Member portal applications.
- Developed front end using Struts and JSP.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in data visualization and provided the files required for the team by analyzing the data in hive and developed Pig scripts for advanced analytics on the data
- Created many user-defined routines, functions, before/after subroutines which facilitated in implementing some of the complex logical solutions.
- Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
- Developed webpages using HTML, Java script, JQuery and CSS.
- Developed customized reports and Unit Testing using JUNIT.
- Used Java 1.6, spring, Hibernate, Oracle, to build the product suite.
- Responsible for building projects in deployable files (WAR files and JAR files).
- Coded Java Servlets to control and maintain the session state and handle user requests.
- Involved in development, and Testing, phases of the project by following agile methodology.
- Implemented the logging mechanism using log4j framework.
- Developed Web Services.
- Verified software errors and interacted wif developers to resolve the technical issues.
- Used Maven to build the J2EE application.
- Wrote complex SQL queries and stored procedures.
- Involved in maintenance of different applications.
Environment: Servlet, Enterprise Javabeans, Custom Tags, Stored Procedures, JavaScript, Java, Spring Framework, Struts, Web Services, Oracle.
Confidential
Java Developer
Responsibilities:
- Involved in the designing of the project using UML.
- Followed J2EE Specifications in the project.
- Designed the user interface pages in JSP.
- Used XML and XSL for mapping the fields in database.
- Used JavaScript for client side validations.
- Created stored procedures and triggers dat are required for project.
- Created functions and views in Oracle.
- Enhanced the performance of the whole application using the stored procedures and prepared statements.
- Responsible for updating database tables and designing SQL queries using PL/SQL.
- Created bean classes for communicating wif database.
- Involved in documentation of the module and project.
- Prepared test cases and test scenarios as per business requirements.
- Involved in bug fixing.
- Prepared coded applications for unit testing using JUnit.
Environment: Java, JSP, Servlets, J2EE, EJB 3, Java Beans, Oracle, HTML, DHTML, XML, XSL, JavaScript, BEA WebLogic.