We provide IT Staff Augmentation Services!

Senior Hadoop Consultant Resume

2.00/5 (Submit Your Rating)

Morristown, NJ

SUMMARY

  • Result - driven with IT professional with 8+ years of experience on Big Data, Hadoop Development, Ecosystem analytics using Cloudera Distribution of Hadoop CDH3, CDH4, CDH5, Hortonworks Hadoop Distribution (HDP).
  • Expertise in setting, configuring & amp; Hadoop clusters using Apache tar balls & Hortonworks Ambari on Ubuntu, Red hat, Centos & Windows.
  • Hands-on experienced in Big data Management Platform (BMP) using Map Reduce, HDFS, HBase, Oozie, Hive, Pig, Sqoop, Flume, Pentaho, NIFI, Cassandra, Kafka, Zookeeper.
  • Excellent understanding of Hadoop architecture and complete understanding of Hadoop daemons and various components such as HDFS, YARN, Resource Manager, Node Manager, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions dat run Hadoop Map/Reduce and Pig jobs.
  • Excellent understanding and noledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Hands-on experience in troubleshooting errors in HBase Shell/API, Pig, Hive and MapReduce.
  • Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS amazon cloud.
  • Experience in Talend Big Data Studio 6.0. Push data as delimited files into HDFS using Talend Big Data studio.
  • Loaded streaming log data from various web servers into HDFS using Flume.
  • Created Hive internal and external tables defined with appropriate static and dynamic partitions.
  • Involved in Data staging validation, Map reduce validation and output validation phase.
  • Well noledge of ETL Bugs and ETL Mapping sheets.
  • Extensive experience working in Teradata, Oracle, Netezza, SQL Server and MySQL database.
  • Strong Knowledge on Spark concepts like RDD Operations, Caching and Persistence.
  • Analysis and development of mappings using the transformations in Informatica.
  • Handsome experience in Linux admin activities.
  • Excellent understanding of relational databases as pertains to application development using several RDBMS including in IBM DB2, Oracle 10g, MS SQL Server … and MySQL and strong database skills including SQL, Stored Procedure and PL/SQL.
  • Working noledge on J2EE development with Spring, Struts, Hibernate Frameworks in various projects and expertise in Web Services (JAXB, SOAP, WSDL, Restful) development Experience in writing tests using Spec2, Scala Test, Selenium, Testing and Junit.
  • Excellent communication skills, team player, quick learner, organized, resilient and self-motivated.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Cassandra, Power pivot, Puppet, Oozie, Zookeeper.

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

IDE s: Eclipse, Net beans.

Big data Analytics: Data Meer 2.0.5

Frameworks: MVC, Struts, Hibernate, Spring

Programming languages: C, C++, Java, Python, Ant scripts, Linux shell scripts

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

ETL Tools: Informatica, Pentaho

Testing: Win Runner, Load Runner, QTP, Selenium, JIRA, JENKINS, HPALM

PROFESSIONAL EXPERIENCE

Confidential, Morristown, NJ

Senior Hadoop Consultant

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Involved in developing and testing Pig Latin Scripts.
  • Involved in gathering the requirements, designing, development and testing
  • Launching and Setup of HADOOP/ HBASE Cluster which includes configuring different components of HADOOP and HBASE Cluster.
  • Hands on experience in loading data from UNIX file system to HDFS.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data from HBASE through Sqoop and placed in HDFS for further processing.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Involved in creating Hive tables, loading data and running hive queries in those data.
  • Extensive Working noledge of partitioned table, UDFs and performance tuning in Hive.
  • Sqoop framework is developed in Unix and generic script is used to ingest data for various sources
  • Connection details are fed through Metadata, so dat a common script is maintained, and Logs are Audited and stored in HDFS as Hive table.
  • Sqoop job and parameters are fine tuned to improve performance
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on MapReduce Joins in querying multiple semi-structured data as per analytic needs.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Created many Java UDF and UDAFs in hive for functions dat were not preexisting
  • Hive like the rank, Csum, etc.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Developed POC for Apache Kafka.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Gained noledge on building Apache Spark applications using Scala.
  • Hands-on various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
  • Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers.
  • Created concurrent access for hive tables with shared and exclusive locking dat can be enabled in hive with the halp of Zookeeper implementation in the cluster.
  • Created and Implemented Business, validation and coverage, Price gap Rules in Talend on Hive, using Talend Tool.
  • Involved in development of Talend components to validate the data quality across different data sources.

Environment: CDH 5.7.6, Hadoop 2.6, Spark 1.6.0, Scala 2.10, HBase 1.2.0, Apache Phoenix 4.7, Maven, Apache NIFI, Sqoop 1.4.6, MapReduce, HDFS, Pig, Hive 0.13, Intellij, Oracle EDB, DataStax Cassandra 4.8, Centos, Windows, Python 3.0, Tableau 9.0.

Confidential, New York, NY

Sr. Hadoop Consultant:

Responsibilities:

  • Developed the architecture of the data pipeline for data analysis.
  • Evaluated suitability of Hadoop and its ecosystem to the above project and implementing/ validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
  • Scala Script to load processed into DataStax Cassandra 4.8.
  • Transformed the ABintio Process into Hadoop using PIG and HIVE.
  • Map-Reduce Job to compare two files TSV and save the processed output into Oracle.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
  • Developed the services to run the Map-Reduce jobs as per the requirement basis.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Managing and scheduling Jobs on a Hadoop cluster.
  • Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
  • Good experience in writing MapReduce programs in Java on MRv2 / YARN environment.
  • Administered, installed, upgraded and managing HDP2.2, Pig, Hive & HBase.
  • Designed and Developed Talend Jobs to extract data from Oracle into MongoDB.
  • Implemented Capacity Scheduler to share the resources of the cluster for the MapReduce jobs given by the users.
  • Manage and review data backups and log files and experience in deploying Java applications on cluster.
  • Scheduling the jobs through Walgreens EBS internal Scheduling System.
  • Prepare multi-cluster test harness to exercise the system for performance and failover.
  • Develop high-performance cache, making the site stable and improving its performance.
  • Create a complete processing engine, enhanced to performance.

Environment: Amazon EC2, Apache Hadoop 1.0.1, MapReduce, HDFS, HBase, Hive, Pig, Oozie, Flume, Java (JDK 1.6), Eclipse, Spark, SQL. Hortonworks Data Platform 2.3.4, Hadoop 2.7, Spark 1.4.1, Scala 2.10, SBT 0.13, Sqoop 1.4.6, MapReduce, HDFS, Pig, Hive 0.13, Java, Oracle 11g, DataStax Cassandra 4.8, Centos, Windows.

Confidential

Hadoop Admin

Responsibilities:

  • Created a new project solution based on the company's technology direction ensured dat infrastructure services are projected based on current standard.
  • Upgrading the cluster from CDH 4. x to CDH 5.x.
  • Implemented HA for Name Node and HUE using Cloudera manager.
  • Configured HA proxy for IMPALA service.
  • Created snapshot's for in cluster backup of the data instance.
  • Created SQOOP scripts for ingesting data from Transactional systems to Hadoop.
  • Conducted Technology Evaluation sessions for Big Data, Data Governance, Hadoop and Amazon Web Services, Tableau and R, Data Analysis, Statistical Analysis, Data Driven Business Decision.
  • Integrated Tableau, Teradata, DB2, ORACLE via ODBC/JDBC drivers with Hadoop.
  • Worked with application teams to install the operating system, Hadoop updates, patches, version upgrades as required.
  • Created scripts for automating balancing data across the cluster using the HDFS load balancer utility.
  • Created POC for implementing streaming use case with Kafka and HBase services.
  • Working experience of maintaining MySQL database creation and setting up the users and maintain the backup of databases.
  • Integrated is an existing LLE and Production cluster with LDAP.
  • Implemented TLS for CDH Services and for Cloudera Manager.
  • Working with data delivery teams to set up new Hadoop users. dis job includes setting up Linux users, setting up Kerberos TEMPprincipals and testing HDFS, Hive.
  • Managed the backup and disaster recovery for Hadoop data. Coordinated root cause analysis efforts to minimize future system issues.
  • Served as lead technical infrastructure Architect and Big Data subject matter expert.
  • Deployed Big Data solutions in the cloud. Built, configured, monitored and managed end to end Big Data applications on Amazon Web Services (AWS).
  • Screen Hadoop cluster job performances and capacity planning.
  • Spinning clusters in Azure using Cloudera director. Implemented dis for POC for the cloud migration project.
  • Leveraged AWS cloud services such as EC2, auto-scaling and VPC to build secure, highly scalable and flexible systems dat handled expected and unexpected load bursts.
  • Defined Migration strategy to move the application to the cloud. Developed architecture blueprints and detailed documentation. Created bill of materials, including required Cloud Services (such as EMR, EC2, S3 etc.) and tools, experience in scheduling cron jobs on EMR.
  • Created bash scripts frequently, depending on the project requirements.
  • Implemented VPC, Auto scaling, S3, EBS, ELB, Cloud Formation templates and Cloud Watch services from AWS.

Environment: Over 1500 nodes, approximately 5 PB of data, Cloudera's distribution Hadoop (CDH) 5.5, HA name node, map reduce, Yarn, Hive, Impala, Pig, Sqoop, Flume, Cloudera Navigator, Control-M, Oozie, Hue, White elephant, Ganglia, Nagios, HBase, Cassandra, Kafka, Storm, Puppet.

Confidential

Java Developer

Responsibilities:

  • Developed user interface using JAVA Server Pages (JSP), HTML and Java Script for the Presentation Tier.
  • Developed JSP pages and client-side validation by java script tags.
  • Involved in team meetings with corporate webmaster UI team and end user client for understanding needs.
  • Developed web Components using JSP, Servlets and Server-side components using EJB under J2EE.
  • Developed an own realm for Apache Tomcat Server for autanticating the users.
  • Developed front end controller in Servlet to handle all the request.
  • Developed Controller Servlet to handle the database access.
  • Used Spring Framework as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
  • Extensively worked on J2EE Technologies to develop the project, Followed Spring MVC Framework for the development of the project.
  • Implemented Hibernate in the data access object layer to access and update information in the Oracle10g Database.
  • Created a Front-end application using JSPs and Spring MVC for registering a new entry and configured it to connect to database using Hibernate.
  • Configured the Hibernate configuration files to persist the data to the Oracle 10g Database.
  • Used Hibernate as ORM tool for accessing database.
  • Designed, Developed and analyzed the front-end and back-end using JSP, Servlets and Spring.
  • Executed JavaScript frameworks for real-time applications settling on using AngularJS for the frontend.
  • Designed and developed very complex and large web pages using AngularJS, HTML 5, and CSS.
  • Designed and developed Restful service interface using spring boot to the underlying customer event API.
  • Designed and developed front-end using Servlet, JSP, JSF, DHTML, Java Script and AJAX.
  • Developed various J2EE components like Servlet, JSP.
  • Developed custom tags to display the data in JSP pages.
  • Tested/De-bugged on browser using Firebug.
  • Coded JavaScript for page functionality and Pop up Screens and used HTML to make dropdown menus on web pages and display part of a web page upon user request.
  • Deployed the application in Production environment.
  • Supporting the application at client side

Environment: Java 1.5/J2EE, core java, JSF, Hibernate, JDBC, Eclipse, Spring, JSP, XML, XSL, JSTL, JavaScript, jQuery, MVC, Servlets, AJAX, HTML, CSS, UML, POJO, log4j, Junit, Soap, JMS, ANT, SVN, DAO, DTO, Apache Tomcat, Oracle SQL.

Confidential

J2EE Developer

Responsibilities:

  • Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle.
  • Involved in Requirement Analysis, Development and Documentation.
  • Used MVC architecture for Web tier.
  • Hands-on in developing form-beans and action mappings required for struts implementation and validation framework using struts.
  • Development of front-end screens with JSP Using Eclipse.
  • Coding for DAO Objects using JDBC (using DAO pattern).
  • XML and XSDs are used to define data formats.
  • Implemented J2EE design patterns value object singleton, DAO for the presentation tier, business tier and Integration Tier layers of the project.
  • Involved in Bug fixing and functionality enhancements.
  • Designed and developed excellent Logging Mechanism for each order process using Log4j.
  • Involved in writing Oracle SQL Queries.
  • Involved in requirement analysis and complete development of client-side code.
  • Followed Sun standard coding and documentation standards.
  • Participated in project planning with business analysts and team members to analyze the Business requirements and translated business requirements into working software.
  • Developed software application modules using disciplined software development process.

Environment: Java, J2EE, JSP, EJB, ANT, STRUTS1.2, Log4j, Web logic 7.0, JDBC, MyEclipse, Windows XP, CVS, Oracle.

We'd love your feedback!