Sr. Hadoop Consultant Resume
Boston, MA
SUMMARY:
- Around 7+ years of experience in IT with over around 6 years of hands - on experience as Hadoop Administrator.
- Hands on experience in deploying and managing multi-node development, testing and production of Hadoop Cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, ZOOKEEPER, HBASE) using Cloudera Manager and Hortonworks Ambari.
- Hand on experience in Big Data Technologies/Framework like Hadoop, HDFS, YARN, MapReduce, HBase, Hive, Pig, Sqoop, NoSQL, Flume, Oozie.
- Proficiency with the application servers like WebSphere, WebLogic, JBOSS and Tomcat.
- I Support on Liaison with Software to technical team for automation, installation and configuration tasks.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Experience on Commissioning, Decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
- As an admin involved in Cluster maintenance, trouble shooting, Monitoring and followed proper backup& Recovery strategies.
- Used Namespace support to map Phoenix schemas to HBase namespaces.
- Designed and implemented database software migration procedures, and guidelines.
- Performed administrative tasks on Hadoop Clusters using HortonWorks.
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Map Reduce, Sqoop, Apache NiFi, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Installing and monitoring the Hadoop cluster resources using Ganglia and Nagios.
- Experience in designing and implementation of secure Hadoop cluster using Kerberos.
- Experience with Hadoop Architecture and Big Data users to implement new Hadoop eco-system technologies to support multi-tenancy cluster.
- Skilled in monitoring servers using Nagios, Data dog, Cloud watch and using EFK Stack Elasticsearch, Fluentd Kibana.
- Implemented DB2/LUW replication, federation, and partitioning (DPF).
- Areas of expertise and accomplishment include: Database Installation/Upgrade, Backup/Recovery,
- Hands on experience in installing, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions (CDH 5.X).
- Experience on capacity planning, hdfs management and yarn resource management.
- Hands on experience on configuring a Hadoop cluster in an enterprise environment and on VMWare and Amazon Web Services (AWS) using an EC2 instances.
- Installed and configured a Hortonworks HDP 2.3.0 using AMBARI 2.1.1 manager.
- Hands on experience in upgrading the cluster from HDP 2.0 to HDP 2.3.
- Having strong experience/expertise in different Data warehouse tools including ETL tools like Ab Initio, Informatica, etc. and BI tools like Cognos, Micro strategy, Tableau.
- Expertise in interactive data visualization and analyzing with BI tools like Tableau.
- Worked with Different Relational Database systems like Oracle/PL/SQL. Used Unix Shell scripting, Python and Experience working on AWS EMR Instances.
- Used Python and Shell scripts to compare data consistency between different systems and also for data loading and replication.
- Used NoSQL database with Cassandra, MongoDB, Monod and Designed table.
- Worked on setting up Name Node High Availability for major production cluster and designed automatic failover control using Zookeeper and Quorum Journal Nodes.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Experience on Commissioning, Decommissioning, Balancing and Managing Nodes and tuning server for optimal performance of the cluster.
- Familiar with writing Oozie workflows and Job Controllers for job automation.
- Experience in dealing with structured, semi-structured and unstructured data in HADOOP ecosystem.
- Importing data from various data sources, transformation using Hive, Pig, and loaded data into HBase.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Extracted the data from Teradata into HDFS/Databases/Dashboards using SPARK STREAMING for data analysis.
- Analysed the client's existing Hadoop infrastructure to understand the performance bottlenecks and provided performance tuning accordingly.
- Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts.
- Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
TECHNICAL SKILLS:
Hadoop Ecosystem: Hadoop, HDFS, MapReduce, Hive, Pig, Zookeeper, Sqoop, Oozie, Yarn, Apache NiFi, Apache Phoenix, Spark, Kafka, Storm, Ambari1.0-2.1.1, Kerberos, Liaison, Flume and mahout.
Hadoop Management & Security: Hortonworks, Cloudera Manager.
Web Technologies: HTML, XHTML, XML, XSL, CSS, JavaScript
Server Side Scripting: Shell, Perl, Python.
Database: Oracle 10g, Microsoft SQL Server, MySQL, DB2, SQL, RDBMS.
Web Servers: Apache Tomcat 5.x, BEA WebLogic 8.x, IBM, WebSphere 6.0/ 5.1.1
Programming Languages: C, Java, Pl SQLNO SQL
Databases: HBase, Mongo DB
Virtualization: VMware, ESXI, VSphere, VCenter Server.
SDLC Methodology: Agile (SCRUM), Waterfall.
Operating Systems: Windows 2000 Server, Windows 2000 Advanced Server, Windows Server 2003 Centos, Debian, Fedora, Windows NT, Windows 98/XP UNIX, Linux RHEL, DB2
WORK EXPERIENCE:
Sr. Hadoop Consultant
Confidential, Boston, MA
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- Worked on security components like Kerberos, ranger, sentry, hdfs encryption.
- Created Oozie workflows to automate data ingestion using Sqoop and process incremental log data ingested by Flume using Pig.
- Load log data into HDFS using Flume, Kafka and performing ETL integrations.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Extracted the data from Teradata into HDFS using the Sqoop.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- I was also involved in core components of HDP like Yarn and HDFS, which I was using to get architect platform.
- Complete end to end design and integration of Apache NiFi.
- Involved in setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- I use Phoenix for Hive integration to join huge tables to other huge tables.
- Resolve PHOENIX-3164 bug that affects the Phoenix Query Server deployed.
- With Phoenix Execute aggregate queries through server-side hooks (called co-processors).
- Architecture and designed Hadoop 30 nodes Innovation Cluster with SQRRL, SPARK, Solr, Puppet, HDP 2.2.4
- Managed 900+ Nodes HDP 2.2.4 cluster with 4 petabytes of data using Ambari 2.0 and Linux Cent OS 6.5
- Installed and configured Hortonworks Distribution Platform (HDP 2.3) on Amazon EC2 instances.
- Used Python for instantiating multi-threaded application and running with other applications.
- I use Phoenix for Hive integration to join huge tables to other huge tables.
- Used Restful Web Services API to connect with the MapR table. Involved to create connection to Database was developed through restful web services API.
- Responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
- Monitoring various components using monitoring/logging tools (ex: Icinga, Splunk, Graphite, Prometheus).
- Automating and Monitoring configuration of systems using Puppet and Zenoss.
- Created Splunk app for Enterprise Security to identify and address emerging security threats using continuous monitoring, alerting and analytics.
- Expertise with NoSQL databases like Hbase, Cassandra, DynamoDB (AWS) and MongoDB.
- Involved in helping the UNIX and Splunk administrators to deploy Splunk across the UNIX and windows environment.
- Created interactive dashboards utilizing parameters, actions, and calculated fields utilizing Tableau Desktop.
- Lead Bigdata Hadoop/YARN Operations and managed an off-shore team.
- Provided support on Kerberos related issues and Coordinated Hadoop installations/upgrades and patch installations in the environment.
Environment: Hive, Pig, HBase, Zookeeper and Sqoop, ETL, Azure, Hortonworks, Apache Phoenix, Ambari 2.0, Apache NiFi, Linux Cent OS, HBase, Splunk, Liaison, MongoDB, Elasticsearch, Teradata, Puppet, Kerberos, Kafka, Cassandra, Python, Agile/scrum.
Hadoop Administrator
Confidential, Irving, TX
Responsibilities:
- Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Involved in capacity planning, with reference to the growing data size and the existing cluster size.
- Worked on analysing Hadoop cluster and different big data analytic tools including Pig, HBase, NoSQL, and databases, Flume, Oozie and Sqoop.
- Experience in designing, implementing and maintaining of high performing Bigdata, Hadoop clusters and integrating them with existing infrastructure.
- Deployed the application and tested on Websphere Application Servers.
- Configured SSL for Ambari, Ranger, Hive and Knox.
- Experience in methodologies such as Agile, Scrum, and Test driven development.
- Creating principles for new users in the Kerberos and Implemented and maintained Kerberos cluster and integrated with the Active Directories (AD).
- Worked with data pipeline using Kafka and Storm to store data into Hdfs.
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Involved in migrating java test framework to python flask.
- Shell scripting for Linux/Unix Systems Administration and related tasks. Point of Contact for Vendor escalation.
- Monitoring and analysing MapReduce jobs and look out for any potential issues and address them.
- Collected the logs data from web servers and integrated into HDFS using Flume.
- Moving the data from Oracle, Teradata, MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in Map Reduce way.
- Created Splunk app for Enterprise Security to identify and address emerging security threats using continuous monitoring, alerting and analytics.
- I use Phoenix Query Server to create a new ZooKeeper connection for each “client session”.
- With Phoenix Support for updatable view to extend primary key of base table.
- Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
- Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
- Good knowledge in implementing Name Node Federation and High Availability of Name Node and Hadoop Cluster using Zookeeper and Quorum-Journal Manager.
- Good knowledge in adding security to the cluster using Kerberos and Sentry.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.
- Exported the patterns analyzed back to Teradata using Sqoop.
- Hands-On experience in setting up ACL (Access Control Lists) to secure access to the HDFS file system.
- Analyze escalated incidences within the Azure SQL database.
- Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
- Fine Tuned Hadoop cluster by setting proper number of map and reduced slots for the Task Trackers.
- Experience in tuning the heap size to avoid any disk spills and to avoid OOM issues.
- Familiar with job scheduling using Fair Scheduler so that CPU time is well distributed amongst all the jobs.
- Experience managing users and permissions on the cluster, using different authentication methods.
- Involved in regular Hadoop Cluster maintenance such as updating system packages.
- Experience in managing and analysing Hadoop log files to look troubleshooting issues.
Environment: Hadoop, YARN, Hive, HBase, Flume, Cloudera Manager, Apache Phoenix, Kafka, Zookeeper, Oozie and Sqoop, MapReduce, Ambari, HDFS, Teradata Splunk, Elasticsearch, Jenkins, GitHub, Kerberos, MySQL, Apache NiFi, NoSQL, MongoDB, Java, Shell Script, Python.
Hadoop Administrator
Confidential, Greenville, SC
Responsibilities:
- Responsible for architecting Hadoop cluster.
- Involved in source system analysis, data analysis, data modelling to ETL (Extract, Transform and Load) and HiveQL
- Strong Experience in Installation and configuration of Hadoop ecosystem like Yarn, HBase, Flume, Hive, Pig, Sqoop.
- Expertise in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Manage and review Hadoop Log files.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Worked extensively with Sqoop for importing data.
- Designed a data warehouse using Hive.
- Created partitioned tables in Hive.
- Mentored analyst and test team for writing Hive Queries.
- Extensively used Pig for data cleansing.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Troubleshoot network connectivity issues within the monitoring environment.
- Work with various support teams optimizing their monitoring offering packages.
- Collaborate with the Windows/UNIX Administration team by assisting them in configuring. And troubleshooting O/S and H/W monitoring related issues.
- Worked on Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Build and published customized interactive reports and dashboards using Tableau server.
- Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
- Worked on pulling the data from relational databases, Hive into the Hadoop cluster using the Sqoop import for visualization and analysis.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Environment: Hadoop, HDFS, Map-Reduce, hive, Hortonworks, pig, Kafka, Oozie, Zookeeper, Sqoop, Nagios, MySQL, NoSQL, MongoDB, Java.
Hadoop Administrator
Confidential, Buffalo, NY
Responsibilities:
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Installed and configured Hadoop and Ecosystem components in Hortonworks environments. Configured Hadoop, Hive and Pig on Amazon EC2 servers.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2 cluster and Implemented Sentry for the Dev Cluster.
- Work directly with the Networking team and Build team on configuring and troubleshooting monitoring routes and Firewalls in an enterprise network environment.
- Provides on-call support addressing monitoring related maintenance and outages.
- Configured MySQL Database to store Hive metadata.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Worked with Linux systems and MySQL database on a regular basis.
- Supported Map Reduce Programs those ran on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Monitoring cluster job performance and involved capacity planning.
- Works with application teams to install operating system and Hadoop updates, patches, Version upgrades as required.
Environment: HDFS, Hive, Pig, sentry, Kerberos, LDAP, YARN, Ambari, Python.
Hadoop Administrator
Confidential
Responsibilities:
- Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Involved in creating hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Involved in creating workflow to run multiple hive and Pig Jobs, which run independently with time and data availability.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Involved in developing shell scripts and automated data management from end to end integration work.
- Used Pig as an ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Configured and optimized the Cassandra cluster and developed real-time java based application to work along with the Cassandra database.
- Used OOZIE workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and SQOOP.
- Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
- Using HBase to store majority of data which needs to be divided based on region.
Environment: HDFS, HBase, MapReduce, Storm, Zookeeper, Hive, Pig, SQOOP, Cassandra, Spark, Scala, OOZIE, Hue, ETL, Java, JDK, J2EE, Struts.
Java Developer
Confidential
Responsibilities:
- As a programmer, involved in designing and implementation of MVC pattern.
- Extensively used XML where in process details are stored in the database and used the stored XML whenever needed.
- Part of core team to develop process engine.
- Developed Action Classes & Validation Struts framework.
- Created project related documentations like user guides based on role.
- Implemented modules like Client Management, Vendor Management.
- Implemented Access Control Mechanism to provide various access levels to the user.
- Designed and developed the application using J2EE, JSP, Struts, Hibernate, Spring technologies.
- Coded DAO and hibernate implementation Class for data access.
- Coded Springs Services Class and Transfer Objects to pass the data between layers.
- Implemented Web Services using Axis.
- Used different features of Struts like MVC, Validation framework and tag library.
- Created detail design document, Use cases, and Class Diagrams using UML.
- Written ANT scripts to build JAR, WAR and EAR files.
- Developed Standalone Java Component that will interact with Crystal Reports on Crystal Enterprise Server in order to view Reports as well Scheduling of Reports as well storing data as XML and sending data to consumers using SOAP.
- Deployed the application and tested on WebSphere Application Servers.
- Coordinated with the onsite, offshore and QA team to facilitate the quality delivery from offshore on schedule.
Environment: Java, J2EE, Spring, Spring Web Service, JSP, JavaScript, Hibernate, SOAP, CSS, Struts, WebSphere, MQ Series, JUnit, Apache, Windows XP and Linux.