Sr. Hadoop Consultant Resume
Boston, MA
SUMMARY:
- 9+ years of professional experience including around 7 years of Hadoop Administrator and 2 plus years in system administrator
- Experience in architecting, designing, installing, configuring and managing of ApacheHadoop Clusters in MapR, Hortonworks & Cloudera Hadoop Distribution.
- Experience in Configuring and maintaining HA of HDFS, YARN (yet another resource negotiator) Resource Manager, MapReduce, Hive, HBASE and Kafka.
- Practical knowledge on functionalities of every Hadoop daemon, interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient.
- Experience in managing Hadoop infrastructure like commissioning, decommissioning, log rotation, rack topology implementation.
- Experience in understanding and managing Hadoop Log Files.
- Configuring the Zookeeper to coordinate the servers in Clusters and to maintain the Data
- Consistency.
- Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience in collecting the logs from log collector into HDFS using Flume.
- Experience in Kafka multi node cluster setup.
- Experience in setting up and managing the batch scheduler Oozie.
- Extending Hive functionalities by writing custom UDFs.
- Experience in integrating AD/LDAP users with Ambari and Ranger.
- Good experience in implementing Kerberos & Ranger in Hadoop Ecosystem.
- Experience in configuring policies in Ranger to provide the security for Hadoop services (Hive,
- HBase, Hdfsetc.)
- Good Understanding of Rack Awareness in the Hadoop cluster.
- Experience in using Monitoring tools like Cloudera manager and Ambari.
- Experienced in adding/installation and configuring of new services and removal of them through Ambari.
- Experienced in Ambari-alerts configuration for various components and managing the alerts.
- Involved in migration of cluster to AWS.
- Good understanding of Lambda functions.
- Actively worked on enabling ssl for Hadoop services in EMR.
- Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
- Good Understanding of data ingestion pipelines.
- Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
- Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
- Experience on UNIX commands and Shell Scripting.
- Excellent interpersonal, communication, documentation and presentation skills.
PROFESSIONAL EXPERIENCE:
Sr. Hadoop Consultant
Confidential, Boston, MA
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadooptools like Hive, Pig, Hbase, Zookeeper and Sqoop.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- Worked on security components like Kerberos, ranger, sentry, hdfs encryption.
- Created Oozie workflows to automate data ingestion using Sqoop and process incremental log data ingested by Flume using Pig.
- Load log data into HDFS using Flume, Kafka and performing ETL integrations.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Extracted the data from Teradata into HDFS using the Sqoop.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- I was also involved in core components of HDP like Yarn and HDFS, which I was using to get architect platform.
- Complete end to end design and integration of Apache NiFi.
- Involved in setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- I use Phoenix for Hive integration to join huge tables to other huge tables.
- Resolve PHOENIX 3164 bug that affects the Phoenix Query Server deployed.
- With Phoenix Execute aggregate queries through server - side hooks (called co-processors)
- Architecture and designed Hadoop 30 nodes Innovation Cluster with SQRRL, SPARK, Solr, Puppet, HDP 2.2.4
- Managed 900+ Nodes HDP 2.2.4 cluster with 4 petabytes of data using Ambari 2.0 and Linux Cent OS 6.5
- Installed and configured Hortonworks Distribution Platform (HDP 2.3) on Amazon EC2 instances.
- Used Python for instantiating multi-threaded application and running with other applications.
- I use Phoenix for Hive integration to join huge tables to other huge tables.
- Used Restful Web Services API to connect with the MapR table. Involved to create connection to Database was developed through restful web services API.
- Responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
- Monitoring various components using monitoring/logging tools (ex: Icinga, Splunk, Graphite, Prometheus)
- Automating and Monitoring configuration of systems using Puppet and Zenoss.
- Created Splunk app for Enterprise Security to identify and address emerging security threats using continuous monitoring, alerting and analytics.
- Expertise with NoSQL databases like Hbase, Cassandra, DynamoDB (AWS) and MongoDB.
- Involved in helping the UNIX and Splunk administrators to deploy Splunk across the UNIX and windows environment.
- Created interactive dashboards utilizing parameters, actions, and calculated fields utilizing Tableau Desktop.
- Lead Bigdata Hadoop/YARN Operations and managed an off-shore team.
- Provided support on Kerberos related issues and Coordinated Hadoop installations/upgrades and patch installations in the environment.
Environment: Hive, Pig, HBase, Zookeeper and Sqoop, ETL, Azure, Hortonworks, Apache Phoenix, Ambari 2.0, Apache NiFi, Linux Cent OS, HBase, Splunk, Liaison, MongoDB, Elasticsearch, Teradata, Puppet, Kerberos, Kafka, Cassandra, Python, Agile/scrum.
Hadoop Administrator
Confidential, Irving, TX
Responsibilities:
- Managed mission - critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Involved in capacity planning, with reference to the growing data size and the existing cluster size.
- Worked on analysing Hadoop cluster and different big data analytic tools including Pig, HBase, NoSQL, and databases, Flume, Oozie and Sqoop.
- Experience in designing, implementing and maintaining of high performing Bigdata, Hadoop clusters and integrating them with existing infrastructure.
- Deployed the application and tested on Websphere Application Servers.
- Configured SSL for Ambari, Ranger, Hive and Knox.
- Experience in methodologies such as Agile, Scrum, and Test driven development.
- Creating principles for new users in the Kerberos and Implemented and maintained Kerberos cluster and integrated with the Active Directories (AD)
- Worked with data pipeline using Kafka and Storm to store data into Hdfs.
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Involved in migrating java test framework to python flask.
- Shell scripting for Linux/Unix Systems Administration and related tasks. Point of Contact for Vendor escalation.
- Monitoring and analysing MapReduce jobs and look out for any potential issues and address them.
- Collected the logs data from web servers and integrated into HDFS using Flume.
- Moving the data from Oracle, Teradata, MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in Map Reduce way.
- Created Splunk app for Enterprise Security to identify and address emerging security threats using continuous monitoring, alerting and analytics.
- I use Phoenix Query Server to create a new ZooKeeper connection for each "client session".
- With Phoenix Support for updatable view to extend primary key of base table.
- Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
- Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
- Good knowledge in implementing Name Node Federation and High Availability of Name Node and Hadoop Cluster using Zookeeper and Quorum-Journal Manager.
- Good knowledge in adding security to the cluster using Kerberos and Sentry.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.
- Exported the patterns analyzed back to Teradata using Sqoop.
- Hands-On experience in setting up ACL (Access Control Lists) to secure access to the HDFS file system.
- Analyze escalated incidences within the Azure SQL database.
- Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
- Fine Tuned Hadoop cluster by setting proper number of map and reduced slots for the Task Trackers.
- Experience in tuning the heap size to avoid any disk spills and to avoid OOM issues.
- Familiar with job scheduling using Fair Scheduler so that CPU time is well distributed amongst all the jobs.
- Experience managing users and permissions on the cluster, using different authentication methods.
- Involved in regular Hadoop Cluster maintenance such as updating system packages.
- Experience in managing and analysing Hadoop log files to look troubleshooting issues.
Environment: Hadoop, YARN, Hive, HBase, Flume, Cloudera Manager, Apache Phoenix, Kafka, Zookeeper, Oozie and Sqoop, MapReduce, Ambari, HDFS, Teradata Splunk, Elasticsearch, Jenkins, GitHub, Kerberos, MySQL, Apache NiFi, NoSQL, MongoDB, Java, Shell Script, Python.
Hadoop Administrator
Confidential, Greenville, SC
Responsibilities:
- Responsible for architecting Hadoop cluster.
- Involved in source system analysis, data analysis, data modelling to ETL (Extract, Transform and Load) and HiveQL
- Strong Experience in Installation and configuration of Hadoop ecosystem like Yarn, HBase, Flume, Hive, Pig, Sqoop.
- Expertise in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Manage and review Hadoop Log files.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Worked extensively with Sqoop for importing data.
- Designed a data warehouse using Hive.
- Created partitioned tables in Hive.
- Mentored analyst and test team for writing Hive Queries.
- Extensively used Pig for data cleansing.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Troubleshoot network connectivity issues within the monitoring environment.
- Work with various support teams optimizing their monitoring offering packages.
- Collaborate with the Windows/UNIX Administration team by assisting them in configuring. And troubleshooting O/S and H/W monitoring related issues.
- Worked on Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Build and published customized interactive reports and dashboards using Tableau server.
- Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
- Worked on pulling the data from relational databases, Hive into the Hadoop cluster using the Sqoop import for visualization and analysis.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Environment: Hadoop, HDFS, Map-Reduce, hive, Hortonworks, pig, Kafka, Oozie, Zookeeper, Sqoop, Nagios, MySQL, NoSQL, MongoDB, Java.
Hadoop Administrator
Confidential, Buffalo, NY
Responsibilities:
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Installed and configured Hadoop and Ecosystem components in Hortonworks environments. Configured Hadoop, Hive and Pig on Amazon EC2 servers.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2 cluster and Implemented Sentry for the Dev Cluster.
- Work directly with the Networking team and Build team on configuring and troubleshooting monitoring routes and Firewalls in an enterprise network environment.
- Provides on - call support addressing monitoring related maintenance and outages.
- Configured MySQL Database to store Hive metadata.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Worked with Linux systems and MySQL database on a regular basis.
- Supported Map Reduce Programs those ran on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Monitoring cluster job performance and involved capacity planning.
- Works with application teams to install operating system and Hadoop updates, patches, Version upgrades as required.
Environment: HDFS, Hive, Pig, sentry, Kerberos, LDAP, YARN, Ambari, Python.
Hadoop Administrator
Confidential
Responsibilities:
- Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
- Involved in creating hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Involved in creating workflow to run multiple hive and Pig Jobs, which run independently with time and data availability.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Involved in developing shell scripts and automated data management from end to end integration work.
- Used Pig as an ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Configured and optimized the Cassandra cluster and developed real-time java based application to work along with the Cassandra database.
- Used OOZIE workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and SQOOP.
- Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
- Using HBase to store majority of data which needs to be divided based on region.
Environment: HDFS, HBase, MapReduce, Storm, Zookeeper, Hive, Pig, SQOOP, Cassandra, Spark, Scala, OOZIE, Hue, ETL, Java, JDK, J2EE, Struts
System Admin
Confidential
Responsibilities:
- Install and maintain all server hardware and software systems and administer all server performance and ensure availability for same.
- Monitor everyday systems and evaluate availability of all server resources and perform all activities for Linux servers.
- Maintain and monitor all system frameworks and provide after call support to all systems and maintain optimal Linux knowledge.
- Perform tests on all new software and maintain patches for management services and perform audit on all security processes.
- Gathered test data requirements for data conditioning from Business Units to test total application functionality.
- Developed Automation Scripts using shell scripts to check the log files size, and report the application.
- Responsible in writing the cron jobs to start the processes at regular intervals.
- Involved in Database Testing Using SQL to pull data from database and check whether it matches with GUI.
Environment: RedHat Linux 6.3, MySQL, VMware, Shell, Perl.