Hadoop Administrator/Big Data Engineer Resume Englewood, CO - Hire IT People

SUMMARY

Around 10+ years of IT Operations experience with 7+ years of experience in Hadoop Administration, Developer and Architect with 2+ years’ experience in Linux based systems.
4+ Years of experience in ElasticSearch with cluster maintenance, rolling upgrades and creating Kibana dashboards for data visualization.
Excellent understanding of Distributed Systems and Parallel Processing Architecture.
Worked on components like HDFS, Map/Reduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, Fair Scheduler, Spark, Storm, SmartSence and Kafka.
Experience in managing Cloudera, Hortonworks, MapR Distributions.
Expert in Secure Kafka cluster with Kerberos, Ranger and SSL.
Implemented real time data processing using Kafka and Storm topology into CosmosDB.
Strong knowledge on Hadoop HDFS architecture and Map - Reduce framework.
Secured Hadoop cluster from unauthorized access by Kerberos, LDAP integration and SSL for data transfer among cluster nodes.
Experience in Hadoop with Kerberos, TLS, and HDFS encryption
Expert level in architecting, building, and maintaining Enterprise grade Hadoop.
Expert in Securing Kafka cluster with Kerberos, Ranger and SSL.
Successfully Implemented on BlockChain Project technology with IBM.
Experience in setting up Spark standalone clusters on-premises and on cloud platforms - DataProc, EMR.
Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format.
Managed Spark cluster environments, on bare-metal and container infrastructure, including service allocation and configuration for the cluster, capacity planning, performance tuning, and ongoing monitoring.
Setup Apache Zeppelin with adding Hive and db2 Interpreter.
Successfully Deployed Elasticsearch 5.3.0 and Kibana with Ansible in Production Cluster.
Upgraded Elasticsearch Cluster from 5.3 TO 6.5, also did the rolling upgrade from 6.4.3 to 6.8.11 & 7.9.
Expert in enabling ElasticSearch X-pack Security.
Upgraded Docker from 1.8 to 18.09.2 Version.
Involved in vendor selection and capacity planning for the Hadoop cluster.
Experience in Administering the Linux systems to deploy Hadoop cluster and monitoring using Nagios and Ganglia.
Familiar in creating playbook script for Automated Installation with Ansible for Multi Node Cluster setup.
Experience in performing backup, recovery, failover, and DR practices on multiple platforms.
Experience with automation for provisioning system resources using puppet.
Strong knowledge in configuring Name Node High Availability and Name Node Federation.
Experienced in writing the automation scripts for monitoring the file systems.
Implementing Hadoop based solution to store archives and backups from multiple sources.
Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata using fast loaders and connectors Experience.
Worked with architecture team in building Hadoop hardware and software design.
Configured and Implemented Amazon AWS.
Familiar in implementing in Azure-Omni platform.
Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, OpenStack, HDInsight Azure.
Hands on experience in using various Hadoop distros (Cloudera (CDH 4/CDH 5), Hortonworks, Map-R, IBM Big Insights, Apache, and Amazon EMR Hadoop distributions.
Built ingestion framework using flume for streaming logs and aggregating the data into HDFS.
Worked with application team to provide operational support, install Hadoop updates, patches, and version upgrades.
Experience in installation, upgrading, configuration, monitoring, supporting, and managing in Hadoop Clusters using Cloudera CDH4, CDH5, Hortonworks HDP 2.1, 2.2 and 2.3, 2.4, 2.6 & 3.1 on Ubuntu, Redhat, and Cento’s systems.
Experience with NIFI specifically with developing custom processors and workflows.
Experience in Installing and monitoring standalone multi-node Clusters of Kafka.
Experience in Performance tuning of Apache Kafka.
Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and HBase.
Exposure in installing Hadoop and its ecosystem components such as Hive and Pig.
Experience in systems & network design, physical system consolidation through server and storage virtualization, remote access solutions.
Experience in understanding and managing Hadoop Log Files.
Worked on VBLOCK Server is by VCE from CISCO PROVIDER configured DATABASE SERVERS on it.
Experience in Data Modeling (Logical and Physical Design of Databases), Normalization and building Referential Integrity Constraints.
Worked with highly transactional merchandise and investment in SQL databases with PCI compliance involving data encryption with certificates and security keys Confidential various levels.
Experience in upgrading SQL server software, service packs and patches.
Actively involved in System Performance by tuning SQL queries and stored procedures by using SQL Profiler, Database Engine Tuning Advisor.
Experience in production support for 24x7 over weekends on rotation basis.

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie, Kafka, Storm, YARN, flume, SmartSence, Impala, Knox, Ranger, Spark, Ganglia, Zeppelin, Superset

Hadoop Distributions: Cloudera, Hortonworks, MapR.

Database: SQL, Oracle, NOSQL, MySQL, MongoDB, Cosmos DB, Cassandra, HBase.

Operating System: Windows 2000/2003/2008/2012, Linux (CentOS, Ubuntu Red hat).

Programming Languages: JAVA, JavaScript, C, C++, SQL, T-SQL, PL/SQL.

Scripting: Power Shell 3.0/2.0. UNIX Shell Scripting, Python

ETL Tools: DTS, SSIS, Informatic, Sqoop.

Tools: SCOM, NetMon, SMB, SFTP, SQL Sentry.

PROFESSIONAL EXPERIENCE

Confidential - Englewood, CO

Hadoop Administrator/Big Data Engineer

Responsibilities:

Worked in Setting up New Kafka cluster with HDP 3.1.5 with integrating Kerberos, Ranger and SSL.
Implemented Advertiser listeners for External Usage in Kafka.
Upgraded Elasticsearch from 6.4.3 to 7.9 with enabling x-pack features.
Worked on Blockchain Project technology with IBM.
Implemented Ranger, Kerberos, SSL on Kafka cluster.
Successfully implemented Kafka-reassign/re-balancing Kafka partition in live traffic.
Worked in Maintenance on Elasticsearch Cluster.
Successfully upgraded Secure cluster form HDP 2.6.3 to 3.1.5.
Installed CDP 7.1.2 in Datacenter.
Migrated Hadoop cluster from HDP 3.1.5 to CDP 7.1.2.
Manage Hadoop environments and perform setup, administration, and monitoring tasks.
Performs root cause analysis on failed components and implements corrective measures.
Implemented Kafka monitoring by adding Burrow and Influx Db.
Worked on creating comprehensive MongoDB API and Document DB API using Storm into Azure Cosmos DB.
Setup cross realm trust between two MIT KDC in HDP version2.6.3 and 3.1.5.
Worked on Telemetry Projects within Confidential .
Created Custom Spout and Bolt in Storm application by into Cosmos DB according to the business rules.
Installed and Setup Zeppelin with hive/jdbc interpreter.
Worked on Storm-Mongo DB design to map Strom tuple values to either an update operation or an insert.
Implemented real time processing using HDInsight Kafka and Storm topology.
Added Parallelism to Storm Topology (Worker, Processes Executors, Tasks) and implemented Acking method.
Worked on in setting up Kafka and Storm clusters in Azure.
Created and built multiple Docker Swarm Mangers by deploying Java application in Docker Container by using Jenkins.
Upgraded Docker from 1.8 to 18.09.
Provide solutions for data engineers such as NIFI processors and workflows.
Used various Core Java concepts such as Multi-Threading, Serialization, Garbage Collection, Exception Handling, Collection API's to implement various features and enhancements.
Worked on idle timeout issues in Mongo DB by adding pool size.
Successfully did rolling upgrade form HDP 3.0 to 3.1.5
Worked on Kafka cluster by using Mirror Maker to copy to the Kafka cluster on Azure.
Worked on IBM MQ Adaptor in pulling data to Kafka cluster.
Debug Mongo DB API issues for Mongo-java-driver 3.6.1 version.
Worked on JSON and Edifact format data and manipulated using Storm Application.
Implemented Kafka ACL’S and successfully tested with anonymous users and with different hostnames.
Involved Apache Storm, created a Topology that runs continuously over a stream of incoming data.
Using HDInsight Storm, Created Topology in ingesting data from HDInsight Kafka and writes data to Cosmos DB.
Implemented Kafka Connect Fetching data from MQ to Kafka Broker.
Worked on in creating Cosmos DB, Document DB and Mongo DB.
Work on External and Internal Table with Hive.
Implemented AppD V 2.4 and deployed through Jenkins.
Worked on Kafka Backup Index, Log4j appended minimized logs and Pointed Ambari server logs to NAS Storage.
Worked on SNMP Trap Issues in Production Cluster.
Experience in writing test cases in JUnit for unit testing of classes and worked on Fortify Scanner for security.

Confidential - Englewood, CO

Hadoop Administration- Kafka

Responsibilities:

Designed, Implemented and Configuring Topics and Partitions in new Kafka cluster in all environments.
Successfully secured Kafka cluster with Kerberos.
Involved in Strom Batch-Mode Processing over massive data sets analogous to a Hadoop job that runs as a batch process over a fixed data set.
Implemented Kafka Security Features using SSL and without Kerberos, with more grain-fines Security to have users and groups to enable advanced security features.
Tested Advertiser Listener property through zookeeper data for securing Kafka brokers.
Deployed Spark Cluster and other services in AWS using console.
Installed Kerberos secured Kafka cluster with no encryption in all environments.
Successful in setting up, a no authentication Kafka listener in parallel with Kerberos (SASL) Listener.
Tested Non-authenticated user (Anonymous user) in parallel with Kerberos user.
Integrated LDAP Configuration for securing Ambari servers and Manage Authorization and securing with permissions against users and Groups.
Installed HDP 2.6 in all environments.
Installed Ranger in all environments for Second Level of security in Kafka Broker.
Worked on Oozie Job Scheduler.
Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format
Installed Docker for Utilizing ELK, Influx DB, and Kerberos.
Created Influx DB for Kafka metrices to monitor Consumer Lags in Grafana.
Created Bash Script with AWK formatted text to send metrics to Influx DB.
Enabled Influx DB and Configured Influx database source into Grafana interface.
Succeeded in deploying Elasticsearch 5.3.0, Influx DB 1.2 in a Docker container.
Installed Ansible by installing Elasticsearch Nodes in multiple environments with automated scripts.
Created a Cron Job to execute a program that will start the ingestion process. The Data is read in, converted to Avro, and written to the HDFS files.
Designed Data Flow Ingestion Chart Process.
Set up a new Grafana Dashboard with real-time consumer lags in all environments pulling only consumer lags metrices and sending them to influx DB (Via a script in Corntab).
Worked on DDL-Oracle Schema issues Confidential time of Ambari upgrade.
Successfully Upgraded HDP 2.5 to 2.6 in all Environments and Software patches.
Tested all services like Hadoop, ZK, Spark, Hive SERVER & Hive MetaStore.
Worked on heap optimization and configurations for hardware optimization.
Involved working in Production Ambari Views.
Implemented Rack Awareness in Production Environment.
Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce (EMR) on (EC2).
Worked on Disk space issues in Production Environment by monitoring how fast that space is filled, review what is being logged created a long-term fix for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
Worked on Nagios Monitoring tool.
Installed Kafka Manager for consumer lags and for Monitoring Kafka Metrices by adding topics, Partitions etc.
Involved with Hortonworks Support team on Grafana Consumer Lags Issues. (Currently no consumer lags are generating in Grafana Visualization within HDP).
Successfully Generated Consumer Group lags from Kafka using API.
Installed and configured Ambari Log Search under the hood it required a SolR Instance, that can collect and index all cluster generated logs in real time and display them in one interface.
Setup Ansible 2.3.0 in installing ElasticSearch.
Worked on maintenance of Elasticsearch Cluster by adding more partitioned disks. This will increase disk writing throughput and enable Elasticsearch to write to multiple disk in same time and a segment of given Shard is written to the same disk.
Worked on Spark cluster security, networking connectivity and IO throughput along with other factors that affect distributed system performance.
Upgraded Elasticsearch from 5.3.0 to 5.3.2 following the rolling upgrade process by using ansible to deploy new packages in all Clusters.
Successfully Made some visualization on Kibana and deployed Kibana with Ansible and connected to Elasticsearch Cluster.
Tested Kibana and ELK by creating a test index and injected sample data.
Successfully tested Kafka ACL’s with Anonymous users and with different hostnames.
Worked on HBase to store variable data formats of data coming from different applications.
Worked on 24X7 Production Support Issues.

Environment: Kafka Brokers, Kafka Security, Kerberos, ACL, ElasticSearch, Kibana, Ambari Log Search, Nagios, Kafka Manger, Grafana, YARN, Spark, Rangers.

Confidential - Richmond, VA

BigData Engineer -Hadoop Administrator

Responsibilities:

Responsible for Implementation and Support of the Enterprise Hadoop Environment.
Installation, Configuration, Hadoop Cluster and Maintenance, Cluster Monitoring, Troubleshooting and Certifying Environments for production readiness.
Experience in Implementing Hadoop Cluster Capacity Planning.
Involved in the installation of CDH5 and Up-grades from CDH4 to CDH5.
Cloudera Manager Upgrade from 5.3. to 5.5 version.
Responsible on-boarding New Users to Hadoop Cluster (adding user a home directory and providing access to datasets).
Helped users in production deployments throughout the process.
Installed and configured apache airflow for workflow management and created workflows in python.
Managed and reviewed Hadoop Log files as part of administration for troubleshooting purposes, Communicate and Escalate issues appropriately.
Responsible for building Scalable Distributed data solutions using Hadoop.
Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
Installed Oozie workflow Engine to run multiple Hive and Pig jobs, runs independently with time and data availability.
Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest customer behavioral data into HDFS for analysis.
Integrated Hadoop with Active Directory and Enabled Kerberos for Authentication.
Upgraded Cloudera Hadoop Ecosystems in the cluster using Cloudera distribution packages.
Experienced in stress and performance testing, benchmark for the cluster.
Commissioned and Decommissioned the Data Nodes in the cluster in case of problems.
Debug and Solve the key issues with Cloudera manager by interacting with the Cloudera team.
Monitoring the System Activity, Performance, Resource Utilization.
Deep understanding of Monitoring and Troubleshooting mission critical Linux machines.
Used Kafka for building real-time data pipelines between clusters.
Executed Log Aggregations, Website Activity Tracking and Commit log for Distributed Systems using Apache Kafka.
Focused on High-availability, Fault tolerance and Auto-Scaling.
Managed Critical bundles and patches on Production Servers.
Managed Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID Configurations.
Integrated Apache Kafka for data ingestion.
Configured Domain Name System (DNS) for Hostname to IP resolution.
Involved in data migration from Oracle database to MongoDB.
Queried and Analyzed data from Cassandra for Quick Searching, Sorting and Grouping.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Preparation of operational testing scripts for Log Check, Backup and Recovery and Failover.
Troubleshooting and Fixing issues Confidential User, System and Network levels.
Performed all System administration tasks like Corn jobs, installing packages, and patches.

Environment: YUM, RAID, MYSQL 5.1.4, PHP, SHELL SCRIPT, MYSQL, WORKBENCH, LINUX 5.0, 5.1, YUM, RAID, Flume, Oozie, Pig, Sqoop, Mongo, Map-Reduce, YARN, and Cloudera Manager, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Hive, Zookeeper and HBase, Windows 2000/2003 Unix Linux Java, RMAN, Golden Gate, Redhat, Cassandra, EM Cloud Control, Apache Hadoop, Airflow, Toad

Confidential - Atlanta, GA

BigData Operations Engineer - Consultant

Responsibilities:

Cluster Administration, Releases and Upgrades, Managed Multiple Hadoop Clusters with highest capacity of 7 PB (400+ nodes) while working on Hortonworks Distribution.
Responsible for Implementation and Ongoing Administration of Hadoop Infrastructure.
Used Hadoop cluster as a Staging Environment from Heterogeneous sources in Data Import Process.
Configured High Availability on the name node for the Hadoop cluster - part of the Disaster Recovery Roadmap.
Configured Ganglia and Nagios to monitor clusters.
Involved working on Cloud architecture.
Performed both Major and Minor upgrades to the existing clusters and rolling back to the previous version.
Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
Maintained, audited and built new clusters for testing purposes using Ambari and Hortonworks.
Designed and Allocated HDFS quotes for multiple groups.
Configured Flume for efficiently collecting, aggregating, and moving enormous amounts of log Data from many diverse sources to HDFS.
Upgrade HDP 2.2 to 2.3 Manually in Software Patches and Upgrades.
Scripting Hadoop Package Installation and Configuration to support fully automated deployments.
Configuring Rack Awareness on HDP.
Adding New Nodes to an existing cluster, recovering from a Name Node failure.
Instrumental in building scalable distributed data solutions using Hadoop eco-system.
Adding New Data Nodes when needed and re-balancing the cluster.
Handled import of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted data from MySQL into HDFS using Sqoop.
Involved working in Database Backup and Recovery, Database Connectivity and Security.
Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports for the BI team.
Utilization based on running statistics of Map and Reduce tasks.
Changes to Configuration properties of the Cluster based on volume of the data being processed and Performance of the Cluster.
Inputs to development team regarding efficient Utilization of resources like Memory and CPU.

Environment: Map Reduce, SmartSence, KNOX, MYSQL plus, HDFS, Knox, Ranger Pig Hive HBase Flume Sqoop, Yarn Flume, Kafka.

Confidential - Atlanta, GA

Hadoop Administrator/ Linux Administrator

Responsibilities:

Installation and Configuration of Linux for new build environment.
Day-to- day-user access, Permissions, Installations and Maintenance of Linux Servers.
Created Volume Groups, Logical Volumes and Partitions on Linux servers and mounted File systems.
Experienced in Installation and Configuration of Cloudera CDH4 in all environments.
Resolved tickets on P1 issues and Troubleshoot the errors.
Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage.
Experienced in Maintaining the Hadoop cluster on AWS EMR.
Balancing HDFS Manually to decrease network utilization and increase job performance.
Responsible for building Scalable distributed data solutions using Hadoop.
Done major and minor upgrades to the Hadoop cluster.
Upgraded Cloudera Hadoop ecosystems in cluster using Cloudera distribution packages.
Use of Sqoop to Import and Export data from HDFS to RDBMS vice-versa.
Experienced in stress and performance testing, benchmark for the cluster.
Commissioned and Decommissioned the Data Nodes in cluster in case of problems.
Installed Centos using Pre-Execution environment boot and Kick start method on Multiple Servers, Remote installation of Linux using PXE boot.
Monitoring System activity, Performance, Resource Utilization.
Develop and optimize physical design of MySQL database systems.
Deep understanding of Monitoring and Troubleshooting Mission Critical Linux Machines.
Responsible for Maintenance of Raid-Groups, LUN Assignments as per Requirement Documents.
Extensive use of LVM, Creating Volume Groups, Logical volumes.
Performed Red Hat Package Manager (RPM) and YUM package Installations.
Tested and Performed Enterprise wide installation, Configuration and Support for Hadoop Using MapR Distribution.
Setting Up Cluster and Install all the ecosystem components through MapR and Manually through command line in Lab Cluster.
Set up Automated Processes to Archive/Clean data on cluster on Name Node and Secondary Name Node.
Involved in Estimation and setting-up Hadoop Cluster in Linux.
Prepared PIG scripts to validate Time Series Rollup Algorithm.
Responsible for Support, Troubleshooting of Map Reduce Jobs, Pig Jobs.
Maintained Incremental Loads Confidential Daily, Weekly and Monthly Basis.
Implemented Oozie workflows for Map Reduce, Hive and Sqoop Actions.
Performed Scheduled backup and Necessary Restoration.
Build and Maintain Scalable Data Using Hadoop Ecosystem and other open source components like Hive and HBase.
Monitor Data Streaming between Web Sources and HDFS.
Close monitoring and analysis of the Map Reduce job executions on cluster Confidential task level.

Environment: YUM, RAID, MYSQL 5.1.4, PHP, SHELL SCRIPT, MYSQL, WORKBENCH, LINUX 5.0, 5.1, YUM, RAID, AWS EMR, Flume, Oozie, Pig, Sqoop, Mongo, Hbase, Hive, Map-Reduce, YARN, and Cloudera Manager, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Windows 2000/2003 Unix Linux Java, RMAN, Golden Gate, Redhat, Cassandra, EM Cloud Control, Apache Hadoop, Toad

We provide IT Staff Augmentation Services!

Hadoop Administrator/big Data Engineer Resume

Englewood, CO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship