We provide IT Staff Augmentation Services!

Hadoop Administrator/big Data Engineer Resume

4.00/5 (Submit Your Rating)

Englewood, CO

SUMMARY

  • Around 10+ years of IT Operations experience with 7+ years of experience in Hadoop Administration, Developer and Architect with 2+ years’ experience in Linux based systems.
  • 4+ Years of experience in ElasticSearch with cluster maintenance, rolling upgrades and creating Kibana dashboards for data visualization.
  • Excellent understanding of Distributed Systems and Parallel Processing Architecture.
  • Worked on components like HDFS, Map/Reduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, Fair Scheduler, Spark, Storm, SmartSence and Kafka.
  • Experience in managing Cloudera, Hortonworks, MapR Distributions.
  • Expert in Secure Kafka cluster with Kerberos, Ranger and SSL.
  • Implemented real time data processing using Kafka and Storm topology into CosmosDB.
  • Strong knowledge on Hadoop HDFS architecture and Map - Reduce framework.
  • Secured Hadoop cluster from unauthorized access by Kerberos, LDAP integration and SSL for data transfer among cluster nodes.
  • Experience in Hadoop with Kerberos, TLS, and HDFS encryption
  • Expert level in architecting, building, and maintaining Enterprise grade Hadoop.
  • Expert in Securing Kafka cluster with Kerberos, Ranger and SSL.
  • Successfully Implemented on BlockChain Project technology with IBM.
  • Experience in setting up Spark standalone clusters on-premises and on cloud platforms - DataProc, EMR.
  • Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format.
  • Managed Spark cluster environments, on bare-metal and container infrastructure, including service allocation and configuration for the cluster, capacity planning, performance tuning, and ongoing monitoring.
  • Setup Apache Zeppelin with adding Hive and db2 Interpreter.
  • Successfully Deployed Elasticsearch 5.3.0 and Kibana with Ansible in Production Cluster.
  • Upgraded Elasticsearch Cluster from 5.3 TO 6.5, also did the rolling upgrade from 6.4.3 to 6.8.11 & 7.9.
  • Expert in enabling ElasticSearch X-pack Security.
  • Upgraded Docker from 1.8 to 18.09.2 Version.
  • Involved in vendor selection and capacity planning for the Hadoop cluster.
  • Experience in Administering the Linux systems to deploy Hadoop cluster and monitoring using Nagios and Ganglia.
  • Familiar in creating playbook script for Automated Installation with Ansible for Multi Node Cluster setup.
  • Experience in performing backup, recovery, failover, and DR practices on multiple platforms.
  • Experience with automation for provisioning system resources using puppet.
  • Strong knowledge in configuring Name Node High Availability and Name Node Federation.
  • Experienced in writing the automation scripts for monitoring the file systems.
  • Implementing Hadoop based solution to store archives and backups from multiple sources.
  • Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata using fast loaders and connectors Experience.
  • Worked with architecture team in building Hadoop hardware and software design.
  • Configured and Implemented Amazon AWS.
  • Familiar in implementing in Azure-Omni platform.
  • Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, OpenStack, HDInsight Azure.
  • Hands on experience in using various Hadoop distros (Cloudera (CDH 4/CDH 5), Hortonworks, Map-R, IBM Big Insights, Apache, and Amazon EMR Hadoop distributions.
  • Built ingestion framework using flume for streaming logs and aggregating the data into HDFS.
  • Worked with application team to provide operational support, install Hadoop updates, patches, and version upgrades.
  • Experience in installation, upgrading, configuration, monitoring, supporting, and managing in Hadoop Clusters using Cloudera CDH4, CDH5, Hortonworks HDP 2.1, 2.2 and 2.3, 2.4, 2.6 & 3.1 on Ubuntu, Redhat, and Cento’s systems.
  • Experience with NIFI specifically with developing custom processors and workflows.
  • Experience in Installing and monitoring standalone multi-node Clusters of Kafka.
  • Experience in Performance tuning of Apache Kafka.
  • Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and HBase.
  • Exposure in installing Hadoop and its ecosystem components such as Hive and Pig.
  • Experience in systems & network design, physical system consolidation through server and storage virtualization, remote access solutions.
  • Experience in understanding and managing Hadoop Log Files.
  • Worked on VBLOCK Server is by VCE from CISCO PROVIDER configured DATABASE SERVERS on it.
  • Experience in Data Modeling (Logical and Physical Design of Databases), Normalization and building Referential Integrity Constraints.
  • Worked with highly transactional merchandise and investment in SQL databases with PCI compliance involving data encryption with certificates and security keys Confidential various levels.
  • Experience in upgrading SQL server software, service packs and patches.
  • Actively involved in System Performance by tuning SQL queries and stored procedures by using SQL Profiler, Database Engine Tuning Advisor.
  • Experience in production support for 24x7 over weekends on rotation basis.

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie, Kafka, Storm, YARN, flume, SmartSence, Impala, Knox, Ranger, Spark, Ganglia, Zeppelin, Superset

Hadoop Distributions: Cloudera, Hortonworks, MapR.

Database: SQL, Oracle, NOSQL, MySQL, MongoDB, Cosmos DB, Cassandra, HBase.

Operating System: Windows 2000/2003/2008/2012, Linux (CentOS, Ubuntu Red hat).

Programming Languages: JAVA, JavaScript, C, C++, SQL, T-SQL, PL/SQL.

Scripting: Power Shell 3.0/2.0. UNIX Shell Scripting, Python

ETL Tools: DTS, SSIS, Informatic, Sqoop.

Tools: SCOM, NetMon, SMB, SFTP, SQL Sentry.

PROFESSIONAL EXPERIENCE

Confidential - Englewood, CO

Hadoop Administrator/Big Data Engineer

Responsibilities:

  • Worked in Setting up New Kafka cluster with HDP 3.1.5 with integrating Kerberos, Ranger and SSL.
  • Implemented Advertiser listeners for External Usage in Kafka.
  • Upgraded Elasticsearch from 6.4.3 to 7.9 with enabling x-pack features.
  • Worked on Blockchain Project technology with IBM.
  • Implemented Ranger, Kerberos, SSL on Kafka cluster.
  • Successfully implemented Kafka-reassign/re-balancing Kafka partition in live traffic.
  • Worked in Maintenance on Elasticsearch Cluster.
  • Successfully upgraded Secure cluster form HDP 2.6.3 to 3.1.5.
  • Installed CDP 7.1.2 in Datacenter.
  • Migrated Hadoop cluster from HDP 3.1.5 to CDP 7.1.2.
  • Manage Hadoop environments and perform setup, administration, and monitoring tasks.
  • Performs root cause analysis on failed components and implements corrective measures.
  • Implemented Kafka monitoring by adding Burrow and Influx Db.
  • Worked on creating comprehensive MongoDB API and Document DB API using Storm into Azure Cosmos DB.
  • Setup cross realm trust between two MIT KDC in HDP version2.6.3 and 3.1.5.
  • Worked on Telemetry Projects within Confidential .
  • Created Custom Spout and Bolt in Storm application by into Cosmos DB according to the business rules.
  • Installed and Setup Zeppelin with hive/jdbc interpreter.
  • Worked on Storm-Mongo DB design to map Strom tuple values to either an update operation or an insert.
  • Implemented real time processing using HDInsight Kafka and Storm topology.
  • Added Parallelism to Storm Topology (Worker, Processes Executors, Tasks) and implemented Acking method.
  • Worked on in setting up Kafka and Storm clusters in Azure.
  • Created and built multiple Docker Swarm Mangers by deploying Java application in Docker Container by using Jenkins.
  • Upgraded Docker from 1.8 to 18.09.
  • Provide solutions for data engineers such as NIFI processors and workflows.
  • Used various Core Java concepts such as Multi-Threading, Serialization, Garbage Collection, Exception Handling, Collection API's to implement various features and enhancements.
  • Worked on idle timeout issues in Mongo DB by adding pool size.
  • Successfully did rolling upgrade form HDP 3.0 to 3.1.5
  • Worked on Kafka cluster by using Mirror Maker to copy to the Kafka cluster on Azure.
  • Worked on IBM MQ Adaptor in pulling data to Kafka cluster.
  • Debug Mongo DB API issues for Mongo-java-driver 3.6.1 version.
  • Worked on JSON and Edifact format data and manipulated using Storm Application.
  • Implemented Kafka ACL’S and successfully tested with anonymous users and with different hostnames.
  • Involved Apache Storm, created a Topology that runs continuously over a stream of incoming data.
  • Using HDInsight Storm, Created Topology in ingesting data from HDInsight Kafka and writes data to Cosmos DB.
  • Implemented Kafka Connect Fetching data from MQ to Kafka Broker.
  • Worked on in creating Cosmos DB, Document DB and Mongo DB.
  • Work on External and Internal Table with Hive.
  • Implemented AppD V 2.4 and deployed through Jenkins.
  • Worked on Kafka Backup Index, Log4j appended minimized logs and Pointed Ambari server logs to NAS Storage.
  • Worked on SNMP Trap Issues in Production Cluster.
  • Experience in writing test cases in JUnit for unit testing of classes and worked on Fortify Scanner for security.

Confidential - Englewood, CO

Hadoop Administration- Kafka

Responsibilities:

  • Designed, Implemented and Configuring Topics and Partitions in new Kafka cluster in all environments.
  • Successfully secured Kafka cluster with Kerberos.
  • Involved in Strom Batch-Mode Processing over massive data sets analogous to a Hadoop job that runs as a batch process over a fixed data set.
  • Implemented Kafka Security Features using SSL and without Kerberos, with more grain-fines Security to have users and groups to enable advanced security features.
  • Tested Advertiser Listener property through zookeeper data for securing Kafka brokers.
  • Deployed Spark Cluster and other services in AWS using console.
  • Installed Kerberos secured Kafka cluster with no encryption in all environments.
  • Successful in setting up, a no authentication Kafka listener in parallel with Kerberos (SASL) Listener.
  • Tested Non-authenticated user (Anonymous user) in parallel with Kerberos user.
  • Integrated LDAP Configuration for securing Ambari servers and Manage Authorization and securing with permissions against users and Groups.
  • Installed HDP 2.6 in all environments.
  • Installed Ranger in all environments for Second Level of security in Kafka Broker.
  • Worked on Oozie Job Scheduler.
  • Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format
  • Installed Docker for Utilizing ELK, Influx DB, and Kerberos.
  • Created Influx DB for Kafka metrices to monitor Consumer Lags in Grafana.
  • Created Bash Script with AWK formatted text to send metrics to Influx DB.
  • Enabled Influx DB and Configured Influx database source into Grafana interface.
  • Succeeded in deploying Elasticsearch 5.3.0, Influx DB 1.2 in a Docker container.
  • Installed Ansible by installing Elasticsearch Nodes in multiple environments with automated scripts.
  • Created a Cron Job to execute a program that will start the ingestion process. The Data is read in, converted to Avro, and written to the HDFS files.
  • Designed Data Flow Ingestion Chart Process.
  • Set up a new Grafana Dashboard with real-time consumer lags in all environments pulling only consumer lags metrices and sending them to influx DB (Via a script in Corntab).
  • Worked on DDL-Oracle Schema issues Confidential time of Ambari upgrade.
  • Successfully Upgraded HDP 2.5 to 2.6 in all Environments and Software patches.
  • Tested all services like Hadoop, ZK, Spark, Hive SERVER & Hive MetaStore.
  • Worked on heap optimization and configurations for hardware optimization.
  • Involved working in Production Ambari Views.
  • Implemented Rack Awareness in Production Environment.
  • Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce (EMR) on (EC2).
  • Worked on Disk space issues in Production Environment by monitoring how fast that space is filled, review what is being logged created a long-term fix for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
  • Worked on Nagios Monitoring tool.
  • Installed Kafka Manager for consumer lags and for Monitoring Kafka Metrices by adding topics, Partitions etc.
  • Involved with Hortonworks Support team on Grafana Consumer Lags Issues. (Currently no consumer lags are generating in Grafana Visualization within HDP).
  • Successfully Generated Consumer Group lags from Kafka using API.
  • Installed and configured Ambari Log Search under the hood it required a SolR Instance, that can collect and index all cluster generated logs in real time and display them in one interface.
  • Setup Ansible 2.3.0 in installing ElasticSearch.
  • Worked on maintenance of Elasticsearch Cluster by adding more partitioned disks. This will increase disk writing throughput and enable Elasticsearch to write to multiple disk in same time and a segment of given Shard is written to the same disk.
  • Worked on Spark cluster security, networking connectivity and IO throughput along with other factors that affect distributed system performance.
  • Upgraded Elasticsearch from 5.3.0 to 5.3.2 following the rolling upgrade process by using ansible to deploy new packages in all Clusters.
  • Successfully Made some visualization on Kibana and deployed Kibana with Ansible and connected to Elasticsearch Cluster.
  • Tested Kibana and ELK by creating a test index and injected sample data.
  • Successfully tested Kafka ACL’s with Anonymous users and with different hostnames.
  • Worked on HBase to store variable data formats of data coming from different applications.
  • Worked on 24X7 Production Support Issues.

Environment: Kafka Brokers, Kafka Security, Kerberos, ACL, ElasticSearch, Kibana, Ambari Log Search, Nagios, Kafka Manger, Grafana, YARN, Spark, Rangers.

Confidential - Richmond, VA

BigData Engineer -Hadoop Administrator

Responsibilities:

  • Responsible for Implementation and Support of the Enterprise Hadoop Environment.
  • Installation, Configuration, Hadoop Cluster and Maintenance, Cluster Monitoring, Troubleshooting and Certifying Environments for production readiness.
  • Experience in Implementing Hadoop Cluster Capacity Planning.
  • Involved in the installation of CDH5 and Up-grades from CDH4 to CDH5.
  • Cloudera Manager Upgrade from 5.3. to 5.5 version.
  • Responsible on-boarding New Users to Hadoop Cluster (adding user a home directory and providing access to datasets).
  • Helped users in production deployments throughout the process.
  • Installed and configured apache airflow for workflow management and created workflows in python.
  • Managed and reviewed Hadoop Log files as part of administration for troubleshooting purposes, Communicate and Escalate issues appropriately.
  • Responsible for building Scalable Distributed data solutions using Hadoop.
  • Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
  • Installed Oozie workflow Engine to run multiple Hive and Pig jobs, runs independently with time and data availability.
  • Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest customer behavioral data into HDFS for analysis.
  • Integrated Hadoop with Active Directory and Enabled Kerberos for Authentication.
  • Upgraded Cloudera Hadoop Ecosystems in the cluster using Cloudera distribution packages.
  • Experienced in stress and performance testing, benchmark for the cluster.
  • Commissioned and Decommissioned the Data Nodes in the cluster in case of problems.
  • Debug and Solve the key issues with Cloudera manager by interacting with the Cloudera team.
  • Monitoring the System Activity, Performance, Resource Utilization.
  • Deep understanding of Monitoring and Troubleshooting mission critical Linux machines.
  • Used Kafka for building real-time data pipelines between clusters.
  • Executed Log Aggregations, Website Activity Tracking and Commit log for Distributed Systems using Apache Kafka.
  • Focused on High-availability, Fault tolerance and Auto-Scaling.
  • Managed Critical bundles and patches on Production Servers.
  • Managed Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID Configurations.
  • Integrated Apache Kafka for data ingestion.
  • Configured Domain Name System (DNS) for Hostname to IP resolution.
  • Involved in data migration from Oracle database to MongoDB.
  • Queried and Analyzed data from Cassandra for Quick Searching, Sorting and Grouping.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Preparation of operational testing scripts for Log Check, Backup and Recovery and Failover.
  • Troubleshooting and Fixing issues Confidential User, System and Network levels.
  • Performed all System administration tasks like Corn jobs, installing packages, and patches.

Environment: YUM, RAID, MYSQL 5.1.4, PHP, SHELL SCRIPT, MYSQL, WORKBENCH, LINUX 5.0, 5.1, YUM, RAID, Flume, Oozie, Pig, Sqoop, Mongo, Map-Reduce, YARN, and Cloudera Manager, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Hive, Zookeeper and HBase, Windows 2000/2003 Unix Linux Java, RMAN, Golden Gate, Redhat, Cassandra, EM Cloud Control, Apache Hadoop, Airflow, Toad

Confidential - Atlanta, GA

BigData Operations Engineer - Consultant

Responsibilities:

  • Cluster Administration, Releases and Upgrades, Managed Multiple Hadoop Clusters with highest capacity of 7 PB (400+ nodes) while working on Hortonworks Distribution.
  • Responsible for Implementation and Ongoing Administration of Hadoop Infrastructure.
  • Used Hadoop cluster as a Staging Environment from Heterogeneous sources in Data Import Process.
  • Configured High Availability on the name node for the Hadoop cluster - part of the Disaster Recovery Roadmap.
  • Configured Ganglia and Nagios to monitor clusters.
  • Involved working on Cloud architecture.
  • Performed both Major and Minor upgrades to the existing clusters and rolling back to the previous version.
  • Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Maintained, audited and built new clusters for testing purposes using Ambari and Hortonworks.
  • Designed and Allocated HDFS quotes for multiple groups.
  • Configured Flume for efficiently collecting, aggregating, and moving enormous amounts of log Data from many diverse sources to HDFS.
  • Upgrade HDP 2.2 to 2.3 Manually in Software Patches and Upgrades.
  • Scripting Hadoop Package Installation and Configuration to support fully automated deployments.
  • Configuring Rack Awareness on HDP.
  • Adding New Nodes to an existing cluster, recovering from a Name Node failure.
  • Instrumental in building scalable distributed data solutions using Hadoop eco-system.
  • Adding New Data Nodes when needed and re-balancing the cluster.
  • Handled import of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted data from MySQL into HDFS using Sqoop.
  • Involved working in Database Backup and Recovery, Database Connectivity and Security.
  • Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Utilization based on running statistics of Map and Reduce tasks.
  • Changes to Configuration properties of the Cluster based on volume of the data being processed and Performance of the Cluster.
  • Inputs to development team regarding efficient Utilization of resources like Memory and CPU.

Environment: Map Reduce, SmartSence, KNOX, MYSQL plus, HDFS, Knox, Ranger Pig Hive HBase Flume Sqoop, Yarn Flume, Kafka.

Confidential - Atlanta, GA

Hadoop Administrator/ Linux Administrator

Responsibilities:

  • Installation and Configuration of Linux for new build environment.
  • Day-to- day-user access, Permissions, Installations and Maintenance of Linux Servers.
  • Created Volume Groups, Logical Volumes and Partitions on Linux servers and mounted File systems.
  • Experienced in Installation and Configuration of Cloudera CDH4 in all environments.
  • Resolved tickets on P1 issues and Troubleshoot the errors.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage.
  • Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Balancing HDFS Manually to decrease network utilization and increase job performance.
  • Responsible for building Scalable distributed data solutions using Hadoop.
  • Done major and minor upgrades to the Hadoop cluster.
  • Upgraded Cloudera Hadoop ecosystems in cluster using Cloudera distribution packages.
  • Use of Sqoop to Import and Export data from HDFS to RDBMS vice-versa.
  • Experienced in stress and performance testing, benchmark for the cluster.
  • Commissioned and Decommissioned the Data Nodes in cluster in case of problems.
  • Installed Centos using Pre-Execution environment boot and Kick start method on Multiple Servers, Remote installation of Linux using PXE boot.
  • Monitoring System activity, Performance, Resource Utilization.
  • Develop and optimize physical design of MySQL database systems.
  • Deep understanding of Monitoring and Troubleshooting Mission Critical Linux Machines.
  • Responsible for Maintenance of Raid-Groups, LUN Assignments as per Requirement Documents.
  • Extensive use of LVM, Creating Volume Groups, Logical volumes.
  • Performed Red Hat Package Manager (RPM) and YUM package Installations.
  • Tested and Performed Enterprise wide installation, Configuration and Support for Hadoop Using MapR Distribution.
  • Setting Up Cluster and Install all the ecosystem components through MapR and Manually through command line in Lab Cluster.
  • Set up Automated Processes to Archive/Clean data on cluster on Name Node and Secondary Name Node.
  • Involved in Estimation and setting-up Hadoop Cluster in Linux.
  • Prepared PIG scripts to validate Time Series Rollup Algorithm.
  • Responsible for Support, Troubleshooting of Map Reduce Jobs, Pig Jobs.
  • Maintained Incremental Loads Confidential Daily, Weekly and Monthly Basis.
  • Implemented Oozie workflows for Map Reduce, Hive and Sqoop Actions.
  • Performed Scheduled backup and Necessary Restoration.
  • Build and Maintain Scalable Data Using Hadoop Ecosystem and other open source components like Hive and HBase.
  • Monitor Data Streaming between Web Sources and HDFS.
  • Close monitoring and analysis of the Map Reduce job executions on cluster Confidential task level.

Environment: YUM, RAID, MYSQL 5.1.4, PHP, SHELL SCRIPT, MYSQL, WORKBENCH, LINUX 5.0, 5.1, YUM, RAID, AWS EMR, Flume, Oozie, Pig, Sqoop, Mongo, Hbase, Hive, Map-Reduce, YARN, and Cloudera Manager, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Windows 2000/2003 Unix Linux Java, RMAN, Golden Gate, Redhat, Cassandra, EM Cloud Control, Apache Hadoop, Toad

We'd love your feedback!