Hadoop Administrator/Big Data Engineer Resume Englewood, CO - Hire IT People

SUMMARY:

Around 8+ years of IT Operations experience with 3+ years of experience in Hadoop Administration, Developer and Architect with 2+ years’ experience in Linux based systems.
Excellent understanding of Distributed Systems and Parallel Processing Architecture.
Worked on components like HDFS, Map/Reduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, Fair Scheduler, Spark, Storm, SmartSence and Kafka.
Experience in managing Cloudera, Hortonworks, MapR Distributions.
Implemented real time data processing using Kafka and Storm topology into CosmosDB.
Strong noledge on HadoopHDFS architecture and Map - Reduce framework.
Secured Hadoop cluster from unauthorized access by Kerberos, LDAP integration and SSL for data transfer among cluster nodes.
Expert level in architecting, building and maintaining Enterprise grade Hadoop.
Set up secure Kafka cluster with Kerberos based on autantication in addition to Kafka ACL’S IP Address and Anonymous Users.
Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format.
Successfully Deployed Elasticsearch 5.3.0 and Kibana with Ansible in Production Cluster.
Involved in vendor selection and capacity planning for teh Hadoopcluster.
Experience in Administering teh Linux systems to deploy Hadoopcluster and monitoring using Nagios and Ganglia.
Familiar in creating playbook script for Automated Installation with Ansible for Multi Node Cluster setup.
Experience in performing backup, recovery, failover and DR practices on multiple platforms.
Experience with automation for provisioning system resources using puppet.
Strong noledge in configuring Name Node High Availability and Name Node Federation.
Experienced in writing teh automation scripts for monitoring teh file systems.
Implementing Hadoop based solution to store archives and backups from multiple sources.
Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata using fast loaders and connectors Experience.
Worked with architecture team in building Hadoophardware and software design.
Configured and Implemented Amazon AWS.
Familiar in implementing in Azure-Omni platform.
Experience in deployingHadoopcluster on Public and Private Cloud Environment like Amazon AWS, OpenStack, HDInsight Azure.
Built ingestion framework using flume for streaming logs and aggregating teh data into HDFS.
Worked with application team to provide operational support, installHadoopupdates, patches and version upgrades.
Experience in installation, upgrading, configuration, monitoring, supporting and managing in HadoopClusters using Cloudera CDH4, CDH5, Hortonworks HDP 2.1, 2.2 and 2.3, 2.4 & 2.6 on Ubuntu, Redhat, and Centos systems.
Experience in Installing and monitoring standalone multi-node Clusters of Kafka.
Experience in Performance tuning of Apache Kafka.
Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and Hbase.
Exposure in installingHadoopand its ecosystem components such as Hive and Pig.
Experience in systems & network design, physical system consolidation through server and storage virtualization, remote access solutions.
Experience in understanding and managingHadoopLog Files.
Worked on VBLOCK Server’s by VCE from CISCO PROVIDER configured DATABASE SERVERS on it.
Experience in Data Modeling (Logical and Physical Design of Databases), Normalization and building Referential Integrity Constraints.
Worked with highly transactional merchandise and investment in SQL databases with PCI compliance involving data encryption with certificates and security keys Confidential various levels.
Experience in upgrading SQL server software, service packs and patches.
Actively involved in System Performance by tuning SQL queries and stored procedures by using SQL Profiler, Database Engine Tuning Advisor.
Experience in production support for 24x7 over weekends on rotation basis.

TECHNICAL SKILLS:

Hadoop Ecosystem: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie, Kafka, Storm, YARN, flume, SmartSence, Impala, Knox, Ranger, Spark, Ganglia.

Hadoop Distributions: Cloudera, Hortonworks, MapR.

Database: SQL, Oracle, NOSQL, MySQL, MongoDB, CosmosDB, Cassandra, Hbase.

Operating System: Windows 2000/2003/2008/2012, Linux (CentOS, Ubuntu Redhat).

Programming Languages: JAVA, JavaScript, C, C++, SQL, T-SQL, PL/SQL.

Scripting: Power Shell 3.0/2.0. UNIX Shell Scripting, Python

ETL Tools: DTS, SSIS, Informatic, sqoop.

Tools: SCOM, NetMon, SMB, SFTP, SQL Sentry.

PROFESSIONAL EXPERIENCE

Confidential, Englewood, CO

Hadoop Administrator/Big Data Engineer

Responsibilities:

Implemented real time processing using HDInsight Kafka and Storm topology.
Worked on creating comprehensive MongoDB API and Document DB API using Storm into Azure Cosmos DB.
Created Custom Spout and Bolt in Storm application by into Cosmos DB according to teh business rules.
Worked on Storm-Mongo DB design to map Strom tuple values to either an update operation or an insert.
Added Parallelism to Storm Topology (Worker, Processes Executors, Tasks) and implemented Acking method.
Installed in HadoopClusters using Hortonworks HDP 2.6.
Worked on in setting up Kafka and Storm clusters in Azure.
Created and built multiple Docker Swarm Mangers by deploying Java application in Docker Container by using Jenkins.
Used various Core Java concepts such as Multi-Threading, Serialization, Garbage Collection, Exception Handling, Collection API's to implement various features and enhancements.
Worked on idle timeout issues in Mongo DB by adding pool size.
Worked on Kafka cluster by using Mirror Maker to copy to teh Kafka cluster on Azure.
Worked on IBM MQ Adaptor in pulling data to Kafka cluster.
Debug Mongo DB API issues for Mongo-java-driver 3.6.1 version.
Worked on JSON and Edifact format data and manipulated using Storm Application.
Implemented Kafka ACL’S and successfully tested with anonymous users and with different hostnames.
Implemented Kerberos in HDP Cluster 2.6.
Involved Storm Terminology, created a Topology that runs continuously over a stream of incoming data.
Using HDInsight Storm, Created Topology in ingesting data from HDInsight Kafka and writes data to Cosmos DB.
Implemented Kafka Connect Fetching data from MQ to Kafka Broker.
Worked on in creating Cosmos DB, Document DB and Mongo DB.
Implemented AppD V 2.4 and deployed through Jenkins.
Worked on Kafka Backup Index, Log4j appended minimized logs and Pointed Ambari server logs to NAS Storage.
Worked on SNMP Trap Issues in Production Cluster.
Experience in writing test cases in JUnit for unit testing of classes and worked on Fortify Scanner for security.

Confidential,CO

Hadoop Administration- Kafka

Responsibilities:

Designed, Implemented and Configuring Topics and Partitions in new Kafka cluster in all environments.
Successfully secured Kafka cluster with Kerberos.
Implemented Kafka Security Features using SSL and without Kerberos, with more grain-fines Security to has users and groups to enable advanced security features.
Tested Advertiser Listener property through zookeeper data for securing Kafka brokers.
Implemented AWS and Azure-Omni for teh couchbase Load.
Designed Azure storage for Kafka topics and merge into Couch Base with constant query components.
Deployed Spark Cluster and other services in AWS using console.
Installed Kerberos secured Kafka cluster with no encryption in all environments.
Successful in setting up, a no autantication Kafka listener in parallel with Kerberos (SASL) Listener.
Tested Non-autanticated user (Anonymous user) in parallel with Kerberos user.
Integrated LDAP Configuration for securing Ambari servers and Manage Authorization and securing with permissions against users and Groups.
Implemented KNOX, RANGER, Spark and SmartSence inHadoop cluster.
Installed HDP 2.6 in all environments.
Installed Ranger in all environments for Second Level of security in Kafka Broker.
Worked on Oozie Job Scheduler.
Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format
Experience in gzip data compression Data to Avro Data into HDFS files.
Installed Docker for Utilizing ELK, InfluxDB, and Kerberos.
Created InfluxDB for Kafka metrices to monitor Consumer Lags in Grafana.
Created Bash Script with AWK formatted text to send metrics to InfluxDB.
Enabled InfluxDB and Configured Influx database source into Grafana interface
Succeeded in deploying ElasticSearch 5.3.0, Influx DB 1.2 in a Docker container.
Installed Ansible by installing Elasticsearch Nodes in multiple environments with automated scripts.
Created a Cron Job to execute a program that will start teh ingestion process. Teh Data is read in, converted to Avro, and written to teh HDFS files.
Designed Data Flow Ingestion Chart Process.
Set up a new Grafana Dashboard with real-time consumer lags in all environments pulling only consumer lags metrices and sending them to influx DB (Via a script in Corntab).
Worked on DDL-Oracle Schema issues Confidential time of Ambari upgrade.
Successfully Upgraded HDP 2.5 to 2.6 in all Environments and Software patches.
Tested all services like Hadoop, ZK, Spark, Hive SERVER & Hive MetaStore.
Worked on heap optimization and configurations for hardware optimization.
Involved working in Production Ambari Views.
Implemented Rack Awareness in Production Environment.
Worked on Disk space issues in Production Environment by monitoring how fast that space is filled, review wat is being logged created a long-term fix for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
Worked on Nagios Monitoring tool.
Installed Kafka Manager for consumer lags and for Monitoring Kafka Metrices by adding topics, Partitions etc.
Involved with Hortonworks Support team on Grafana Consumer Lags Issues. (Currently no consumer lags are generating in Grafana Visualization within HDP).
Successfully Generated Consumer Group lags from Kafka using API.
Installed and configured Ambari Log Search under teh hood it required a SolR Instance, that can collect and index all cluster generated logs in real time and display them in one interface.
Installed Ansible 2.3.0 in all environments.
Worked on maintenance of Elasticsearch Cluster by adding more partitioned disks. This will increase disk writing throughput and enable Elasticsearch to write to multiple disk in same time and a segment of given Shard is written to teh same disk.
Upgraded Elasticsearch from 5.3.0 to 5.3.2 following teh rolling upgrade process by using ansible to deploy new packages in all Clusters.
Successfully Made some visualization on Kibana and deployed Kibana with Ansible and connected to Elasticsearch Cluster.
Tested Kibana and ELK by creating a test index and injected sample data.
Successfully tested Kafka ACL’s with Anonymous users and with different hostnames.
Created HBase tables to store variable data formats of data coming from different applications.
Worked on 24X7 Production Support Issues.

Environment: Kafka Brokers, Kafka Security, Kerberos, ACL, ElasticSearch, Kibana, Ambari Log Search, Nagios, Kafka Manger, Grafana, YARN, Spark, Rangers.

Confidential - Richmond, VA

BigData Engineer -Hadoop Administrator

Responsibilities:

Responsible for Implementation and Support of teh Enterprise Hadoop Environment.
Installation, Configuration, Hadoop Cluster and Maintenance, Cluster Monitoring, Troubleshooting and Certifying Environments for production readiness.
Experience in Implementing Hadoop Cluster Capacity Planning.
Involved in teh installation of CDH5 and Up-grades from CDH4 to CDH5.
Cloudera Manager Upgrade from 5.3. to 5.5 version.
Responsible on-boarding New Users to Hadoop Cluster (adding user a home directory and providing access to datasets).
Helped users in production deployments throughout teh process.
Managed and reviewed Hadoop Log files as part of administration for troubleshooting purposes, Communicate and Escalate issues appropriately.
Responsible for building Scalable Distributed data solutions using Hadoop.
Continuous monitoring and managing teh Hadoop cluster through Ganglia and Nagios.
Installed Oozie workflow Engine to run multiple Hive and Pig jobs, runs independently with time and data availability.
Involved in Strom Batch-Mode Processing over massive data sets analogous to a Hadoop job that runs as a batch process over a fixed data set.
Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest customer behavioral data into HDFS for analysis.
IntegratedHadoopwith Active Directory and EnabledKerberosfor Autantication.
Upgraded Cloudera Hadoop Ecosystems in teh cluster using Cloudera distribution packages.
Experienced in stress and performance testing, benchmark for teh cluster.
Commissioned and Decommissioned teh Data Nodes in teh cluster in case of problems.
Debug and Solve teh key issues with Cloudera manager by interacting with teh Cloudera team.
Monitoring teh System Activity, Performance, Resource Utilization.
Deep understanding of Monitoring and Troubleshooting mission critical Linux machines.
Used Kafka for building real-time data pipelines between clusters.
Executed Log Aggregations, Website Activity Tracking and Commit log for Distributed Systems using Apache Kafka.
Focused on High-availability, Fault tolerance and Auto-Scaling.
Managed Critical bundles and patches on Production Servers.
Managed Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID Configurations.
Integrated Apache Kafka for data ingestion.
Configured Domain Name System (DNS) for Hostname to IP resolution.
Involved in data migration from Oracle database to MongoDB.
Queried and Analyzed data fromCassandrafor Quick Searching, Sorting and Grouping.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Preparation of operational testing scripts for Log Check, Backup and Recovery and Failover.
Troubleshooting and Fixing issues Confidential User, System and Network levels.
Performed all Systemadministrationtasks like Corn jobs, installing packages, and patches.

Environment: YUM, RAID, MYSQL 5.1.4, PHP, SHELL SCRIPT, MYSQL, WORKBENCH, LINUX 5.0, 5.1, YUM, RAID,Flume, Oozie, Pig, Sqoop, Mongo, Hbase, Hive, Map-Reduce, YARN, and Cloudera Manager, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Hive, Zookeeper and HBase, Windows 2000/2003 Unix Linux Java, RMAN, Golden Gate, Redhat, Cassandra, EM Cloud Control, ApacheHadoop, Toad

Confidential, Atlanta, GA

BigData Operations Engineer - Consultant.

Responsibilities:

ClusterAdministration, Releases and Upgrades, Managed MultipleHadoopClusters with highest capacity of 7 PB (400+ nodes) while working on Hortonworks Distribution.
Responsible for Implementation and Ongoing Administration of Hadoop Infrastructure.
Used Hadoop cluster as a Staging Environment from Heterogeneous sources in Data Import Process.
Configured High Availability on teh name node for teh Hadoop cluster - part of teh Disaster Recovery Roadmap.
Configured Ganglia and Nagios to monitor clusters.
Involved working on Cloud architecture.
Performed both Major and Minor upgrades to teh existing clusters and rolling back to teh previous version.
Implemented Commissioning and Decommissioning of data nodes, killing teh unresponsive task tracker and dealing with blacklisted task trackers.
Implemented Fair scheduler on teh job tracker to allocate teh fair amount of resources to small jobs.
Maintained, audited and built new clusters for testing purposes using Ambari and Hortonworks.
Designed and Allocated HDFS quotes for multiple groups.
Configured Flume for efficiently collecting, aggregating and moving enormous amounts of log Data from many diverse sources to HDFS.
Upgrade HDP 2.2 to HDP 2.3 Manually in Software Patches and Upgrades.
Scripting Hadoop Package Installation and Configuration to support fully automated deployments.
Configuring Rack Awareness on HDP.
Adding New Nodes to an existing cluster, recovering from a Name Node failure.
Instrumental in building scalable distributed data solutions using Hadoop eco-system.
Adding New Data Nodes when needed and re-balancing teh cluster.
Handled import of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted data from MySQL into HDFS using Sqoop.
Involved working in Database Backup and Recovery, Database Connectivity and Security.
Exported teh analyzed data to relational databases using Sqoop for visualization and to generate reports for teh BI team.
Utilization based on running statistics of Map and Reduce tasks.
Changes to Configuration properties of teh Cluster based on volume of teh data being processed and Performance of teh Cluster.
Inputs to development team regarding efficient Utilization of resources like Memory and CPU.

Environment: Map Reduce, SmartSence, KNOX, MYSQL plus, HDFS, Knox, Ranger Pig Hive HBase Flume Sqoop, Yarn Flume, Kafka.

Confidential - Atlanta, GA

Hadoop Administrator/ Linux Administrator

Responsibilities:

Installation and Configuration of Linux for new build environment.
Day-to- day-user access, Permissions, Installations and Maintenance of Linux Servers.
Created Volume Groups, Logical Volumes and Partitions on Linux servers and mounted File systems.
Experienced in Installation and Configuration of Cloudera CDH4in all environments.
Resolved tickets on P1 issues and Troubleshoot teh errors.
Balancing HDFS Manually to decrease network utilization and increase job performance.
Responsible for building Scalable distributed data solutions using Hadoop.
Done major and minor upgrades to teh Hadoop cluster.
Upgraded Cloudera Hadoop ecosystems in cluster using Cloudera distribution packages.
Use of Sqoop to Import and Export data from HDFS to RDBMS vice-versa.
Experienced in stress and performance testing, benchmark for teh cluster.
Commissioned and Decommissioned teh Data Nodes in cluster in case of problems.
Installed Centos using Pre-Execution environment boot and Kick start method on Multiple Servers, Remote installation of Linux using PXE boot.
Monitoring System activity, Performance, Resource Utilization.
Develop and optimize physical design of MySQL database systems.
Deep understanding of Monitoring and Troubleshooting Mission Critical Linux Machines.
Responsible for Maintenance of Raid-Groups, LUN Assignments as per Requirement Documents.
Extensive use of LVM, Creating Volume Groups, Logical volumes.
Performed Red Hat Package Manager (RPM) and YUM package Installations.
Tested and Performed Enterprise wide installation, Configuration and Support for Hadoop Using MapR Distribution.
Setting Up Cluster and Install all teh ecosystem components through MapR and Manually through command linein Lab Cluster.
Set up Automated Processes to Archive/Clean data on cluster on Name Node and Secondary Name Node.
Involved in Estimation and setting-up Hadoop Cluster in Linux.
Prepared PIG scripts to validate Time Series Rollup Algorithm.
Responsible for Support, Troubleshooting of Map Reduce Jobs, Pig Jobs.
Maintained Incremental Loads Confidential Daily, Weekly and Monthly Basis.
Implemented Oozie workflows for Map Reduce, Hive and Sqoop Actions.
Performed Scheduled backup and Necessary Restoration.
Build and Maintain Scalable Data Using Hadoop Ecosystem and other open source components like Hive and HBase.
Monitor Data Streaming between Web Sources and HDFS.
Close monitoring and analysis of teh Map Reduce job executions on cluster Confidential task level.

Confidential

Systems Engineer

Responsibilities:

Worked on SQL Server 2005, 2008R2 Database Administration.
Migrated SQL Server 2005 Databases to SQL Server 2008.
Created Stored procedures, Functions and Triggers for retrieval and update of data.
Extensive Experience in MS SQL Server 2008/2005/2000, BI tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), and SQL Server Analysis Services (SSAS).
Migrated DTS packages into SSIS packages.
Manage backup, recovery and DR practices for teh environment.
Created data transformation tasks like BCP, BULK INSERT to import/export data from client.
Involved in Source Data Analysis, analysis and designing mappings for data extraction also responsible for Design and Development of SSIS Packages to load teh Data from various Databases and Files.
Developed and optimized database structures, stored procedures, Dynamic Management views, DDL triggers and user-defined functions.
Experience in implementing and maintaining new T-SQL features added in SQL Server 2005 that are Data partitioning, Error handling through TRY-CATCH statement, Common Table Expression (CTE).
Used Data Partitioning, Snapshot Isolation in SQL Server 2005.
Expertise and Interest include Administration, Database Design, Performance Analysis, and Production Support for Large (VLDB) and Complex Databases up to 2.5 Terabytes.
Managing teh clustering environment.
Worked on file system management and monitoring and Capacity planning in Big Data
Moving teh large data sets from HDFS to MYSQL database and vice-versa using SQOOP.
Implemented NFS, NAS, FTP and HTTP servers on Linux servers.
Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
Created a local YUM repository for installing and updating packages.
Migrating data from one cluster to other cluster by using DISTCP, and automated teh dumping procedure using shell scripts.
Involved designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration.
Involved in various NOSQL databases likeHBase, Cassandrain implementing and integration.
Implemented automatic failover zookeeper and zookeeper failover controller.
Configured Oozie for workflow automation and coordination.
Tuned long running queries to optimize application and system.

Environment: ETL Tool ASP.NET, VB.NET, EMC Storage Area. MS SQL Server 2008/2005/2000, SSRS, SSAS, SSIS,T-SQL, Windows 2003/2000 Advanced Server, Unix, Visual Studio 2010, C#.Net 2005.

We provide IT Staff Augmentation Services!

Hadoop Administrator/big Data Engineer Resume

Englewood, CO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship