Hadoop Administrator/big Data Engineer Resume
Englewood, CO
SUMMARY:
- Around 8+ years of IT Operations experience with 3+ years of experience in Hadoop Administration, Developer and Architect with 2+ years’ experience in Linux based systems.
- Excellent understanding of Distributed Systems and Parallel Processing Architecture.
- Worked on components like HDFS, Map/Reduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, Fair Scheduler, Spark, Storm, SmartSence and Kafka.
- Experience in managing Cloudera, Hortonworks, MapR Distributions.
- Implemented real time data processing using Kafka and Storm topology into CosmosDB.
- Strong noledge on HadoopHDFS architecture and Map - Reduce framework.
- Secured Hadoop cluster from unauthorized access by Kerberos, LDAP integration and SSL for data transfer among cluster nodes.
- Expert level in architecting, building and maintaining Enterprise grade Hadoop.
- Set up secure Kafka cluster with Kerberos based on autantication in addition to Kafka ACL’S IP Address and Anonymous Users.
- Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format.
- Successfully Deployed Elasticsearch 5.3.0 and Kibana with Ansible in Production Cluster.
- Involved in vendor selection and capacity planning for teh Hadoopcluster.
- Experience in Administering teh Linux systems to deploy Hadoopcluster and monitoring using Nagios and Ganglia.
- Familiar in creating playbook script for Automated Installation with Ansible for Multi Node Cluster setup.
- Experience in performing backup, recovery, failover and DR practices on multiple platforms.
- Experience with automation for provisioning system resources using puppet.
- Strong noledge in configuring Name Node High Availability and Name Node Federation.
- Experienced in writing teh automation scripts for monitoring teh file systems.
- Implementing Hadoop based solution to store archives and backups from multiple sources.
- Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata using fast loaders and connectors Experience.
- Worked with architecture team in building Hadoophardware and software design.
- Configured and Implemented Amazon AWS.
- Familiar in implementing in Azure-Omni platform.
- Experience in deployingHadoopcluster on Public and Private Cloud Environment like Amazon AWS, OpenStack, HDInsight Azure.
- Built ingestion framework using flume for streaming logs and aggregating teh data into HDFS.
- Worked with application team to provide operational support, installHadoopupdates, patches and version upgrades.
- Experience in installation, upgrading, configuration, monitoring, supporting and managing in HadoopClusters using Cloudera CDH4, CDH5, Hortonworks HDP 2.1, 2.2 and 2.3, 2.4 & 2.6 on Ubuntu, Redhat, and Centos systems.
- Experience in Installing and monitoring standalone multi-node Clusters of Kafka.
- Experience in Performance tuning of Apache Kafka.
- Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and Hbase.
- Exposure in installingHadoopand its ecosystem components such as Hive and Pig.
- Experience in systems & network design, physical system consolidation through server and storage virtualization, remote access solutions.
- Experience in understanding and managingHadoopLog Files.
- Worked on VBLOCK Server’s by VCE from CISCO PROVIDER configured DATABASE SERVERS on it.
- Experience in Data Modeling (Logical and Physical Design of Databases), Normalization and building Referential Integrity Constraints.
- Worked with highly transactional merchandise and investment in SQL databases with PCI compliance involving data encryption with certificates and security keys Confidential various levels.
- Experience in upgrading SQL server software, service packs and patches.
- Actively involved in System Performance by tuning SQL queries and stored procedures by using SQL Profiler, Database Engine Tuning Advisor.
- Experience in production support for 24x7 over weekends on rotation basis.
TECHNICAL SKILLS:
Hadoop Ecosystem: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie, Kafka, Storm, YARN, flume, SmartSence, Impala, Knox, Ranger, Spark, Ganglia.
Hadoop Distributions: Cloudera, Hortonworks, MapR.
Database: SQL, Oracle, NOSQL, MySQL, MongoDB, CosmosDB, Cassandra, Hbase.
Operating System: Windows 2000/2003/2008/2012, Linux (CentOS, Ubuntu Redhat).
Programming Languages: JAVA, JavaScript, C, C++, SQL, T-SQL, PL/SQL.
Scripting: Power Shell 3.0/2.0. UNIX Shell Scripting, Python
ETL Tools: DTS, SSIS, Informatic, sqoop.
Tools: SCOM, NetMon, SMB, SFTP, SQL Sentry.
PROFESSIONAL EXPERIENCE
Confidential, Englewood, CO
Hadoop Administrator/Big Data Engineer
Responsibilities:
- Implemented real time processing using HDInsight Kafka and Storm topology.
- Worked on creating comprehensive MongoDB API and Document DB API using Storm into Azure Cosmos DB.
- Created Custom Spout and Bolt in Storm application by into Cosmos DB according to teh business rules.
- Worked on Storm-Mongo DB design to map Strom tuple values to either an update operation or an insert.
- Added Parallelism to Storm Topology (Worker, Processes Executors, Tasks) and implemented Acking method.
- Installed in HadoopClusters using Hortonworks HDP 2.6.
- Worked on in setting up Kafka and Storm clusters in Azure.
- Created and built multiple Docker Swarm Mangers by deploying Java application in Docker Container by using Jenkins.
- Used various Core Java concepts such as Multi-Threading, Serialization, Garbage Collection, Exception Handling, Collection API's to implement various features and enhancements.
- Worked on idle timeout issues in Mongo DB by adding pool size.
- Worked on Kafka cluster by using Mirror Maker to copy to teh Kafka cluster on Azure.
- Worked on IBM MQ Adaptor in pulling data to Kafka cluster.
- Debug Mongo DB API issues for Mongo-java-driver 3.6.1 version.
- Worked on JSON and Edifact format data and manipulated using Storm Application.
- Implemented Kafka ACL’S and successfully tested with anonymous users and with different hostnames.
- Implemented Kerberos in HDP Cluster 2.6.
- Involved Storm Terminology, created a Topology that runs continuously over a stream of incoming data.
- Using HDInsight Storm, Created Topology in ingesting data from HDInsight Kafka and writes data to Cosmos DB.
- Implemented Kafka Connect Fetching data from MQ to Kafka Broker.
- Worked on in creating Cosmos DB, Document DB and Mongo DB.
- Implemented AppD V 2.4 and deployed through Jenkins.
- Worked on Kafka Backup Index, Log4j appended minimized logs and Pointed Ambari server logs to NAS Storage.
- Worked on SNMP Trap Issues in Production Cluster.
- Experience in writing test cases in JUnit for unit testing of classes and worked on Fortify Scanner for security.
Confidential,CO
Hadoop Administration- Kafka
Responsibilities:
- Designed, Implemented and Configuring Topics and Partitions in new Kafka cluster in all environments.
- Successfully secured Kafka cluster with Kerberos.
- Implemented Kafka Security Features using SSL and without Kerberos, with more grain-fines Security to has users and groups to enable advanced security features.
- Tested Advertiser Listener property through zookeeper data for securing Kafka brokers.
- Implemented AWS and Azure-Omni for teh couchbase Load.
- Designed Azure storage for Kafka topics and merge into Couch Base with constant query components.
- Deployed Spark Cluster and other services in AWS using console.
- Installed Kerberos secured Kafka cluster with no encryption in all environments.
- Successful in setting up, a no autantication Kafka listener in parallel with Kerberos (SASL) Listener.
- Tested Non-autanticated user (Anonymous user) in parallel with Kerberos user.
- Integrated LDAP Configuration for securing Ambari servers and Manage Authorization and securing with permissions against users and Groups.
- Implemented KNOX, RANGER, Spark and SmartSence inHadoop cluster.
- Installed HDP 2.6 in all environments.
- Installed Ranger in all environments for Second Level of security in Kafka Broker.
- Worked on Oozie Job Scheduler.
- Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format
- Experience in gzip data compression Data to Avro Data into HDFS files.
- Installed Docker for Utilizing ELK, InfluxDB, and Kerberos.
- Created InfluxDB for Kafka metrices to monitor Consumer Lags in Grafana.
- Created Bash Script with AWK formatted text to send metrics to InfluxDB.
- Enabled InfluxDB and Configured Influx database source into Grafana interface
- Succeeded in deploying ElasticSearch 5.3.0, Influx DB 1.2 in a Docker container.
- Installed Ansible by installing Elasticsearch Nodes in multiple environments with automated scripts.
- Created a Cron Job to execute a program that will start teh ingestion process. Teh Data is read in, converted to Avro, and written to teh HDFS files.
- Designed Data Flow Ingestion Chart Process.
- Set up a new Grafana Dashboard with real-time consumer lags in all environments pulling only consumer lags metrices and sending them to influx DB (Via a script in Corntab).
- Worked on DDL-Oracle Schema issues Confidential time of Ambari upgrade.
- Successfully Upgraded HDP 2.5 to 2.6 in all Environments and Software patches.
- Tested all services like Hadoop, ZK, Spark, Hive SERVER & Hive MetaStore.
- Worked on heap optimization and configurations for hardware optimization.
- Involved working in Production Ambari Views.
- Implemented Rack Awareness in Production Environment.
- Worked on Disk space issues in Production Environment by monitoring how fast that space is filled, review wat is being logged created a long-term fix for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
- Worked on Nagios Monitoring tool.
- Installed Kafka Manager for consumer lags and for Monitoring Kafka Metrices by adding topics, Partitions etc.
- Involved with Hortonworks Support team on Grafana Consumer Lags Issues. (Currently no consumer lags are generating in Grafana Visualization within HDP).
- Successfully Generated Consumer Group lags from Kafka using API.
- Installed and configured Ambari Log Search under teh hood it required a SolR Instance, that can collect and index all cluster generated logs in real time and display them in one interface.
- Installed Ansible 2.3.0 in all environments.
- Worked on maintenance of Elasticsearch Cluster by adding more partitioned disks. This will increase disk writing throughput and enable Elasticsearch to write to multiple disk in same time and a segment of given Shard is written to teh same disk.
- Upgraded Elasticsearch from 5.3.0 to 5.3.2 following teh rolling upgrade process by using ansible to deploy new packages in all Clusters.
- Successfully Made some visualization on Kibana and deployed Kibana with Ansible and connected to Elasticsearch Cluster.
- Tested Kibana and ELK by creating a test index and injected sample data.
- Successfully tested Kafka ACL’s with Anonymous users and with different hostnames.
- Created HBase tables to store variable data formats of data coming from different applications.
- Worked on 24X7 Production Support Issues.
Environment: Kafka Brokers, Kafka Security, Kerberos, ACL, ElasticSearch, Kibana, Ambari Log Search, Nagios, Kafka Manger, Grafana, YARN, Spark, Rangers.
Confidential - Richmond, VA
BigData Engineer -Hadoop Administrator
Responsibilities:
- Responsible for Implementation and Support of teh Enterprise Hadoop Environment.
- Installation, Configuration, Hadoop Cluster and Maintenance, Cluster Monitoring, Troubleshooting and Certifying Environments for production readiness.
- Experience in Implementing Hadoop Cluster Capacity Planning.
- Involved in teh installation of CDH5 and Up-grades from CDH4 to CDH5.
- Cloudera Manager Upgrade from 5.3. to 5.5 version.
- Responsible on-boarding New Users to Hadoop Cluster (adding user a home directory and providing access to datasets).
- Helped users in production deployments throughout teh process.
- Managed and reviewed Hadoop Log files as part of administration for troubleshooting purposes, Communicate and Escalate issues appropriately.
- Responsible for building Scalable Distributed data solutions using Hadoop.
- Continuous monitoring and managing teh Hadoop cluster through Ganglia and Nagios.
- Installed Oozie workflow Engine to run multiple Hive and Pig jobs, runs independently with time and data availability.
- Involved in Strom Batch-Mode Processing over massive data sets analogous to a Hadoop job that runs as a batch process over a fixed data set.
- Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest customer behavioral data into HDFS for analysis.
- IntegratedHadoopwith Active Directory and EnabledKerberosfor Autantication.
- Upgraded Cloudera Hadoop Ecosystems in teh cluster using Cloudera distribution packages.
- Experienced in stress and performance testing, benchmark for teh cluster.
- Commissioned and Decommissioned teh Data Nodes in teh cluster in case of problems.
- Debug and Solve teh key issues with Cloudera manager by interacting with teh Cloudera team.
- Monitoring teh System Activity, Performance, Resource Utilization.
- Deep understanding of Monitoring and Troubleshooting mission critical Linux machines.
- Used Kafka for building real-time data pipelines between clusters.
- Executed Log Aggregations, Website Activity Tracking and Commit log for Distributed Systems using Apache Kafka.
- Focused on High-availability, Fault tolerance and Auto-Scaling.
- Managed Critical bundles and patches on Production Servers.
- Managed Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID Configurations.
- Integrated Apache Kafka for data ingestion.
- Configured Domain Name System (DNS) for Hostname to IP resolution.
- Involved in data migration from Oracle database to MongoDB.
- Queried and Analyzed data fromCassandrafor Quick Searching, Sorting and Grouping.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Preparation of operational testing scripts for Log Check, Backup and Recovery and Failover.
- Troubleshooting and Fixing issues Confidential User, System and Network levels.
- Performed all Systemadministrationtasks like Corn jobs, installing packages, and patches.
Environment: YUM, RAID, MYSQL 5.1.4, PHP, SHELL SCRIPT, MYSQL, WORKBENCH, LINUX 5.0, 5.1, YUM, RAID,Flume, Oozie, Pig, Sqoop, Mongo, Hbase, Hive, Map-Reduce, YARN, and Cloudera Manager, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Hive, Zookeeper and HBase, Windows 2000/2003 Unix Linux Java, RMAN, Golden Gate, Redhat, Cassandra, EM Cloud Control, ApacheHadoop, Toad
Confidential, Atlanta, GA
BigData Operations Engineer - Consultant.
Responsibilities:
- ClusterAdministration, Releases and Upgrades, Managed MultipleHadoopClusters with highest capacity of 7 PB (400+ nodes) while working on Hortonworks Distribution.
- Responsible for Implementation and Ongoing Administration of Hadoop Infrastructure.
- Used Hadoop cluster as a Staging Environment from Heterogeneous sources in Data Import Process.
- Configured High Availability on teh name node for teh Hadoop cluster - part of teh Disaster Recovery Roadmap.
- Configured Ganglia and Nagios to monitor clusters.
- Involved working on Cloud architecture.
- Performed both Major and Minor upgrades to teh existing clusters and rolling back to teh previous version.
- Implemented Commissioning and Decommissioning of data nodes, killing teh unresponsive task tracker and dealing with blacklisted task trackers.
- Implemented Fair scheduler on teh job tracker to allocate teh fair amount of resources to small jobs.
- Maintained, audited and built new clusters for testing purposes using Ambari and Hortonworks.
- Designed and Allocated HDFS quotes for multiple groups.
- Configured Flume for efficiently collecting, aggregating and moving enormous amounts of log Data from many diverse sources to HDFS.
- Upgrade HDP 2.2 to HDP 2.3 Manually in Software Patches and Upgrades.
- Scripting Hadoop Package Installation and Configuration to support fully automated deployments.
- Configuring Rack Awareness on HDP.
- Adding New Nodes to an existing cluster, recovering from a Name Node failure.
- Instrumental in building scalable distributed data solutions using Hadoop eco-system.
- Adding New Data Nodes when needed and re-balancing teh cluster.
- Handled import of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted data from MySQL into HDFS using Sqoop.
- Involved working in Database Backup and Recovery, Database Connectivity and Security.
- Exported teh analyzed data to relational databases using Sqoop for visualization and to generate reports for teh BI team.
- Utilization based on running statistics of Map and Reduce tasks.
- Changes to Configuration properties of teh Cluster based on volume of teh data being processed and Performance of teh Cluster.
- Inputs to development team regarding efficient Utilization of resources like Memory and CPU.
Environment: Map Reduce, SmartSence, KNOX, MYSQL plus, HDFS, Knox, Ranger Pig Hive HBase Flume Sqoop, Yarn Flume, Kafka.
Confidential - Atlanta, GA
Hadoop Administrator/ Linux Administrator
Responsibilities:
- Installation and Configuration of Linux for new build environment.
- Day-to- day-user access, Permissions, Installations and Maintenance of Linux Servers.
- Created Volume Groups, Logical Volumes and Partitions on Linux servers and mounted File systems.
- Experienced in Installation and Configuration of Cloudera CDH4in all environments.
- Resolved tickets on P1 issues and Troubleshoot teh errors.
- Balancing HDFS Manually to decrease network utilization and increase job performance.
- Responsible for building Scalable distributed data solutions using Hadoop.
- Done major and minor upgrades to teh Hadoop cluster.
- Upgraded Cloudera Hadoop ecosystems in cluster using Cloudera distribution packages.
- Use of Sqoop to Import and Export data from HDFS to RDBMS vice-versa.
- Experienced in stress and performance testing, benchmark for teh cluster.
- Commissioned and Decommissioned teh Data Nodes in cluster in case of problems.
- Installed Centos using Pre-Execution environment boot and Kick start method on Multiple Servers, Remote installation of Linux using PXE boot.
- Monitoring System activity, Performance, Resource Utilization.
- Develop and optimize physical design of MySQL database systems.
- Deep understanding of Monitoring and Troubleshooting Mission Critical Linux Machines.
- Responsible for Maintenance of Raid-Groups, LUN Assignments as per Requirement Documents.
- Extensive use of LVM, Creating Volume Groups, Logical volumes.
- Performed Red Hat Package Manager (RPM) and YUM package Installations.
- Tested and Performed Enterprise wide installation, Configuration and Support for Hadoop Using MapR Distribution.
- Setting Up Cluster and Install all teh ecosystem components through MapR and Manually through command linein Lab Cluster.
- Set up Automated Processes to Archive/Clean data on cluster on Name Node and Secondary Name Node.
- Involved in Estimation and setting-up Hadoop Cluster in Linux.
- Prepared PIG scripts to validate Time Series Rollup Algorithm.
- Responsible for Support, Troubleshooting of Map Reduce Jobs, Pig Jobs.
- Maintained Incremental Loads Confidential Daily, Weekly and Monthly Basis.
- Implemented Oozie workflows for Map Reduce, Hive and Sqoop Actions.
- Performed Scheduled backup and Necessary Restoration.
- Build and Maintain Scalable Data Using Hadoop Ecosystem and other open source components like Hive and HBase.
- Monitor Data Streaming between Web Sources and HDFS.
- Close monitoring and analysis of teh Map Reduce job executions on cluster Confidential task level.
Environment: YUM, RAID, MYSQL 5.1.4, PHP, SHELL SCRIPT, MYSQL, WORKBENCH, LINUX 5.0, 5.1, YUM, RAID,Flume, Oozie, Pig, Sqoop, Mongo, Hbase, Hive, Map-Reduce, YARN, and Cloudera Manager, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Hive, Zookeeper and HBase, Windows 2000/2003 Unix Linux Java, RMAN, Golden Gate, Redhat, Cassandra, EM Cloud Control, ApacheHadoop, Toad
Confidential
Systems Engineer
Responsibilities:
- Worked on SQL Server 2005, 2008R2 Database Administration.
- Migrated SQL Server 2005 Databases to SQL Server 2008.
- Created Stored procedures, Functions and Triggers for retrieval and update of data.
- Extensive Experience in MS SQL Server 2008/2005/2000, BI tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), and SQL Server Analysis Services (SSAS).
- Migrated DTS packages into SSIS packages.
- Manage backup, recovery and DR practices for teh environment.
- Created data transformation tasks like BCP, BULK INSERT to import/export data from client.
- Involved in Source Data Analysis, analysis and designing mappings for data extraction also responsible for Design and Development of SSIS Packages to load teh Data from various Databases and Files.
- Developed and optimized database structures, stored procedures, Dynamic Management views, DDL triggers and user-defined functions.
- Experience in implementing and maintaining new T-SQL features added in SQL Server 2005 that are Data partitioning, Error handling through TRY-CATCH statement, Common Table Expression (CTE).
- Used Data Partitioning, Snapshot Isolation in SQL Server 2005.
- Expertise and Interest include Administration, Database Design, Performance Analysis, and Production Support for Large (VLDB) and Complex Databases up to 2.5 Terabytes.
- Managing teh clustering environment.
- Worked on file system management and monitoring and Capacity planning in Big Data
- Moving teh large data sets from HDFS to MYSQL database and vice-versa using SQOOP.
- Implemented NFS, NAS, FTP and HTTP servers on Linux servers.
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
- Created a local YUM repository for installing and updating packages.
- Migrating data from one cluster to other cluster by using DISTCP, and automated teh dumping procedure using shell scripts.
- Involved designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration.
- Involved in various NOSQL databases likeHBase, Cassandrain implementing and integration.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Configured Oozie for workflow automation and coordination.
- Tuned long running queries to optimize application and system.
Environment: ETL Tool ASP.NET, VB.NET, EMC Storage Area. MS SQL Server 2008/2005/2000, SSRS, SSAS, SSIS,T-SQL, Windows 2003/2000 Advanced Server, Unix, Visual Studio 2010, C#.Net 2005.
