Big Data Engineer-Kafka Administration Resume Englewood, CO - Hire IT People

SUMMARY

8+ years of IT Operations experience with 3+ years of experience in Hadoop Development, Administrationand 2+ years of experience in Linux based systems
Excellent understanding of Distributed Systems and Parallel Processing architecture.
Worked on components like HDFS, Map/Reduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, and Fair Scheduler, Spark, SmartSence and Kafka.
Experience in managing Cloudera, Hortonworks.
Strong knowledge on HadoopHDFS architecture and Map - Reduce framework.
Secured teh Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among teh cluster node.
Expert level experience architecting, building and maintaining Enterprise grade Hadoop.
Set up a secure Kafka cluster, tested Kerberos based autantication in addition to Kafka ACL’S. Moreover, tested Kafka ACL’s with IP based.
Implemented Kafka Cluster and Securing KERBEROS, ACL on it.
Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format
Successfully Deployed Elasticsearch 5.3.0 and Kibana with Ansible in Production Cluster
Involved in vendor selection and capacity planning for teh Hadoopcluster in production.
Experience in Administering teh Linux systems to deploy Hadoopcluster and monitoring teh cluster using Nagios and Ganglia.
Experience in performing backup, recovery, failover and DR practices on multiple platforms.
ImplementedKerberosfor autanticating all teh services inHadoopCluster.
Experience with automation for provisioning system resources using puppet.
Strong knowledge in configuring Name Node High Availability and Name Node Federation.
Experienced in writing teh automatic scripts for monitoring teh file systems.
Implementing Hadoop based solution to store archives and backups from multiple sources.
Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and also using fast loaders and connectors Experience.
Worked with architecture in Hadoophardware and software design.
Configured and Implemented Amazon AWS
Experience in deployingHadoopcluster on Public and Private Cloud Environment like Amazon AWS, OpenStack.
Built ingestion framework using flume for streaming logs and aggregating teh data into HDFS.
Worked with application team via scrum to provide operational support, installHadoopupdates, patches and version upgrades as required.
Expertise in importing and exporting data into HDFS format, preprocessed data into teh commercial Analytic database - RDBMS.
Experience in installation, upgrading, configuration, monitoring supporting and managing in HadoopClusters using Cloudera CDH4, CDH5, Hortonworks HDP 2.1, 2.2 and 2.3, 2.4 & 2.6 on Ubuntu, Redhat, and Centos systems.
Experience in Installing and monitoring standalone multi-node Clusters of Kafka.
Performance tuning Apache Kafka on clusters.
Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and Hbase.
Created a role in teh Sentry app through Hue.
Exposure to installingHadoopand its ecosystem components such as Hive and Pig.
Experience in systems & network design physical system consolidation through server and storage virtualization, remote access solutions.
Organized in using Zendesk for support and JIRA for development tracking.
Experience in understanding and managingHadoopLog Files, experience in managing theHadoop infrastructure withClouderaManager.
Worked on VBLOCK Server’s by VCE from CISCO PROVIDER configured DATABASE SERVERS on it.
Experience in Data Modeling (Logical and Physical Design of Databases), Normalization and building Referential Integrity Constraints.
Worked with highly transactional merchandise and investment in SQL databases with PCI, HIPAA compliance involving data encryption with certificates and security keys Confidential various levels.
Experience in upgrading SQL server software to new versions and applying service packs and patches.
Actively involved in System Performance by tuning SQL queries and stored procedures by using SQL Profiler, Database Engine Tuning Advisor
Experience in providing good production support for 24x7 over weekends on rotation basis.

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, PigSqoop, Oozie, Kafka, YARN, flume, SmartSence, Impala, Knox, Ranger, Spark, Ganglia.

Hadoop Distributions: Cloudera, Hortonworks, MapR.

Database: SQL, Oracle, NOSQL, MySQL, MongoDB, Cassandra, Hbase.

Web Servers: IIS 6.0, IIS 7.0, IIS 7.5.

Operating System: Windows 2000/2003/2008/2012 , Linux (CentOS, Ubuntu Redhat).

Programming Languages: JAVA, JavaScript, C, C++, SQL, T-SQL, PL/SQL.

Scripting: Power Shell 3.0/2.0. UNIX Shell Scripting, Python

ETL Tools: DTS, SSIS, Informatica, sqoop.

Tools: SCOM, NetMon, SMB, SFTP, SQL Sentry.

PROFESSIONAL EXPERIENCE

Confidential - Englewood, CO

Big Data Engineer-Kafka Administration

Responsibilities:

Designed and implemented by configuring Topics in new Kakfa cluster in all environment.
Successfully secured teh kafka cluster with Kerberos
Implemented Kafka Security Features using SSL and without Kerberos. Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features.
Designed and architected in building New Hadoop Cluster.
Implemented AWS and Azure-Omni for teh couchbase Load.
Developed and deployed custom hadoop applications, Data Analysis, Data Storage and processed in Amazon EMR.
Deployed Spark Cluster and other services in AWS using console.
Installed Kerberos secured kafka cluster with no encryption on Dev and Prod. Also set up Kafka ACL’s into it
Successfully did set up a no autantication kafka listener in parallel with Kerberos (SASL) Listener. Also I tested non autanticated user (Anonymous user) in parallel with Kerberos user.
Integrated LDAP Configuration this includes integrating LDAP for securing Ambari servers and manage authorization and securing with permissions against users and Groups
Implemented KNOX, RANGER, Spark and SmartSence inHadoop cluster.
Installed HDP 2.6 in all environments
Installed Ranger in all environments for Second Level of security in Kafka Broker.
Involved in Data Ingestion Process to Production cluster.
Worked on Oozie Job Scheduler
Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format (Receiving gzip data compression Data and produce Avro Data into HDFS files).
Installed Docker for utilizing ELK, Influxdb, and Kerberos.
Created Database on InfluxDB also worked on Interface, created for Kafka also checked teh measurements on Databases
Created a Bash Scripting with Awk formatted text to send metrics to InfluxDB.
Enabled influxDB and Configured Influx database source into Grafana interface
Succeeded in deploying of ElasticSearch 5.3.0, Influx DB 1.2 on teh Prod machine in a Docker container.
Created a Cron Job those will execute a program dat will start teh ingestion process. Teh Data is read in, converted to Avro, and written to teh HDFS files
Designed Data Flow Ingestion Chart Process.
Set up a new Grafana Dashboard with real-time consumer lags in Dev, PP Cluster, pulling only consumer lags metric and sending them to influx DB (Via a script in Corntab)
Worked on Database Schema DDL-oracle Schema issues Confidential time of Ambari upgrade
Successfully Upgraded HDP 2.5 to 2.6 in all environment Software patches and upgrades.
Worked on Kafka Backup Index, Log4j appender minimized logs and Pointed ambari server logs to NAS Storage.
Tested all services like Hadoop, ZK, Spark, Hive SERVER & Hive MetaStore.
Worked on SNMP Trap Issues in Production Cluster.
Worked on heap optimization and changed some of teh configurations for hardware optimization.
Involved working in Production Ambari Views.
Implemented Rack Awareness in Production Environment.
Worked on Disk space issues in Production Environment by monitoring how fast dat space is filled, review wat is being logged created a long-term fix for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
Worked on Nagios Monitoring tool.
Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
Involved with Hortonwork Support team on Grafana consumer Lags Issues. (Currently no consumer lags are generating in Grafana Visualization within HDP)
Successfully Generated consumer group lags from kafka using their API
Installed and configured Ambari Log Search under teh hood it will required a SolR Instance, dat can collect and index all cluster generated logs in real time and display them in one interface.
Installed Ansible 2.3.0 in Production Environment
Worked on maintenance of Elasticsearch cluster by adding more partitioned disks. This will increase disk writing throughput and enable Elasticsearch to write to multiple disk in same time and a segment of given Shard is written to teh same disk
Upgraded Elasticsearch from 5.3.0 to 5.3.2 following teh rolling upgrade process and using ansible to deploy new packages in Prod Cluster.
Successfully Made some visualization on Kibana
Also deployed Kibana with ansible and connected to Elasticsearch Cluster. Tested Kibana and ELK by creating a test index and injected sample data into it.
Successfully test Kakfa ACL’s with anonymous users and with different hostnames.
Created HBase tables to store variable data formats of data coming from different applications.
Worked on Production Support Issues

Environment: Kakfa Brokers, Kafka Security, Kerberos, ACL, ElasticSearch, Kibana, Ambari Log Search, Nagios, Kafka Manger, Grafana, YARN, Spark, Rangers.

Confidential - Richmond, VA

BigData Engineer -Hadoop Administrator

Responsibilities:

Responsible for implementation and support of teh Enterprise Hadoop environment.
Installation and configuration, Hadoop Cluster and Maintenance, Cluster Monitoring, Troubleshooting and certifying environments for production readiness.
Experience in Implementing Hadoop Cluster Capacity Planning
Involved in teh installation of CDH5 and up-gradation from CDH4 to CDH5
Cloudera Manager Up gradation from 5.3. to 5.5 version
Responsible on-boarding new users to teh Hadoop cluster (adding user a home directory and providing access to teh datasets).
Helped teh users in production deployments throughout teh process.
Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
Responsible for building scalable distributed data solutions using Hadoop.
Continuous monitoring and managing teh Hadoop cluster through Ganglia and Nagios.
Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
Involved in Strom Batch-mode processing over massive data sets which is analogous to a Hadoop job dat runs as a batch process over a fixed data set.
Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest customer behavioral data into HDFS for analysis.
Involved Storm terminology created a topology dat runs continuously over a stream of incoming data.
IntegratedHadoopwith Active Directory and enabledKerberosfor Autantication.
Upgraded teh Cloudera Hadoop ecosystems in teh cluster using Cloudera distribution packages.
Done stress and performance testing, benchmark for teh cluster.
Commissioned and decommissioned teh Data Nodes in teh cluster in case of teh problems.
Debug and solve teh major issues with Cloudera manager by interacting with teh Cloudera team.
Monitoring teh System activity, Performance, Resource utilization.
Deep understanding of monitoring and troubleshooting mission critical Linux machines.
Kafka- Used for building real-time data pipelines between clusters.
Ran Log aggregations, website Activity tracking and commit log for distributed system using Apache Kafka
Design and Implemented Amazon Web Services As a passionate advocate of AWS within Grace note, migrated from a physical data center environment to
Focused on high-availability, fault tolerance, and auto-scaling.
Managed critical bundles and patches on teh production servers after successfully navigating through teh testing phase in teh test environments.
Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
Integrated Apache Kafka for data ingestion
Configured Domain Name System (DNS) for hostname to IP resolution.
Involved in data migration from Oracle database to MongoDB.
Queried and analyzed data fromCassandrafor quick searching, sorting and grouping
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Preparation of operational testing scripts for Log check, Backup and recovery and Failover.
Troubleshooting and fixing teh issues Confidential User level, System level and Network level by using various tools and utilities.
Performed all Systemadministrationtasks like cron jobs, installing packages, and patches.

Environment: YUM, RAID, MYSQL 5.1.4, PHP, SHELL SCRIPT, MYSQL, WORKBENCH, LINUX 5.0, 5.1, YUM, RAID,Flume, Oozie, Pig, Sqoop, Mongo, Hbase, Hive, Map-Reduce, YARN, and Cloudera Manager, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Hive, Zookeeper and HBase, Windows 2000/2003 Unix Linux Java, RMAN, Golden Gate, Redhat, Cassandra, EM Cloud Control, ApacheHadoop, Toad

Confidential - Atlanta, GA

BigData Operations Engineer - Consultant

Responsibilities:

ClusterAdministration, releases and upgrades Managed multipleHadoopclusters with teh highest capacity of 7 PB (400+ nodes) with PAM EnabledWorked on Hortonworks Distribution.
Responsible for implementation and ongoing administration of Hadoop infrastructure.
Using Hadoop cluster as a staging environment for teh data from heterogeneous sources in data import process
Configured High Availability on teh name node for teh Hadoop cluster - part of teh disaster recovery roadmap.
Configured Ganglia and Nagios to monitor teh cluster and on-call with EOC for support.
Involved working on Cloud architecture.
Performed both Major and Minor upgrades to teh existing cluster and also rolling back to teh previous version.
Implemented Commissioning and Decommissioning of data nodes, killing teh unresponsive task tracker and dealing with blacklisted task trackers.
Implemented Fair scheduler on teh job tracker to allocate teh fair amount of resources to small jobs.
Maintained, audited and built new clusters for testing purposes using teh AMBARI, HORTONWORKS.
Designed and allocated HDFS quotas for multiple groups.
Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to teh HDFS.
Upgraded from HDP 2.2 to HDP 2.3 Manually in Software patches and upgrades.
Scripting Hadoop package installation and configuration to support fully automated deployments.
Configuring Rack Awareness on HDP.
Adding new Nodes to an existing cluster, recovering from a Name Node failure.
Instrumental in building scalable distributed data solutions using Hadoop eco-system.
Adding new Data Nodes when needed and re-balancing teh cluster.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted teh data from MySQL into HDFS using Sqoop.
Involved working in Database backup and recovery, Database connectivity and security.
Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.
Utilization based on teh running statistics of Map and Reduce tasks.
Changes to teh configuration properties of teh cluster based on volume of teh data being processed and performance of teh cluster.
Inputs to development regarding teh efficient utilization of resources like memory and CPU utilization.

Environment: Map Reduce, SmartSence, KNOX, MYSQL plus, HDFS, Knox, Ranger Pig Hive HBase Flume Sqoop, Yarn Flume, Kafka.

Confidential - Atlanta, GA

Hadoop Admin/ Linux Administrator

Responsibilities:

Installation and configuration of Linux for new build environment.
Day-to- day - user access, permissions, Installing and Maintaining Linux Servers.
Created volume groups logical volumes and partitions on teh Linux servers and mounted file systems and created partitions
Experienced in Installation and configuration Cloudera CDH4in testing environment.
Resolved tickets submitted by users, P1 issues, troubleshoot teh errors, resolving teh errors.
Balancing HDFS manually to decrease network utilization and increase job performance.
Responsible for building scalable distributed data solutions using Hadoop.
Done major and minor upgrades to teh Hadoop cluster.
Upgraded teh Cloudera Hadoop ecosystems in teh cluster using Cloudera distribution packages.
Use of Sqoop to Import and export data from HDFS to RDMS vice-versa.
Done stress and performance testing, benchmark for teh cluster.
Commissioned and decommissioned teh Data Nodes in teh cluster in case of teh problems.
Debug and solve teh major issues with Cloudera manager by interacting with teh Cloudera team.
Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
Monitoring teh System activity, Performance, Resource utilization.
Develop and optimize physical design of MySQL database systems.
Deep understanding of monitoring and troubleshooting mission critical Linux machines.
Responsible for maintenance Raid-Groups, LUN Assignments as per agreed design documents.
Extensive use of LVM, creating Volume Groups, Logical volumes.
Performed Red Hat Package Manager (RPM) and YUM package installations, patch and other server management.
Tested and Performed enterprise wide installation, configuration and support for hadoop using MapR Distribution.
Setting up cluster and installing all teh ecosystem components through MapR and manually through command linein Lab Cluster
Set up automated processes to archive/clean teh unwanted data on teh cluster, in particular on Name node and Secondary name node.
Involved in estimation and setting-up Hadoop Cluster in Linux.
Prepared PIG scripts to validate Time Series Rollup Algorithm.
Responsible for support, troubleshooting of Map Reduce Jobs, Pig Jobs and maintaining Incremental Loads Confidential daily, weekly and monthly basis.
Implemented Oozie workflows for Map Reduce, Hive and Sqoop actions.
Channelized Map Reduce outputs based on requirement using Practitioners
Performed scheduled backup and necessary restoration.
Build and maintain scalable data using theHadoop ecosystem and other open source components like Hive and HBase.
Monitor teh data streaming between web sources and HDFS.
Close monitoring and analysis of teh Map Reduce job executions on cluster Confidential task level.

Confidential

Systems Engineer - DBA

Responsibilities:

Worked on SQL Server 2005, 2008r2 Database Administration.
Managed teh migration of SQL Server 2005 databases to SQL Server 2008
Created stored procedures, functions and triggers for retrieval and update of data in teh database.
Extensive experience in MS SQL Server 2008/2005/2000 , BI tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), and SQL Server Analysis Services (SSAS).
Migrated DTS packages into SSIS packages.
Manage backup, recovery and DR practices for teh environment.
Created data transformation tasks like BCP, BULK INSERT to import/export data from client.
Involved in Source Data Analysis, analysis and designing mappings for data extraction also responsible for Design and Development of SSIS Packages to load teh Data from various Databases and Files.
Experience in implementing Key Performance Indicator (KPI) objects in SSAS and creating calculate members in MOLAP cube using MDX in SSAS.
Developed and optimized database structures, stored procedures, Dynamic Management views, DDL triggers and user-defined functions.
Experience in implementing and maintaining new T-SQL features added in SQL Server 2005 dat are Data partitioning, Error handling through TRY-CATCH statement, Common Table Expression (CTE).
Used Data Partitioning, Snapshot Isolation in SQL Server 2005.
Expertise and Interest include Administration, Database Design, Performance Analysis, and Production Support for Large (VLDB) and Complex Databases up to 2.5 Terabytes.
Managing teh clustering environment.
Worked on file system management and monitoring and Capacity planning in Big Data
Moving teh large data sets from HDFS to MYSQL database and vice-versa using SQOOP.
Implemented NFS, NAS, FTP and HTTP servers on Linux servers.
Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
Created a local YUM repository for installing and updating packages.
Migrating teh data from one cluster to other cluster by using DISTCP, and automated teh dumping procedure using shell scripts.
Involved designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration.
Involved in various NOSQL databases likeHBase, Cassandrain implementing and integration.
Implemented automatic failover zookeeper and zookeeper failover controller.
Configured Oozie for workflow automation and coordination.
To analyze long running slow queries and tune teh same to optimize application and system

Environment: ETL Tool ASP.NET, VB.NET, EMC Storage Area. MS SQL Server 2008/2005/2000 , SSRS,SSAS,SSIS,T-SQL, Windows 2003/2000 Advanced Server, Unix, Visual Studio 2010, C#.Net 2005.

We provide IT Staff Augmentation Services!

Big Data Engineer-kafka Administration Resume

Englewood, CO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship