We provide IT Staff Augmentation Services!

Big Data Engineer-kafka Administration Resume

0/5 (Submit Your Rating)

Englewood, CO

SUMMARY

  • 8+ years of IT Operations experience with 3+ years of experience in Hadoop Development, Administrationand 2+ years of experience in Linux based systems
  • Excellent understanding of Distributed Systems and Parallel Processing architecture.
  • Worked on components like HDFS, Map/Reduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, and Fair Scheduler, Spark, SmartSence and Kafka.
  • Experience in managing Cloudera, Hortonworks.
  • Strong knowledge on HadoopHDFS architecture and Map - Reduce framework.
  • Secured teh Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among teh cluster node.
  • Expert level experience architecting, building and maintaining Enterprise grade Hadoop.
  • Set up a secure Kafka cluster, tested Kerberos based autantication in addition to Kafka ACL’S. Moreover, tested Kafka ACL’s with IP based.
  • Implemented Kafka Cluster and Securing KERBEROS, ACL on it.
  • Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format
  • Successfully Deployed Elasticsearch 5.3.0 and Kibana with Ansible in Production Cluster
  • Involved in vendor selection and capacity planning for teh Hadoopcluster in production.
  • Experience in Administering teh Linux systems to deploy Hadoopcluster and monitoring teh cluster using Nagios and Ganglia.
  • Experience in performing backup, recovery, failover and DR practices on multiple platforms.
  • ImplementedKerberosfor autanticating all teh services inHadoopCluster.
  • Experience with automation for provisioning system resources using puppet.
  • Strong knowledge in configuring Name Node High Availability and Name Node Federation.
  • Experienced in writing teh automatic scripts for monitoring teh file systems.
  • Implementing Hadoop based solution to store archives and backups from multiple sources.
  • Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and also using fast loaders and connectors Experience.
  • Worked with architecture in Hadoophardware and software design.
  • Configured and Implemented Amazon AWS
  • Experience in deployingHadoopcluster on Public and Private Cloud Environment like Amazon AWS, OpenStack.
  • Built ingestion framework using flume for streaming logs and aggregating teh data into HDFS.
  • Worked with application team via scrum to provide operational support, installHadoopupdates, patches and version upgrades as required.
  • Expertise in importing and exporting data into HDFS format, preprocessed data into teh commercial Analytic database - RDBMS.
  • Experience in installation, upgrading, configuration, monitoring supporting and managing in HadoopClusters using Cloudera CDH4, CDH5, Hortonworks HDP 2.1, 2.2 and 2.3, 2.4 & 2.6 on Ubuntu, Redhat, and Centos systems.
  • Experience in Installing and monitoring standalone multi-node Clusters of Kafka.
  • Performance tuning Apache Kafka on clusters.
  • Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and Hbase.
  • Created a role in teh Sentry app through Hue.
  • Exposure to installingHadoopand its ecosystem components such as Hive and Pig.
  • Experience in systems & network design physical system consolidation through server and storage virtualization, remote access solutions.
  • Organized in using Zendesk for support and JIRA for development tracking.
  • Experience in understanding and managingHadoopLog Files, experience in managing theHadoop infrastructure withClouderaManager.
  • Worked on VBLOCK Server’s by VCE from CISCO PROVIDER configured DATABASE SERVERS on it.
  • Experience in Data Modeling (Logical and Physical Design of Databases), Normalization and building Referential Integrity Constraints.
  • Worked with highly transactional merchandise and investment in SQL databases with PCI, HIPAA compliance involving data encryption with certificates and security keys Confidential various levels.
  • Experience in upgrading SQL server software to new versions and applying service packs and patches.
  • Actively involved in System Performance by tuning SQL queries and stored procedures by using SQL Profiler, Database Engine Tuning Advisor
  • Experience in providing good production support for 24x7 over weekends on rotation basis.

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, PigSqoop, Oozie, Kafka, YARN, flume, SmartSence, Impala, Knox, Ranger, Spark, Ganglia.

Hadoop Distributions: Cloudera, Hortonworks, MapR.

Database: SQL, Oracle, NOSQL, MySQL, MongoDB, Cassandra, Hbase.

Web Servers: IIS 6.0, IIS 7.0, IIS 7.5.

Operating System: Windows 2000/2003/2008/2012 , Linux (CentOS, Ubuntu Redhat).

Programming Languages: JAVA, JavaScript, C, C++, SQL, T-SQL, PL/SQL.

Scripting: Power Shell 3.0/2.0. UNIX Shell Scripting, Python

ETL Tools: DTS, SSIS, Informatica, sqoop.

Tools: SCOM, NetMon, SMB, SFTP, SQL Sentry.

PROFESSIONAL EXPERIENCE

Confidential - Englewood, CO

Big Data Engineer-Kafka Administration

Responsibilities:

  • Designed and implemented by configuring Topics in new Kakfa cluster in all environment.
  • Successfully secured teh kafka cluster with Kerberos
  • Implemented Kafka Security Features using SSL and without Kerberos. Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features.
  • Designed and architected in building New Hadoop Cluster.
  • Implemented AWS and Azure-Omni for teh couchbase Load.
  • Developed and deployed custom hadoop applications, Data Analysis, Data Storage and processed in Amazon EMR.
  • Deployed Spark Cluster and other services in AWS using console.
  • Installed Kerberos secured kafka cluster with no encryption on Dev and Prod. Also set up Kafka ACL’s into it
  • Successfully did set up a no autantication kafka listener in parallel with Kerberos (SASL) Listener. Also I tested non autanticated user (Anonymous user) in parallel with Kerberos user.
  • Integrated LDAP Configuration this includes integrating LDAP for securing Ambari servers and manage authorization and securing with permissions against users and Groups
  • Implemented KNOX, RANGER, Spark and SmartSence inHadoop cluster.
  • Installed HDP 2.6 in all environments
  • Installed Ranger in all environments for Second Level of security in Kafka Broker.
  • Involved in Data Ingestion Process to Production cluster.
  • Worked on Oozie Job Scheduler
  • Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format (Receiving gzip data compression Data and produce Avro Data into HDFS files).
  • Installed Docker for utilizing ELK, Influxdb, and Kerberos.
  • Created Database on InfluxDB also worked on Interface, created for Kafka also checked teh measurements on Databases
  • Created a Bash Scripting with Awk formatted text to send metrics to InfluxDB.
  • Enabled influxDB and Configured Influx database source into Grafana interface
  • Succeeded in deploying of ElasticSearch 5.3.0, Influx DB 1.2 on teh Prod machine in a Docker container.
  • Created a Cron Job those will execute a program dat will start teh ingestion process. Teh Data is read in, converted to Avro, and written to teh HDFS files
  • Designed Data Flow Ingestion Chart Process.
  • Set up a new Grafana Dashboard with real-time consumer lags in Dev, PP Cluster, pulling only consumer lags metric and sending them to influx DB (Via a script in Corntab)
  • Worked on Database Schema DDL-oracle Schema issues Confidential time of Ambari upgrade
  • Successfully Upgraded HDP 2.5 to 2.6 in all environment Software patches and upgrades.
  • Worked on Kafka Backup Index, Log4j appender minimized logs and Pointed ambari server logs to NAS Storage.
  • Tested all services like Hadoop, ZK, Spark, Hive SERVER & Hive MetaStore.
  • Worked on SNMP Trap Issues in Production Cluster.
  • Worked on heap optimization and changed some of teh configurations for hardware optimization.
  • Involved working in Production Ambari Views.
  • Implemented Rack Awareness in Production Environment.
  • Worked on Disk space issues in Production Environment by monitoring how fast dat space is filled, review wat is being logged created a long-term fix for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
  • Worked on Nagios Monitoring tool.
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
  • Involved with Hortonwork Support team on Grafana consumer Lags Issues. (Currently no consumer lags are generating in Grafana Visualization within HDP)
  • Successfully Generated consumer group lags from kafka using their API
  • Installed and configured Ambari Log Search under teh hood it will required a SolR Instance, dat can collect and index all cluster generated logs in real time and display them in one interface.
  • Installed Ansible 2.3.0 in Production Environment
  • Worked on maintenance of Elasticsearch cluster by adding more partitioned disks. This will increase disk writing throughput and enable Elasticsearch to write to multiple disk in same time and a segment of given Shard is written to teh same disk
  • Upgraded Elasticsearch from 5.3.0 to 5.3.2 following teh rolling upgrade process and using ansible to deploy new packages in Prod Cluster.
  • Successfully Made some visualization on Kibana
  • Also deployed Kibana with ansible and connected to Elasticsearch Cluster. Tested Kibana and ELK by creating a test index and injected sample data into it.
  • Successfully test Kakfa ACL’s with anonymous users and with different hostnames.
  • Created HBase tables to store variable data formats of data coming from different applications.
  • Worked on Production Support Issues

Environment: Kakfa Brokers, Kafka Security, Kerberos, ACL, ElasticSearch, Kibana, Ambari Log Search, Nagios, Kafka Manger, Grafana, YARN, Spark, Rangers.

Confidential - Richmond, VA

BigData Engineer -Hadoop Administrator

Responsibilities:

  • Responsible for implementation and support of teh Enterprise Hadoop environment.
  • Installation and configuration, Hadoop Cluster and Maintenance, Cluster Monitoring, Troubleshooting and certifying environments for production readiness.
  • Experience in Implementing Hadoop Cluster Capacity Planning
  • Involved in teh installation of CDH5 and up-gradation from CDH4 to CDH5
  • Cloudera Manager Up gradation from 5.3. to 5.5 version
  • Responsible on-boarding new users to teh Hadoop cluster (adding user a home directory and providing access to teh datasets).
  • Helped teh users in production deployments throughout teh process.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Continuous monitoring and managing teh Hadoop cluster through Ganglia and Nagios.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Involved in Strom Batch-mode processing over massive data sets which is analogous to a Hadoop job dat runs as a batch process over a fixed data set.
  • Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest customer behavioral data into HDFS for analysis.
  • Involved Storm terminology created a topology dat runs continuously over a stream of incoming data.
  • IntegratedHadoopwith Active Directory and enabledKerberosfor Autantication.
  • Upgraded teh Cloudera Hadoop ecosystems in teh cluster using Cloudera distribution packages.
  • Done stress and performance testing, benchmark for teh cluster.
  • Commissioned and decommissioned teh Data Nodes in teh cluster in case of teh problems.
  • Debug and solve teh major issues with Cloudera manager by interacting with teh Cloudera team.
  • Monitoring teh System activity, Performance, Resource utilization.
  • Deep understanding of monitoring and troubleshooting mission critical Linux machines.
  • Kafka- Used for building real-time data pipelines between clusters.
  • Ran Log aggregations, website Activity tracking and commit log for distributed system using Apache Kafka
  • Design and Implemented Amazon Web Services As a passionate advocate of AWS within Grace note, migrated from a physical data center environment to
  • Focused on high-availability, fault tolerance, and auto-scaling.
  • Managed critical bundles and patches on teh production servers after successfully navigating through teh testing phase in teh test environments.
  • Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
  • Integrated Apache Kafka for data ingestion
  • Configured Domain Name System (DNS) for hostname to IP resolution.
  • Involved in data migration from Oracle database to MongoDB.
  • Queried and analyzed data fromCassandrafor quick searching, sorting and grouping
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Preparation of operational testing scripts for Log check, Backup and recovery and Failover.
  • Troubleshooting and fixing teh issues Confidential User level, System level and Network level by using various tools and utilities.
  • Performed all Systemadministrationtasks like cron jobs, installing packages, and patches.

Environment: YUM, RAID, MYSQL 5.1.4, PHP, SHELL SCRIPT, MYSQL, WORKBENCH, LINUX 5.0, 5.1, YUM, RAID,Flume, Oozie, Pig, Sqoop, Mongo, Hbase, Hive, Map-Reduce, YARN, and Cloudera Manager, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Hive, Zookeeper and HBase, Windows 2000/2003 Unix Linux Java, RMAN, Golden Gate, Redhat, Cassandra, EM Cloud Control, ApacheHadoop, Toad

Confidential - Atlanta, GA

BigData Operations Engineer - Consultant

Responsibilities:

  • ClusterAdministration, releases and upgrades Managed multipleHadoopclusters with teh highest capacity of 7 PB (400+ nodes) with PAM EnabledWorked on Hortonworks Distribution.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure.
  • Using Hadoop cluster as a staging environment for teh data from heterogeneous sources in data import process
  • Configured High Availability on teh name node for teh Hadoop cluster - part of teh disaster recovery roadmap.
  • Configured Ganglia and Nagios to monitor teh cluster and on-call with EOC for support.
  • Involved working on Cloud architecture.
  • Performed both Major and Minor upgrades to teh existing cluster and also rolling back to teh previous version.
  • Implemented Commissioning and Decommissioning of data nodes, killing teh unresponsive task tracker and dealing with blacklisted task trackers.
  • Implemented Fair scheduler on teh job tracker to allocate teh fair amount of resources to small jobs.
  • Maintained, audited and built new clusters for testing purposes using teh AMBARI, HORTONWORKS.
  • Designed and allocated HDFS quotas for multiple groups.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log Data from Many different sources to teh HDFS.
  • Upgraded from HDP 2.2 to HDP 2.3 Manually in Software patches and upgrades.
  • Scripting Hadoop package installation and configuration to support fully automated deployments.
  • Configuring Rack Awareness on HDP.
  • Adding new Nodes to an existing cluster, recovering from a Name Node failure.
  • Instrumental in building scalable distributed data solutions using Hadoop eco-system.
  • Adding new Data Nodes when needed and re-balancing teh cluster.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted teh data from MySQL into HDFS using Sqoop.
  • Involved working in Database backup and recovery, Database connectivity and security.
  • Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.
  • Utilization based on teh running statistics of Map and Reduce tasks.
  • Changes to teh configuration properties of teh cluster based on volume of teh data being processed and performance of teh cluster.
  • Inputs to development regarding teh efficient utilization of resources like memory and CPU utilization.

Environment: Map Reduce, SmartSence, KNOX, MYSQL plus, HDFS, Knox, Ranger Pig Hive HBase Flume Sqoop, Yarn Flume, Kafka.

Confidential - Atlanta, GA

Hadoop Admin/ Linux Administrator

Responsibilities:

  • Installation and configuration of Linux for new build environment.
  • Day-to- day - user access, permissions, Installing and Maintaining Linux Servers.
  • Created volume groups logical volumes and partitions on teh Linux servers and mounted file systems and created partitions
  • Experienced in Installation and configuration Cloudera CDH4in testing environment.
  • Resolved tickets submitted by users, P1 issues, troubleshoot teh errors, resolving teh errors.
  • Balancing HDFS manually to decrease network utilization and increase job performance.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Done major and minor upgrades to teh Hadoop cluster.
  • Upgraded teh Cloudera Hadoop ecosystems in teh cluster using Cloudera distribution packages.
  • Use of Sqoop to Import and export data from HDFS to RDMS vice-versa.
  • Done stress and performance testing, benchmark for teh cluster.
  • Commissioned and decommissioned teh Data Nodes in teh cluster in case of teh problems.
  • Debug and solve teh major issues with Cloudera manager by interacting with teh Cloudera team.
  • Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
  • Monitoring teh System activity, Performance, Resource utilization.
  • Develop and optimize physical design of MySQL database systems.
  • Deep understanding of monitoring and troubleshooting mission critical Linux machines.
  • Responsible for maintenance Raid-Groups, LUN Assignments as per agreed design documents.
  • Extensive use of LVM, creating Volume Groups, Logical volumes.
  • Performed Red Hat Package Manager (RPM) and YUM package installations, patch and other server management.
  • Tested and Performed enterprise wide installation, configuration and support for hadoop using MapR Distribution.
  • Setting up cluster and installing all teh ecosystem components through MapR and manually through command linein Lab Cluster
  • Set up automated processes to archive/clean teh unwanted data on teh cluster, in particular on Name node and Secondary name node.
  • Involved in estimation and setting-up Hadoop Cluster in Linux.
  • Prepared PIG scripts to validate Time Series Rollup Algorithm.
  • Responsible for support, troubleshooting of Map Reduce Jobs, Pig Jobs and maintaining Incremental Loads Confidential daily, weekly and monthly basis.
  • Implemented Oozie workflows for Map Reduce, Hive and Sqoop actions.
  • Channelized Map Reduce outputs based on requirement using Practitioners
  • Performed scheduled backup and necessary restoration.
  • Build and maintain scalable data using theHadoop ecosystem and other open source components like Hive and HBase.
  • Monitor teh data streaming between web sources and HDFS.
  • Close monitoring and analysis of teh Map Reduce job executions on cluster Confidential task level.

Environment: YUM, RAID, MYSQL 5.1.4, PHP, SHELL SCRIPT, MYSQL, WORKBENCH, LINUX 5.0, 5.1, YUM, RAID,Flume, Oozie, Pig, Sqoop, Mongo, Hbase, Hive, Map-Reduce, YARN, and Cloudera Manager, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Oracle Enterprise Manager (OEM), Hive, Zookeeper and HBase, Windows 2000/2003 Unix Linux Java, RMAN, Golden Gate, Redhat, Cassandra, EM Cloud Control, ApacheHadoop, Toad

Confidential

Systems Engineer - DBA

Responsibilities:

  • Worked on SQL Server 2005, 2008r2 Database Administration.
  • Managed teh migration of SQL Server 2005 databases to SQL Server 2008
  • Created stored procedures, functions and triggers for retrieval and update of data in teh database.
  • Extensive experience in MS SQL Server 2008/2005/2000 , BI tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), and SQL Server Analysis Services (SSAS).
  • Migrated DTS packages into SSIS packages.
  • Manage backup, recovery and DR practices for teh environment.
  • Created data transformation tasks like BCP, BULK INSERT to import/export data from client.
  • Involved in Source Data Analysis, analysis and designing mappings for data extraction also responsible for Design and Development of SSIS Packages to load teh Data from various Databases and Files.
  • Experience in implementing Key Performance Indicator (KPI) objects in SSAS and creating calculate members in MOLAP cube using MDX in SSAS.
  • Developed and optimized database structures, stored procedures, Dynamic Management views, DDL triggers and user-defined functions.
  • Experience in implementing and maintaining new T-SQL features added in SQL Server 2005 dat are Data partitioning, Error handling through TRY-CATCH statement, Common Table Expression (CTE).
  • Used Data Partitioning, Snapshot Isolation in SQL Server 2005.
  • Expertise and Interest include Administration, Database Design, Performance Analysis, and Production Support for Large (VLDB) and Complex Databases up to 2.5 Terabytes.
  • Managing teh clustering environment.
  • Worked on file system management and monitoring and Capacity planning in Big Data
  • Moving teh large data sets from HDFS to MYSQL database and vice-versa using SQOOP.
  • Implemented NFS, NAS, FTP and HTTP servers on Linux servers.
  • Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
  • Created a local YUM repository for installing and updating packages.
  • Migrating teh data from one cluster to other cluster by using DISTCP, and automated teh dumping procedure using shell scripts.
  • Involved designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration.
  • Involved in various NOSQL databases likeHBase, Cassandrain implementing and integration.
  • Implemented automatic failover zookeeper and zookeeper failover controller.
  • Configured Oozie for workflow automation and coordination.
  • To analyze long running slow queries and tune teh same to optimize application and system

Environment: ETL Tool ASP.NET, VB.NET, EMC Storage Area. MS SQL Server 2008/2005/2000 , SSRS,SSAS,SSIS,T-SQL, Windows 2003/2000 Advanced Server, Unix, Visual Studio 2010, C#.Net 2005.

We'd love your feedback!