Hadoop Kafka Admin Resume
2.00/5 (Submit Your Rating)
Atlanta, GA
SUMMARY
- 10+ years of expertise in Hadoop, Big Data Analytics and Linux including architecture, design, installation, configuration and management of Apache Hadoop Clusters, Mapr, and Hortonworks& Cloudera Hadoop Distribution.
- Experience in configuring, installing and managing MapR, Hortonworks & Cloudera Distributions.
- Hands on experience in installing, configuring, monitoring and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, Oozie, Apache Spark, Impala.
- Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Experience in Designing and developing mappings from various Informatica transformation logics like, Source Qualifier, Expression, Unconnected and Connected lookups, Router, Filter, Aggregator, Union, Joiner, sorter, Normalizer, Sequence generator, Rank, SQL and Update Strategy.
- Working knowledge of monitoring tools and frameworks such as Splunk, Influx DB, Prometheus, SysDig, Data Dog, App - Dynamics, New Relic, and Nagios.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Standardize Splunk forwarder deployment, configuration and maintenance across a variety of Linux platforms. Also worked on Devops tools like Puppet and GIT.
- Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
- Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks, Cloudera and Map Reduce.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Experience in Ranger, Knox configuration to provide the security for Hadoop services (hive, base, hdfs etc.).Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
- Developed automated scripts using Unix Shell for performing RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
- Experienced with deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure.
- Excellent knowledge of NOSQL databases like HBase, Cassandra.
- Experience in large scale Hadoop cluster, handling all Hadoop environment builds, including design, cluster setup, performance tuning .
- Involved in the release process from development to Informatica production.
- Experience in hbase replication and maprdb replication setup between two clusters
- Release process implementation like Devops and Continuous Delivery methodologies to existing Build and Deployments.Experience with scripting languages python, Perl or shell script also.
- Modified reports and Talen ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Deployed Grafana Dashboards for monitoring cluster nodes using Graphite as a Data Source and collect as a metric sender.
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in workflow scheduling and monitoring tool Rundeck and Control-M.
- Proficiency with the application servers like Web Sphere, WebLogic, JBOSS and Tomcat.
- Working experience on designing and implementing complete end to end Hadoop Infrastructure.
- Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data.
- Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Cassandra, MongoDB & Elastic Search.
PROFESSIONAL EXPERIENCE
Hadoop Kafka Admin
Confidential - Atlanta, GA
Responsibilities:
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks & Cloudera Hadoop Distribution.
- Developed data pipeline using Spark, Hive, Pig, python, Impala and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive Sqoop.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Experienced in developing Spark scripts for data analysis in python.
- Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Working to implement MapR stream to facilitate realtime data ingestion to meet business needs
- Built on-premise data pipelines using Kafka and Spark for real time data analysis.
- Performed streaming data ingestion using Kafka to the spark distribution environment.
- Implemented Hive complex UDF’s to execute business logic with Hive Queries
- Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
- Responsible for installing, configuring, supporting and managing of Cloudera Hadoop Clusters.
- Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's
- Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
- Experience in Designing and developing mappings from various Informatica transformation logics like, Source Qualifier, Expression, Unconnected and Connected lookups, Router, Filter, Aggregator, Union, Joiner, sorter, Normalizer, Sequence generator, Rank, SQL and Update Strategy.
- Built a prototype for real time analysis using Spark streaming and Kafka.
- Experienced in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters with MapR 5.1 on a cluster of 200+ nodes in different environments such as Development, Test and Production (Operational & Analytics) environments.
- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
- Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated with Sitescope for monitoring and Alerting.
- Converting Map Reduce programs into Spark transformations using Spark RDD's and Scala.
- Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
- Importing and exporting data into HDFS and Hive using Sqoop
- Worked extensively on building Nifi data pipelines in docker container environment in development phase.
- Implemented Kerberos security in all environments. Defined file system layout and data set permissions.
- Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System), developed multiple Map Reduce jobs in java for data cleaning.
- Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Documented EDL (Enterprise Data Lake) best practices and standards includes Data Management
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File
- Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS).
- Experience in innovative, and where possible, automated approaches for system administration tasks.
- Experience on Ambari (Hortonworks) for management of Hadoop Ecosystem.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Release process implementation like Devops and Continuous Delivery methodologies to existing Build and Deployments.Experience with scripting languages python, Perl or shell script also.
- Designing, developing, and ongoing support of a data warehouse environments.
- Involved in the release process from development to Informatica production.
- Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
- Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data inHadoop and NoSQL with data in Oracle Database
- Experience in Designing and developing mappings from various Informatica transformation logics like, Source Qualifier, Expression, Unconnected and Connected lookups, Router, Filter, Aggregator, Union, Joiner, sorter, Normalizer, Sequence generator, Rank, SQL and Update Strategy.
- Worked with Different Relational Database systems like Oracle/PL/SQL. Used Unix Shell scripting, Python and Experience working on AWS EMR Instances.
- Developed applications, which access the database with JDBC to execute queries, prepared statements, and procedures.
- Worked with Devops team to Clusterize NIFI Pipeline on EC2 nodes integrated with Spark, Kafka, Postgres running on other instances using SSL handshakes in QA and Production Environments.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java and Nifi for data cleaning and preprocessing.
- Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
- Performed data blending of Cloudera Impala and TeraData ODBC data source in Tableau.
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored EQM team for creating Hive queries to test use cases.
- Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
- Created MapR DB tables and involved in loading data into those tables.
- Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels, Interceptors defined channel selectors to multiplex data into different sinks and log4j properties
- Used Nifiprocessor to process and deploy end to end data processing pipelines and scheduling the work flows.
- Worked on setting up Apache NiFi and performing POC with NiFi in orchestrating a data pipeline.
- Extensively worked on the ETL mappings, analysis and documentation of OLAP reports
- Responsible for implementation and ongoing administration of MapR 4.0.1 infrastructure.
- Maintaining the Operations, installations, configuration of 150+ node cluster with MapR distribution.
- Monitoring the health of the cluster and setting up alert scripts for memory usage on the edge nodes.
- Experience on Linux systems administration on production and development servers (Red Hat Linux, Cent OS and other UNIX utilities). Worked on NoSQL database like HBase and created hive tables on top.
Hadoop Admin
Confidential - Flint, MI
Responsibilities:
- Installed and configured Hadoop on YARN and other ecosystem components.
- Configured and used HCatalog to access the table data maintained in the Hive metastore and use the same table information for processing in Pig .
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files
- Worked with the Data Science team to gather requirements for various data mining projects.
- Here I have installed 5 Hadoop clusters for different teams, we have developed a Data lake which serves as a Base layer to store and do analytics for Developers, we provide services to developers, install their custom software’s, upgrade Hadoop components, solve their issues, and help them troubleshooting their long running jobs, we are L3 and L4 support for the Data lake, and I also manage clusters for other teams.
- Building automation frameworks for data ingestion, processing in Python, and Scala with NoSQL and SQL databases and Chef, Puppet, Kibana, Elastic Search, Tableau, GoCd, RedHat infrastructure for data ingestion, processing, and storage.
- I’m a mix of DevOps and Hadoop admin here, and work on L3 issues and installing new components as the requirements comes and did as much automation and implemented CI /CD Model.
- Involved in implementing security on Cloudera Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
- Responsible for upgrading Cloudera CDH5 and MapReduce 2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
- Hadoop security setup using MIT Kerberos, AD integration (LDAP) and Sentry authorization.
- Migrated services from a managed hosting environment to AWS including: service design, network layout, data migration, automation, monitoring, deployments and cutover, documentation, overall plan, cost analysis, and timeline.
- Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built designing cloud-hosted solutions, specific AWS product suite experience.
- Configured Zookeeper to implement node coordination, in clustering support.
- Load log data into HDFS using Flume, Kafka and performing ETL integrations.
- Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to MaprFS.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed a Major upgrade in production environment from CDH4 to CDH5.
- As an admin followed standard Back up policies to make sure the high availability of cluster.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera. Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs.
- Used Trifacta for data cleansing and data wrangling.
- Wrote MapReduce job using Java API for data Analysis.
- Developed Python, Shell/Perl Scripts and Power shell for automation purpose.
- Implementing a Continuous Delivery framework using Jenkins, Puppet, and Maven & Nexus in Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry.
- Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- I have used Service now and JIRA to track issues, Mostly Managing and reviewing Log files as a part of administration for troubleshooting purposes, meeting the SLA’s on time.
Hadoop Admin
Confidential - Heathrow, FL
Responsibilities:
- Worked on setting up Hadoop cluster for the Production Environment.
- Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Installed, configured and deployed a 50 node MapR Hadoop Cluster for Development and Production
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
- Configured, installed, monitored MapR Hadoop on 10 AWS ec2 instances and configured MapR on Amazon EMR making AWS S3 as default file system for the cluster
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Used Informatica Power Center to create mappings, mapplets, User defined functions, workflows, worklets, sessions and tasks.
- Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
- Used Informatica Data Explorer (IDE) to find hidden data problems.
- Utilized Informatica Data Explorer (IDE) to analyze legacy data for data profiling.
- Development of Informatica mappings and workflows using Informatica 7.1.1.
- Worked on Identifying and eliminating duplicates in datasets thorough IDQ 8.6.1 components.
- Optimized the full text search function by connecting MongoDB and Elastic Search.
- Utilized AWS framework for content storage and Elastic Search for document search.
- Developed a framework for the automation testing on the Elastic Search index Validation. Java, MySQL.
- Created User defined types to store specialized data structures in Cloudera.
- Wrote a technical paper and created slideshow outlining the project and showing how Cloudera can be potentially used to improve performance.
- Setting up monitoring tools for Hadoop monitoring and alerting. Monitoring and maintaining Hadoop cluster Hadoop/HBase/zookeeper.
- Write scripts to automate application deployments and configurations. Hadoop cluster performance tuning and monitoring. Troubleshoot and resolve Hadoop cluster related system problems.
- As a admin followed standard Back up policies to make sure the high availability of cluster.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
- Screen Hadoop cluster job performances and capacity planning.
- Monitored Hadoop cluster connectivity and security and also involved in management and monitoring Hadoop log files.
- Assembled Puppet Master, Agent and Database servers on Red Hat Enterprise Linux Platforms.
Hadoop Admin
Confidential - Bloomington, IL
Responsibilities:
- Worked on analyzing, writing Hadoop MapReduce jobs using API, Pig and Hive.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved in installing Hadoop Ecosystem components.
- Responsible to manage data coming from different sources.
- Supported MapReduce Programs those are running on the cluster.
- Wrote MapReduce job using Java API for data Analysis and dim fact generations.
- Installed and configured Pig and also written Pig Latin scripts.
- Wrote Map Reduce job using Pig Latin.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Developed Java Map Reduce programs on mainframe data to transform into structured way.
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
- Developed optimal strategies for distributing the mainframe data over the cluster. Importing and exporting the stored mainframe data into HDFS and Hive.
- Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
- Implemented HBase API to store the data into HBase table from hive tables.
- Writing Hive queries for joining multiple tables based on business requirement.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
- Conducted POC for Hadoop and Spark as part of NextGen platform implementation.
- Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Hadoop Admin
Confidential
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed and Configured Sqoop to import and export the data into MapR-FS, HBase and Hive from Relational databases.
- Administering large MapR Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment.
- Installed and Configured MapR-zookeeper, MapR-cldb, MapP-jobtracker, MapR-tasktracker, MapRresourcemanager, MapR-node manager, MapR-fileserver, and MapR-webserver.
- Installed and configured Knox gateway to secure HIVE through ODBC, WebHcat and Oozie services.
- Load data from relational databases into MapR-FS filesystem and HBase using Sqoop and setting up MapR metrics with NoSQL database to log metrics data.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level and optimized Hadoop clusters components to achieve high performance.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data into HDFS for analysis.
- Integrated HDP clusters with Active Directory and enabled Kerberos for Authentication.
- Worked on commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning and installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Worked on creating the Data Model for HBase from the current Oracle Data model.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Leveraged Chef to manage and maintain builds in various environments and planned for hardware and software installation on production cluster and communicated with multiple teams to get it done.
- Monitoring the Hadoop cluster functioning through MCS and worked on NoSQL databases including HBase.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs and worked with Linux server admin team in administering the server hardware and operating system.
- Worked closely with data analysts to construct creative solutions for their analysis tasks and managed and reviewed Hadoop and HBase log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports and worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Automated workflows using shell scripts pull data from various databases into Hadoop.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.