Sr. Hadoop/kafka Administrator Resume
East Lansing, MI
SUMMARY
- 10+ years of professional IT experience which includes over 5+ years of solid experience in Hadoop Administration in from requirement gathering, deploying, maintaining, monitoring and upgrading HadoopClusters using Hortonworks (HDP), Cloudera (CDH) Distributions and Confluent kafka
- Experience in administering clusters wif both Ambari and Cloudera Manager
- Hands on experience using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, Spark 1/2, Kafka, HBase, Zookeeper, Oozie, Hive, Sqoop.
- Good Experience in Multi Clustered environment and setting up Hadoop Distribution using underlying technologies like MR framework, PIG, Oozie, HBase, HIVE, Sqoop, Spark, Kafka & related Java APIs for capturing, storing, managing, integrating & analyzing of data.
- Experience in understanding the security requirements for Hadoop and integrating wif the Kerberos autantication infrastructure - KDC server setup wif both AD and MIT as Key Distribution Center (KDC).
- Strong noledge in configuring HA for HDFS, Resource Manager, HS2(Hiveserver2),HBase, Impala for avoiding SPOF and load balancing purposes.
- Experience in NoSQL databases such as HBase, Cassandra.
- Experience in designing and implementation of secure Hadoop cluster using MIT and AD Kerberos, Apache Sentry, Knox and Ranger.
- Maintaining the availability of the cluster by troubleshooting the cluster issues and monitoring the cluster based on the alerts.
- Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
- Having Good Knowledge in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS using EC2 and VPC.
- Experienced in Integrated BI and Analytical tools like Tableau, and Business Objects wif Hadoop Cluster.
- Experience in building the clusters from scratch, major/minor upgrades, support tiering, data migration, security, POCs etc.
- Experience in configuring AWS EC2, S3, VPC, RDS, CloudWatch, Cloud Formation, IAM, and SNS.
- Experience in administration of Kafka and Flume streaming using Cloudera Distribution
- Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
- Strong noledge in configuring High Availability for Name Node, Kafka, Hbase, Oozie, HiveServer2 and Resource Manager
- Experience in performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- In-depth understanding/noledge of Hadoop Architecture and its ecosystem components.
- Experience in developing Shell Scripts for system management.
- Excellent in communicating wif clients, customers, managers, and other teams in the enterprise at all levels.
- Over Seven (7+) years of administration experience in DataStage ETL product in IBM InfoSphere Information Server IIS / InfoSphere Foundation Tools IFT 8.1/9.1/11.3/11.5/11.7 Suites and WebSphere Application Server WAS 6.0.2.11/6.0.2.27 Base/ 7.0.0.11/8.0/8.5/9 Network Deployment Versions.
- Nine (9+) years of development experience in DataStage 8.x/9.x/11.x of IBM InfoSphere Information Server IIS suite on Windows/Unix/Linux Platforms
- Understanding of project on boarding process and taking care of the new project on boarding requests.
- TEMPEffective problem-solving skills and outstanding interpersonal skills. Ability to work independently as well as wifin a team environment. Driven to meet deadlines.
TECHNICAL SKILLS
Operating Systems: RHEL 7/8.x, CentOS, Windows 7/10
Big Data: HDFS, YARN, Spark,Kafka,Airflow,Zookeeper, Hive, Impala, Hive LLAP, Sqoop, Oozie, Spark, HBase
Hadoop Distributions: HDP 2.x/3.x, CDP 7.x,CDH 5.x/6.X/7.x,AWS EMR 5.x/6.x MAPR 5.x/6.x
Database: MYSQL &NoSQL, SQL Server, Oracle, HBASE(NoSQL),Cassandra
Scripting Languages: Unix, Bash, Python
Monitoring/Performance Mgmt. Tools: Splunk, Ambari, Cloudera Manager, Pepperdata, Dr. Elephant,ELK
ETL/BI Tools: Data stage 9.x/11.x, Talend 6.x/7.x,Cognos8/10.x, Tableau 8.x
Methodologies: Agile, Waterfall Model
Cloud Technologies: Aws, Azure
PROFESSIONAL EXPERIENCE
Confidential, East Lansing, MI
Sr. Hadoop/kafka Administrator
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced in Installing, Configuring, Monitoring, Maintaining and Troubleshooting Hadoop/Kafka clusters.
- Created Apache Kafka cluster on Linux using Confluent Kafka in different environments. To make it secure, plugin the security configuration wif SSL encryption, SASL Autantication and ACLs.
- Kafka replaces the traditional pub-sub model wif ease, fault-tolerant, high thorughtput and low latency.
- Installed multi-node multi broker clusters and encrypted wif SSL/TLS, autanticate wif SASL/PLAINTEXT, SASL/SCRAM and SASL/GSSAPI (Kerberos).
- Performed high-level, day-to-day operational maintenance, support, and upgrades for the Kafka Cluster.
- Monitored the Confluent Kafka Eco System and addressed the alerts or issues. And having experience wif Confluent Kafka CLI commands and Control Center
- Experienced in administrating Kafka brokers, Zookeeper, Topics, connectors, KSQL, Zookeeper
- Researched and codified the Kafka Consumer using KafkaConsumer API 0.10 and KafkaProducer API
- The project plan is to build and setup big data environment and support operations, TEMPeffectively manage and monitor the Hadoop cluster through Cloudera Manager.
- Worked on installing and configuring of CDH 6.x/CDP7.X Hadoop Cluster on AWS using Cloudera Manager
- Managed 60+ Nodes CDH 6.x cluster wif 1 petabyte of data using Cloudera Manager and Linux RedHat 7.4.
- Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster in Cloudera.
- Installed Kafka manager for consumer lags and for monitoring Kafka metrics also this TEMPhas been used for adding topics, Partitions etc.
- Responsible for creating, modifying and deleting topics (Kafka Queues) as and when required by theBusiness team
- Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
- Installed and configured RHEL7 EC2 instances for Production, QA and Development environment.
- Installed MIT Kerberos for autantication of application and Hadoop service users.
- Installing, configuring and administering Jenkins CI tool on AWS EC2 instances.
- Configured Nagios to monitor EC2 Linux instances wif Ansible automation.
- Used Cronjob to backup Hadoop Service databases to S3 buckets.
- Supported technical team in management and review of Hadoop logs.
- Implementing Hadoop security solutions Kerberos for securing Hadoop clusters.
- Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this TEMPhas been used for adding topics, Partitions etc.
- Creating queues on YARN queue manager to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Used NIFI to pull the data from different source and to push the data to HBASE and HIVE
- Wrote Lambda functions in python for AZURE Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
- Worked wif developer teams on Nifi workflow to pick up the data from rest API server, from Data Lake as well as from SFTP server and send dat to Kafka broker.
- Installed Kerberos secured Kafka cluster wif no encryption in all environments.
- Worked wif Kafka for the proof of concept for carrying out log processing on a distributed system.
- Manually upgrading and MRV1 installation wif Cloudera manager. coordinated Kafka operations and monitoring(via JMX) wif DevOps personnel
- Done Proof of Concept in Apache Nifi workflow in place of Oozie to automate the tasks of loading.
- Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Scheduled jobs using OOZIE workflow.
- Worked in the cluster disaster recovery plan for the Hadoop cluster by implementing the cluster data backup in Amazon S3 buckets.
- Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
- Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
- Created Cluster utilization reports for capacity planning and tuning resource allocation for YARN Jobs.
- Implemented high availability for Cloudera production clusters.
- Configured Apache Sentry for fine-grained authorization and role-based access control of data in Hadoop.
- Monitoring performance and tuning configuration of services in Hadoop Cluster.
- Worked on resolving production issues and documenting root cause analysis and updating the tickets using ITSM.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Creation of Users, Groups and mount points for NFS support.
- Involved in creating Hive DB, tables and load flat files.
- Configured Apache Phoenix on top HBase to query data through SQL
Environment: Cloudera CDP7.X,Cloudera CDH 5.x/6.x/,(Hive,Yarn,Spark,Kafka,Hbase,Hdfs,Oozie),Cloudera Manager 5.x/6.x,Ambari 2.6 and HDP 2.6(Nifi),OpenShift 4.x,Kuberntes 1.15,Confluent Enterprise 5.1(Confluent Center,KsqlDB,Kafka Connect, bitbucket, GIT, Ansible, Nifi, AWS, EC2, S3, Python, Elastic Search(ELK 7.x), Flume RHEL7 EC2, Sqoop, Teradata, Apache Splunk, SQL. IBM InfoSphere Information Server 9.1.2/11.5.0.2/11.7 Suite DataStage,QS,IA,IGC, RHEL 7.3, Automic/UC4 12.2, AWS S3 and AWS EMR Oracle 12c/11g
Confidential, Lansing, MI
Sr. Hadoop /Kafka Administrator
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced in Installing, Configuring, Monitoring, Maintaining and Troubleshooting Hadoop clusters.
- Experience in setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
- Extensively involved in Cluster Capacity planning, Hardware planning, and Performance Tuning of the HadoopCluster.
- Experience managing clusters wif size between 30 and 60
- Extensively worked wif Hortonworks Distribution of Hadoop HDP 2.5/2.6 and 3.0
- Performed minor upgrades(2.5 to 2.6) and major HDP upgrades(2.6 to 3.0)
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Hands on experience working wif HDFS, MapReduce, Hive, Pig, Sqoop, Impala, Hadoop HA, Yarn, Hue
- Monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures
- Deployed high availability on the Hadoop cluster quorum journal nodes.
- Monitoring and controlling local file system disk space usage, log files, cleaning log files wif automated scripts. Integrated different tools like SAS, Tableau, wif Hadoop, this way users can pull data from HDFS hive.
- Loaded the dataset into Hive for ETL Operation
- Implemented Capacity Scheduler to share the resources of the cluster for the MapReduce jobs given by the users.
- Enabled Kerberos autantication for the cluster and responsible for generating keytab files for both the user accounts and service accounts.
- Worked on Integration of Hiveserver2 wif BI tools.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
- Used Sqoop to import and export data from RDBMS to HDFS and vice-versa.
- Enabled Ranger for user authorization to cluster and developed a script dat creates Ranger policies when a new user or service account is created.
- Supporting Hadoop developers and assisting in optimization of HQL and Spark.
- Experience in troubleshooting errors inHbase Shell/API, Hive and MapReduce.
- Experienced in creating shell scripts dat gets the jobs info running every hour and puts into hive tables to generate the cluster usage reports.
- Involved in creating and managing SOLR collections using curl commands.
- Implementation of Hue for HTTPS, HueSSLClient configurations.
- Worked on user on boarding to setup newHadoop users. This includes setting HUE users & adding to appropriate groups in AD to get data access.
- Automated backing up data on regular basis to a DR cluster using distcp scripts.
- Work wif network and Linux system engineers/admin to define optimum network configurations, server hardware and operating system.
- Evaluate and propose new tools and technologies to meet the needs of the organization
- 24x7 support
Environment: Cloudera, Flume, Kafka, HDP 2.x/3.x, Pig, Oozie, Hive, Sqoop, Impala, Kerberos, UNIX Shell Scripts, Python, Zoo Keeper, SQL, Map Reduce.
Confidential, Irving, TX
Sr. DataStage Admin
Responsibilities:
- Platform setup and Maintenance (Access, Capacity, Administration & Performance Monitoring) of Unix and Linux servers prior to builds of InfoSphere Information Server 8.7 and 9.1 suites using multi-tier topology.
- Installed, configured and customized the behavior of the out of box capabilities of IBM InfoSphere Information Server software.
- Installed IIS client tier components on each user Desktop PCs to utilize thick clients and web-based applications.
- Coordinated, Planed and installed software & hardware upgrades as needed to support currency of IBM InoSphere products while mentoring others.
- Responsible for monitoring the InfoSphere environment (and related software) for hardware, software, memory, storage, user, and performance metrics dat are critical to normal production operations and make sure the system continues to function in a consistent and expected manner.
- Version control (Subversion source repository, including branching, merging, promoting, access security for multiple projects and integration wif DevOps tools)
- Coordinated/Planned Execution activities related to platform maintenance, Disaster Recovery and Software release management
- L3+ escalation for platform issue resolution, troubleshooting and root cause analysis.
- Configured, managed, and monitored DataStage Engine
- DataStage project configuration, Engine’s development and runtime environments, and Engine’s data source connectivity
- Run and monitor DataStage jobs through the command line and GUI and usage of Engine utilities.
- Strong analytical and problem-solving skills, the ability to visualize complex data relationships and to understand the relationship to business processes and systems where data is managed.
- Migrated all ETL projects and jobs into IIS 9.1 suite and documented the steps.
- 24x7 data warehousing and governance support and resolution of errors related to all InfoSphere suite products.
Environment: IBM InfoSphere Information Server/Foundation Tools 8.7/9.1 Suite DataStage, DB2 9.7/10.1 ESE, WAS 7.0.0.17ND, SOA, Windows 7, Unix Sun Solaris 10, AIX 6.3, Linux RHEL AP 5 & Windows 2008 Server R2,Teradata
Confidential, MN
Sr. DataStage Admin
Responsibilities:
- Involved in analysis based on requirements and developed Cluster & Grid Architecture for data integration.
- Selected Installation Topology wif layers for the products of Information Server.
- Upgraded and configured all the tools from IIS Information Server 8.1 base version to Fix Pack 1->Fix Pack 1a->Fix Pack 2 and tan to 8.5 on different tiers.
- Installed, Configured and Administered DB2 Metadata Repository Database and WebSphere Application Server components for IIS suites.
- Prepared disk, file and network resources. Modified kernel parameters and user limits.
- Installed, Configured and Administered IBM IS 8.1 suite.
- Worked wif IBM to troubleshoot data integration suite issues. Resolved incompatible JRE errors during installation.
- Added, deleted and moved projects and jobs. Purged job log files and traced server activity.
- Issued DataStage engine commands from Administrator. Started and stopped DataStage engine and managed logs.
- Managed DataStage services like DSRPC, Telnet and Engine Resource Services.
- Configured NLS for WebSphere DataStage in dsenv file.
- Migrated DataStage jobs from earlier versions (7.5.2) to DataStage 8.1 and 8.5 versions.
- Created design documents and proto type jobs based on the business requirements and high level design (HLD).
- Used Lookup, Join, Funnel, Merge, Lookup, Transformer etc processing stages to design server, parallel and sequence jobs.
- Worked on setting up users for IIS suite and suite components for different products.
- Used IIS Web Console for administration areas: security, licensing, logging, and scheduling.
- Created different type of reports like Administrative reports, IA Column Analysis reports, Metadata reports and Fast Track Mapping specification reports from IIS console.
- Used Control-M to schedule jobs and e-mailed the status of jobs to operations team daily.
- Worked on Oracle 10gR2 and DataStage Server performance tuning approaches.
- Worked on performance tuning of ETL jobs via SQL queries, data connections, configuration files, job parameters and environment variables.
- Assisted Data quality specialists use Information Analyzer (IA) scan samples and full volumes of data to determine their quality and structure ranging from individual fields to high-level data entities.
- Resolved the issues during creation of data quality rules to assess and monitor heterogeneous data sources for trends, patterns, and exception conditions.
- Used Control-M to schedule jobs and e-mailed the status of jobs to operations team daily.
- Provided production support and resolved tickets in IBM ESR tool.
- Performed Knowledge Transfer KT to team members and lead the whole project in all the phases.
Environment: InfoSphere Information Server V 8.1.0.2/8.5.0.2 Suite DataStage, Oracle 11g, SQL Server 2005, WAS 6.0.2.27/7.0.0.11 , Shell Scripts, Windows XP, IBM AIX 6.1, HP-UX 11i and Red Hat Linux Advanced Platform 5.2
Confidential, MN
DataStage Admin
Responsibilities:
- Participated in meetings wif Business users and Subject Matter Experts (SMEs) to understand the business requirements.
- Installed, Configured and Administered DataStage, QualityStage, ProfileStage, IS Manager and Metadata Server of InfoSphere Information Server 8.1 suite in Dev, UAT and Prod Phases.
- Installed, Configured and Administered Oracle 10gR2 Metadata Repository Database and WebSphere Application Server 6.0.2.27.
- Unlocked jobs. Set up Configuration Files, Job Parameters, Environment Variables and Data Connections across jobs in a project.
- Prepared disk, file, network resources and packages. Modified kernel parameters and user limits. Resolved incompatible JRE errors during upgradation.
- Involved in DataStage security - assigned IIS suite and suite component roles in IIS Web Console.
- DataStage disk usage - projects, system files, datasets & scratch.
- Installed and Configured fix pack 1 on IIS 8.1 and resolved errors.
- Involved in migration of DataStage and QualityStage projects and jobs from earlier versions to 8.1.1 version.
- Worked wif Administrator Client to manage projects and set their properties.
- Worked wif Director Client to schedule, run and view the logs of jobs.
- Worked wif Designer Client to design server, parallel and sequence jobs using stages.
- Designed technical Specification and Mapping Docs for DataStage jobs.
- Worked wif stages like Data Set, File Set, Sequential File, Change Apply, Change Capture, Aggregator, Sort, Funnel, remove duplicates, Merge, Join, Transformer and Look Up for performing lookup functions on the source and lookup datasets.
- Used Development/Debugging stages like Row generator, Column Generator, Head, Tail and Peek.
- Extensively used the various Partitioning Methods like Hash, Entire, Auto, Same, Round Robin to increase the job performance
- Concerned wif Unit testing, System testing to check data reliability
- Used Quality stages like Investigation, Standardization, Match Frequency, Unduplicate Match and Data Survive.
- Used IIS Console for Information Analyzer and Information Services Director tasks.
- Worked wif Information Analyzer and QualityStages to understand and cleanse metadata to build Data Model.
- Performance tuning of DataStage jobs, IIS suite products and WAS.
- Determined Distinguished Names DN and Configured IBM IS tools to use LDAP user registry wif Active Directory.
- Stopped and Started Logging Agent, ASBAgent, DataStage Services & WAS and managed logs.
- Participated in application and infrastructure engineering review and security audit.
- Developed Common Error Management and Audit Management process at Project level.
- Enabled IIS and WAS Security and configured internal and external user registries.
- Created PMRs wif IBM for interim fixes, refresh packs, patches and fix packs.
- Backup and Restore of IIS 8.1 Suite.
- Used Control-M for auto scheduling of DataStage jobs.
- Used ISTOOL to migrate job objects between different environments in IS Manager.
Environment: IBM InfoSphere Information Server V8.1 Suite DataStage, QualityStage, Information Analyzer, IS Manager & Metadata Server, Oracle 10gR2, DB2 9.5, Teradata V2R7, SQL Server 2005, WAS 8.x, Java, J2EE, SQL, PL/SQL, Control-M, Shell Scripts, Windows XP, Unix (AIX 6.1/Solaris 10) and Linux,XML/XSD