We provide IT Staff Augmentation Services!

Hadoop Admin Resume

4.00/5 (Submit Your Rating)

San Antonio, TX

SUMMARY:

  • ETL/Hadoop Admin with eight (8) years of total IT experience in software development, design and implementation with a major focus on Big Data Hadoop and ecosystem technologies, ETL - Informatica/data warehousing and Business Intelligence Apps in travel/telecommunications and retail industries
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Spark, Kafka, Oozie, Hive, Sqoop, Pig, Zoo keeper and Flume
  • Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Yarn, Map Reduce, Resource Manager, Node Manager, Application Master, Name Node and Data Node.
  • Well versed with installation, configuration, supporting and managing the Cloudera-CDH platform (CDH5.X) with clusters
  • Monitored cluster resources and configured alerts using Cloudera manager for the cluster
  • Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce
  • Experience setting up separate clusters for KAFKA to handle high volumes of streaming data
  • Experience setting up 3 node NiFi clusters and created NiFi flows to move the data between applications
  • Expertise in importing and exporting data using Sqoop between HDFS and RDBMS, creating and scheduling Sqoop jobs
  • Experience in data analysis using HIVE, Pig Latin and Hbase data using phoenix and Squirrel
  • Experience in Amazon Web services(AWS) such as IAM, S3, EMR, EC2 and Route53
  • AWS VPC design and Implementation and Deploy and configure EC2, Elastic Bean Stalk and RDS instances.
  • Cloud formation scripts, AMI, Consolidated Billing for multiple AWS accounts
  • Understanding of AWS Storage Services such as S3, Glacier, Elastic File System (EFS), AWS Storage Gateway (SGW), and Snowball
  • Knowledge in Perl and shell scripting for administration, maintenance and troubleshooting
  • Experience in data migration, data integration and data conversion
  • Expertise setting up load strategy, dynamically passed the parameters to mappings and workflow in Informatica
  • Successfully installed and upgraded Informatica from lower versions to higher versions, 8.x to 9.x (8.5.1 to 9.0.1 and 8.6.1 HF11 to 9.5.1 HF2 to 9.6.1)
  • Involved in code migration across environments and in deployment efforts
  • Substantial work experience with Informatica. Expertise in reusability, parameterization, workflow design, designing and developing the ETL mappings and scripts
  • Implemented LDAP authentication for Informatica users
  • Experience working with reporting tools Cognos and Qlickview

SKILL:

Big Data: Cloudera CDH, Apache Hadoop, Horton Works

Big Data Ecosystem: HDFS, Map Reduce, Spark, Kafka, Sqoop, Flume, Zookeeper, Oozie, HivePig, Impala, Solr, R, Rstudio, Rshiny

NoSQL: Hbase, Cassandra, DynamoDB

Databases: Oracle, Greenplum, MS SQL, Teradata, Redshift

ETL Tools: Informatica 7.1, 8.X, 9.X and 10.X

Reporting Tools: Cognos, Qlikview, OBIEE

Operating Systems: Linux, Cent OS, Unix, AIX, Windows family

Dev Tools: Eclipse, ItelliJ

Programming SQL, T: SQL, PL/SQL, Scala, Java, Python

Languages: Scheduling Tools Opswise, Control, M, Tidal, DAC, Splunk, TOAD, Phoenix, Squirrel, FilezillaSuper Putty, Yum, Git, Jenkins

Other Tools: Ansible, JIRA, Stash, AWS, S3, EC2, Docker Container, Azure

EXPERIENCE:

Hadoop Admin

Confidential, San Antonio, TX

Responsibilities:

  • Responsible for the build out, day-to-day management, and support of Big Data clusters based on Hadoop and other technologies, on-premises and in cloud. Responsible for cluster availability.
  • Installed Cloudera distribution Hadoop (CDH 5.x) on Azure GS4 Virtual Machines
  • Involved in designing, capacity arrangement/planning, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration
  • Used Cloudera manager to configure Yarn, Hive, Spark, Isilon, Hbase, Impala, oozie and hue
  • Installing and configuring Cloudera Navigator, Data Science Work Bench, Key Trustee Server and SSL/TSL for CM Services.
  • Configured and enabled LDAP and Active Directory KDC Authentication.
  • Installed and configure Sentry for CM. Enabled security by creating various roles and assigning roles to groups to control access to hive databases and HDFS locations.
  • Installed and configure Hue and Impala and enabled Sentry for Impala.
  • Added and removed additional nodes and created host templates
  • Implemented separate Kafka cluster on Cloudera and monitored Kafka topics, stats via Kafka Manager and other kafka command line tools.
  • Installed and configured Rstudio and Rshiny and enabled LDAP and Kerberos for Login
  • Installed various libraries and driver packages to connect different data sources such as Teradata, Hive, Spark, Impala.
  • Crated Sqoop scripts to load the data from Teradata to Hive, scheduled incremental jobs in Oozie.
  • Imported data into Hive using Sqoop from EDW, created partitions on hive tables and explored various forms of storing the data such as Parquet, CSV and JSON
  • Setting up new Hadoop users. This job includes setting up Linux users, groups setting up Kerberos principals and Sentry testing their provided access.

Environment: Cloudera, Hadoop, Azure, R, Rstudio, Spark, Kafka, Jenkins, Splunk, Atlassian tools, Jenkins, Docker container, Git, Linux

Hadoop Admin

Confidential, Atlanta, GA

Responsibilities:

  • Installed Cloudera distribution Hadoop (CDH 5.x) on AWS EC2 instances
  • Convert and deployed physical and VMWare virtual systems into EC2 AMI instances.
  • Involved in designing, capacity arrangement/planning, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration
  • Used Cloudera manager to configure Yarn, Hive, Spark, Isilon, Hbase, Impala, oozie and hue
  • Added and removed additional nodes and created host templates
  • Implemented separate Kafka cluster to handle the large volumes of streaming data and monitored Kafka topics, stats via confluent control center and creating alerts to monitor kafka brokers.
  • Created a 3 node Percona MySQL cluster, for HA of Cloudera metadata
  • Used HA proxy as load balance among MySQL nodes
  • Developed data flows using processors in NiFi, loading the Kafka streaming data into Hbase
  • Used Phoenix, SQuirrel for data analysis on Hbase
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managed and reviewed the data backups and Hadoop log files
  • Worked with release management technologies such as Jenkins, github, gitlab and Ansible
  • Used source code management tools like GIT and Subversion: familiar with concepts like Branches, Merges and Tags
  • Monitored services and hosts with intelligent service health checks and metrics and using CM
  • With the help of Oozie, automatically deployed Hadoop jobs
  • Debugged, troubleshot the failed jobs and identified the correct solutions for the eco system
  • Configured Mirror Maker to replicate the data between two Kafka clusters
  • Configured Hbase to replicate the data between two data centers (clusters), by adding the peers
  • Created Hive, Hbase tables and worked with Apache Phoenix to retrieve the data
  • Imported data into Hive using Sqoop from EDW, created partitions on hive tables and explored various forms of storing the data such as Parquet, CSV and JSON
  • Experience with different issue tracking tools like Jira, Stash, Confluence Atlassian tools
  • Worked in Devops model, Continuous Integration and Continuous Deployment (CICD), automated deployments using Jenkins and Ansible worked closely with infrastructure, network, database, business intelligence and application teams to ensure business applications are highly available and perform within agreed on service levels.
  • Working with data delivery teams to setup new Hadoop users. Includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for new users.
  • Designed logging framework for AWS cloud watch for all deployed apps.
  • Responsible for maintaining ACL, Security Groups and firewall configurations for AWS instances. understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks.

Environment: Cloudera, Hadoop, AWS, Spark, Confluent Kafka, NiFi, AWS, Jenkins, Ansible, Splunk, Atlassian tools, Jenkins, Docker container, Git, Linux

ETL/Hadoop Admin

Confidential, Alpharetta, GA

Responsibilities:

  • Collected the logs data from the Web servers and integrated into HDFS using Flume
  • Responsible for commissioning and decommissioning the data nodes, cluster monitoring, trouble shooting and capacity planning
  • Installed Oozie workflow engine to run multiple Hive jobs
  • Worked with Kafka for the proof of concept for carrying out log processing on distributed systems
  • Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis
  • Performed File System Checks (FSCK) from time to time to check any over replicated blocks, under replicated blocks, corrupt blocks and missing replicas
  • Automated installations using Puppet
  • Created Hive tables to store the processed results in a tabular format
  • Installed and configured Zookeeper
  • Moved relational database data using Sqoop into Hive Dynamic Partition Tables using Staging Tables
  • Involved in collecting metrics for Hadoop clusters using Cloudera Manager
  • Configured Sqoop and exported/imported the data into HDFS
  • Managed and scheduled jobs on a Hadoop cluster using Oozie
  • Successfully upgraded Informatica from 8.6.1 to 9.5.1 in all environments
  • Maintained user security in all environments
  • Worked to ensure the regular measurement of availability and when necessary the undertaking of measures for its improvement providing 98% system availability
  • Involved in code deployment across environments
  • Managed and maintained the existing Informatica interfaces: Tidal, Powercenter and OBDC connections etc.
  • Participated in defining Informatica standards and best practices
  • Provided user support, answered questions, investigated the user issues and enforced standards

Environment: Informatica 8.6.1, 9.5.1(HF4) on Linux, Oracle, AWS, Qlikview, Tidal, Solr

Informatica Admin/Developer

Confidential, Chicago, IL

Responsibilities:

  • Responsible for requirement gathering analysis and end user meetings
  • Facilitated architecture and development with high volume transaction systems
  • Responsible for performance tuning at the Source level, Target level, Mapping level and Session level
  • Solid expertise in using both connected and un-connected Lookup Transformations
  • Worked with various lookup caches like Static, Dynamic and Persistent Cache
  • Responsible for best practices like naming conventions, performance tuning and error handling
  • Involved in defining the overall strategies for design and standards by creating checklists for successful deployment
  • Developed Slowly Changing Dimension mappings for Type 1 SCD and Type 2 SCD
  • Facilitated data integration, data conversion and migrated the data from source to target
  • Usage of reusability in Informatica, Parameterization, workflow design and mapping design
  • Performance tuned Informatica mappings
  • Involved in estimation, design and architectural discussions with the architecture team for data warehouse
  • Worked on code reviews for maintaining the code standards across all environments
  • Involved in design review/approvals before on-boarding projects to the shared platforms along with the day to day activities
  • Established the platform architecture and implemented user security measures by creating the appropriate roles and groups
  • Worked on upgrade, enhancement and migration of environments in Informatica
  • Ensured that services were up and running in all environments
  • Created and maintained connections (relational and application etc.) and the system usernames/passwords
  • Worked with the Informatica product team to identify bugs and raise feature requests. Worked with product support on new versions and hotfixes

Environment: Informatica 9.1, 9.5.0 on UNIX, Greenplum, Oracle, Cognos 10.1, Qlikview, Opswise

We'd love your feedback!