Hadoop Admin Resume San Antonio, TX - Hire IT People

SUMMARY:

ETL/Hadoop Admin with eight (8) years of total IT experience in software development, design and implementation with a major focus on Big Data Hadoop and ecosystem technologies, ETL - Informatica/data warehousing and Business Intelligence Apps in travel/telecommunications and retail industries
Hands on experience in installing, configuring and using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Spark, Kafka, Oozie, Hive, Sqoop, Pig, Zoo keeper and Flume
Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Yarn, Map Reduce, Resource Manager, Node Manager, Application Master, Name Node and Data Node.
Well versed with installation, configuration, supporting and managing the Cloudera-CDH platform (CDH5.X) with clusters
Monitored cluster resources and configured alerts using Cloudera manager for the cluster
Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce
Experience setting up separate clusters for KAFKA to handle high volumes of streaming data
Experience setting up 3 node NiFi clusters and created NiFi flows to move the data between applications
Expertise in importing and exporting data using Sqoop between HDFS and RDBMS, creating and scheduling Sqoop jobs
Experience in data analysis using HIVE, Pig Latin and Hbase data using phoenix and Squirrel
Experience in Amazon Web services(AWS) such as IAM, S3, EMR, EC2 and Route53
AWS VPC design and Implementation and Deploy and configure EC2, Elastic Bean Stalk and RDS instances.
Cloud formation scripts, AMI, Consolidated Billing for multiple AWS accounts
Understanding of AWS Storage Services such as S3, Glacier, Elastic File System (EFS), AWS Storage Gateway (SGW), and Snowball
Knowledge in Perl and shell scripting for administration, maintenance and troubleshooting
Experience in data migration, data integration and data conversion
Expertise setting up load strategy, dynamically passed the parameters to mappings and workflow in Informatica
Successfully installed and upgraded Informatica from lower versions to higher versions, 8.x to 9.x (8.5.1 to 9.0.1 and 8.6.1 HF11 to 9.5.1 HF2 to 9.6.1)
Involved in code migration across environments and in deployment efforts
Substantial work experience with Informatica. Expertise in reusability, parameterization, workflow design, designing and developing the ETL mappings and scripts
Implemented LDAP authentication for Informatica users
Experience working with reporting tools Cognos and Qlickview

SKILL:

Big Data: Cloudera CDH, Apache Hadoop, Horton Works

Big Data Ecosystem: HDFS, Map Reduce, Spark, Kafka, Sqoop, Flume, Zookeeper, Oozie, HivePig, Impala, Solr, R, Rstudio, Rshiny

NoSQL: Hbase, Cassandra, DynamoDB

Databases: Oracle, Greenplum, MS SQL, Teradata, Redshift

ETL Tools: Informatica 7.1, 8.X, 9.X and 10.X

Reporting Tools: Cognos, Qlikview, OBIEE

Operating Systems: Linux, Cent OS, Unix, AIX, Windows family

Dev Tools: Eclipse, ItelliJ

Programming SQL, T: SQL, PL/SQL, Scala, Java, Python

Languages: Scheduling Tools Opswise, Control, M, Tidal, DAC, Splunk, TOAD, Phoenix, Squirrel, FilezillaSuper Putty, Yum, Git, Jenkins

Other Tools: Ansible, JIRA, Stash, AWS, S3, EC2, Docker Container, Azure

EXPERIENCE:

Hadoop Admin

Confidential, San Antonio, TX

Responsibilities:

Responsible for the build out, day-to-day management, and support of Big Data clusters based on Hadoop and other technologies, on-premises and in cloud. Responsible for cluster availability.
Installed Cloudera distribution Hadoop (CDH 5.x) on Azure GS4 Virtual Machines
Involved in designing, capacity arrangement/planning, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration
Used Cloudera manager to configure Yarn, Hive, Spark, Isilon, Hbase, Impala, oozie and hue
Installing and configuring Cloudera Navigator, Data Science Work Bench, Key Trustee Server and SSL/TSL for CM Services.
Configured and enabled LDAP and Active Directory KDC Authentication.
Installed and configure Sentry for CM. Enabled security by creating various roles and assigning roles to groups to control access to hive databases and HDFS locations.
Installed and configure Hue and Impala and enabled Sentry for Impala.
Added and removed additional nodes and created host templates
Implemented separate Kafka cluster on Cloudera and monitored Kafka topics, stats via Kafka Manager and other kafka command line tools.
Installed and configured Rstudio and Rshiny and enabled LDAP and Kerberos for Login
Installed various libraries and driver packages to connect different data sources such as Teradata, Hive, Spark, Impala.
Crated Sqoop scripts to load the data from Teradata to Hive, scheduled incremental jobs in Oozie.
Imported data into Hive using Sqoop from EDW, created partitions on hive tables and explored various forms of storing the data such as Parquet, CSV and JSON
Setting up new Hadoop users. This job includes setting up Linux users, groups setting up Kerberos principals and Sentry testing their provided access.

Environment: Cloudera, Hadoop, Azure, R, Rstudio, Spark, Kafka, Jenkins, Splunk, Atlassian tools, Jenkins, Docker container, Git, Linux

Hadoop Admin

Confidential, Atlanta, GA

Responsibilities:

Installed Cloudera distribution Hadoop (CDH 5.x) on AWS EC2 instances
Convert and deployed physical and VMWare virtual systems into EC2 AMI instances.
Involved in designing, capacity arrangement/planning, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration
Used Cloudera manager to configure Yarn, Hive, Spark, Isilon, Hbase, Impala, oozie and hue
Added and removed additional nodes and created host templates
Implemented separate Kafka cluster to handle the large volumes of streaming data and monitored Kafka topics, stats via confluent control center and creating alerts to monitor kafka brokers.
Created a 3 node Percona MySQL cluster, for HA of Cloudera metadata
Used HA proxy as load balance among MySQL nodes
Developed data flows using processors in NiFi, loading the Kafka streaming data into Hbase
Used Phoenix, SQuirrel for data analysis on Hbase
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managed and reviewed the data backups and Hadoop log files
Worked with release management technologies such as Jenkins, github, gitlab and Ansible
Used source code management tools like GIT and Subversion: familiar with concepts like Branches, Merges and Tags
Monitored services and hosts with intelligent service health checks and metrics and using CM
With the help of Oozie, automatically deployed Hadoop jobs
Debugged, troubleshot the failed jobs and identified the correct solutions for the eco system
Configured Mirror Maker to replicate the data between two Kafka clusters
Configured Hbase to replicate the data between two data centers (clusters), by adding the peers
Created Hive, Hbase tables and worked with Apache Phoenix to retrieve the data
Imported data into Hive using Sqoop from EDW, created partitions on hive tables and explored various forms of storing the data such as Parquet, CSV and JSON
Experience with different issue tracking tools like Jira, Stash, Confluence Atlassian tools
Worked in Devops model, Continuous Integration and Continuous Deployment (CICD), automated deployments using Jenkins and Ansible worked closely with infrastructure, network, database, business intelligence and application teams to ensure business applications are highly available and perform within agreed on service levels.
Working with data delivery teams to setup new Hadoop users. Includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for new users.
Designed logging framework for AWS cloud watch for all deployed apps.
Responsible for maintaining ACL, Security Groups and firewall configurations for AWS instances. understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks.

Environment: Cloudera, Hadoop, AWS, Spark, Confluent Kafka, NiFi, AWS, Jenkins, Ansible, Splunk, Atlassian tools, Jenkins, Docker container, Git, Linux

ETL/Hadoop Admin

Confidential, Alpharetta, GA

Responsibilities:

Collected the logs data from the Web servers and integrated into HDFS using Flume
Responsible for commissioning and decommissioning the data nodes, cluster monitoring, trouble shooting and capacity planning
Installed Oozie workflow engine to run multiple Hive jobs
Worked with Kafka for the proof of concept for carrying out log processing on distributed systems
Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis
Performed File System Checks (FSCK) from time to time to check any over replicated blocks, under replicated blocks, corrupt blocks and missing replicas
Automated installations using Puppet
Created Hive tables to store the processed results in a tabular format
Installed and configured Zookeeper
Moved relational database data using Sqoop into Hive Dynamic Partition Tables using Staging Tables
Involved in collecting metrics for Hadoop clusters using Cloudera Manager
Configured Sqoop and exported/imported the data into HDFS
Managed and scheduled jobs on a Hadoop cluster using Oozie
Successfully upgraded Informatica from 8.6.1 to 9.5.1 in all environments
Maintained user security in all environments
Worked to ensure the regular measurement of availability and when necessary the undertaking of measures for its improvement providing 98% system availability
Involved in code deployment across environments
Managed and maintained the existing Informatica interfaces: Tidal, Powercenter and OBDC connections etc.
Participated in defining Informatica standards and best practices
Provided user support, answered questions, investigated the user issues and enforced standards

Environment: Informatica 8.6.1, 9.5.1(HF4) on Linux, Oracle, AWS, Qlikview, Tidal, Solr

Informatica Admin/Developer

Confidential, Chicago, IL

Responsibilities:

Responsible for requirement gathering analysis and end user meetings
Facilitated architecture and development with high volume transaction systems
Responsible for performance tuning at the Source level, Target level, Mapping level and Session level
Solid expertise in using both connected and un-connected Lookup Transformations
Worked with various lookup caches like Static, Dynamic and Persistent Cache
Responsible for best practices like naming conventions, performance tuning and error handling
Involved in defining the overall strategies for design and standards by creating checklists for successful deployment
Developed Slowly Changing Dimension mappings for Type 1 SCD and Type 2 SCD
Facilitated data integration, data conversion and migrated the data from source to target
Usage of reusability in Informatica, Parameterization, workflow design and mapping design
Performance tuned Informatica mappings
Involved in estimation, design and architectural discussions with the architecture team for data warehouse
Worked on code reviews for maintaining the code standards across all environments
Involved in design review/approvals before on-boarding projects to the shared platforms along with the day to day activities
Established the platform architecture and implemented user security measures by creating the appropriate roles and groups
Worked on upgrade, enhancement and migration of environments in Informatica
Ensured that services were up and running in all environments
Created and maintained connections (relational and application etc.) and the system usernames/passwords
Worked with the Informatica product team to identify bugs and raise feature requests. Worked with product support on new versions and hotfixes

Environment: Informatica 9.1, 9.5.0 on UNIX, Greenplum, Oracle, Cognos 10.1, Qlikview, Opswise

We provide IT Staff Augmentation Services!

Hadoop Admin Resume

San Antonio, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship