Hadoop/aws Cloud Engineer Resume
Charlotte, NC
SUMMARY:
- Highly motivated IT professional with 7 years of industry experience working in large environments. 7 years of experience includes around a year working in AWS Cloud Environment and around 3 years of experience in Big Data environment supporting big data engineers, developers and scientists. Seeking challenging projects utilizing my skills and expertise.
- Strong knowledge in working with CloudFormation Templates, EC2 instance types, Inbound/Outbound security groups, S3 buckets, IAM policies, CloudWatch, Lambda, Route53, subnets, VPC, GIT, Bitbucket and Bamboo to spin up EMR clusters with necessary requirements.
- Experience with developing Lambda functions to create route53 records dynamically when the ec2 instances gets spinned up.
- Good knowledge with Hue implementation and importing LDAP users and groups into Hue.
- Migrated onprem data to S3 and perform ETL operations using sqoop, FTP, hive and spark.
- Involved in ranger hive and hdfs plugin installations.
- Actively worked on enabling ssl for Hadoop services in EMR.
- Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific ec2 instance types.
- Good understanding of pyspark, hive on spark for performance tuning and spark memory management.
- Knowledge on ganglia for monitoring servers.
- Good understanding of hive authorizations and presto integration with hive on EMR platform.
- Knowledge on Data Analysis and experience in working with Data Analytics teams.
Hadoop Engineer
- Installed and configured complete Hadoop ecosystem (components such as HDFS, Hive, Impala, Spark, Oozie, HBase, flume and zookeeper).
- Deep Expertise in managing the Hadoop infrastructure with Cloudera Manager.
- Implemented the Security of Hadoop Cluster with Kerberos, Active Directory/LDAP, and TLS/SSL utilizations.
- Developed and scheduled the ETL workflows in Hadoop using Oozie.
- Experience in commissioning and decommissioning and performing major and minor cluster upgrades to latest release and installing hadoop patches.
- Tuning, Configuring and Troubleshooting Linux based operating systems like RedHat and Centos and virtualization in a large set of servers.
- Experience in troubleshooting and managing Various Network related tasks such as TCP/IP, NFS, DNS, DHCP and SMTP.
- Proficient in OS upgrades and Patch loading as and when required.
- Coordinated and Worked with teams for integration of various data sources like Oracle, DB2, SQL server.
- Good knowledge on Tableau and integration with Hadoop.
- Experience in setting up Disaster Recovery.
- Involved in customer interactions, business user meetings, vendor calls and technical team discussions to take right choices in terms of design and implementations and to provide best practices for the organization.
- Provided support to different business units using Tableau on impala connectivity drop issues.
TECHNICAL SKILLS:
Bigdata: Apache Hadoop, Cloudera, Hive, Hue, Mapreduce, Zookeeper, Oozie, Sqoop, Flume, AWS, EMR, S3, IAM, Lambda functions, VPC, ELB, VPN, RDS and CloudFormation
Languages: Shell scripting, Python, HiveQL, pyspark, JSON
Operating Systems: RedHat Linux, CentOS
Database: s
IBM DB2, Oracle, SQL server, MYSQL
Networking and Protocols: Tcp/IP, HTTP, FTP, SNMP, LDAP, DNS
Application Servers: Apache HTTP webserver, Websphere, weblogic
Tools: Ansible, Ganglia
PROFESSIONAL EXPERIENCE:
Hadoop/AWS Cloud Engineer
Confidential, Charlotte, NC
Responsibilities:
- Performed the CM/CDH upgrades, 5.7 to 5.9 and 5.9.0 to 5.10.1 and worked to resolve different issues through the upgrades.
- Worked through and supported RedHat Linux OS level upgrade to 6.8, Oracle Database switchover for testing DR, and Oracle Database upgrade to 12c.
- Developed and automated the Benchmark tests for different services like Hive, Impala, Oozie actions, and Spark before and after performing upgrade.
- Analyze pyspark code for avro to orc conversion and worked through performance tuning of the job depending on the input data and set driver/executor memory, executors etc…
- Written shell scripts and successfully migrated data from on Prem to AWS EMR (S3) and helped application teams with data copy from on - prem to AWS cloud, by spinning up an EMR cluster and then sync/distcp to S3.
- Updated cloudformation templates to use PasswordVault to retrieve public/private ssh keys and updated AWS role arn for S3 by defining specific policies or permissions that the EMR cluster should have access to buckets in S3.
- Modified security groups, subnet ID, EC2 Instance Types, Ports, and AWS Tags. Worked on BitBucket, Git and Bamboo to deploy EMR clusters.
- Supported end-to-end CloudFormation template development and platform setup for different business units in elevating a Data Transmission Project to AWS, where they want to have the files from vendors that are landing into on-prem HDFS to be parallely sent to s3.
- Configured and implemented enabling the SSL for Hadoop Web UIs (HDFS, YARN, JobHistory, Spark, Tez and Hue) in AWS.
- Generate certificates in both pem and jks formats as required for different services and created truststores to establish mutual handshake between different services as part of SSL implementation.
- Dynamically created DNS for core nodes in Auto Scaling group in AWS using the lambda functions and Route53.
- Tuning for the effective performance of Hadoop eco system as well as monitoring for performance drop.
- Assisted development teams in Loading data from various data sources like DB2, Oracle, SQL server into Hadoop HDFS/Hive Tables using Sqoop or FTP for flat files from different vendors.
- Provide on-prem support for users by Troubleshooting the issues at hadoop service level and Job level.
- Supported a POC project to stream data using Flume and send to Kafka sink and perform transformations using spark from kafka topics and store the result data into HBase.
- Work with application teams to tune the cluster level settings, while they onboard to cloud for testing their jobs by estimating the memory requirement, type or number of core nodes depending upon the applications load and help with troubleshooting their jobs when they fail due to lack of proper tuning.
- Automated the process of Ranger Hive and HDFS plugins to be installed when a new cluster spins up in AWS.
- Installed MySQL on an RDS Instance and Externalized the HUE and Hive Metastore databases.
- Onboard new users to Hadoop and perform a manual sync of new users added in LDAP to Hue and grant necessary permissions to different objects in HUE.
- Worked through and supported IDVault Environment setup on on-prem Hadoop servers.
- Worked with Unix teams to have a staging/landing server required for business teams to land the data before they can push to AWS.
- AWS Cluster configuration management using Ansible YAML scripts.
- Involved in creating Hive tables in Avro and Parquet formats and loading and analyzing data using hive queries.
- Good knowledge on CI/CD pipelines for automating application deployment with Git, Jenkins, Nexus, Ansible.
- Used Python (Boto3 API) scripts to manage AWS resources.
- Provided and supported Tableau Integration and tableau user issues.
Environment: Hadoop, MapReduce, Hive, HDFS, Sqoop, Oozie, Cloudera, AWS, EMR, S3, IAM, Lambda, Route53, RDS, CloudFormation, Flume, HBase, ZooKeeper, CDH5, Oracle, MySQL, NoSQL and Unix/Linux.
Hadoop Admin
Confidential, Foster City, CA
Responsibilities:
- Maintaining, Monitoring and configuring Hadoop Cluster using Cloudera Manager (CDH5) distribution.
- CDH Upgradation from 5.3 to 5.5 version.
- Benchmarking the Hadoop Cluster after Upgradation.
- Involved in the process of linux kernel upgrade in the cluster.
- Balancing HDFS manually to decrease network utilization and increase job performance.
- Rebalancing the cluster, as much of the production jobs are failing due to space issue on the datanodes.
- Developed shell script to check the Space utilization of local file system directories on Gateway servers and Master Nodes to run for every six hours.
- Developed shell script for adhoc job purpose to check the yarn applications load on the cluster using yarn backend commands.
- Developed an Audit shell script to check the data in Hive tables are distcp correctly between production and analytical cluster.
- Failed Production jobs debugging by looking at the log files and providing a resolution.
- Checking Namenode, Resource manager and corresponding Services logs when they are down and jobs are failing.
- Documented Solr admin dashboard for logging failed jobs, to view collections present, and to do some basic operations like read/write/modify on collections.
- Performed some hive queries to extract the data from Hive tables based on partition values for data analysis.
- Configured periodic incremental imports of data into HDFS using Sqoop.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
- Providing Inputs to Development team regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
- Commissioning and Decommissioning the Data nodes from cluster in case of issues like disk failure.
- Adding New users to groups, Changing the Directory space quota of a user, cleaning the tmp folder if it is used up are some of day-to-day tasks.
- Setting up user authentication against Kerberos by generating Kerberos principals and keytab files.
- Discussions with other technical teams on regular basis regarding upgrades, Process changes, any special processing and feedback
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Solr, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH5, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux
Hadoop Admin
Confidential, Atlanta, GA
Responsibilities:
- Planning, installing, configuring, maintaining, and monitoring Hadoop Cluster using Ambari, Hortonworks distribution of Hadoop.
- Hands-on experience on major components in Hadoop Ecosystem including HDFS, Yarn, Hive, Impala, Flume, Zookeeper, Oozie and other ecosystem Products.
- Experience in Upgrades and Patches and Installation of Ecosystem Products through Ambari.
- Capacity planning, hardware recommendations, performance tuning and benchmarking.
- Cluster balancing and performance tuning of Hadoop components like HDFS, Hive, Impala, MapReduce, Oozie work flows.
- Taking Backups of meta-data & databases before upgrading BDA cluster and deploying patches.
- Implemented NameNode backup using NFS. This was done for High availability.
- Adding and Decommissioning Hadoop Cluster nodes Including Balancing HDFS block data.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Configuring Kerberos and AD/LDAP for Hadoop cluster
- Worked with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Implemented Kerberos security across the cluster
- Working with data delivery teams to setup new Hadoop users, includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce /YARN access for the new users.
- Experience in Setting up Data Ingestion tools like Flume, Sqoop, SFTP.
- Install and Set up HBASE and Impala
- Setting up Quotas on HDFS, implementing Rack Topology Scripts configuring Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Handle the data exchange between HDFS & Web Applications and databases using Flume and Sqoop.
- Used Hive and created Hive tables and involved in data loading.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Set up automated processes to archive/clean the unwanted data on the cluster, in particular on Name node and Secondary name node.
- Supported technical team members in management and review of Hadoop log files and data backups.
- Participated in development and execution of system and disaster recovery processes.
- Monitoring Hadoop Cluster through Ambari and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics.
- Performance Tuning, Client/Server Connectivity and Database Consistency Checks using different Utilities.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Hortonworks, Flume, HBase, ZooKeeper, LDAP, NoSQL, DB2 and Unix/Linux.
Linux Admin
Confidential
Responsibilities:
- Design, build, install and configure Red Hat Enterprise Linux servers (RHEL5, RHEL6) on bare metal servers of HP ProLiant DL380.
- Installation and configuration of RHEL, CENTOS OS in servers in virtual VM machines (VMware) and proxmox virtual machine.
- Configuring Fortigate firewalls and monitoring firewall logs (to enhance security, Network load balance, Application and user monitoring).
- User, Group creation monitoring and maintain log for system status/system health using Linux commands and Nagios, Top, Genome system monitor.
- Installed and Configured Redhat Linux Kickstart and booting from SAN/NAS.
- Performed SAN Migration on RHEL servers.
- Experience in using protocols like NFS, SSH, SFTP & DNS.
- Discussing with the vendors for new hardware requirements.
- Doing capacity Assessment for new requests of servers i.e. calculating CPU and Memory for new servers according to the current/future Applications running on the system.
- System Performance, and Tuning
- Performed package installations, maintenance, periodic updates and patch management.
- Creating the Linux file system.
- Installation/Administration of TCP/IP, NFS, DNS, NTP, Auto mounts, Send mail and Print servers as per the client's requirement.
- Experience in troubleshooting samba related issues.
- Performed disk administration using LVM, Linux Volume Manager (LVM), Veritas Volume Manager 4.x/5.x.
- Performance monitoring on Linux servers using iostat, netstat, vmstat, sar, top & prstat.
- Installed VMWare ESX4.1 to perform virtualization of RHEL servers.
- Installed and configured DHCP, DNS, NFS.
- Configured iptables on Linux servers.
- Performed Package administration on Linux using rpm, yum and Satellite server.
- Automation of various administrative tasks on multiple servers using Puppet.
- Deployed Puppet, Puppet Dashboard, and Puppet DB for configuration management to existing infrastructure.
- Proficient in installation, configuration and maintenance of applications like Apache, LDAP, PHP
- Resolved config issues and problems related to OS, NFS mounts, LDAP user ids DNS and issues.
- Worked on VMware, VMware View, vSphere 4.0. Dealt with ESX, ESXi servers.
- Enhanced and simplified vCenter server 4.0.
- Performed installing, configuring and trouble-Shooting web servers like IBM HTTP Web Server, Apache Web Server, Websphere Application Servers, and Samba Server on Linux (Redhat & CentOS).
- Managing systems routine backup, scheduling jobs like disabling and enabling cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning, testing.
- Create and update technical documentation for team members.
Environment: Red-Hat Linux Enterprise servers (HP Proliant DL 585, BL ... ML Series, SAN (Netapp), VMware Virtual Client 3.5, VMware Infrastructure 3.5, Bash, CentOS, LVM, Windows 2003 server, NetBackup, Veritas Volume Manager, Samba, NFS.
Linux Admin
Confidential
Responsibilities:
- Administration of RHEL, which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Creating, cloning Linux Virtual Machines.
- Installing RedHat Linux using kick start and applying security polices for hardening the server based on the company policies.
- RPM and YUM package installations, patch and other server management.
- Managing systems routine backup, scheduling jobs like disabling and enabling cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning, testing.
- Tech and non-tech refresh of Linux servers, which includes new hardware, OS, upgrade, application installation, testing.
- Set up user and group login ID's, printing parameters, network configuration, password, resolving permissions issues, and user and group quota.
- Creating physical volumes, volume groups, logical volumes.
- Gathering requirements from customers and business partners and design, implement and provide solutions in building the environment.
- Installing and configuring Apache and supporting them on Linux production servers.
- Troubleshooting Linux network, security related issues, capturing packets using tools such as IPtables, firewall, TCP wrappers, NMAP.
Environment: Red-Hat Linux Enterprise servers (HP Proliant DL 585, BL ... ML Series, SAN (Netapp), Veritas Cluster Server 5.0, Windows 2003 server, Shell programming, Jboss 4.2, JDK 1.5,1.6, VMware Virtual Client 3.5, VMware Infrastructure 3.5