Big Data/DevOps Admin/Engineer Resume

SUMMARY

AWS Certified DevOps Engineer with 5+ years of professional IT experience in related technologies that includes extensive and work experience with Linux/Unix Administration, Systems Configuration, Big Data Engineering, Data Science and Cloud Engineering and DevOps Engineering.
Over 5 years of professional IT experience with 4+ years of experience as Hadoop Administrator.
Experience with complete Software Design Life Cycle including design, development, testing and implementation of moderate to advanced complex systems based on Java and Python.
Good experience in installation, configuration and management of clusters in Hortonworks Data Platform (HDP 2.x and HDP 3.x) and Hortonworks Data Flow (HDF 2.x and HDF 3.x) distributions using Apache Ambari and Ansible on both on - premises and public Cloud Service Providers (CSPs).
Experience in Hadoop architecture and its various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
Good understanding and hands on experience of Hadoop Cluster capacity planning, performance tuning, cluster monitoring, troubleshooting.
Hands-on experience in installing, managing and operating of various Hadoop Ecosystem components like Apache Hadoop, Apache Spark, Apache Hive, Apache Tez, Apache ZooKeeper, Apache Zeppelin, Apache Ranger, Apache Knox and Apache Sqoop.
Hands-on experience with configuring security for HDP clusters using Apache Knox, Ranger and Kerberos.
Hands-on experience with installing and setting up Data Science platforms using Anaconda, Jupyter and Apache Zeppelin.
Hands-on experience in installing, managing and operating on Apache NiFi clusters using Ansible.
Hands-on experience in installing, managing and operating ElasticStack (5.x, 6.x and 7.x) components including and not limited to Elasticsearch, Kibana and Beats.
Hands-on experience in Installing and configuring Ansible/Ansible Engine on the RHEL 7 and RHEL 8 systems to make it work as a Ansible Control Machine.
Extensive experience with Ansible architecture and Ansible internals, as wells as developing Roles and Playbooks for Automation, Configuration Management and IaC.
Experience with installing and setting up AnsibleWorks (AWX) on Red Hat OpenShift Container Platform.
Experience working with various Red Hat products, including but not limited to, Red Hat Enterprise Linux (6, 7 and 8), Red Hat JBoss Middleware Suite, Red Hat OpenShift, Red Hat Cloud Forms, Red Hat Identity Management and Red Hat Ansible Automation Platform.
Hands-on experience in installing, managing and operating various Red Hat JBoss Middleware Suite products like JBoss EAP, JWS, EWS, JBCS Apache HTTPD, JBoss Fuse, JBoss AMQ.
Hands-on experience in installing, managing and operating various Open Source Middleware components like Apache HTTPD, Apache Tomcat, WildFly Application Server, Apache ZooKeeper, Apache Solr and Nginx.
Experience with design and development of automation tools and infrastructure to run service-oriented stacks on an internal data center, Public & Private clouds.
Experience in working with data models for databases and Data Warehouse/DataMart/ODS for OLAP and OLTP environments.
Excellent communication and interpersonal skills, and leadership quality with ability to work efficiently in both independent and team environments.
Major strengths are familiarity with multiple software systems, ability to learn new technologies quickly, adapt to new environments, self-motivated, team player, focused, adaptive and quick learner with Excellent Interpersonal, Technical and Communication Skills.

TECHNICAL SKILLS

Big Data Ecosystem: Apache Hadoop, Apache Spark, Apache NiFi, Apache Hive, Apache Tez, Apache Sqoop, Apache Ranger, Apache Knox, Apache ZooKeeper, Apache Solr, Apache Kafka, Apache Tika.

Elastic Stack: Elasticsearch, Kibana, Beats, Logstash, APM, ES-Hadoop, X-Pack

Cloud: AWS, Azure and Red Hat Cloud Forms.

Operating Systems: Red Hat Enterprise Linux (8, 7 and 6), CentOS (7 and 8), OpenSUSE.

Middleware Components: Apache HTTPD, Apache Tomcat, WildFly AS, JBoss EAP, JWS, JBCS, JBoss Fuse, AMQ.

Container Orchestration: Docker Engine, Kubernetes, ECS, Red Hat Open Shift Container Platform

DevOps Deploy / Configuration Management: Ansible, Puppet, Foreman, Chef

DevOps SCM & VCS: SVN & Git (GitHub, GitLab & BitBucket)

DevOps Build: Jenkins, Maven, Sbt, Nexus, Jfrog, npm, AWS CodeBuild.

DevOps Monitoring: AppDynamics, Elastic APM, OpenNMS, AWS CloudWatch, Prometheus, Grafana.

DevOps Management: BMC Remedy, Service Now, JIRA, CA Agile Rally

Scripting: Python 3 and Shell

Data Science: Anaconda

PROFESSIONAL EXPERIENCE

Confidential

Big Data/DevOps Admin/Engineer

Responsibilities:

Provide Hadoop Developer and Administration support which includes infrastructure setup, software installation (Hortonworks Data Platform and Hortonworks Data Flow) configuration, upgrading/patching, monitoring, trouble shooting, maintenance, and working with development team to install components (Hive, Pig, etc.) and manage design and development Spark, Hive, Apache NiFi and YARN Applications.
Daily production support for big data technologies and platforms (Hadoop, Spark, MapReduce, Hive, NiFi, etc)
Design and Develop data integration/engineering workflows on big data technologies and platforms (Hadoop, Spark, MapReduce, Hive, NiFi, etc)
Working on 3 Hadoop clusters for different teams, supporting 25+ users to use Hadoop platform usability simple and updating them with best practices.
Installed/Configured/Maintained Apache Hadoop clusters for application development which included components Ambari, Hadoop, Hive, Spark2, Zeppelin, NiFi, ZooKeeper, Tez, Ranger and Knox.
Also installed, configured and managed plain vanilla Apache Hadoop with Apache Spark.
Installed, configured and managed ElasticSearch and Kibana.
Installed, configured and managed Apache NiFi to be used with Hadoop clusters for data transfer and workflow scheduling.
Installed Apache NiFi, Ambari Server, Ambari Agents and ElasticSearch using Ansible.
Working on hardware sizing, capacity management, cluster management, maintenance, performance monitoring and configuration of big data systems and applications for high volume throughput and processing.
Responsible for day-to-day activities which includes HDFS support and maintenance, Cluster maintenance, creation/removal of nodes, Cluster Monitoring/ Troubleshooting, Manage and review Hadoop log files, Backup and restoring, capacity planning.
Developed Ansible Playbooks with Ansible roles. Used file module in Ansible playbook to copy and remove files on remote systems.
Monitor, maintain, provision and upgrade Hadoop, Hive and Spark systems to support a complex Data Pipeline Platform.
Participate in an on-call rotation responding to alerts and systems issues for Hadoop, Hive, Spark and more.
Troubleshoot, repair and recover from hardware or software failures. Identify and resolve faults, inconsistencies and systemic issues. Coordinate and communicate with impacted constituencies.
Manage user access and resource allocations to Data Pipeline Platform.
Develop tools to automate routine day-to-day tasks such as security patching, software upgrades, hardware allocation. Utilize automated system monitoring tools to verify the integrity and availability of all hardware, server resources, and critical processes.
Create new standard operating procedures for the team and focus on updating existing documentation for the team.
Engage other teams during outages or planned maintenance.
Administer development, test, QA and production servers.
Design, build and maintain near real-time big data applications and pipelines to process billions of records into and out of high-performance layers and data lakes, powering various data products and services
Characterize and optimize application and pipeline performance as well as troubleshoot and resolve data processing issues
Actively incubate new technologies and tools for big data initiatives.
Participated in On-call Rotation (as required) for emergency technical production support and planned maintenance activities.

Environment: Puppet 3.5, Ansible 2.6, Hadoop 2.7, MapReduce 2.7, Hive 1.2, Spark 2.1, Ambari 2.6, Zookeeper 3.4, Hortonworks Data Platform 2.6, Zeppelin 0.7, RHEL 7, Red Hat Identity Management, Red Hat CloudForms, Red Hat Satellite, ElasticSearch 5.x, X-pack, Kibana, Apache NiFi 1.3, JBoss EAP 7, JBoss EWS, JBCS 2.4, NGINX 1.14, Apache HTTPD Server, Apache Tomcat 7, JBoss Fuse, Apache Solr, Anaconda3, R, Python, Ruby.

Confidential, Baton Rouge, LA

DevOps Engineer/Hadoop Administrator

Responsibilities:

Work on production systems to support reproduction and troubleshooting, system characterization and analysis, root cause analysis of production issues, and bug tracking and resolution as part of production support.
Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Zookeeper and Sqoop.
Managed 25+ Nodes CDH 5.2 cluster with 4 petabytes of data using Cloudera Manager and Linux RedHat 6.5.
Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Upgraded the Hadoop cluster from CDH5.2 to CDH5.5
Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration.
Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop and Spark.
Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, enabling Kerberos Using the Wizard.
Monitored cluster for performance and, networking and data integrity issues.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Test strategize, test plan and test case creation in providing test coverage across various products, systems, and platforms.
Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning.
Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
Scripting Hadoop package installation and configuration to support fully-automated deployments.
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters
Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
Created Hive External tables and loaded the data in to tables and query data using HQL.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Monitoring Hadoop cluster using tools like Nagios, Ganglia, and Cloudera Manager.
Maintaining the Cluster by adding and removing of nodes using tools like Ganglia, Nagios, and Cloudera Manager.
Participated in On-call Rotation (as required) for emergency technical production support and planned maintenance activities.

Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Spark, Oozie, Flume, HBase, Nagios, Ganglia, Hue, Cloudera Manager, Zookeeper, Cloudera, Oracle, Kerberos and RedHat 6.5

Confidential, Merrimack NH

Hadoop Developer/Administrator & Cloud Engineer

Responsibilities:

Involved in design, development and implementation of Hadoop POC to transition one of the complicated and vast on-premises data warehouse project to on-premise Cloudera CDH clusters and Amazon Web Services.
Worked on setting up and configuring Hortonworks HDP and Amazon Web Service’s EMR (Elastic Map Reduce - A Managed Hadoop Framework) Clusters.
Involved in loading data from On-premises data warehouse to Cloudera CDH clusters and AWS’s Cloud using different approaches like Sqoop, Spark and AWS Services.
Extensively worked on Apache Spark by leveraging Spark to move data from IBM’s Netezza an on-premises data warehouse to the AWS Cloud.
Worked on Apache Spark to extract, load, transform and analyze very large data sets.
Used Apache Spark to perform advanced analytics on very large datasets using DataFrame API’s.
Good knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm. Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop. Involved in migrating java test framework to python flask.
Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Experience in working with Ranger in enabling metadata management, governance and audit.
Installed and Configured Oozie for workflow automation and coordination.
Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Involved in scheduling and coordinating Oozie workflow manager jobs to run multiple Hive, MapReduce and pig jobs in batch processing mode.
Experience in methodologies such as Agile, Scrum and Test driven development
Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
Installed, configured, and administered a small Hortonworks HDP clusters consisting of 10 nodes. Monitored cluster for performance and, networking and data integrity issues.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
Worked on creating Event triggers on S3 whenever a particular file is uploaded or placed into an AWS’s S3 bucket.
Wrote Lambda functions in python for AWS’s Lambda which invokes python scripts to perform various transformations and analytics on large data sets.
Integrated LDAP Configuration this includes integrating LDAP for securing Cloudera Manager and manage authorization and securing with permissions against users and Groups.
Extensively worked on ORC & Parquet file formats and used data frame’s API in Apache Spark.
Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
Scripting Hadoop package installation and configuration to support fully-automated deployments.
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.

Environment: Hive, Pig, HBase, Zookeeper and Sqoop, ETL, Linux, RHEL 6, MongoDB, Cassandra, Ganglia and Cloudera Manager.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship