We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

Charlotte, NC

PROFESSIONAL SUMMARY:

  • 7 years of IT Operations experience with 4+ years of experience in Hadoop Administration and 3 years of experience in Database Administrator based on SQL.
  • Excellent understanding of Distributed Systems and Parallel Processing architecture.
  • Worked on components like HDFS, Map/Reduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, and Fair Scheduler ), Spark, SmartSence and Kafka .
  • Experience in managing Cloudera, Hortonworks and MapR distributions.
  • Strong knowledge on Hadoop HDFS architecture and Map - Reduce framework.
  • Involved in vendor selection and capacity planning for the Hadoop cluster in production.
  • Experience in Administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
  • Experience in performing backup, recovery, and failover and DR practices on multiple platforms. Experience in managing and handling Linux platform servers (especially Ubuntu) and hands on experience on Red hat Linux.
  • Expertise in deployment of Hadoop, Yarn, Spark and Storm integration with Cassandra, ignite and RabbitMQ, Kafka.
  • Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and MapReduce 2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS. Hadoop security setup using MIT Kerberos, AD integration (LDAP) and Sentry authorization.
  • Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Used Network Monitoring Daemons like Ganglia and Service monitoring tools like Nagios.
  • Backup configuration and Recovery from a Name-node failure.
  • Installation of various Hadoop Ecosystems and Hadoop Daemons.
  • Installation and configuration of Sqoop and Flume.
  • Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
  • Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
  • Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • Experience in copying files with in cluster or intra - cluster using DistCp command line utility
  • Experience in HDFS data storage and support for running map-reduce jobs.
  • Installing and configuring Hadoop eco system like Sqoop, pig, hive.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
  • Hands-on experience with installing Kerberos Security and setting up permissions, set up Standards and Processes for Hadoop based application design and implementation.
  • Experience with cloud: Hadoop-on-Azure, AWS/EMR, Cloudera Manager (also direct-Hadoop-EC2(non EMR))
  • Brief exposure in Implementing and Maintaining Hadoop Security and Hive Security.
  • Experience in Database Administration, performing tuning and backup & recovery and troubleshooting in large scale customer facing environment.
  • Expertise in deployment of Hadoop, Yarn, Spark and Storm integration with Cassandra, ignite and RabbitMQ, Kafka.
  • Expertise in Commissioning and Decommissioning of nodes in the clusters, Backup configuration and Recovery from a Name node failure.
  • Good working knowledge on importing and exporting data from different databases namely MySQL into HDFS and Hive using Scoop.
  • Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
  • Participate in development/implementation of Cloudera Hadoop environment.
  • Strong knowledge on yarn terminology and the High-Availability Hadoop Clusters.
  • Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
  • Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
  • Very Good Knowledge in YARN (Hadoop 2.x.x) terminology and High availability Hadoop Clusters.
  • Experience in analyzing the log files for Hadoop and eco system services and finding out the root cause.
  • Very Good experience on high-volume transactional systems running on Unix/Linux and Windows.
  • Involved in all phases of Software Development Life Cycle (SDLC) in large scale enterprise software using Object Oriented Analysis and Design.

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: HDFS, MapReduce, YARN, Pig, Hive, HBase, Oozie, Sqoop, Spark, Cassandra, Solr, Hue, Kafka, Hcatalog, AWS, Data Modeling, MongoDB, Flume & Zookeeper.

Languages and technologies: Java, SQL, NoSQL, Phoniex

Operating Systems: Linux & UNIX. Windows, MAC.

Databases: MySQL, Oracle, Teradata, Greenplum, PostgresSQL, DB2.

Scripting: Shell Scripting, Pearl Scripting, Python

Web/Application Server: Apache 2.4, Tomcat, Web Sphere, Web Logic.

NOSQL Databases: HBase, Cassandra, MongoDB

Office Tools: MS Word, MS Excel, MS PowerPoint, MS Project

PROFESSIONAL EXPERIENCE:

Confidential, Charlotte, NC

Hadoop Administrator

Responsibilities:

  • Worked on Hadoop Stack, ETL TOOLS like TALEND, Reporting tools like Tableau and Security like Kerberos, User provisioning with LDAP and lot of other Big Data technologies for multiple use cases.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Here I have installed 5 Hadoop clusters for different teams, we have developed a Data lake which serves as a Base layer to store and do analytics for Developers, we provide services to developers, install their custom software’s, upgrade Hadoop components, solve their issues, and help them troubleshooting their long running jobs, we are L3 and L4 support for the Datalike, and I also manage clusters for other teams.
  • Building automation frameworks for data ingestion, processing in Python, and Scala with NoSQL and SQL databases and Chef, Puppet, Kibana, Elastic Search, Tableau, GoCD, Red hat infrastructure for data ingestion, processing, and storage.
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • I’m a mix of Devops and Hadoop admin here, and work on L3 issues and installing new components as the requirements comes and did as much automation and implemented CI /CD Model.
  • Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and MapReduce 2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS. Hadoop security setup using MIT Kerberos, AD integration (LDAP) and Sentry authorization.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Migrated services from a managed hosting environment to AWS including: service design, network layout, data migration, automation, monitoring, deployments and cutover, documentation, overall plan, cost analysis, and timeline.
  • Used R for an effective data handling and storage facility.
  • Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ensile, Puppet, or custom-built. Designing cloud-hosted solutions, specific AWS product suite experience.
  • Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure the high availability of cluster.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari. Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs.
  • Created Teradata Database Macros for Application Developers which assist them to conduct performance and space analysis, as well as object dependency analysis on the Teradata database platforms
  • Implementing a Continuous Delivery framework using Jenkins, Puppet, Maven & Nexus in Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry.
  • Defined Chef Server and workstation to manage and configure nodes.
  • Experience in setting up the chef repo, chef work stations and chef nodes.
  • Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Environment: Hortonworks Hadoop, Cassandra, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, Sqoop, Hive, Oozie, Ambari, SAS, SPSS, Unix Shell Scripts, Zoo Keeper, SQL, Map Reduce, Pig.

Confidential - Boston, MA

Hadoop Administrator

Responsibilities:

  • Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
  • Automated Setup Hadoop Cluster, Implemented Kerberos security for various Hadoop services using Horton Works.
  • Experience and detailed knowledge of integrating and aggregating large data sets into high-volume data warehouse applications using highly-varied and high-velocity data collection systems
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Installation of various Hadoop Ecosystems and Hadoop Daemons.
  • Responsible for Installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
  • Experience in Netezza /green plum data warehouse appliance technologies
  • Engaged in activities like the Netezza data warehouse appliance, data warehousing, working with very large data sets and SQL
  • Involved in loading data from UNIX file system to HDFS, Importing and exporting data into HDFS using Sqoop, experienced in managing and reviewing Hadoop log files.
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Extracted meaningful data from dealer csv files, text files, and mainframe files and generated Python panda's reports for data analysis.
  • Developed python code using version control tools like GIT hub and SVN on vagrant machines.
  • Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters. Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.

Environment: Horton Work, Hadoop, HDFS, Pig, Hive, Sqoop, Flume, Kafka, Storm, UNIX, Cloudera Manager, Zookeeper and HBase, Python, Spark, Apache, SQL, ETL.

Confidential - Washington, DC

Hadoop Administrator

Responsibilities:

  • Installed and configured Cloudera CDH 5.7.1 with Hadoop Eco-Systems like Hive, Oozie, Hue, Spark, kafka, HBase, Yarn.
  • Configured AD, Centerify and integrated with Kerberos.
  • Installed and configured Kafka Cluster
  • Installed MySQL and MySQL Master - Slave setup.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded.
  • Setting up and managing HA Name Node to avoid single point of failures in large clusters.
  • Worked with different applications teams to integrate with Hadoop.
  • Worked with data delivery teams to setup new Hadoop users, Linux users, setting up Kerberos principles and testing HDFS, Hive.
  • Upgraded from CDH 5.7.1 to CDH 5.7.2
  • Involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
  • Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
  • Assisted in developing DFD with Architect team and Networking team.
  • Integrated Attunity and Cassandra with CDH Cluster.
  • Worked closely with Data Center team and Linux team in configuring VM and Linux boxes.
  • Involved in and finalizing Cloudera SOW and MSA with Cloudera.

Environment: RHEL 6.7, CentOS 7.2, Shell Scripting, Java (JDK 1.7), Map Reduce, Oracle, SQL server, Attunity, Cloudera CDH 5.7.x, Hive, Zookeeper and Cassandra.

Confidential, Jersey city, New Jersey

Hadoop Administrator

Responsibilities:

  • Responsible for Cluster Maintenance, Monitoring, Managing, Commissioning and decommissioning Data nodes, Troubleshooting, and review data backups, Manage & review log files for Hortonworks.
  • Adding/Installation of new components and removal of them through Cloudera.
  • Monitoring workload, job performance, capacity planning using Cloudera.
  • Major and Minor upgrades and patch updates.
  • Installed Hadoop eco system components like Pig, Hive, HBase and Sqoop in a Cluster.
  • Experience in setting up tools like Ganglia for monitoring Hadoop cluster.
  • Handling the data movement between HDFS and different web sources using Flume and Sqoop.
  • Extracted files from NoSQL database like HBase through Sqoop and placed in HDFS for processing.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
  • Installed and configured HA of Hue to point Hadoop Cluster in cloud era Manager.
  • Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, supporting and managing Hadoop Clusters.
  • Installed and configured MapReduce, HDFS and developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Working with applications teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Responsible for developing data pipeline using HDInsight, Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded.
  • Set up and managing HA Name Node to avoid single point of failures in large clusters.
  • Working with data delivery teams to setup new Hadoop users, Linux users, setting up Kerberos principles and testing HDFS, Hive.
  • Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.

Environment: Linux, Shell Scripting, Java (JDK 1.7), Tableau, Map Reduce, Teradata, SQL server, NoSQL, Cloudera, Flume, Sqoop, Chef, Puppet, Pig, Hive, Zookeeper and HBase.

Confidential

Hadoop Administrator

Responsibilities:

  • Installation, configuration, support and maintenance Hadoop clusters using Apache, Hortonworks, yarn distributions.
  • Involved on Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Installing and configuring Hadoop eco system like Sqoop, pig, hive.
  • Involved in Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • Used Network Monitoring Daemons like Ganglia and Service monitoring tools like Nagios.
  • Loading log data directly into HDFS using Flume.
  • Importing and exporting data into HDFS using Sqoop.
  • Backup configuration and Recovery from a NameNode failure.
  • NameNode high availability with quorum journal manager, shared edit logs
  • Involved in the Access Control Lists on HDFS.
  • Configuring Rack Awareness on HDP
  • Starting and stopping the Hadoop demons like NameNode, StandbyNamenode, data node, Resource Manager, Node Manager.
  • JVM memory configuration parameters.
  • Involved in copying files with in cluster or intra-cluster using DistCp command line utility
  • Involved in commissioning and decommissioning for slave node line data nodes, HBase region servers, Node Manager
  • Involved in the cluster capacity scheduler.

Environment: Hadoop 2.0, Map Reduce, HDFS, Hive, Zookeeper, Oozie, Java (jdk1.6), Hortonworks, NoSQL, Oracle 11g, 10g, Red hat Linux.

Confidential

SQL Database Administrator

Responsibilities:

  • Installed, Configured, and Maintained SQL Server 2014, 2012, and 2008 R2 in development, test, and production environment
  • Configured and Maintained Fail-Over Clustering using SQL Server 2012
  • Installed and Configured SQL Server Reporting Services (SSRS)
  • Configured and Maintained Replications, Log Shipping, and Mirroring for High Availability.
  • Upgraded/Migrated SQL Server Instances/Databases from older version SQL Server to new version of SQL Server like 2000/ R2 and 2008 R2 to 2012
  • Migrated MS Access Databases into MS SQL Server 2008 R2, and 2012
  • Migrated Oracle 10gR2/11gR2 and MySQL 5.1.23 databases to SQL Server 2008 R2/2012
  • Applied SP (Service Pack)/ Hot Fixes on SQL Server Instances to address security and upgraded related issues
  • Performed database and SQL/TSQL Performance Tuning
  • Wrote SQL/T-SQL queries, Stored-Procedures, functions, and Triggers
  • Scheduled many jobs to automate different database related activities including backup, monitoring database health, disk space, backup verification
  • Developed Different Maintenance Plans for database monitoring
  • Setup Jobs, Maintenance plans for backups, Rebuilding indexes, check server health, alert, notifications
  • Create and managed different types of Indexes (Cluster/Non-Cluster), Constraints (Unique/Check)
  • Worked on Data Modeling projects, Backward Engineering, Developed E-R Diagram and used multiple tools like ER-Win, Toad Data Modeler, and SQL Server Data Diagram
  • Developed SSIS Packages from different sources like SQL Server Database, Flat file, CSV, Excel and many other data sources supports ODBC, OLE DB Data Sources.
  • Deployed SSIS packages to move data across server, move logins, load data from different data sources
  • Setup jobs from SSIS Packages.
  • Used Imp/Exp. Tool to Export & Import data from different sources like SQL Server Database, Flat file, CSV, Excel and many other data sources supports ODBC, OLE DB Data Sources.

Environment: Microsoft SQL Server 2012/2008 R2/2008, Windows 2012/2008 Servers, T-SQL, SQL Server Profiler, SSIS, MS Office, Performance Monitor, SQL Server Cluster.

Confidential

Database Administrator

Responsibilities:

  • Configure and install SQL server 2012 and 2014 in High Availabilities (Always On Group).
  • Managing ETL implementation and enhancements, testing and quality assurance, troubleshooting issues and ETL/Query performance tuning.
  • Database administration including installation, configuration, upgrades, capacity planning, performance tuning, backup and recovery and managing clusters of SQL servers.
  • Experience with setup and administration of SQL Server Database Security environments using database Privileges and Roles.
  • Provide SQL Server database physical model creation and implementation (data type, indexing, and table design).
  • Perform day-to-day administration on SQL Server environments.
  • Diagnose & troubleshoot issues, Conduct performance tuning for optimizations.
  • Manage database Capacity growth, refresh data from production to lower environment.
  • Installing SQL Server with standard access privileges service account to improve security and to attain high ratings in Sox and PI audits.
  • Identify problems, find root cause and perform tuning to ensure application performance.
  • Manage database Capacity growth, write stored procedures, triggers.
  • Installed, configured SQL Server 2008 R2 Clustering on Microsoft Clustering services (MSCS) for Active-Passive and Active-Active Cluster Nodes.

Environment: Microsoft SQL Server 2008/2008 R2/2005, Microsoft Windows 2008 Server, MS Visual Source Safe, XML.

Hire Now