Hadoop Administrator Resume
Mayfield, OH
PROFESSIONAL SUMMARY:
- Around 8 Years of professional Information Technology experience inHadoopand SQL Administration activities such as installation, configuration, and maintenance of systems/clusters
- Extensive experience in Hadoop Map Reduce programming, Spark, Scala, Pig, NoSQL, and Hive.
- Experience with Hortonworks & Cloudera Manager Administration also experience in Installing, Updating Hadoop and its related components in Single node cluster as well as Multi node cluster environment using Apache Cloudera and Hortonworks.
- Good experience in UNIX/LINUX Administrator along with SQL administration, designing and implementing Relational Database model as per business needs in different domains.
- Hands on experience on major components in Hadoop Ecosystem including HDFS and MR framework, YARN, HBase, Hive, Pig, Scoop, Zookeeper.
- Experience in managing and handling Linux platform servers (especially Ubuntu) and hands on experience on Red hat Linux.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Used Network Monitoring Daemons like Ganglia and Service monitoring tools like Nagios.
- Backup configuration and Recovery from a Name node failure.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Installation and configuration of Sqoop and Flume.
- Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Experience on Commissioning, Decommissioning, Balancing, Managing Nodes and tuning server for optimal performance of the cluster.
- Experience in copying files within cluster or intra - cluster using Dist-Cp command line utility
- Experience in HDFS data storage and support for running map-reduce jobs.
- Installing and configuring Hadoop eco-system like Sqoop, Pig, Hive.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Hands-on experience with installing Kerberos Security and setting up permissions, set up Standards and Processes for Hadoop based application design and implementation.
- Experience with cloud: Hadoop-on-Azure, AWS/EMR, Cloudera Manager (also direct-Hadoop-EC2(non EMR))
- Brief exposure in Implementing and Maintaining Hadoop Security and Hive Security.
- Experience in Database Administration, performing tuning and backup & recovery and troubleshooting in large scale customer facing environment.
- Expertise in deployment of Hadoop, Yarn, Spark and Storm integration with Cassandra.
- Expertise in Commissioning and Decommissioning of nodes in the clusters, Backup configuration and Recovery from a Name node failure.
- Good working knowledge on importing and exporting data from different databases namely MySQL into HDFS and Hive using Sqoop.
- Strong knowledge on yarn terminology and the High-Availability Hadoop Clusters.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Very Good Knowledge in YARN (Hadoop) terminology and High availability Hadoop Clusters.
- Very Good experience on high-volume transactional systems running on Unix/Linux and Windows.
PROFESSIONAL EXPERIENCE:
Hadoop Administrator
Confidential, Mayfield, OH
Responsibilities:
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups & log files.
- Worked on Hadoop Stack, ETL TOOLS like TALEND, Security like Kerberos, User provisioning with LDAP and lot of other Big Data technologies for multiple use cases.
- Worked with the Data Science team to gather requirements for various data mining projects.
- Here I have installed 5 Hadoop clusters for different teams, we have developed a Data lake which serves as a Base layer to store and do analytics for Developers, we provide services to developers, install their custom software’s, upgrade Hadoop components, solve their issues, and help them troubleshooting their long running jobs.
- Building automation frameworks for data ingestion, processing in Python and SQL databases and Red hat infrastructure for data ingestion, processing, and storage.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
- Responsible for upgrading Hortonworks Hadoop HDP and MapReduce with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS. Hadoop security setup using Kerberos, AD integration (LDAP) and Sentry authorization.
- Migrated services from a managed hosting environment to AWS including service design, network layout, data migration, automation, monitoring, deployments and cutover, documentation, overall plan, cost analysis, and timeline.
- Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Ansible or custom-built. designing cloud-hosted solutions, specific AWS product suite experience.
- Performed a Major upgrade in production environment from HDP lower version to higher version HDP As an admin followed standard Back up policies to make sure the high availability of cluster.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari. Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs.
- Created Teradata Database for Application Developers which assist them to conduct performance and space analysis, as well as object dependency analysis on the Teradata database platforms
- Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment:Hortonworks Hadoop, Cassandra, MSSQL, MySQL, Toad 9.6, Windows NT, Sqoop, Hive, Oozie, Ambari, Unix Shell Scripts, Zookeeper, SQL, Map Reduce, Pig.
Hadoop Administrator
Confidential, New York, NY
Responsibilities:
- Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
- Automated Setup Hadoop Cluster, Implemented Kerberos security for various Hadoop services using Horton Works.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Responsible for Installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement
- Involved in loading data from UNIX file system to HDFS, Importing and exporting data into HDFS using Sqoop, experienced in managing and reviewing Hadoop log files.
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Extracted meaningful data from dealer csv files, text files, and mainframe files.
- Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters. Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
Environment:HortonWork, Hadoop, HDFS, Pig, Hive, Sqoop, Flume, Kafka, Storm, UNIX, Cloudera Manager, Zookeeper and HBase, Python, Spark, Apache, SQL, ETL.
Hadoop Administrator
Confidential, New York, NY
Responsibilities:
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Performed both major and minor upgrades to the existing Hortonworks Hadoop cluster.
- Build automated setup for the cluster monitoring and issue escalation process.
- Administration, installation, upgrading and managing distributions and tuning Hadoop Clusters. (Cloudera Manager) HBase, Hive.
- Worked on Hadoop Stack, ETL tools and implement security through user provisioning with LDAP and Kerberos.
- Expertise in Hadoop Stack Map reduces, Sqoop, Pig, Hive, and HBase, Kafka and Spark.
- Work with system team to make plans and executes on system upgrades for existing Hadoop Clusters.
- Ability to work with incomplete or imperfect data, experience with real-time transactional data. Strong collaborator and team player with an agile hand on experience on Impala.
- Installs, manages, and configures the Hadoop clusters, generate tables, and reports.
- Monitors the Hadoop jobs and performance.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Implemented complex MapReduce programs to perform joins on the Map side using distributed cache.
- Participate in development/implementation of Cloudera Hadoop environment.
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Environment:Hadoop, Java, HDFS, MapReduce, Spark, Pig, Hive, Impala, Sqoop, Flume, Docker, Kafka, HBase, Oozie, Java, SQL scripting, Linux shell scripting, and Cloudera.
SQL DBA
Confidential, MN
Responsibilities:
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
- Used Sqoop to migrate data to and from HDFS and My SQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.
- Designed, planned, and delivered a proof of concept and business function/division-based implementation of a Big Data roadmap and strategy project.
- Involved in loading and transforming large sets of structured, semi structured, and unstructured data from relational databases into HDFS using Sqoop imports.
- Involved in exporting the analyzed data to the databases such as Teradata, MySQL, and Oracle use Sqoop for visualization and to generate reports for the BI team.
- Worked on an Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data in a timely manner.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Experience in analyzing Cassandra database and comparing it with other open-source NoSQL databases to find which one of them best suits the current requirements.
- Transformed the data using Hive, Pig for BI team to perform visual analytics, according to the client’s requirement.
- Implemented Fair schedulers on the Job Tracker to share the resources of the cluster of the Map +++Reduce jobs.
- Given by the users Environment: Cloudera CDH 4 Distribution, HDFS, MapReduce, Cassandra, Hive, Oozie, Pig, Shell Scripting, MySQL.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: HDFS, MapReduce, YARN, Pig, Hive, HBase, Oozie, Sqoop, Spark, Cassandra, Solr, Hue, Kafka, Hcatalog, AWS, Data Modeling, MongoDB, Flume & Zookeeper.
Languages and technologies: Java, SQL, NoSQL, Python.
Operating Systems: Linux & UNIX. Windows, MAC.
Databases: MySQL, Oracle, Teradata, PostgreSQL, DB2.
Scripting: Shell Scripting, Pearl Scripting, Python
NOSQL Databases: HBase, Cassandra.
