Hadoop Admin Resume
4.00/5 (Submit Your Rating)
Irvine, CA
SUMMARY
- 8 plus years of IT industrial experience in Administrating Linux, Database management, developing Map - reduce applications, designing, building and administrating large scale Hadoop production Clusters
- 2.5 years of experience in big data technologies: Hadoop HDFS, Map-reduce, Pig, Hive, Oozie, Flume, Sqoop, Zookeeper, And NoSQL: Cassandra and Hbase.
- Experience in deploying and managing the multi-node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Hortonworks Ambari.
- Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
- Strong knowledge of Apache Hive data warehouse, data cubes, Hive server, partitioning, bucketing, clustering and writing UDFS, UDAFS, and UDTFS in Java for hive.
- Solid experience in Pig administration and development and writing PIG UDFS (Eval, Filter, Load and Store) and macros.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Experience in performing minor and major upgrades, commissioning and decommissioning of data nodes on Hadoop cluster.
- Strong knowledge in configuring Name Node High Availability and Name Node Federation.
- Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, scoop automation.
- Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and also using fast loaders and connectors Experience.
- Experience in using Flume to stream data into HDFS - from various sources.
- Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2.
- Experience in installing and administering PXE Server with kick start, setting up FTP, DHCP, DNS servers and Logical Volume Management.
- Experience in configuring and managing storage devices NAS (file level access - NFS) and SAN (block level access-iSCSI)
- Experience in Storage management including JBOD, RAID Levels 1 5 6 10, Logical Volumes, Volume Groups and Partitioning
- Exposure to Maven/Ant, GIT along with Shell Scripting for Build & Deployment Process.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, crating realm /domain, managing principles, generation key tab file each service and managing key tab using key tab tools.
- Experience in handling multiple relational databases: MySQL, SQL Server.
- Familiar with Agile Methodology (SCRUM) and Software Testing.
- Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.
TECHNICAL SKILLS
Hadoop Ecosystem: HDFS, SQOOP, FLUME, MAP-REDUCE, HIVE, PIG, OOZIE, ZOOKEEPER
NoSQL Database: Hbase, Cassandra
Security: Kerberos
Database: MySQL, SQL Server
Cluster management Tools: Cloudera Manager, Ambari
Os: LINUX (Centos, RHEL), windows, mac
PROFESSIONAL EXPERIENCE
Confidential, Irvine, CA
Hadoop Admin
Responsibilities:
- Managing 5 Hortonworks cluster size of 1500 nodes altogether (Development, R&D, Discovery PROD, MARS HBase and MARS PROD)
- Designed and Architected R&D cluster with HDP 2.3.2 andAmbari2.2.0
- Worked on 4 different versions of HDP (1.3.2, 2.1.5, 2.2.6, 2.3.2 Latest Enterprise Release )
- Upgraded HDP 1.3.2 and 2.1.5 to 2.2.6 using Blueprints. 2.2.6 to 2.3.2 using Rolling upgrade with no downtime to PROD Cluster
- Configured Hadoop High Availability on Namenode, HBase, Hive, Yarn and Storm (Nimbus)
- Configured Hadoop security Kerberos. Ranger and Knox for secured cluster
- Configured HDFS data at rest Encryption using Ranger KMS
- Configured Storm HA
- Installed and configured Spark
- Created kafka topics, produced and consumed messages
- Cluster performance tuning
- Setup 3 instance of zookeeper dedicated for HBase, Storm and kafka. 1st instance managed byAmbariand other 2 are out ofAmbari
- Configured Apache Ranger centralized security and auditing for HDFS, YARN, HIVE, HBase, Storm and Kafka.
- Installed and configured Informatica 9.6.1 HF1 Big Data Edition for Hadoop ETL
- Commissioning and decommissioning of datanodes
- Troubleshoot the issues reported by Nagios
- Built and configured log data loading into HDFS using Flume.
- Wrote shell script to monitor few components out ofAmbari
- Performed Importing and exporting data into HDFS and Hive using Sqoop.
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive
- Recovering from node failures and troubleshooting common Hadoop cluster issues
- Supporting Hadoop developers and assisting in optimization of map reduce jobs, Pig Latin scripts, Hive Scripts, and HBase ingest Required
Confidential, San Francisco, CA
Hadoop Admin
Responsibilities:
- Designed and developed data solutions to help business and product teams make data driven decisions
- Worked closely with data analysts to construct creative solutions for their analysis tasks
- Lead end-to-end efforts to design, develop, and implement data warehousing and business intelligence solutions
- Worked on performing major upgrade of cluster from CDH3u6 to CDH4.4.0
- Developed Puppet modules to automated the installation, configuration and deployment of software, OS's and network infrastructure at a cluster level
- Implemented Namenode HA and automatic failover infrastructure to overcome single point of failure for Namenode utilizing Zookeeper services
- ImplementedClouderaManager on existing cluster
- Optimized our Hadoop infrastructure at both the software and hardware level
- Ensured our Hadoop clusters are built and tuned in the most optimal way to support the activities of our Big Data teams
- Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop
- Installed, Configured and managed Flume Infrastructure
- Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume
- Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data
- Configured Hive metastore to use MySQL Database, to make available all the tables created in Hive different users simultaneously.
- Using HiveQL developed many queries and extracted the business required information.
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
Confidential, Jersey City, NJ
Hadoop Admin
Responsibilities:
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
- Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
- Implemented authentication service using Kerberos authentication protocol.
- Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
- Master nodes disks are configured with RAID 1+0
- Performed benchmarking on the Hadoop cluster using different benchmarking mechanisms.
- Tuned the cluster by Commissioning and decommissioning the Data Nodes.
- Upgraded the Hadoop cluster.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Deployed high availability on the Hadoop cluster quorum journal nodes.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Configured Ganglia which include installing GMOND and GMETAD daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Deployed Network file system for Name Node Metadata backup.
- Performed cluster back using DISTCP, Cloudera manager BDR and parallel ingestion.
- Designed and allocated HDFS quotas for multiple groups.
- Configured and deployed hive metastore using MySQL and thrift server.
- Used hive schema to create relations in pig using Hcatalog.
- Development of Pig scripts for handling the raw data for analysis.
- Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
- Deployed and configured flume agents to stream log events into HDFS for analysis.
- Performed deploying yarn, which facilitate multiple applications to run on the cluster.
- Configured Oozie for workflow automation and coordination.
- Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
- Custom shell scripts for automating redundant tasks on the cluster.
- Worked with BI teams in generating the reports and designing ETL workflows on Pentaho.
Environment: LINUX, HDFS, SQOOP, FLUME, MAP-REDUCE, HIVE, PIG, OOZIE, ZOOKEEPER
Confidential
Java Developer
Responsibilities:
- Involved in the design and followed Agile Software Development Methodology throughout the software development lifecycle.
- Designed Use Cases, Class Diagrams, and Sequence Diagrams using Visual Paradigm to model the detail design of the application.
- Developed User Interface using JSP standard tags and Java script, HTML, CSS for Presentation layer.
- Used the spring validation for Web Form Validation by implementing the Validator interface.
- Application was built on Spring MVC framework and Hibernate as ORM
- Used Spring-Core module for Dependency Injection and integrated view using Apace Tiles.
- Consumed Web Services (WSDL, SOAP, UDDI) from third party for authorizing payments to/from customers using CXF Framework
- Used JMS Queue communication in authorization module.
- Mapped (one-to-many, one-to-one, many-to-one relations) DTOs to Oracle Database tables and Java data types to SQL data types by creating Hibernate mapping XML files
- Oracle database was used, wrote stored procedures for common SQL queries
- Used ANT for building the enterprise application modules, Used CVS for Version control, Log4J to monitor the error logs and performed unit testing using J Unit.
- Deployed the applications on IBM Web Sphere Application Server 5.0.