Hadoop Consultant Resume
Baltimore, MarylanD
OBJECTIVE
- With over 6+ years of IT experience, me would like to work with a dynamic and progressive IT firm where me can apply my technical experience and interpersonal skills efficiently & TEMPeffectively for the growth of the company.
SUMMARY
- Expertise in writing Hadoop Jobs for analyzing structured and unstructured data using HDFS, Hive, HBase, Spark, Oozie and Sqoop.
- Good knowledge of Hadoop Architecture and various core components such as HDFS, YARN and MapReduce concepts.
- Experience in working with different kind of MapReduce programs using Hadoop for working with Big Data analysis.
- Experience in importing/exporting data using Sqoop into HDFS from RDBMS.
- Working experience on designing and implementing complete end - to end Hadoop Infrastructure including HIVE, Sqoop, Oozie and ZooKeeper.
- Experience in writing shell scripts to dump the shared data from MySQL servers to HDFS.
- Experience with Oozie Workflow Engine in running workflow jobs with actions dat run Hadoop MapReduce, Hive, Spark jobs.
- Experience in working with various Cloudera distributions (CDH/CDP), Hortonworks and AWS.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Extensively worked on Hive and Sqoop for sourcing and transformations.
- Experience in the Hadoop Installation, configuration and maintaining the cluster.
- Hands on experience knowledge in NoSQL databases like HBase.
- Good understanding in using data ingestion tools- such as Sqoop and Talend.
- Good working knowledge on Hadoop hue ecosystems.
- Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.
- Strong experience in analyzing large amounts of data sets writing PySpark scripts and Hive queries.
- Extensive experience in working with structured data using Hive QL, join operations and experience in optimizing Hive Queries.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Integrated spring schedulers with Oozie client as beans to handle cron jobs.
TECHNICAL SKILLS
Big Data/Hadoop platform: Cloudera CDP 7.x, Cloudera CDH 5.x and 6.x, Hortonworks HDP 2.x
Hadoop related tools: HDFS, YARN, Map Reduce, Sqoop, Hive, Impala, Spark, Oozie, Zookeeper, Sentry, Ranger, Solr, HBase
Databases and Data warehouses: Oracle, MySQL, SQL Server, MongoDB, PostgreSQL, DB2, Hive, HBase
Version Control: Git, GitHub, Bitbucket
Project Management Tool: Jira
3rd Party Tools: Ganglia, Airflow, Putty, SecureCRT, Jira, Splunk, Bitbucket, Git, Talend etc.
Methodologies: Agile, Waterfall
Operating Systems: MS Windows, Linux, Unix, CentOS
IDE: Eclipse, IntelliJ, PyCharm, Jupyter Notebook
PROFESSIONAL EXPERIENCE
Confidential, Baltimore, Maryland
Hadoop Consultant
Responsibilities:
- Collaborate in identifying the current problems, constraints, and root causes with data sets to identify the descriptive and predictive solution with support of the Hadoop HDFS, MapReduce, Hive, and Hbase.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, HBase, Zookeeper and Sqoop.
- Installed and Configured Sqoop to import and export the data into HDFS and Hive from Relational databases.
- Cluster Monitoring and Troubleshooting Hadoop issues
- Administering large Hadoop environments build, and support cluster set up, performance tuning and monitoring in an enterprise environment.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level and optimized Hadoop clusters components to achieve high performance.
- Integrated CDH clusters with Active Directory and enabled Kerberos for Autantication.
- Worked on commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning and installed Oozie workflow engine to run multiple Hive Jobs.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Used Hive and created Hive tables and involved in data loading and writing Hive and worked with Linux server admin team in administering the server hardware and operating system.
- Worked closely with data analysts to construct creative solutions for their analysis tasks and managed and reviewed Hadoop and Hive log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports and worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Collaborating with application teams to install operating system and Hadoop updates, version upgrades when required.
- Automated workflows using shell scripts pull data from various databases into Hadoop.
ENVIRONMENT: HADOOP, HDFS, MAP REDUCE, HIVE, HBASE, ZOOKEEPER, OOZIE, IMPALA, CLOUDERA, ORACLE, SPARK, SQOOP, MYSQL, YARN, SENTRY, KERBEROS AND ETL.
Confidential - Atlanta, Georgia
Hadoop Admin
Responsibilities:
- Upgraded (MAJOR) the Hadoop cluster.
- Deployed HIGH AVAILABILITY on the Hadoop cluster quorum journal nodes.
- Implemented automatic failover zookeeper and ZOOKEEPER failover controller
- Tuned the cluster by COMMISSIONING and DECOMMISSIONING the Nodes.
- Configured Oozie for workflow automation and coordination.
- Implemented the Rack awareness topology in Hadoop Cluster.
- Data Backup in between clusters was performed using distcp.
- Deployed Network file system for Name Node Metadata backup.
- Involved in creating Hive tables and loading and analyzing data using hive queries.
- Designed and allocated HDFS quotas for multiple groups.
- Configured and deployed hive metastore using MySQL and thrift server.
- Experienced in loading and transforming of large sets of structured, semi structured, and unstructured data.
- Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS.
- Used Sqoop to move the data from RDBMS to HDFS.
- Built on-premises data pipelines using spark for real time data analysis.
- Created a local YUM repository for installing and updating packages.
- Configured flume agents to stream log events into HDFS for analysis.
- Implemented autantication service using MIT Kerberos autantication protocol.
- Custom shell scripts for automating redundant tasks on the cluster.
- Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.
ENVIRONMENT: HADOOP HDFS, MAPREDUCE, HIVE, PIG, FLUME, OOZIE, SQOOP, CLOUDERA MANAGER
Confidential - Alpharetta, GA
Hadoop Consultant
Responsibilities:
- Understood the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop ecosystem.
- Deployed Hadoop cluster of Hortonworks Distribution and installed ecosystem components: HDFS, YARN, Zookeeper, Hbase, Hive, MapReduce, Pig and Spark in Linux servers using Ambari.
- Designed and implemented Disaster Recovery Plan for Hadoop Clusters.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
- Integrated Hadoop cluster with Active Directory and enabled Kerberos for Autantication.
- Implemented Capacity schedulers on the Yarn Resource Manager to share the resources of the cluster for the MapReduce jobs given by the users.
- Set up Linux Users, and tested HDFS, Hive, Pig and MapReduce Access for the new users.
- Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
- Optimized Hadoop clusters components: HDFS, Yarn, Hive, Kafka to achieve high performance.
- Worked with Linux server admin team in administering the server Hardware and operating system.
- Interacted with the Networking team to improve bandwidth.
- Provided User, Platform and Application support on Hadoop Infrastructure.
- Applied Patches and Bug Fixes on Hadoop Cluster.
- Proactively involved in ongoing Maintenance, Support, and Improvements in Hadoop clusters.
- Conducted Root Cause Analysis and resolved production problems and data issues.
- Performed Disk Space management to the users and groups in the cluster.
- Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
- Performed Backup and Recovery process in order to Upgrade Hadoop Cluster.
- Used Sqoop, WinScp utilities for data copying and for data migration.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs such as MapReduce, Pig, Hive, and Sqoop as well as system specific jobs such as Java programs and Shell scripts.
- Proactively involved in ongoing Monitoring, Maintenance, Support and Improvements in Hadoop clusters.
- Monitored cluster stability, used tools to gather statistics and improved performance.
- Used Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
ENVIRONMENT: LINUX, HORTONWORKS, MAP REDUCE, SQL SERVER, NOSQL, CLOUDERA, SQOOP, HIVE, ZOOKEEPER AND HBASE.