Hadoop Admin Resume
Atlanta, GeorgiA
SUMMARY:
- 8+ Years of extensive IT experience with 4 +years of experience as a Hadoop Administrator operationalizing and managing small to medium clusters using distributions like Cloudera, Hortonworks and ECS.
- Hands on Experience in Installing, Configuring and using Hadoop Eco System Components like HDFS, Hadoop Map Reduce, Yarn, Zookeeper, Sqoop, Flume, Hive, HBase, Spark, Pig, Oozie.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
- Experience in installing and configuring Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Experience in deploying a Hadoop cluster using Cloudera 5.X integrated with Ambari for monitoring and Alerting.
- Experience in launching and setting up of Hadoop Cluster on AWS as well as physical servers, which includes configuring different components of Hadoop.
- Experience in developing and monitoring Puppet Configuration Manager to automate the configuration files of Hadoop Ecosystem.
- Good experience in Hadoop infrastructure which include Map reduce, Hive, Oozie, Sqoop, HBase, Pig, HDFS, Yarn, Spark. Impala configuration projects in direct client facing roles.
- Good knowledge on implementation and design of big data pipelines.
- Knowledge in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Knowledge in implementing ETL/ELT processes with MapReduce, PIG, Hive
- Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase & Hive Integration, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP
- Strong knowledge on creating and monitoring Hadoop cluster on VM, Hortonworks Data Platform 2.1 7 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS.
- Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase & Hive Integration, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework.
- Knowledge in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Strong knowledge in Software Development Life Cycle (SDLC)
- Strong understanding in Agile and Waterfall SDLC methodologies.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Good Knowledge in creating reports using Qlik View/ Qlik Scenes.
- Exposure about how to export data from relational databases to Hadoop Distributed File System.
- Experience in cluster maintenance, commissioning and decommissioning the data nodes.
- Experience in monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Experience working with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Experience in monitoring multiple Hadoop clusters environments using Ganglia and Nagios as well as workload, job performance and capacity planning using Cloudera Manager.
- Experience in installing and configuring Kerberos for the authentication of users and Hadoop daemons.
- Hands on experience in Linux Hadoop activities on RHEL &Cent OS.
- Knowledge on Cloud technologies like AWS Cloud.
- Experience in Benchmarking, Backup and Disaster Recovery of Name Node Metadata.
- Experience in performing minor and major Upgrades of Hadoop Cluster.
- Experience working with popular frame works like Spring MVC, Hibernate.
- Experience with Source Code Management tools and proficient in GIT.
- Excellent interpersonal and communication skills, creative, research-minded with problem solving skills.
- Ensure that critical customer issues are addressed quickly and effectively.
- Apply troubleshooting techniques to provide solutions to our customer's individual needs.
- Troubleshoot, diagnose and potentially escalate customer inquiries during their engineering and operations efforts
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, Georgia
Hadoop Admin
Responsibilities:
- Hadoop installation, Configuration of multiple nodes using Cloudera platform.
- Installed and configured a Hortonworks HDP 2.2 using Ambari and manually through command line. Cluster maintenance as well as creation and removal of nodes using tools like Ambari, Cloudera Manager Enterprise and other tools.
- Handling the installation and configuration of a Hadoop cluster.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Involved in developer activities of installation and configuring Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data ( Data Encryption at Rest)
- Handling the data exchange between HDFS and different web sources using Flume and Sqoop.
- Monitoring the data streaming between web sources and HDFS and functioning through monitoring tools.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Day-to-day operational support of our Cloudera Hadoop clusters in lab and production, at multi-petabyte scale.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performed by the cluster.
- Worked on developing architecture document and proper guidelines
- Worked on installing Kafka on Virtual Machine.
- Created topic for different users
- Installed Zookeepers, brokers, schema registry, control Center on multiple machine.
- Setup ACL/SSL security for different users and assign users to multiple topics
- Develop security for users and they can connect with SSL security
- Assign access to users by multiple user’s login.
- Experience in performing backup, recovery, failover and DR practices on multiple platforms.
- Installed and configured Hadoop multi-node cluster and maintenances by Nagios.
- Install and configure different Hadoop ecosystem components such as Spark, HBase, Hive, Pig etc. as per requirement.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups and an Excellent working knowledge on SQL with databases.
- Commissioning and De-commissioning of data nodes from cluster in case of problems.
- Setting up automated processes to archive/clean the unwanted data on the cluster, in particular on Name Node and Secondary Name node.
- Set up and managing HA Name Node to avoid single point of failures in large clusters.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Run the benchmark tools to test the cluster performance
- Performed POC to assess the workloads and evaluate the resource utilization and configure the Hadoop properties based on the benchmark result.
- Tuning the cluster based on the POC and benchmark results.
- Commissioning and de-commissioning of cluster nodes.
- Monitoring System Metrics and logs for any problems
- Experience with AWS Cloud (EC2, S3 & EMR)
- Expertise building Cloudera, Hortonworks Hadoop clusters on bare metal and Amazon EC2 cloud.
- Experienced in installation, configuration, troubleshooting and maintenance of Kafka & Spark clusters.
- Experience in setting up Kafka cluster on AWS EC2 Instances.
- Good understanding on cluster configurations and resource management using YARN
Environment: Hadoop, HDFS, MAPREDUCE, HIVE, PIG, OOZIE, SQOOP, AMBARI, STORM, GFS, ZOOKEEPER, NIFI, KAFKA
Confidential
Hadoop Admin
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
- Set up Hortonworks Infrastructure from configuring clusters to Node
- Installed Ambari server on the clouds
- Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
- Worked as Administrator for Hadoop Cluster (180 nodes)
- Performed Requirement Analysis, Planning, Architecture Design and Installation of the Hadoop cluster
- Experience in Upgrades and Patches and Installation of Ecosystem Products through Ambari.
- Automated the configuration management for several servers using Chef and Puppet.
- Monitored job performances, file system/disk-space management, cluster & database connectivity, log files, management of backup/security and troubleshooting various user issues.
- Responsible for day-to-day activities which include HDFS support and maintenance, Cluster maintenance, creation/removal of nodes, Cluster Monitoring/Troubleshooting, Manage and review Hadoop log files, Backup restoring and capacity planning.
- Design and deployment of clustered HPC monitoring systems, including a dedicated monitoring cluster.
- Develop and document best practices, HDFS support and maintenance, Setting up new Hadoop users.
- Responsible for the new and existing administration of Hadoop infrastructure.
- Included DBA Responsibilities like data modeling, design and implementation, software installation and configuration, database backup and recovery, database connectivity and security.
- Built data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search.
- Implemented concepts of Hadoop eco system such as YARN, MapReduce, HDFS, HBase, Zookeeper, Pig and Hive.
- In charge of installing, administering, and supporting Windows and Linux operating systems in an enterprise environment.
- Involved in Installing and configuring ranger for the authentication of users and Hadoop daemons.
- Experience in methodologies such as Agile, Scrum, and Test-driven development.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration, Datawarehouse, and Migration, and installation on Kafka.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS) Used Python and Django creating graphics, XML processing, data exchange and business logic
- Created Oozie workflows to run multiple MR, Hive and pig jobs.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Develop Spark code using Scala and Spark-SQL for faster testing and data processing
- Involved in the development of Spark Streaming application for one of the data sources using Scala, Spark by applying
Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python.
Confidential
Hadoop Admin
Responsibilities:
- Evaluate business requirements and prepare detailed specifications that follow project guidelines required to develop written programs.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
- Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Installed, configured, monitored, and maintained HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
- Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python. Worked with business teams and created Hive queries for ad hoc access.
- Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn
- Hands on experience in installing, configuring MapR, Hortonworks clusters and installed Hadoop ecosystem components like Hadoop, Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume, Zookeeper.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Worked on Scripting Hadoop package installation and configuration to support fully-automated deployments
- Supported Hadoop developers and assisting in optimization of map reduce jobs, Pig Latin scripts, Hive Scripts and HBase ingest required.
- Defined job flows and managed and reviewed Hadoop and HBase log files.
- Ran Hadoop streaming jobs to process terabytes of text data.
- Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups.
- Implemented Hadoop stack and different bigdata analytic tools, migration from different databases to Hadoop (Hdfs)
- Developed backup policies for Hadoop systems and action plans for network failure.
- Involved in the User/Group Management in Hadoop with AD/LDAP integration.
- Resource management and load management using capacity scheduling and appending changes according to requirements.
- Implemented strategy to upgrade entire cluster nodes OS from RHEL5 to RHEL6 and ensured cluster remains up and running.
- Developed scripts in shell and python to automate lot of day to day admin activities.
- Installed several projects on Hadoop servers and configured each project to run jobs and scripts successfully
- Created user accounts and given users the access to the Hadoop cluster.
- Resolved tickets submitted by users, troubleshot the error documenting and resolved the errors.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and Sparks for faster testing and processing of data.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Puppet, Zookeeper, HBase, Flume, Ganglia, Sqoop, Linux, CentOS, Ambari.
Confidential
Selenium Tester
Responsibilities
- Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
- Developed Test Plans, built test cases and test data sets based upon documented system requirements.
- Provided input on automating nightly builds, Integration, and regression, as well as exploratory and acceptance testing.
- Participated in process improvement and quality control activities.
- Review and understand business requirements/use cases and functional details • Analyzed and implemented testing methods and equipment that aided to increase the team's confidence in the system.
- Evaluated test results and recommended changes in procedures to validate implementation against requirements.
- Participating in daily standups, Sprint planning, retrospective and grooming sessions.
- Conducting ATDD sessions with developers, UAT testers and product owner.
- Giving Demos of new features to PO and Stakeholders at the end of each Sprint.
- Analyzed and selected the test cases for automation of Java and Web application
- Performing manual testing of features within each sprint and automate features from previous Sprint.
- Create frame work using TestNG and Web driver • Parameterize the test for multiple sets of data testing • Followed Agile Methodology (SCRUM) for this project.
- Arrange test suites to be able to upgrade tests easily in the event any feature changes.
- Write Test plan and test case for the new features.
- Modify the existing test cases based on change in a feature and requirements.
- Using JIRA as a defect tracking tool for Product backlog and reporting bugs.
- Tested mobile UIs on iPhone, iPad, Android, BlackBerry, windows and other smart phones. Experienced in cross platform web testing on web browsers, iOS and Android devices.
- Helped in creating automation script using Selenium driver using JAVA, Eclipse, XPATH, CSS, Firebug, Fire Path, Diff Browsers IE, Firefox, Chrome. Mobile application automation using Appium and Eclipse JAVA, Android Driver, Virtual Device Simulator.
- Experience in Dara Driving from excel for feeding data into Appium Testcases.
- Working on Android and iOS Automation Tools (Selenium, and Appium) for testing Native apps • Documenting test scenarios and test cases in a test case management system.
- Assisting UAT testers with data setup and execute business scenarios.
- Writing SQL queries to setup/modify test date in Oracle database.
- Performing Web automation in Selenium using JUnit framework and performing Mobile Web Manual testing..
Environment : HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Quality Center, Win Runner, LoadRunner, QTP, SQL Server 2000, VB.net
