We provide IT Staff Augmentation Services!

Hadoop Admin Resume

New York New, YorK

SUMMARY:

  • 5 years of IT Operations experience with 3+ years of experience in Hadoop Administration and 2+ years of experience in Software Development
  • Excellent understanding of Distributed Systems and Parallel Processing architecture.
  • Worked on components like HDFS, Map/Reduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, and Fair Scheduler), Spark and Kafka.
  • Experience in managing Cloudera, Hortonworks and MapR distributions.
  • Strong knowledge on Hadoop HDFS architecture and Map - Reduce framework.
  • Involved in vendor selection and capacity planning for the Hadoop cluster in production.
  • Experience in Administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
  • Experience in performing backup, recovery, failover and DR practices on multiple platforms.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Experience with automation for provisioning system resources using puppet.
  • Strong knowledge in configuring Name Node High Availability and Name Node Federation.
  • Experienced in writing the automatic scripts for monitoring the file systems, key Map R services.
  • Implementing Hadoop based solution to store archives and backups from multiple sources.
  • Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and also using fast loaders and connectors Experience.
  • Worked with architecture in Hadoop hardware and software design.
  • Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, OpenStack.
  • Built ingestion framework using flume for streaming logs and aggregating the data into HDFS.
  • Worked with application team via scrum to provide operational support, install Hadoop updates, patches and version upgrades as required.
  • Expertise in importing and exporting data into HDFS format, preprocessed data into the commercial Analytic database - RDBMS.
  • Experience in installation, upgrading, configuration, monitoring supporting and managing in Hadoop Clusters using Cloudera CDH4, CDH5, Hortonworks HDP 2.1, 2.2 and 2.3 on Ubuntu, Redhat, Centos systems.
  • Experience in Installing and monitoring standalone multi-node Clusters of Kafka.
  • Performance tuning Apache Kafka on clusters.
  • Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and Hbase.
  • Implemented authentication using Kerberos and authentication using Apache Sentry.
  • Exposure to installing Hadoop and its ecosystem components such as Hive and Pig.
  • Experience in systems & network design physical system consolidation through server and storage virtualization, remote access solutions.
  • Organized in using Zendesk for support and JIRA for development tracking.
  • Experience in understanding and managing Hadoop Log Files, experience in managing the Hadoop infrastructure with Cloudera Manager.
  • Experience on SQL DBA in High Availability and Disaster Recovery like Replication, Log shipping, Mirroring and Clustering and database security and permissions
  • Worked with highly transactional merchandise and investment in SQL databases with PCI, HIPAA compliance involving data encryption with certificates and security keys at various levels
  • Experience in upgrading SQL server software to new versions and applying service packs and patches.
  • Experience in providing good production support for 24x7 over weekends on rotation basis.

PROFESSIONAL EXPERIENCE:

Confidential, New York, New York

Hadoop Admin

Responsibilities:

  • Worked on developing architecture document and proper guidelines
  • Worked on installing Kafka on Virtual Machine.
  • Created topic for different users
  • Installed Zookeepers, brokers, schema registry, control Center on multiple machine.
  • Setup ACL/SSL security for different users and assign users to multiple topics
  • Develop security for users and they can connect with SSL security
  • Assign access to users by multiple user’s login.
  • Created documentation processes, server diagrams, preparing server requisition documents and upload them in Share point
  • Used Puppet for automation of deployment to the server
  • Monitor errors, warning on the server using Splunk.
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters
  • Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
  • Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
  • Automated the setup of Hadoop Clusters and creation of Nodes
  • Monitor the improvement of CPU utilization and maintain it.
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to HDFS.
  • Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Implemented test scripts to support test driven development and continuous integration.
  • Optimization and Tuning the application
  • Created User Guide Development and Training overviews for supporting teams
  • Provide troubleshooting and best practices methodology for development teams. This includes process automation and new application onboarding
  • Design monitoring solutions and baseline statistics reporting to support the implementation
  • Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.
  • Extremely good knowledge and experience with Map Reduce, Spark Streaming, Sparks for data processing and reporting.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and Sparks for faster testing and processing of data.
  • Used Apache Kafka for importing real time network log data into HDFS.
  • Developed business specific Custom UDF's in Hive, Pig.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Optimized Map Reduce code by writing Pig Latin scripts.
  • Import data from external table into HIVE by using load command
  • Created table in hive and use static, dynamic partition for data slicing mechanism
  • Working experience with monitoring cluster, identifying risks, establishing good practices to be followed in shared environment
  • Good understanding on cluster configurations and resource management using YARN

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Java, Puppet, Apache Yarn, Pig, Spark, Tableau, Machine Learning.

Confidential

Hadoop Admin

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
  • Assign access to users by multiple users’ login.
  • Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
  • Having knowledge on documenting processes, server diagrams, preparing server requisition documents
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
  • Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
  • Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to HDFS.
  • Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Using PIG predefined functions to convert the fixed width file to delimited file.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python-MySQL connector MySQL dB package to retrieve information.
  • Created Oozie workflows to run multiple MR, Hive and pig jobs.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Develop Spark code using Scala and Spark-SQL for faster testing and data processing
  • Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
  • Import the data from different sources like HDFS/MYSQL into SparkRDD.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, NLP (Natural Language Processing)

Confidential

SDET

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Involved in various meetings with Business analysts and developers.
  • Conducted Functional testing, Regression Testing using selenium with Data-driven framework and Key- Word driven framework.
  • Created automation test scripts using data driven framework and Page Factory model to test the web applications using Selenium WebDriver with JAVA and maven.
  • Expertise in using Selenium Grid to run test scripts on different platforms and against different browsers in parallel to save time.
  • Experience in working with the Protractor.
  • Used Jenkins to execute the test scripts periodically on Selenium Grid for different platforms like desktop, tablet and mobile.
  • Performed Cross browser testing and parallel testing on Chrome, Firefox and Safari using TestNG and Maven on Selenium grid.
  • Expertise in grouping of test suites, test cases and test methods for regression and functional testing using TestNG Annotations like Groups, Parameter, Data Provider and Tags.
  • Utilized Maven to manage dependencies for test execution, plug-ins and created profiles of grouped test cases to run sanity and regression testing.
  • Integrated Jenkins with version controller (GIT) and scheduled builds to run automatically during a build release by invoking predefined maven commands.
  • Used Linux/Unix commands for using GitHub through cmd.
  • Involved in developing BDD Frame work from Scratch.
  • Involved in the usage of BDD framework to develop Cucumber Step Definitions, Scenarios and Features using acceptance criteria
  • Utilized Apache POI jar file to read test data from the excel spread sheets and load them into required test cases.
  • Administered and Engineered Jenkins for managing weekly Build, Test, and Deploy chain, SVN/GIT with Dev/Test/Prod Branching Model for weekly releases.
  • Handled Selenium Synchronization problems using Explicit & Implicit waits during regression testing.
  • Experience in writing complex XPATH using following and preceding and also using functions like contains and not contains.
  • Executed test cases in real device for both mobile app and mobile website.
  • Thorough experience in implementing Automation tools Selenium WebDriver, JUnit, TestNG, Eclipse, Git/GitHub, Jenkins, SOAP UI and REST with POSTMAN.
  • Used cucumber to automate services using Rest API.
  • Used runner classes in cucumber to generate step definition and also used tags to run different kinds of test suites like smoke, health check and regression.
  • Created profiles in maven to launch specific TestNG suite from Jenkins job
  • Implemented SOAP UI tool to test SOAP based architecture application to test SOAP services and RESTAPI.
  • Used the Groovy language to verify Webservices through SOAP UI.
  • Experience in testing the cloud platform.
  • Shared Daily Status Reports with all the team members, Team Leads, Managers.

Environments: Selenium IDE, Groovy, RC Web Driver, Cucumber, HPQC, My Eclipse, JIRA, MySQL, Oracle, Java, JavaScript .Net, Python, Microservices, Restful API Testing, JMeter, VBScript, JUnit, TestNG, Firebug, Xpath, Windows

Confidential

SDET

Responsibilities

  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
  • Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
  • Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
  • Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations.
  • Performed cross verification of trade entry between mainframe system, its web application and downstream system.
  • Extensively used Selenium WebDriver API (XPath and CSS locators) to test the web application.
  • Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
  • Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files.
  • Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system. \ Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
  • Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
  • Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
  • Developed smoke automation test suite for regression test suite.
  • Applied various testing technique in test cases to cover all business scenario for quality coverage.
  • Interacted with development team to understand design flow, code review, discuss unit test plan.
  • Executed tests in System & integration Regression testing In Testing environment.
  • Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
  • Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.

Environment: HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.

Hire Now