We provide IT Staff Augmentation Services!

Hadoop Admin/developer Resume

4.00/5 (Submit Your Rating)

Houston, TX

SUMMARY:

  • Over 7 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem.
  • Experience in configuring, installing and managing MapR, Hortonworks & Cloudera Distributions.
  • Hands on experience in installing, configuring, monitoring and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, Oozie, Apache Spark, Impala.
  • Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
  • Working knowledge of monitoring tools and frameworks such as Splunk, Influx DB, Prometheus, SysDig, Data Dog, App - Dynamics, New Relic, and Nagios.
  • Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
  • Standardize Splunk forwarder deployment, configuration and maintenance across a variety of Linux platforms. Also worked on Devops tools like Puppet and GIT.
  • Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
  • Experience with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks, Cloudera and Map Reduce.
  • Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
  • Experience in Ranger, Knox configuration to provide the security for Hadoop services (hive, base, hdfs etc.).
  • Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
  • Developed automated scripts using Unix Shell for performing RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
  • Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
  • Experienced with deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure. Excellent knowledge of NOSQL databases like HBase, Cassandra.
  • Experience in large scale Hadoop cluster, handling all Hadoop environment builds, including design, cluster setup, performance tuning
  • Release process implementation like Devops and Continuous Delivery methodologies to existing Build and Deployments. Experience with scripting languages python, Perl or shell script also.
  • Involving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, also.
  • Modified reports and Talen ETL jobs based on the feedback from QA testers and Users in development and staging environments.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Deployed Grafana Dashboards for monitoring cluster nodes using Graphite as a Data Source and collect as a metric sender.
  • Experienced in workflow scheduling and monitoring tool Run deck and Control-M.
  • Proficiency with the application servers like Web Sphere, WebLogic, JBOSS and Tomcat.
  • Working experience on designing and implementing complete end to end Hadoop Infrastructure.
  • Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
  • Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data.
  • Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Cassandra, MongoDB & Elastic Search.

PROFESSIONAL EXPERIENCE:

Confidential, Houston, TX

Hadoop Admin/Developer

Responsibilities:

  • Worked on developing architecture document and proper guidelines
  • Worked on Hadoop cluster with 450 nodes on Cloudera distribution 7.7.0.
  • Tested loading the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's.
  • Experience with Solr integration with HBase using Lily indexer/Key-Value Indexer.
  • Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion, SOLR and HBase for and real-time querying.
  • Used TIBCO Administrator to manage TIBCO Components, to monitor and manage the deployments.
  • Experience in setup, configuration and management of Apache Sentry for Role-based authorization and privilege validation for Hive and Impala Services.
  • Implement, document, configure, write queries, develop custom apps, support Splunk Indexers, Indexing and Field extractions using Splunk IFX, Forwarders, light weight forwarders and Splunk web for Splunk 5.x or search heads for Splunk 5.x/6. X.
  • I successfully set up a no authentication Kafka listener in parallel with Kerberos (SASL) Listener. Also, I tested non-authenticated user (Anonymous user) in parallel with Kerberos user.
  • Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
  • Worked on Kafka Backup Index, Log4j minimized logs and Pointed Ambari server logs to NAS Storage.
  • Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
  • Involved in installation of MapR and upgrade from MapR 5.0 to MapR 5.2.
  • Worked with CDH4 as well as CDH5 applications. Performed Data transfer of large data back and forth from development and production clusters.
  • Managed mission-critical Hadoop cluster and Kafka Confidential production scale, especially Cloudera distribution.
  • Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption Confidential Rest)
  • Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
  • Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberos environments.
  • Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
  • Assisted in configuration, development and testing of Autosys JIL and other scripts.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Deployed Dashboards for monitoring cluster nodes using Graphite as a Data Source and collect as a metric sender.
  • Worked on Navigator API to export Denied Access on Cluster to prevent security threat.
  • Worked with Hadoop tools like Flume, Hive, Sqoop and Oozie on the Hadoop cluster.
  • Experience in workflow scheduling and monitoring tool Run-deck and Control-M.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Java, Puppet, Apache Yarn, Pig, Spark, Tableau, Machine Learning.

Confidential

Hadoop Admin/Architect

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
  • Assign access to users by multiple users’ login.
  • Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
  • Having knowledge on documenting processes, server diagrams, preparing server requisition documents
  • Created topic for different users
  • Installed Zookeepers, brokers, schema registry, control Center on multiple machine.
  • Setup ACL/SSL security for different users and assign users to multiple topics
  • Develop security for users and they can connect with SSL security
  • Assign access to users by multiple user’s login.
  • Created documentation processes, server diagrams, preparing server requisition documents and upload them in Share point
  • Used Puppet for automation of deployment to the server
  • Monitor errors, warning on the server using Splunk.
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Created POC on AWS based on the service required by the project
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters
  • Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound, or CPU bound
  • Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
  • Working on setting up 100 node production cluster and a 40-node backup cluster Confidential two different data centers
  • Automated the setup of Hadoop Clusters and creation of Nodes
  • Monitor the improvement of CPU utilization and maintain it.
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to HDFS.
  • Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Implemented test scripts to support test driven development and continuous integration.
  • Optimization and Tuning the application
  • Created User Guide Development and Training overviews for supporting teams
  • Provide troubleshooting and best practices methodology for development teams. This includes process automation and new application onboarding
  • Design monitoring solutions and baseline statistics reporting to support the implementation
  • Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.
  • Extremely good knowledge and experience with Map Reduce, Spark Streaming, Sparks for data processing and reporting.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and Sparks for faster testing and processing of data.
  • Used Apache Kafka for importing real time network log data into HDFS.
  • Developed business specific Custom UDF's in Hive, Pig.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Optimized Map Reduce code by writing Pig Latin scripts.
  • Import data from external table into HIVE by using load command
  • Created table in hive and use static, dynamic partition for data slicing mechanism
  • Working experience with monitoring cluster, identifying risks, establishing good practices to be followed in shared environment
  • Good understanding on cluster configurations and resource management using YARN
  • Worked on tuning the performance of MapReduce Jobs.
  • Responsible to manage data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Using PIG predefined functions to convert the fixed width file to delimited file.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python.

Confidential

Hadoop Admin

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Manage Critical Data Pipelines that power analytics for various business units.
  • Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
  • Worked on Performance tuning on Hive SQLs.
  • Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
  • Monitored all MapReduce Read Jobs running on the cluster using Cloudera Manager and ensured that they were able to read the data to HDFS without any issues.
  • Involved in moving all log files generated from various sources to HDFS for further processing.
  • Involved in collecting metrics for Hadoop clusters using Ganglia.
  • Worked on Kerberos Hadoop cluster with 250 nodes cluster.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Responsible for deploying patches and remediating vulnerabilities.
  • Experience in setting up Test, QA, and Prod environment.
  • Involved in loading data from UNIX file system to HDFS.
  • Created root cause analysis (RCA) efforts for the high severity incidents.
  • Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
  • Coordinating with On-call Support if human intervention is required for problem solving.
  • Make sure that the analytics data is available on-time for the customers which in turn provides them insight and helps them make key business decisions.
  • Aimed Confidential providing a delightful data experience to our customers who are the different business groups across the organization.
  • Worked on Alert mechanism to support production cluster/workflows in effective manner and daily running jobs in effective manner to meet SLA.
  • Involved in providing operational support to the platform and also following best practices to optimize the performance of the environment.
  • Involved with various teams on and offshore for understanding of the data that is imported from their source.
  • Provided updates in daily SCRUM and self-planning on start of sprint and provided the planned task using JIRA. In sync up with team in order to pick priority task and update necessary documentation in WIKI.
  • Weekly meetings with Business partners and active participation in review sessions with other developers and Manager.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Performed real time analysis on the incoming data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

Confidential

SDET

Responsibilities:

  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
  • Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
  • Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
  • Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations.
  • Performed cross verification of trade entry between mainframe system, its web application and downstream system.
  • Extensively used Selenium WebDriver API (XPath and CSS locators) to test the web application.
  • Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
  • Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files.
  • Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system. \ Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
  • Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
  • Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
  • Developed smoke automation test suite for regression test suite.
  • Applied various testing technique in test cases to cover all business scenario for quality coverage.
  • Interacted with development team to understand design flow, code review, discuss unit test plan.
  • Executed tests in System & integration Regression testing In Testing environment.
  • Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
  • Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.

Environment : HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.

Confidential

Software Test Engineer

Responsibilities:

  • Involved in preparing Test Plan and Test cases.
  • Developed test cases for automation team for regression testing.
  • Formulated methods to perform positive and negative testing against requirements.
  • Performed backend testing using SQL queries.
  • Reported bugs found during test using Quality Center.
  • Conducted functional, regression, black box and system testing.
  • Reviewed functional design for internal product documentation.
  • Used Quality Center for requirements management, planning, scheduling, running tests defect tracking and managing the defects.
  • Analyzed, tested, and certified application-specific software and performed ambiguity reviews of business requirements and functional specification documents.
  • Developed Manual Test cases and test scripts to test the functionality of the application.
  • Provided test results, graphs, and analysis of application performance data by email or phone during testing to the application developer and manager.
  • Implemented Automated-testing methodology such as Data Driven Testing, Key Word Driven Testing methods.
  • Created and executed regression scripts using Quick Test Professional.
  • Inserted various check points, parameterized the test scripts, and performed regular expression on scripts.
  • Documented tests bug in Quality Center.

We'd love your feedback!