We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Jersey City New, JerseY

PROFESSIONAL SUMMARY:

  • Over 7 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem.
  • Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie.
  • Strong understanding of various Hadoop services, MapReduce and YARN architecture.
  • Responsible for writing Map Reduce programs.
  • Experienced in importing-exporting data into HDFS using SQOOP.
  • Experience loading data to Hive partitions and creating buckets in Hive.
  • Developed Map Reduce jobs to automate transfer the data from HBase.
  • Expertise in analysis using PIG, HIVE and MapReduce.
  • Experienced in developing UDFs for Hive, PIG using Java.
  • Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
  • Scheduling all Hadoop/hive/Sqoop/HBase jobs using Oozie.
  • Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
  • Experience in defining detailed application software test plans, including organization, participant, schedule, test and application coverage scope.
  • Experience in gathering and defining functional and user interface requirements for software applications.
  • Experience in real time analytics with Apache Spark (RDD, Data Frames and Streaming API).
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  • Experience in integrating Hadoop with Kafka. Expertise in uploading Click stream data from Kafka to HDFS.
  • Expert in utilizing Kafka for messaging and publishing subscribe messaging system.

PROFESSIONAL EXPERIENCE:

Confidential, Jersey City, New Jersey

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple maps reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Coordinated with business customers to gather business requirements. And interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
  • Extensively involved in the Design phase and delivered Design documents.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in map reduce way.
  • Experienced in defining job flows.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Load and Transform large sets of structured data.
  • Responsible to manage data coming from different sources.
  • Involved in creating Hive Tables, loading data and writing Hive queries.
  • Utilized Apache Hadoop environment by Cloudera.
  • The created Data model for Hive tables
  • Involved in Unit testing and delivered Unit test plans and results in documents.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Worked on Oozie workflow engine for job scheduling.
  • Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
  • Monitored all MapReduce Read Jobs running on the cluster using Cloudera Manager and ensured that they were able to read the data to HDFS without any issues.
  • Involved in moving all log files generated from various sources to HDFS for further processing.
  • Involved in collecting metrics for Hadoop clusters using Ganglia.
  • Worked on Kerberos Hadoop cluster with 250 nodes cluster.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Involved in loading data from UNIX file system to HDFS.
  • Created root cause analysis (RCA) efforts for the high severity incidents.
  • Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
  • Coordinating with On-call Support if human intervention is required for problem solving.
  • Make sure that the analytics data is available on-time for the customers which in turn provides them insight and helps them make key business decisions.
  • Aimed at providing a delightful data experience to our customers who are the different business groups across the organization.
  • Provided updates in daily SCRUM and self-planning on start of sprint and provided the planned task using JIRA. In sync up with team in order to pick priority task and update necessary documentation in WIKI.
  • Weekly meetings with Business partners and active participation in review sessions with other developers and Manager.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Performed real time analysis on the incoming data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

Confidential, New York, New York

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Using PIG predefined functions to convert the fixed width file to delimited file.
  • Used Python and Django creating graphics, XML processing, data exchange and business logic
  • Installed Name node, Secondary name node, (Resource Manager, Node manager, Application master), Data node using Cloudera.
  • Installed and configured Cloudera for easy management of existing Hadoop cluster, Installed and Configured.
  • Installed and configured multi-nodes fully distributed Hadoop cluster of large number of nodes.
  • Provided Hadoop, OS, Hardware optimizations.
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Understanding the performance bottlenecks by analyzing the existing Hadoop cluster and provided performance tuning accordingly.
  • Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
  • Installed and configured Hadoop components Hdfs, Hive, HBase.
  • Communicating with the development teams and attending daily meetings.
  • Addressing and Troubleshooting issues on a daily basis.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
  • Cluster maintenance as well as creation and removal of nodes.
  • Monitor Hadoop cluster connectivity and security.
  • Manage and review Hadoop log files.
  • Configured the cluster to achieve the optimal results by fine-tuning the cluster.
  • Dumped the data from one cluster to other cluster by using DISTCP and automated the dumping procedure using shell scripts.
  • Designed the shell script for backing up of important metadata and rotating the logs on a monthly basis.
  • Implemented open source monitoring tool GANGLIA for monitoring the various services across the cluster.
  • Testing, evaluation and troubleshooting of different NoSQL database systems and cluster configurations to ensure high-availability in various crash scenarios.
  • Performance tuning and stress testing of NoSQL database environments to ensure acceptable database performance in production mode.
  • Designed the cluster so that only one secondary name node daemon could be run at any given time.
  • Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Provided the necessary support to the ETL team when required.
  • Integrated Nagios in the Hadoop cluster for alerts.
  • Performed both major and minor upgrades to the existing cluster and rolling back to the previous version.
  • Created Oozie workflows to run multiple MR, Hive and pig jobs.
  • Involved in the development of Spark Streaming application for one of the data sources using Scala, Spark by applying the transformations.
  • Import the data from different sources like HDFS/MYSQL into SparkRDD.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, NLP (Natural Language Processing)

Confidential, New York, New York

Selenium Tester

Responsibilities:

  • Hands on experiences with Java development using IntelliJ and connecting to Hadoop environment.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
  • Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
  • Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
  • Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations.
  • Performed cross verification of trade entry between mainframe system, its web application and downstream system.
  • Extensively used Selenium WebDriver API (XPath and CSS locators) to test the web application.
  • Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
  • Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files.
  • Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system. \ Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
  • Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
  • Expertise in debugging issues occurred in front end part of web-based application, which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
  • Developed smoke automation test suite for regression test suite.
  • Applied various testing technique in test cases to cover all business scenario for quality coverage.
  • Interacted with development team to understand design flow, code review, discuss unit test plan.
  • Executed tests in System & integration Regression testing In Testing environment.
  • Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
  • Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.

Environment : HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.

We'd love your feedback!