We provide IT Staff Augmentation Services!

Hadoop Developer Resume

New York, NY

SUMMARY:

  • Over 8 years of professional IT industry experience and over 4 years of working experience in Big Data technologies and systems
  • Hands on experience in using Cloudera and Hortonworks Hadoop ecosystem components like Hadoop, MapReduce, Yarn, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Oozie, Zookeeper, Kafka and Flume
  • Configured Spark streaming to receive real - time data from the Apache Kafka and store the stream data to HDFS using Scala
  • Experience in importing and exporting data using stream processing platforms like Flume and Kafka.
  • Written Hive UDFs as required and executed complex HiveQL queries to extract data from Hive tables
  • Used partitioning and bucketing in Hive and designed both managed and external tables for performance optimization
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data
  • Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra
  • Experienced in workflow scheduling and locking tools/services like Oozie and Zookeeper
  • Practiced ETL methods in enterprise-wide solutions, data warehousing, reporting and data analysis
  • Experienced in working with AWS using EMR, EC2 for computing and S3 as storage mechanism
  • Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems
  • Used Pig scripts for transformations, event joins, filters and pre-aggregations before storing the data onto HDFS
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems
  • Good Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks
  • Experience in relational databases like Oracle, MySQL and SQL Server
  • Experienced in using IDEs like Eclipse, NetBeans, IntelliJ IDEA, Spring Tool Suite
  • Experience in JIRA and Rally for bug tracking and GitHub and SVN for various code reviews, unit testing
  • Experienced in working in all phases of SDLC - both agile and waterfall methodologies
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills

PROFESSIONAL EXPERIENCE:

HADOOP DEVELOPER

Confidential, New York, NY

Responsibilities:

  • Developed data pipelines using Stream sets Data Collector to store data from Kafka into HDFS, Solr, Elastic Search, HBase and MapR DB
  • Event Streaming on different stages on Stream sets Data Collector, running a MapReduce job on event triggers to convert Avro to Parquet
  • Real time streaming, performing transformations on the data using Kafka and Kafka Streams.
  • Built NiFi dataflow to consume data from Kafka, make transformations on data, place in HDFS & exposed port to run spark streaming job.
  • Developed Spark Streaming Jobs in Scala to consume data from Kafka Topics, made transformations on data and inserted to HBase.
  • Worked on NoSQL databases including HBase and MongoDB, configured MySQL Database to store Hive metadata.
  • Utilized Apache Hadoop environment by Cloudera Distribution.
  • Deployment and administration of Splunk and Hortonworks Distribution.
  • Performed analysis on the unused user navigation data by loading into HDFS and writing Map Reduce jobs. The analysis provided inputs to the new APM front end developers and lucent team.
  • Configured Zoo Keeper, Cassandra & Flume to the existing Hadoop cluster.
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Developed complex queries using Hive and Impala.
  • Worked in Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Developed Map-Reduce jobs on Yarn and Hadoop clusters to produce daily and monthly reports.
  • Developed a workflow using Oozie to automate the tasks of loading the data into HDFS from analyzing the data.
  • Created Phoenix tables, mapped to HBase tables and implemented SQL queries to retrieve data.
  • Streaming events from HBase to Solr using Lily HBase Indexer.
  • Loaded data from csv files to spark, created data frames and queried data using Spark SQL.
  • Created external tables in Hive, Loaded JSON format log files and ran queries using HiveQL.
  • Involved in designing Avro schemas for serialization and Converting JSON data to Avro format.
  • Designed HBase row key and Data-Modelling of data to insert to HBase Tables using concepts of Lookup Tables and Staging Tables.
  • Created HBase tables using HBase API and HBase Shell commands and loaded data into the Tables.
  • Captured the Metrics with Kibana, Logstash and Elasticsearch for Logs, Used Grafana for Monitoring.
  • Worked with MapR, Cloudera and Hortonworks platforms as part of a Proof of concept.
  • Used GIT for version control.
  • Used Agile Scrum methodology/ Scrum Alliance for development.
  • Involved in developing Hive DDLs to create, alter and drop tables.
  • Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Also exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Pair RDDs, Storm, Spark YARN.
  • Used many features like Parallelize, Partitioned, Caching (both in-memory and disk Serialization), Kryo Serialization, etc. Implemented Spark using Scala and Sparks for faster testing and processing of data
  • Provide inputs on long term strategic vision and direction for the data delivery infrastructure including Microsoft BI stack implementations and Azure Advanced Data Analytic Solutions
  • Evaluate existing data platform and apply technical expertise to create a data modernization roadmap and architect solutions to meet business and IT needs.
  • Ensuring technical feasibility of new projects and successful deployments, orchestrating key resources and infusing key data technologies (e.g. Azure Data technologies like, Azure Data Lake, Azure Blog Storage, Azure SQL DB, Analysis Services)
  • Utilized Microsoft data bricks to process spark jobs and blob storage services to process data.
  • Worked on data fabrics to process data silos of a big data system.
  • Make data fabrics to simplifies and integrate data management across cloud and on premises to accelerate digital transformation.

Environment: Hadoop, HDFS, Kafka, MapReduce, Nifi, Elastic Search, Spark, Impala, Hive, Avro, Parquet, Grafana, Scala, Java, HBase, Cassandra, Hortonworks, ZooKeeper, MS Azure Data Lake, Azure Blog Storage.

HADOOP DEVELOPER

Confidential, Detroit, MI

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Design and Deploy infrastructure in Azure.
  • Front-end communication with Clients for on boarding their new projects/deliverables.
  • Contributing in preparing High Level Diagram (HLD) and Low-Level Diagram (LLD) for on boarding clients.
  • Clear Understanding of Microsoft Azure Subscription Components like EA Administrator, Account Owner, Service Admin and Co-Administrator.
  • Configured Azure Auto scaling using Azure Scale Metric CPU. Configure and manage Network Security Groups
  • Create and Manage Custom Roles using Role-Based Access Control (RBAC).
  • Install and Configure Windows Server Failover Cluster using SIOS Data keeper for Disk Sharing.
  • Worked on JavaScript framework (Angular JS, Backbone, Bootstrap) to augment browser-based applications with MVC capability
  • Migration of Existing Infrastructure on Azure Service Manager (ASM) to Azure Resource Manager (ARM).
  • Established connection from Azure to On-premise datacenter using Azure Express Route for Single and Multi-subscription connectivity.
  • Convert VMware (vmdk) to Azure (vhd) using Microsoft Virtual Machine Converter (MVMC).
  • Migrating Infrastructure from Azure Service Manager (ASM) to Azure Resource Manager (ARM). Responsible to manage data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Migrating Services from On-premise to Azure Cloud Environments.
  • Collaborate with development and QA teams to maintain high-quality deployment
  • Designed Client/Server telemetry adopting latest monitoring techniques.
  • Worked on Continuous Integration CI/Continuous Delivery (CD) pipeline for Azure Cloud Services using CHEF.
  • Configured Azure Traffic Manager to build routing for user traffic
  • Infrastructure Migrations: Drive Operational efforts to migrate all legacy services to a fully Virtualized Infrastructure.
  • Implemented HA deployment models with Azure Classic and Azure Resource Manager.
  • Configured Azure Active Directory and managed users and groups
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python -MySQL connector MySQL dB package to retrieve information.
  • Developed various algorithms for generating several data patterns. Used JIRA for bug tracking and issue tracking. source using Scala, Spark by applying the transformations.
  • Import the data from different sources like HDFS/MYSQL into SparkRDD.
  • Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala,, Microsoft Azure, Azure Resource Manager, Yarn

HADOOP ADMIN

Confidential, New York, NY

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed S3 buckets creation, policies and IAM role based polices and customized the JSON templates.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Managed servers on AWS platform instances using Puppet, Chef Configuration management.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Assisted in upgrading, configuration and maintenance of Hadoop infrastructures like Pig, Hive, and HBase.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Transformed, cleaned and filtered imported data using Hive, Map Reduce and loaded data into HDFS
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Imported data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS and performed real time analysis on the incoming data
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

SDET

Confidential, New York, NY

Responsibilities:

  • Implemented and supported administration of Hadoop infrastructure and setting up infrastructure
  • Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
  • Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
  • Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
  • Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations.
  • Performed cross verification of trade entry between mainframe system, its web application and downstream system
  • Extensively used Selenium WebDriver API (XPath and CSS locators) to test the web application.
  • Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
  • Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files.
  • Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system.\ Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
  • Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
  • Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
  • Developed smoke automation test suite for regression test suite.
  • Applied various testing technique in test cases to cover all business scenario for quality coverage.
  • Interacted with development team to understand design flow, code review, discuss unit test plan.
  • Executed tests in System & integration Regression testing In Testing environment.
  • Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
  • Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.

Environment: HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node JS, Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer

SOFTWARE TEST ENGINEER

Confidential, Jersey City, NJ

Responsibilities:

  • Involved in preparing Test Plan and Test cases.
  • Used java with TestNg frame work for automating scripts.
  • Developed test scripts to automate process of testing in Selenium WebDriver.
  • Implemented Data Driven Frameworks to create parameterized test scripts and generate XSLT reports using Selenium Web driver and TestNg framework.
  • Involved in writing Gherkins/scenarios and generated step definitions and methods using Cucumber, for different functionalities.
  • Developed test scripts to automate process of testing in Selenium WebDriver
  • Created automated Test scripts using automated tools and ran the test scripts on various Builds and instances.
  • Implemented robust logging mechanisms within new and existing automated tests using log4j.
  • Repository for version controlling.
  • Performed Cross browser testing using Selenium Grid and Java on various browsers.
  • Responsible for attending daily scrums and discuss daily issues and testing activities.
  • Executed automated selenium scripts and reproduced failures manually.
  • Developed and executed test cases for various web services using SOAPUI
  • Prepared Traceability Matrix to show the test coverage requirement vs. Test scripts.
  • Walkthroughs and peer review participation with team members and other project teams.
  • Involved in web service testing with SOAP UI and validated various responses against annotations.
  • Performed database testing by passing SQL Queries to retrieve data.
  • Performed usability, GUI, Functionality and regression testing of the new builds.
  • Performed browser (IE, Firefox, and Chrome) and platform (Windows 7) compatibility testing.
  • Identified and isolated software defects and reported them via JIRA.
  • Attended Daily Scrums and reporting daily activities or issues to scrum master.
  • Performed functional, compatibility testing on different browsers like Firefox, Chrome and IE.
  • Used GIT as a version control tool to store the test scripts.
  • Responsible for tracking daily testing activities and provide daily testing updates to higher management.

Environment: Java, Cucumber, Selenium, Web Driver, Data Driven, Test NG, Eclipse, Jira, SOAP UI v4.5, Oracle v9i/8i, XML, SQL, Windows 7, MS Project, HTML, Firebug, Fire path, Git.

Hire Now