Hadoop Developer Resume
New York New, YorK
PROFESSIONAL SUMMARY:
- Around 9+ years of professional IT industry experience encompassing wide range of skill set in Big Data technologies
- 5+ years of experience in working with Big Data Technologies on systems which comprises of massive amount of data running in highly distributed Hadoop environment.
- Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, Zookeeper, Kafka, Flume, MapReduce and Yarn.
- Strong Knowledge on architecture and components of Spark, and efficient in working with Spark Core, SparkSQL, Spark streaming.
- Implemented Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used spark and spark - shell accordingly.
- Experience in configuring Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Experience in importing and exporting data using stream processing platforms like Flume and Kafka.
- Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive UDF's as required.
- Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
- Used Spark Data Frame Operations to perform required Validations in the data.
- Experience in integrating Hive queries into Spark environment using Spark SQL.
- Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra.
- Worked on HBase to load and retrieve data for real time processing using Rest API.
- Excellent understanding and knowledge of job workflow scheduling and locking tools/services like Oozie and Zookeeper.
- Experienced in designing different time driven and data driven automated workflows using Oozie.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
- Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie.
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on AWS.
- Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Good Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
- Experience in relational databases like Oracle, MySQL and SQL Server.
- Experienced in using Integrated Development environments like Eclipse, NetBeans, IntelliJ, Spring Tool Suite.
- Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and Worked on various version control tools like GIT, SVN.
- Experienced in working in SDLC, Agile and Waterfall Methodologies.
- Excellent Communication skills, Interpersonal skills, problem solving skills and a team player. Ability to quickly adapt new environment and technologies.
- Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie.
- Strong understanding of various Hadoop services, MapReduce and YARN architecture.
- Responsible for writing Map Reduce programs.
- Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
- Experience in defining detailed application software test plans, including organization, participant, schedule, test and application coverage scope.
- Experience in gathering and defining functional and user interface requirements for software applications.
- Experience in real time analytics with Apache Spark (RDD, Data Frames and Streaming API).
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
- Experience in integrating Hadoop with Kafka. Expertise in uploading Click stream data from Kafka to HDFS.
- Expert in utilizing Kafka for messaging and publishing subscribe messaging system.
PROFESSIONAL EXPERIENCE:
Confidential, New York, New York
Hadoop Developer
Responsibilities:
- Developed data pipelines using Stream sets Data Collector to store data from Kafka into HDFS, Solr, Elastic Search, HBase and MapR DB.
- Event Streaming on different stages on Stream sets Data Collector, running a MapReduce job on event triggers to convert Avro to Parquet.
- Real time streaming, performing transformations on the data using Kafka and Kafka Streams.
- Built NiFi dataflow to consume data from Kafka, make transformations on data, place in HDFS & exposed port to run spark streaming job.
- Developed Spark Streaming Jobs in Scala to consume data from Kafka Topics, made transformations on data and inserted to HBase.
- Worked on NoSQL databases including HBase and MongoDB. Configured MySQL Database to store Hive metadata.
- Utilized Apache Hadoop environment by Cloudera Distribution.
- Deployment and administration of Splunk and Hortonworks Distribution.
- Performed analysis on the unused user navigation data by loading into HDFS and writing Map Reduce jobs. The analysis provided inputs to the new APM front end developers and lucent team.
- Configured Zoo Keeper, Cassandra & Flume to the existing Hadoop cluster.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Developed complex queries using Hive and Impala.
- Worked in Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Developed Map-Reduce jobs on Yarn and Hadoop clusters to produce daily and monthly reports.
- Developed a workflow using Oozie to automate the tasks of loading the data into HDFS from analyzing the data.
- Created Phoenix tables, mapped to HBase tables and implemented SQL queries to retrieve data.
- Streaming events from HBase to Solr using Lily HBase Indexer.
- Loaded data from csv files to spark, created data frames and queried data using Spark SQL.
- Created external tables in Hive, Loaded JSON format log files and ran queries using HiveQL.
- Involved in designing Avro schemas for serialization and Converting JSON data to Avro format.
- Designed HBase row key and Data-Modelling of data to insert to HBase Tables using concepts of Lookup Tables and Staging Tables.
- Created HBase tables using HBase API and HBase Shell commands and loaded data into the Tables.
- Captured the Metrics with Kibana, Logstash and Elasticsearch for Logs, Used Grafana for Monitoring.
- Worked with MapR, Cloudera and Hortonworks platforms as part of a Proof of concept.
- Used GIT for version control.
- Used Agile Scrum methodology/ Scrum Alliance for development.
- Involved in developing Hive DDLs to create, alter and drop tables.
- Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Also exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Pair RDDs, Storm, Spark YARN.
- Used many features like Parallelize, Partitioned, Caching (both in-memory and disk Serialization), Kryo Serialization, etc. Implemented Spark using Scala and Sparks for faster testing and processing of data
- Provide inputs on long term strategic vision and direction for the data delivery infrastructure including Microsoft BI stack implementations and Azure Advanced Data Analytic Solutions
- Evaluate existing data platform and apply technical expertise to create a data modernization roadmap and architect solutions to meet business and IT needs.
- Ensuring technical feasibility of new projects and successful deployments, orchestrating key resources and infusing key data technologies (e.g. Azure Data technologies like, Azure Data Lake, Azure Blog Storage, Azure SQL DB, Analysis Services)
- Utilized Microsoft data bricks to process spark jobs and blob storage services to process data.
- Worked on data fabrics to process data silos of a big data system.
- Make data fabrics to simplifies and integrate data management across cloud and on premises to accelerate digital transformation.
Environment: HDFS, Hadoop, Kafka, MapReduce, Nifi, Elastic Search, Spark, Impala, Hive, Avro, Parquet, Grafana, Scala, JAVA, H-Base, Cassandra, Horton Works, Zoo Keeper, Microsoft Azure, Azure Data Lake, Azure Blog Storage.
Confidential, San Jose, California
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Design and Deploy infrastructure in Azure.
- Front-end communication with Clients for on boarding their new projects/deliverables.
- Contributing in preparing High Level Diagram (HLD) and Low-Level Diagram (LLD) for on boarding clients.
- Clear Understanding of Microsoft Azure Subscription Components like EA Administrator, Account Owner, Service Admin and Co-Administrator.
- Configure Azure Auto scaling using Azure Scale Metric CPU. Configure and manage Network Security Groups (NSG).
- Create and Manage Custom Roles using Role-Based Access Control (RBAC).
- Install and Configure Windows Server Failover Cluster using SIOS Data keeper for Disk Sharing.
- Worked on JavaScript frame work (Angular JS, Backbone, Bootstrap) to augment browser-based applications with MVC capability
- Migration of Existing Infrastructure on Azure Service Manager (ASM) to Azure Resource Manager (ARM).
- Established connection from Azure to On-premise datacenter using Azure Express Route for Single and Multi-subscription connectivity.
- Convert VMware (vmdk) to Azure (vhd) using Microsoft Virtual Machine Converter (MVMC).
- Migrating Infrastructure from Azure Service Manager (ASM) to Azure Resource Manager (ARM). Responsible to manage data coming from different sources.
- Load and transform large sets of structured, semi structured and unstructured data
- Experience in managing and reviewing Hadoop log files.
- Job management using Fair scheduler.
- Migrating Services from On-premise to Azure Cloud Environments.
- Collaborate with development and QA teams to maintain high-quality deployment
- Designed Client/Server telemetry adopting latest monitoring techniques.
- Worked on Continuous Integration CI/Continuous Delivery (CD) pipeline for Azure Cloud Services using CHEF.
- Configured Azure Traffic Manager to build routing for user traffic
- Infrastructure Migrations: Drive Operational efforts to migrate all legacy services to a fully Virtualized Infrastructure.
- Implemented HA deployment models with Azure Classic and Azure Resource Manager.
- Configured Azure Active Directory and managed users and groups
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python-MySQL connector MySQL dB package to retrieve information.
- Developed various algorithms for generating several data patterns. Used JIRA for bug tracking and issue tracking. source using Scala, Spark by applying the transformations.
- Import the data from different sources like HDFS/MYSQL into SparkRDD.
- Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala,, Microsoft Azure, Azure Resource Manager, Yarn
Confidential, New Brunswick, New Jersey
Hadoop Developer
Responsibilities:
- Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
- Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
- Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Worked closely with the data modelers to model the new incoming data sets.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed real time analysis on the incoming data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.
Confidential, New York, New York
SDET
Responsibilities:
- Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
- Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
- Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
- Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
- Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations.
- Performed cross verification of trade entry between mainframe system, its web application and downstream system.
- Extensively used Selenium WebDriver API (XPath and CSS locators) to test the web application.
- Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
- Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files.
- Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system. \ Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
- Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
- Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
- Developed smoke automation test suite for regression test suite.
- Applied various testing technique in test cases to cover all business scenario for quality coverage.
- Interacted with development team to understand design flow, code review, discuss unit test plan.
- Executed tests in System & integration Regression testing In Testing environment.
- Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
- Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.
Environment : HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.
Confidential, Jersey City, New Jersey.
Software Test Engineer
Responsibilities:
- Involved in preparing Test Plan and Test cases.
- Used java with TestNg frame work for automating scripts.
- Developed test scripts to automate process of testing in Selenium WebDriver.
- Implemented Data Driven Frameworks to create parameterized test scripts and generate XSLT reports using Selenium Web driver and TestNg framework.
- Involved in writing Gherkins/scenarios and generated step definitions and methods using Cucumber, for different functionalities.
- Developed test scripts to automate process of testing in Selenium WebDriver
- Created automated Test scripts using automated tools and ran the test scripts on various Builds and instances.
- Implemented robust logging mechanisms within new and existing automated tests using log4j.
- Repository for version controlling.
- Performed Cross browser testing using Selenium Grid and Java on various browsers.
- Responsible for attending daily scrums and discuss daily issues and testing activities.
- Executed automated selenium scripts and reproduced failures manually.
- Developed and executed test cases for various web services using SOAPUI
- Prepared Traceability Matrix to show the test coverage requirement vs. Test scripts.
- Walkthroughs and peer review participation with team members and other project teams.
- Involved in web service testing with SOAP UI and validated various responses against annotations.
- Performed database testing by passing SQL Queries to retrieve data.
- Performed usability, GUI, Functionality and regression testing of the new builds.
- Performed browser (IE, Firefox, and Chrome) and platform (Windows 7) compatibility testing.
- Identified and isolated software defects and reported them via JIRA.
- Attended Daily Scrums and reporting daily activities or issues to scrum master.
- Performed functional, compatibility testing on different browsers like Firefox, Chrome and IE.
- Used GIT as a version control tool to store the test scripts.
- Responsible for tracking daily testing activities and provide daily testing updates to higher management.
Environment : Java, Cucumber, Selenium, Web Driver, Data Driven, Test NG, Eclipse, Jira, SOAP UI v4.5, Oracle v9i/8i, XML, SQL, Windows 7, MS Project, HTML, Firebug, Fire path, Git.