We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

New York New, YorK

SUMMARY:

  • Over 10 years of extensive Professional IT experience, including 4 years of Hadoop experience, capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture
  • Strong working experience with Big Data and Hadoop Ecosystems including HDFS, PIG, HIVE, HBase, Yarn, Sqoop, Flume, Oozie, Hue, MapReduce and Spark.
  • Hands on experience in installing supporting and managing Hadoop Clusters using Cloudera and Hortonworks distribution of Hadoop.
  • Extensive experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in python.
  • Expertise in deployment of Hadoop, Yarn, Spark integration with Cassandra, etc.
  • Good knowledge of Partitions, bucketing concepts, designed and managed them and created external tables in Hive in order to optimize performance.
  • Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Strong experience and knowledge of real time data analytics using Spark, Kafka and Flume.
  • Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce (EMR) on (EC2).
  • Collected logs data from various sources and integrated in to HDFS using Flume.
  • Extensively implemented POC's on migrating to Spark-Streaming to process the live data.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Hands on with real time data processing using distributed technologies Storm and Kafka.
  • Experience in implementing Kerberos authentication protocol in Hadoop for data security.
  • Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
  • Used Different Spark Modules like Spark core, Spark RDD's, Spark Data frame, Spark SQL.
  • Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
  • Converted Various Hive queries into Spark transformations and Actions that are required
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using CSS, Avro, Parquet, JSON and CSV.
  • Good experience with use-case development, with Software methodologies like Agile and Waterfall.
  • Experience in implementing Kerberos authentication protocol in Hadoop for data security.
  • Strong command over relational databases: MySQL, Oracle, SQL Server and MS Access.
  • Hands on experience with J-Unit and Log4j in developing test cases and determining application functionality.
  • Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud. Experience in upgrading SQL server software to new versions and applying service packs and patches.
  • Experience in providing good production support for 24x7 over weekends on rotation basis.

PROFESSIONAL EXPERIENCE:

Confidential, New York, New York

Hadoop Developer

Responsibilities:

  • Job duties include design and development of various modules in Hadoop Big Data platform and processing data using MapReduce, Hive, SQOOP, Kafka and Oozie.
  • Developed job processing scripts using Oozie workflow
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations using Python.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Used different Hadoop components in Talend to design the framework.
  • Involved in Hadoop cluster task like commissioning & decommissioning Nodes without any effect to running jobs and data.
  • Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model and persists the data in Mongo DB
  • Wrote Map Reduce jobs to discover trends in data usage by users.
  • Worked extensively with Sqoop for importing metadata from Oracle. Used Sqoop to import data from SQL server to Cassandra.
  • Real streaming the data using Spark with Kafka.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as we as RDBMS and NoSQL data stores for data access and analysis.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Involved in installing, configuring and managing Hadoop Ecosystem components like Hive, Pig, Sqoop, Kafka and Flume.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Wrote Hive Queries and UDF's.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Good knowledge of Partitions, bucketing concepts, designed and managed them and created external tables in Hive in order to optimize performance.
  • Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Collected logs data from various sources and integrated in to HDFS using Flume.
  • Extensively implemented POC's on migrating to Spark-Streaming to process the live data.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Hands on with real time data processing using distributed technologies Storm and Kafka.
  • Experience in implementing Kerberos authentication protocol in Hadoop for data security.
  • Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
  • Provide troubleshooting and best practices methodology for development teams. This includes process automation and new application onboarding
  • Design monitoring solutions and baseline statistics reporting to support the implementation
  • Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.
  • Extremely good knowledge and experience with Map Reduce, Spark Streaming, Sparks for data processing and reporting.
  • Extensively used Aws Services (API GW, S3, Dynamo DB, SNS, SQS, Lambda, ECR and Elasticsearch) for scalable solution.
  • Managing orchestration, configuration and provisioning of Apache Hadoop using OpenStack and Ansible.
  • Implementation of services using industry standard best practices like twelve-factor app methodology and SOLID principals.
  • Involved in deployment of services through containerization (Docker, Kubernetes, and/or AWS Elastic Container Service (ECS)).
  • Involved in writing Ansible playbooks for setup and configuration of tools including Jenkins and Artifactory on remote servers using REST API protocol and Created playbooks for bug fixes with Ansible.
  • Created unit test cases with Spock/Groovy/JUnit and automated test cases with Cucumber Ruby Behavior Driven Development (BDD) .
  • Used Circles and Jenkins Blue Ocean pipeline tools for CI/CD
  • Used Terraform and CloudFormation as infrastructure as code to build PDH infrastructure.
  • Supporting infrastructure environment comprising of RHEL and Ubuntu.
  • Building and supporting Dev, testing and production server environments.
  • Knowledge on firewall, DNS, DMZ and private IP blocks, and assigning private and public IPs to the virtual machines.
  • Configured Nagios for server and application monitoring and developed customized plugins.
  • Migrated On-premises applications to Cloud and Cloud orchestration using Terraform, CloudFormation and Azure Resource Manager (ARM) template.
  • Written automation scripts for Ansible and in-house custom AWS framework.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and Sparks for faster testing and processing of data.
  • Used Apache Kafka for importing real time network log data into HDFS.
  • Developed business specific Custom UDF's in Hive, Pig.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Optimized Map Reduce code by writing Pig Latin scripts.
  • Import data from external table into HIVE by using load command
  • Created table in hive and use static, dynamic partition for data slicing mechanism
  • Working experience with monitoring cluster, identifying risks, establishing good practices to be followed in shared environment
  • Good understanding on cluster configurations and resource management using YARN

Environment: Linux, Hadoop 2, Python, SQL, Sqoop, HBase, Hive, Spark, Oozie, Cloudera Manager, Oracle, Windows, Yarn, Spring, Sentry, AWS, Microsoft Azure

Confidential, Irving, Texas

Hadoop Developer

Responsibilities:

  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Created AWS Cloud formation template to deploy the artifacts in AWS VPC (Virtual Private Cloud).
  • Extensively used AWS Lambda, Dynamodb, Kinesis Firehose, Redshift and AWS Cache (Redis).
  • Involved in writing Lambda (Node.js) function to load data (device analytics and API Metering) from S3 to Redshift to analyze using existing business intelligence tools.
  • Written data pipeline to move data from Dynamo to S3.
  • Design and Implementation of API Mgmt Solution for Thunderdome Stack.
  • Explored Cloud foundry UAA (User Authentication and Authorization) and core architect of NDG UAA.
  • Implemented Jenkins and built pipelines to drive all microservice builds out to the Docker registry and then deployed to Kubernetes.
  • Setup and build AWS infrastructure using resources CloudWatch, CloudTrail, Security Group, Auto Scaling, and RDS using Cloud Formation templates.
  • Created additional Docker Slave Nodes for Jenkins using custom Docker Images and pulled them to Cloud. Worked on all major components of Docker like, Docker Daemon, Hub, Images, Registry, etc.
  • Implemented and Administered Network File Systems using automounter and administering user and OS data files in NIS, NFS environment
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Using Ambari configuring initial development environment using Hortonworks standalone sandbox and monitoring the Hadoop ecosystem.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables
  • Experienced in managing and reviewing Hadoop log files.
  • Captured the data logs from web server into HDFS using Flume for analysis.
  • Worked on extracting data from CSV, JSON Files and stored them in Avro and parquet formats.
  • Implemented Partition, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into Cassandra.
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
  • Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
  • Worked on moving some of the data pipelines from CDH cluster to run on AWS.
  • Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
  • Used snap logic Center for Extraction, Transformation, and Loading (ETL) of information from numerous sources like Flat files, XML documents, and Databases.
  • Involved in writing OOZIE jobs for workflow automation.
  • Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
  • Work across the enterprise to coordinate with the various stakeholders. This is an experienced individual contributor position that will work with cross-functional team members for day-to-day activities.
  • Supporting lift and shift migration of workloads to Azure cloud on several azure products, Operations Management Suite, Azure Site Recovery and Azure Backup, etc. Supporting buildout of Azure infrastructure as needed for project.
  • Providing support, assistance and guidance with post-migration related issues, Available as an Azure advisor and escalation point for Azure related issues during project.
  • Designing and Implement Azure storage solutions and Deploy Azure IaaS virtual machines (VMs) into secure VNet's and subnets.
  • Working on ASC, tagging standards and policies that needs to assign according to the project requirements.
  • Design and Implementation of Firewall appliances (Barracuda WAF/NG) in Azure
  • Migrating current data center environment to Azure Cloud using tools like Azure Site Recovery (ASR).
  • Implementing changes to firewalls and VNet's to comply with customer Security policies (i.e. Network Security Groups, etc.)
  • Enhancing customer's capabilities around logging (centralizing log files and potentially using OMS or App Insights to look for problems and take action).
  • Working in base lining, capacity planning, Data Center Infrastructure & Network Designing, Windows Server Migrations.
  • Developed effective and interactive Dashboards using Parameters and Actions.
  • Designed, developed, tested, and maintained Tableau functional reports based on user requirements.
  • Created demos in Tableau Desktop and published onto Tableau Server.
  • Worked on motion chart, Bubble chart, drill down analysis using tableau desktop
  • Develop Spark code using Scala and Spark-SQL for faster testing and data processing
  • Involved in the development of Spark Streaming application for one of the data sources using Scala, Spark by applying the transformations.
  • Import the data from different sources like HDFS/MYSQL into Spark RDD.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.

Environment: Windows, Unix, Linux, Hortonworks, Scala, HDFS, Map Reduce, Hive, HBase, Flume, Sqoop, Oracle, Snap logic, Impala, SQL Server, Python, Tableau, MYSQL, AWS, Microsoft Azure.

Confidential, New York, New York

Hadoop Developer

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Imported the data from different sources like HDFS/HBase into Spark RDD, developed a data pipeline using Kafka to store data into HDFS. Performed real time analysis on the incoming data.
  • Worked extensively with Sqoop for importing and exporting the data from data Lake HDFS to Relational Database systems like Oracle and MySQL.
  • Developed python scripts to collect data from source systems and store it on HDFS.
  • Involved in converting Hive or SQL queries into Spark transformations using Python and Scala.
  • Built Kafka Rest API to collect events from front end.
  • Built real time pipeline for streaming data using Kafka and Spark Streaming.
  • Worked on integrating Apache Kafka with Spark Streaming process to consume data from external sources and run custom functions
  • Exploring with the Spark and improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Performance optimization when dealing with large datasets using partitions, broadcasts in Spark, effective and efficient joins, transformations during ingestion process.
  • Used Spark for interactive queries, processing of streaming data and integration with HBase database for huge volume of data.
  • Stored the data in tabular formats using Hive tables and Hive Serdes
  • Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
  • Redesigned the HBase tables to improve the performance according to the query requirements.
  • Developed MapReduce jobs in Java to convert data files into Parquet file format.
  • Developed Hive queries for data sampling and analysis to the analysts.
  • Executed Hive queries that helped in analysis of trends by comparing the new data with existing data warehouse reference tables and historical data.
  • Developed Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
  • Worked on Sequence files, ORC files, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Developed MapReduce programs in Java to search production logs and web analytics logs for application issues.
  • Used OOZIE engine for creating workflow and coordinator jobs that schedule and execute various Hadoop jobs such as MapReduce Jobs, Hive, Spark and automating Sqoop jobs.
  • Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.
  • Shared Daily Status Reports with all the team members, Team Leads, Managers.

Environments : CDH5, Hue, Eclipse, Centos Linux, HDFS, MapReduce, Kafka, Python, Scala, Java, Hive, Sqoop, Spark, Spark-SQL, Spark-Streaming, HBase, Oracle10g, Oozie, Red Hat Linux.

Confidential, New York, New York

Selenium Tester

Responsibilities

  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
  • Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
  • Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
  • Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations.
  • Performed cross verification of trade entry between mainframe system, its web application and downstream system.
  • Extensively used Selenium WebDriver API (XPath and CSS locators) to test the web application.
  • Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
  • Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files.
  • Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system. \ Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
  • Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
  • Expertise in debugging issues occurred in front end part of web-based application, which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
  • Developed smoke automation test suite for regression test suite.
  • Applied various testing technique in test cases to cover all business scenario for quality coverage.
  • Interacted with development team to understand design flow, code review, discuss unit test plan.
  • Executed tests in System & integration Regression testing In Testing environment.
  • Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
  • Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.

Environment : HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.

We'd love your feedback!