We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

4.00/5 (Submit Your Rating)

San Jose, CA

SUMMARY

  • I am an Experienced Hadoop Developer with Overall 8 years of IT experience and 4+ years of relevant experience building distributed applications, high quality software, object - oriented methods, project leadership, and rapid reliable development.
  • Experience with teh Hadoop and related Big Data tools, including Spark, Hive, Kafka, Apache Mesos, Cascading and Hadoop MapReduce using Scala, Python, and Java.
  • Has solid Background working on DBMS technologies such as Oracle, MY SQL, NoSQL, data warehousing architectures and performed migration from different databases SQL server, Oracle, MYSQL to Hadoop.
  • Experienced in successful implementation of ETL solution between an OLTP and OLAP database in support of Decision Support Systems with expertise in all phases of SDLC.
  • Developed automated scripts using UnixShell for running Balancer, file system, Schema Creation in Hive and User/Group creation on HDFS.
  • Experienced in developing MapReduce programs using ApacheHadoop for working with Big data, Hadoop architecture using Map Reduce programming paradigm.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Proficient in configuring Zookeeper, Cassandra&Flume to teh existing Hadoopcluster.
  • Experience in converting Hive or SQL queries into Spark transformations using Python and Scala.
  • Experience with teh Spark to improve teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, PySpark, Pair RDD's and Spark YARN.
  • Experience in deploying and managing teh multi node development, testing and production of Hadoop cluster with different Hadoop components using HortonworksAmbari.
  • Experience in Cloudera HadoopUpgrades, Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
  • Worked in provisioning and deploying multi-tenant Hadoop clusters on public cloud environment AmazonWebServices (AWS) and on private cloud infrastructure using various AWS components.
  • Experienced in working with AmazonWebServices using EC2 for computing and S3 as Storage Mechanism.
  • Has Experience with different File Formats like Text File, Avro File and Parquet for Hive querying and Processing.
  • Experienced managing Linux platform servers and in installation, configuration, supporting and managing Hadoop Clusters.
  • Worked on different IDE tools like Oracle J Developer, WebLogic Workshop, NetBeans and Eclipse.
  • Hands on experience in different software development approaches such as Spiral, Waterfall & Agile iterative models.
  • Experience in using Sqoop to import teh data on to Cassandra tables from different relational databases.
  • Expertise in support activities including installation, configuration and successful deployment of changes across all environments.

TECHNICAL SKILLS

  • AWS Cloud
  • Unix
  • GIT
  • Chef
  • Jira
  • Nagios
  • Tomcat
  • Jenkins
  • SAN
  • Virtualization
  • Windows and Linux Operating Systems
  • Workflow & Approvals
  • ITSM remedy
  • Reports
  • Network Protocols
  • SQL Database and Monitoring Tools.

PROFESSIONAL EXPERIENCE

Sr. Hadoop Developer

Confidential - San Jose, CA

Responsibilities:

  • Developed and analyzed large volumes of log data for marketing campaigns, sales management, inventory management etc.
  • Created and maintained Technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Has done monitoring and reviewing Hadoop log files and written queries to analyze them.
  • Conducted POC's and mocks with client to understand teh Business requirement, also attended defect triage meeting with UAT team and QA team to ensure defects are resolved in timely manner.
  • Worked with Kafka for teh proof of concept for carrying out log processing on a distributed system.
  • Created sessions, configured workflows to extract data from various sources, transformed data, and loading into data warehouse.
  • Understanding teh existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MRv2, HIVE, SQOOP and Pig Latin.
  • Loading teh data from teh different Data sources like (Teradata and DB2) into HDFS using Sqoop, Flume and load into Hive tables, which are partitioned.
  • Written various MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats and store teh refined data in partitioned tables in teh EDW.
  • Developed Hive SQLqueries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
  • Written complex queries to get teh data into HBase and responsible for executing hive queries using Hive Command Line, HUE and Impala to read, write.
  • Designed and implemented proprietary data solutions by correlating data from SQL and NoSQL databases using Kafka.
  • Used Pig as ETL tool to do transformations, event joints and some pre-aggregations before storing teh analyzed data into HDFS.
  • Has been working with AWS cloud services (VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, Work Spaces, Lambda, Kinesis, RDS, SNS, SQS).
  • Has been a part of team that has taken care of setting teh infrastructure in AWS.
  • Used AmazonS3 as a storage mechanism and written python scripts that dump teh data into S3.
  • Developed multiple POCs using PySpark and deployed on teh Yarncluster, compared teh performance of Spark, with Hive and SQL/Teradata to see any performance lags.
  • Developed a PySpark code for saving data in to AVRO and Parquet format and building hive tables on top of them.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Experience in using PentahoData Integration tool for data integration, OLAP analysis and ETL process.
  • Automated workflows using shell scripts to pull data from various data bases into Hadoop.
  • Developed bashscripts to bring teh TLog files from ftp server and tan processing it to load into hive tables. All teh bash scripts are scheduled using Resource Manager Scheduler.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and tan imported into hive tables.
  • Developed spark programs using Scala, involved in creating SparkSQL Queries and developed Oozie workflow for spark jobs.
  • Experienced in using Zookeeper and OOZIE Operational Services for coordinating teh cluster and scheduling workflows.
  • Continuously monitored and managed teh Hadoop Cluster using Horton worksAmbari.

Environment: HDFS, Hadoop 2.x YARN, Teradata, NoSQL, PySpark, Data warehouse, MapReduce, pig, Hive, Sqoop, Spark, Scala, Oozie, Hortonworks, Java, Oracle 10g, Python, MongoDB, Shell and bash Scripting.

Hadoop Developer

Confidential - London, UT

Responsibilities:

  • Used Spark-Streaming APIs to perform necessary transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real time and Persists into MongoDB.
  • Developed Spark scripts by using Scala shell commands as per teh requirement.
  • Used Spark API over Hortenworks to perform analytics on data in Hive.
  • Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Loaded teh data into Spark RDD and do in memory data Computation to generate teh Output response.
  • Optimizing of existing algorithms in Hadoop using SparkContext, Spark-SQL, DataFrames and PairRDD's.
  • Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark using Scala.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Worked closely with business customers for Requirement gatherings.
  • Developing Sqoop jobs with incremental load from heterogeneous RDBMS (IBM & Oracle) using native DB connectors
  • Designed Hive repository with external tables, internal tables, buckets, partition, ACID property and ORC compressions for incremental data load of parsed data for analytical & operational dashboards
  • Business layer with hive transformations.
  • Implemented dynamic column binding on HBase tables whenever a new attribute of functionality entities are added
  • Used and Oozie for job automation.
  • Cluster coordination services through Zookeeper.

Environment: AWS Cloud, Unix, GIT, Chef, Jira, Nagios, Tomcat, Jenkins, SAN, Virtualization, Windows and Linux Operating Systems, Workflow & Approvals, ITSM remedy, Reports, Network Protocols, SQL Database and Monitoring Tools.

Hadoop Administrator/ Developer

Confidential - Woonsocket, RI

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Dataanalytic tools including Pig, Hive HBase database and SQOOP. Involved in Unit testing and delivered Unit test plans and results documents.
  • Developed Hive UDF's to bring all teh customers email id into a structured format.
  • Collected and aggregated large amount of web log data from various sources such as webservers, mobile and network devices using ApacheFlume and stored teh data into HDFS for analysis.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Installed, monitored and maintained hardware/software related issues on Linux/Unixsystems.
  • Investigated, installed and configured software fail-over system for production Linux servers.
  • Designed, developed, debug, tested and promoted Java/ETL code into various environments from DEV through to PROD.
  • Administrator for Pig, Hive and Hbase installing updates, patches and upgrades.
  • Extensively involved in Design phase and delivered Design documents.
  • Extensively involved in writing ETL Specifications for Development and conversion projects.
  • Worked on Oozie workflow engine for job scheduling.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Support all teams that are engaging with implementing new customer, including vendors who are supporting to establish new product.
  • Implemented six nodes CDH4 Hadoop Cluster on CentOS.
  • Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Experienced in managing and reviewing teh Hadoop log files.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Monitored and managed Hadoop cluster using teh Cloudera Manager web- interface.
  • Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
  • Defined and created data model, tables, views, queries etc. to support business requirements.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Populated HDFS and Cassandra with vast amounts of data using Apache Kafka.

Environment: HDFS, Map Reduce, Flume, Hive, Informatica 9.1/8.1/7.1/6.1, Oracle 11g, SQOOP, Oozie, Pig, ETL, Hadoop 2.x, NOSQL, Talend, Flat files, AWS (Amazon Web services), Hortonworks, Shell Scripting

Hadoop Developer

Confidential, NJ

Responsibilities:

  • Developed optimal strategies for distributing teh web log data over teh cluster, importing and exporting teh stored web log data into HDFS and Hive using Scoop.
  • Hands on experience in loading data from UNIX file system and Teradata to HDFS.
  • Analysed teh web log data using teh HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Install, configure, and operate Apache stack i.e. Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Flume and Mahout on Hadoop cluster.
  • Created a high-level design for teh Data Ingestion and data extraction Module, enhancement of Hadoop Map-Reduce job which joins teh incoming slices of data and pick only teh fields needed for further processing.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
  • Tested raw data and executed performance scripts.
  • Worked with NoSQL database HBase to create tables and store data.
  • Developed and involved in teh industry specific UDF (user defined functions)
  • Used Flume to collect, aggregate, and store teh web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map reduce Hive, Pig, and Sqoop.
  • Integrated Oozie with teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Resolve issues, answer questions, and provide support for users or clients on a day to day basis related to Hadoop and its ecosystem including teh HAWQ database.
  • Install, configure, and operate data integration and analytic tools i.e. Informatica, Chorus, SQLFire, GemFireXD for business needs.

Environment: Cloudera, MapReduce, Hive, Solr, Pig, HDFS, HBase, Flume, Sqoop, Zookeeper,Python, Flat files, AWS, Teradata, Unix/Linux.

Java developer

Confidential

Responsibilities:

  • Actively participated in requirements gathering, analysis, design, and testing phases.
  • Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase.
  • Developed teh entire application implementing MVC Architecture integrating JSF with Hibernate and spring frameworks.
  • Developed teh Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to teh service providers.
  • Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
  • Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
  • Enhanced address book application developed using AngularJS, destroy unwanted watches, separate business logic to services. Designed backbone collection to offer combination of promotions for given SKUs.
  • Developed SQL queries and stored procedures.
  • Developed Web Services for data transfer from client to server and vice versa using ApacheAxis, SOAP and WSDL.
  • Used JUnitFramework for teh unit testing of all teh java classes.
  • Implemented various J2EE Design patterns like Singleton, Service Locator and SOA.
  • Worked on AJAX to develop an interactive Web Application and JavaScript for Data Validations.

Environment: J2EE, JDBC, Servlets, JSP, Struts, Hibernate, Web services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript, Web Logic, XML, Junit, Oracle, Web Sphere, Eclipse.

Java developer

Confidential

Responsibilities:

  • Designed use cases, activities, states, objects and components.
  • Developed teh UI pages using HTML, DHTML, Java script, Ajax, jQuery, JSP and tag libraries.
  • Developed front-end screens using JSP and Tag Libraries.
  • Performing validations between various users.
  • Design of JavaServlets and Objects using J2EE standards.
  • Coded HTML, JSP and Servlets.
  • Developed internal application using Angular and Node.js connecting to Oracle on teh backend.
  • Coding xml validation and file segmentation classes for splitting large XML file into smaller segments using SAXParser.
  • Created new connections through application coding for better access to DB2database and involved in writing SQL & PL SQL - Stored procedures, functions, sequences, triggers, cursors, object types etc.
  • Implemented application using Struts MVCframework for maintainability.
  • Involved in testing and deploying in teh development server.
  • Wrote oracle stored procedures (PL/SQL) and calling it using JDBC.
  • Involved in teh design tables of teh database in Oracle. Involved in teh design tables of teh database in Oracle.

Environment: Java, J2EE, Apache Tomcat, Confidential, JSP, servlets, Struts, PL/SQL and Oracle.

We'd love your feedback!