We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Richardson, TX

PROFESSIONAL SUMMARY:

  • I am an Experienced Hadoop Developer with Overall 8 years of IT experience and 4+ years of relevant experience building distributed applications, high quality software, object - oriented methods, project leadership, and rapid reliable development.
  • Experience with the Hadoop and related Big Data tools, including Spark, Hive, Kafka, Apache Mesos, Cascading and Hadoop MapReduce using Scala, Python, and Java.
  • Have solid Background working on DBMS technologies such as Oracle, MY SQL, NoSQL, data warehousing architectures and performed migration from different databases SQL server, Oracle, MYSQL to Hadoop.
  • Experienced in successful implementation of ETL solution between an OLTP and OLAP database in support of Decision Support Systems with expertise in all phases of SDLC.
  • Developed automated scripts using UnixShell for running Balancer, file system, Schema Creation in Hive and User/Group creation on HDFS.
  • Experienced in developing MapReduce programs using ApacheHadoop for working with Big data, Hadoop architecture using Map Reduce programming paradigm.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Proficient in configuring Zookeeper, Cassandra&Flume to the existing Hadoopcluster.
  • Experience in converting Hive or SQL queries into Spark transformations using Python and Scala.
  • Experience with the Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, PySpark, Pair RDD's and Spark YARN.
  • Experience in deploying and managing the multi node development, testing and production of Hadoop cluster with different Hadoop components using HortonworksAmbari.
  • Experience in Cloudera HadoopUpgrades, Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
  • Worked in provisioning and deploying multi-tenant Hadoop clusters on public cloud environment AmazonWebServices (AWS) and on private cloud infrastructure using various AWS components.
  • Experienced in working with AmazonWebServices using EC2 for computing and S3 as Storage Mechanism.
  • Have Experience with different File Formats like Text File, Avro File and Parquet for Hive querying and Processing.
  • Experienced managing Linux platform servers and in installation, configuration, supporting and managing Hadoop Clusters.
  • Worked on different IDE tools like Oracle J Developer, WebLogic Workshop, NetBeans and Eclipse.
  • Hands on experience in different software development approaches such as Spiral, Waterfall & Agile iterative models.
  • Experience in using Sqoop to import the data on to Cassandra tables from different relational databases.
  • Expertise in support activities including installation, configuration and successful deployment of changes across all environments.

TECHNOLOGIES:

Operating Systems: Windows, Linux distributions like Ubuntu, CentOS, RHEL.

Data stores: Oracle, NoSQL, MySQL, Hbase, Cassandra, Mongo DB

Big data: Map Reduce, HDFS, Flume, Hive, Pig, Oozie, YARN, Horton works, Cassandra, Hadoop, Kafka, Flume, Sqoop, Impala, Zookeeper, Spark, Ambari, Mahout, Mongo DB, Avro, Storm and Parquet.

ETL: Talend Open studio, Informatica

Programming Languages: Java, Scala, Python, C, Java Script and Nix tools, HTML, Bash, Perl, SQL, DHTML, XML and C++,C#.

No SQL Databases: Mongo DB and H Base

Amazon Stacks: AWS EMR, S3, Aurora, Dynamo DB, Amazon Lambda and EC2

Application Servers: Web logic 11g, 12c, Tomcat 5.x and 6.x

Java Technologies: Servlets, JSP, JDBC

PROFESSIONAL EXPERIENCE:

Confidential, Richardson, TX

Hadoop Developer

Responsibilities:

  • Developed and analyzed large volumes of log data for marketing campaigns, sales management, inventory management etc.
  • Created and maintained Technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Have done monitoring and reviewing Hadoop log files and written queries to analyze them.
  • Conducted POC's and mocks with client to understand the Business requirement, also attended defect triage meeting with UAT team and QA team to ensure defects are resolved in timely manner.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Created sessions, configured workflows to extract data from various sources, transformed data, and loading into data warehouse.
  • Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MRv2, HIVE, SQOOP and Pig Latin.
  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop, Flume and load into Hive tables, which are partitioned.
  • Written various MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats and store the refined data in partitioned tables in the EDW.
  • Developed Hive SQLqueries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
  • Written complex queries to get the data into HBase and responsible for executing hive queries using Hive Command Line, HUE and Impala to read, write.
  • Designed and implemented proprietary data solutions by correlating data from SQL and NoSQL databases using Kafka.
  • Used Pig as ETL tool to do transformations, event joints and some pre-aggregations before storing the analyzed data into HDFS.
  • Have been working with AWS cloud services (VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, Work Spaces, Lambda, Kinesis, RDS, SNS, SQS).
  • Have been a part of team that has taken care of setting the infrastructure in AWS.
  • Used AmazonS3 as a storage mechanism and written python scripts that dump the data into S3.
  • Developed multiple POCs using PySpark and deployed on the Yarncluster, compared the performance of Spark, with Hive and SQL/Teradata to see any performance lags.
  • Developed a PySpark code for saving data in to AVRO and Parquet format and building hive tables on top of them.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Experience in using PentahoData Integration tool for data integration, OLAP analysis and ETL process.
  • Automated workflows using shell scripts to pull data from various data bases into Hadoop.
  • Developed bashscripts to bring the TLog files from ftp server and then processing it to load into hive tables. All the bash scripts are scheduled using Resource Manager Scheduler.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Developed spark programs using Scala, involved in creating SparkSQL Queries and developed Oozie workflow for spark jobs.
  • Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
  • Continuously monitored and managed the Hadoop Cluster using Horton worksAmbari.

Environment: HDFS, Hadoop 2.x YARN, Teradata, NoSQL, PySpark, Data warehouse, MapReduce, pig, Hive, Sqoop, Spark, Scala, Oozie, Hortonworks, Java, Oracle 10g, Python, MongoDB, Shell and bash Scripting.

Confidential, Lindon, UT

Hadoop Developer

Responsibilities:

  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into MongoDB.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Hortenworks to perform analytics on data in Hive.
  • Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Optimizing of existing algorithms in Hadoop using SparkContext, Spark-SQL, DataFrames and PairRDD's.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Worked closely with business customers for Requirement gatherings.
  • Developing Sqoop jobs with incremental load from heterogeneous RDBMS (IBM & Oracle) using native DB connectors
  • Designed Hive repository with external tables, internal tables, buckets, partition, ACID property and ORC compressions for incremental data load of parsed data for analytical & operational dashboards
  • Business layer with hive transformations.
  • Implemented dynamic column binding on HBase tables whenever a new attribute of functionality entities are added
  • Used and Oozie for job automation.
  • Cluster coordination services through Zookeeper.

Environment: AWS Cloud, Unix, GIT, Chef, Jira, Nagios, Tomcat, Jenkins, SAN, Virtualization, Windows and Linux Operating Systems, Workflow & Approvals, ITSM remedy, Reports, Network Protocols, SQL Database and Monitoring Tools.

Confidential, Woonsocket, RI

Hadoop Administrator/ Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Dataanalytic tools including Pig, Hive HBase database and SQOOP. Involved in Unit testing and delivered Unit test plans and results documents.
  • Collected and aggregated large amount of web log data from various sources such as webservers, mobile and network devices using ApacheFlume and stored the data into HDFS for analysis.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Installed, monitored and maintained hardware/software related issues on Linux/Unixsystems.
  • Investigated, installed and configured software fail-over system for production Linux servers.
  • Designed, developed, debug, tested and promoted Java/ETL code into various environments from DEV through to PROD.
  • Administrator for Pig, Hive and Hbase installing updates, patches and upgrades.
  • Extensively involved in Design phase and delivered Design documents.
  • Extensively involved in writing ETL Specifications for Development and conversion projects.
  • Worked on Oozie workflow engine for job scheduling.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Support all teams that are engaging with implementing new customer, including vendors who are supporting to establish new product.
  • Implemented six nodes CDH4 Hadoop Cluster on CentOS.
  • Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Experienced in managing and reviewing the Hadoop log files.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Monitored and managed Hadoop cluster using the Cloudera Manager web- interface.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Defined and created data model, tables, views, queries etc. to support business requirements.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Populated HDFS and Cassandra with vast amounts of data using Apache Kafka.

Environment: HDFS, Map Reduce, Flume, Hive, Informatica 9.1/8.1/7.1/6.1, Oracle 11g, SQOOP, Oozie, Pig, ETL, Hadoop 2.x, NOSQL, Talend, Flat files, AWS (Amazon Web services), Hortonworks, Shell Scripting

Confidential, Bloomberg, NJ

Hadoop Developer

Responsibilities:

  • Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
  • Hands on experience in loading data from UNIX file system and Teradata to HDFS.
  • Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Install, configure, and operate Apache stack i.e. Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Flume and Mahout on Hadoop cluster.
  • Created a high-level design for the Data Ingestion and data extraction Module, enhancement of Hadoop Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
  • Tested raw data and executed performance scripts.
  • Worked with NoSQL database HBase to create tables and store data.
  • Developed and involved in the industry specific UDF (user defined functions)
  • Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map reduce Hive, Pig, and Sqoop.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Resolve issues, answer questions, and provide support for users or clients on a day to day basis related to Hadoop and its ecosystem including the HAWQ database.
  • Install, configure, and operate data integration and analytic tools i.e. Informatica, Chorus, SQLFire, GemFireXD for business needs.

Environment: Cloudera, MapReduce, Hive, Solr, Pig, HDFS, HBase, Flume, Sqoop, Zookeeper,Python, Flat files, AWS, Teradata, Unix/Linux.

Confidential

Java developer

Responsibilities:

  • Actively participated in requirements gathering, analysis, design, and testing phases.
  • Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase.
  • Developed the entire application implementing MVC Architecture integrating JSF with Hibernate and spring frameworks.
  • Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
  • Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
  • Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
  • Enhanced address book application developed using AngularJS, destroy unwanted watches, separate business logic to services. Designed backbone collection to offer combination of promotions for given SKUs.
  • Developed SQL queries and stored procedures.
  • Developed Web Services for data transfer from client to server and vice versa using ApacheAxis, SOAP and WSDL.
  • Used JUnitFramework for the unit testing of all the java classes.
  • Implemented various J2EE Design patterns like Singleton, Service Locator and SOA.
  • Worked on AJAX to develop an interactive Web Application and JavaScript for Data Validations.

Environment: J2EE, JDBC, Servlets, JSP, Struts, Hibernate, Web services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript, Web Logic, XML, Junit, Oracle, Web Sphere, Eclipse.

Confidential

Java developer

Responsibilities:

  • Designed use cases, activities, states, objects and components.
  • Developed the UI pages using HTML, DHTML, Java script, Ajax, jQuery, JSP and tag libraries.
  • Developed front-end screens using JSP and Tag Libraries.
  • Performing validations between various users.
  • Design of JavaServlets and Objects using J2EE standards.
  • Coded HTML, JSP and Servlets.
  • Developed internal application using Angular and Node.js connecting to Oracle on the backend.
  • Coding xml validation and file segmentation classes for splitting large XML file into smaller segments using SAXParser.
  • Created new connections through application coding for better access to DB2database and involved in writing SQL & PL SQL - Stored procedures, functions, sequences, triggers, cursors, object types etc.
  • Implemented application using Struts MVCframework for maintainability.
  • Involved in testing and deploying in the development server.
  • Wrote oracle stored procedures (PL/SQL) and calling it using JDBC.
  • Involved in the design tables of the database in Oracle. Involved in the design tables of the database in Oracle.

Environment: Java, J2EE, Apache Tomcat, Confidential, JSP, Servlets, Struts, PL/SQL and Oracle.

We'd love your feedback!