We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

Detroit, MichigaN

SUMMARY

  • A professional with 8 years of experience in software development with deep insight into Telecom domain with strong business acumen and technical experience in Big Data technologies.
  • Hands - on experience in working with Big Data technologies i.e. HDFS, MapReduce, PIG, HBase & Hive, Oozie, Sqoop, Kafka and Storm.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS,, Job Tracker, Task Tracker, Name Node, Data Node.
  • Experienced the cluster management in Horton works using Ambari and data management using Hue
  • Hands-on experience with "Productionalizing" Hadoop applications (such as debugging, and performance tuning)
  • Expertise in Data load management, importing & exporting data using SQOOP.
  • Involved in writing the java programs, to load the data from local or remote systems to HDFS.
  • Involved in writing Multi-Threading environment in Java to improve the performance of merging operations.
  • Used Hadoop java API to develop the code.
  • Involved in writing a merge push java program to add or remove headers from the file.
  • Expertise in creating Hive Internal/External tables and views using shared metastore, writing scripts in HiveQL, data transformation & file processing using hql Scripts.
  • Familiar with data ingestion pipeline design, deduplication, windowing
  • Experienced in creating oozie workflows, coordinators for data ingestion and downstream processing
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture
  • Worked on all phases of data warehouse development life cycle, ETL design & implementation & support of new & existing applications.
  • Excellent in deploying the applications in AWS as EC2 instances and create snapshots for the data that have to be stored in AWS S3.
  • Partnered in building effective Hive, Pig and MapReduce scripts.
  • Build complex workflows using hPDL in OOZIE.
  • HiveQL queries optimized using SparkSQL.
  • Working experience on HDP with TEZ.
  • Experienced in analysing business requirements and translating requirements into functional and technical design specifications.
  • Implemented Proof of concepts on migration of data from different databases (i.e. Teradata and MySQL) to Hadoop.
  • Involved in analysis, database design and development of BI, client/server and enterprise applications using SSIS, SSAS and SQL Server.
  • Extensive knowledge on Data warehousing, OLAP, Dimensional Data Modelling for FACT and Dimensions Tables using Analysis Services.
  • Experience in working with different & complex datasets, like Flat files, JSONs, XML files and Databases, in combination big data technologies.
  • Capable to design fast & durable algorithms and strong understanding of Statistical algorithms - Linear & Logistic Regressions.
  • Expertise in implementing Web Services using SOAP, WSDL.
  • Expertise in design and development of various web applications with n-tier architecture using MVC pattern in J2EE environment.
  • Experienced in Calculating Measures and Members in SQL Server Analysis Services (SSAS) using multi-dimensional expression (MDX), Mathematical Formulas.
  • Skillful at Data Transformations, Tasks, Containers, Sources and Destinations like Derived Column, Conditional Split, Sort and Merge Join Transformations to load data into Data Warehouse.
  • Excellent experience in Database Design and Data Modelling (both OLTP & OLAP).
  • Excellent problem solving skills, high analytical skills and interpersonal skills.
  • A fast Learner, smart worker and quick in adapting newer technologies.
  • Good team player, strong interpersonal and communication skills combined with self-motivation, initiative and the ability to think outside the box.

TECHNICAL SKILLS

Big data/Hadoop: HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka and Spark, Zookeeper and Cloudera Manager, AMBARI and SparkSQL.

Web/ Application Servers: Weblogic, Tomcat, JBoss

Web Technologies: HTML4/5, CSS3/2, XML, JavaScript, JQuery, AJAX, WSDL, SOAP

Tools: and IDE: Eclipse, NetBeans, Maven, DB Visualizer, Visual Studio 2008, SQL Server Management Studio

Database: MS-SQL Server … Oracle 9i/10g, Microsoft Access

Methodologies: ware Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE), SSIS,SSAS,SSRS

Programming Language: Java, AngularJS, Python, SQL, PL/SQL, AWS, HiveQL, Unix Shell Scripting, Scala

Web Tools: HTML, Java Script, XML, ODBC, JDBC, Hibernate, JSP, Java, Struts, spring, Junit, JSON and Avro

Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS.

NoSQL Databases: HBase, MongoDB & Cassandra.

Operating Systems: Windows 2000/XP/Vista/7/8, Windows server 2003, Windows MS-Office

PROFESSIONAL EXPERIENCE

Confidential, Detroit, Michigan

Hadoop developer

Responsibilities:

  • Design and migration of existing RAN MSBI system to Hadoop2.7.2
  • Designed the control tables / Job tables in Hbase2.x and MySql5.7. Created external Hive 2.0 tables on Hbase2.0.
  • Experience in developing batch processing framework to ingest data into HDFS2.7.2, Hive2.0 and Hbase2.0.
  • Worked on Hive and Pig 0.16extensively to analyze network data
  • Automation of data pulls from SQL Server to Hadoop eco system via SQOOP1.4.6.
  • Performance Tuning Hive and Pig Job's performance parameters along with native map-reduce parameters to avoid excessive disk spills, enabled temp file compression between jobs in the data pipeline 2.5to handle production size data in a multi-tenant cluster environment (Ambari Views / Analyze / Explain plan etc.,)
  • Designed workflows and coordinators in Oozie 2.4.0 to automate and parallelize Hive and Pig jobs on Apache Hadoop environment by Hortonworks (HDP 2.2)
  • Hands on writing complex Hive queries involving external dynamic partitioned on date Hive Tables which stores rolling window time-period user viewing history
  • Experience of performance tuning hive scripts, pig scripts, MR jobs in production environment by altering job parameters.
  • Involved in data modeling.
  • Delivered Hadoop migration strategy, roadmap and technology fitment.
  • Designed & implemented HBase tables, Hive UDFs & Sqoop'ed the data with complete ownership.
  • Automated many cross-technology tasks using shell scripting & defining the cron-tabs.
  • Worked collaboratively with different teams to smoothly slide the project to production
  • Built the process automation of various jobs using OOZIE.
  • We integrated HBase with HIVE.
  • Worked extensively on performance tuning in HIVE.
  • Experienced in working with Hortonworks and AWS on Real time issues and bringing them to closure
  • Used Apache Kafka 0.10for importing real time network log data into HDFS
  • POCs on moving existing Hive / Pig Latin jobs to Spark1.6.1
  • Deployed and configured Flume agents to stream log events into HDFS for analysis.
  • Load the data into Hive tables using Hive HQL's along with deduplication and Windowing
  • Generated ad-hoc reports using Hive to validate customer viewing history and debug issues in production
  • Worked on HCatalog which allows PIG and Map Reduce to take advantage of the SerDE data format transformation definitions that we write for HIVE
  • Worked on configuring Tableau to Hive data and also on using Spark as execution engine for Tableau 10.0instead of MapReduce2.0
  • Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO)
  • Worked with multiple Input Formats such as Text File, Key Value, Sequence File input format.
  • Installed and configured various components of Hadoop ecosystem and maintained their integrity.
  • Planning for production cluster hardware and software installation on production cluster and communicating with multiple teams to get it done.
  • Designed, configured and managed the backup and disaster recovery for HDFS data.
  • Migrated data across clusters using DISTCP.
  • Experience in collecting metrics for Hadoop clusters using Ambari.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand Hadoop clusters.
  • Worked with BI teams in generating the reports in Tableau
  • Working with Java Development teams in the data parsing.
  • Worked on loading source data to HDFS by writing java code.
  • All small files will be merged and loaded into HDFS using java code and tracking history related to merge files are maintained in HBASE.
  • Involved in developing Multi-Threading environment to improve the performance of merging operations.
  • Used Hadoop java API to develop the code.
  • Involved in writing a java program to add or remove headers from the file.

Environment: HDFS, MapReduce, sparksql, Pig, Hive, HBase, Pig, Flume, Sqoop and Flume.

Confidential, Los Angeles, CA

Hadoop Developer

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for Analytics, application development and Hadoop tools like Hive, HSQL Pig, HBase, OLAP, Zookeeper, Avro, parquet and Sqoop on Linux ARCH.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Having experience working with Devops.
  • Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
  • Having experience in doing structured modeling on unstructured data models.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Worked on Hortonworks Data Platform (HDP)
  • Worked with SPLUNK to analyze and visualize data.
  • Worked on Mesos cluster and Marathon.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked with Orchestration tools like Airflow.
  • Write test cases, analyze and reporting test results to product teams.
  • Good experience on Clojure, Kafka and Storm.
  • Worked with AWS data pipeline.
  • Worked with Elastic Search, Postgres, Apache NIFI.
  • Hadoop workflow management using Oozie, Azkaban, Hamake.
  • Responsible for developing data pipeline using Azure HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs, used Sqoop to import and export data from HDFS to RDBMS and vice-versa for visualization and to generate reports.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Worked in functional, system, and regression testing activities with agile methodology.
  • Worked on Python plugin on MySQL workbench to upload CSV files.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked with HDFS Storage Formats like Avro, Orc.
  • Worked with Accumulo to Modify server side Key Value pairs.
  • Working experience with shiny and R.
  • Working experience with Vertica, QilkSense, QilkView and SAP BOE.
  • Worked with NoSQL databases like HBase, Cassandra, DynamoDB
  • Worked with AWS based data ingestion and transformations.
  • Good experience with Python, Pig, Sqoop, Oozie, Hadoop Streaming, Hive and Phoenix.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop like cluster maintenance, adding and removing cluster nodes, Cluster Monitoring, and troubleshooting, Manage and review data backups and log files.
  • Developed several new MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
  • Responsible for running Hadoop streaming jobs to process terabytes of xml's data, utilized cluster co-ordination services through Zookeeper.
  • Extensive experience in using the MOM with Active MQ, Apache storm, Apache Spark & Kafka Maven, and Zookeeper.
  • Worked on the core and Spark SQL modules of Spark extensively.
  • Worked on Descriptive statistics Using R.
  • Developed Kafka producer and consumers, HBase clients, Spark, shark, Streams and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Strong Working experience in snowflake, Clickstream.
  • Worked on Hadoop EMC Greenplum, Gemstone, Gemfire.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark.
  • Experience using Spark with Neo4J where acquiring the interrelated graphical information of the insurer and to query the data from the stored graphs.
  • Experience in writing batch processing huge Scala programs.
  • Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts.

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop, OLAP, data modelling, Linux, Hadoop Map Reduce, HBase, Shell Scripting, MongoDB, and Cassandra, Apache Spark, Neo4J.

Confidential - Madison, WI

Hadoop Developer

Responsibilities:

  • Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Hive, Pig, HBase, Sqoop, Spark AVRO, Zookeeper etc.), Cloudera distributed Hadoop (CDH4)
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
  • Involved in installing Hadoop Ecosystem components.
  • Importing and exporting data into HDFS, Pig, Hive and HBase using SQOOP.
  • Responsible to manage data coming from different sources.
  • Flume and from relational database management systems using SQOOP.
  • Responsible to manage data coming from different data sources.
  • Involved in gathering the requirements, designing, development and testing.
  • Worked on loading and transformation of large sets of structured, semi structured data into Hadoop system.
  • Developed simple and complex MapReduce programs in Java for Data Analysis.
  • Load data from various data sources into HDFS using Flume.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Worked on Hue interface for querying the data.
  • Created Hive tables to store the processed results in a tabular format.
  • Developed Hive Scripts for implementing dynamic Partitions.
  • Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
  • Extensive knowledge on PIG scripts using bags and tuples.
  • Experience in managing and reviewing Hadoop log files.

Environment: Hadoop (CDH4), UNIX, Eclipse, HDFS, Java, MapReduce, Apache Pig, Hive, HBase, Oozie, SQOOP and MySQL.

Confidential

Java Developer

Responsibilities:

  • Worked on the design and development of multithreaded n-tier application in a distributed environment to support new CMS System.
  • Involved in the prototype of FSD (functional specification document) and TDS (Technical design specification) for each process.
  • Wrote PL/SQL procedure according to the rule configurations.
  • Implemented J2EE Design Patterns like MVC, Service Locator and Session Facade.
  • Developed Web services to communicate to other modules using XML based SOAP.
  • Developed MessageHandler Adapter, which converts the data objects into XML message and invoke an enterprise service and vice-versa using Java, JMS and MQ Series.
  • Business logic is implemented using Struts action components in the Struts and Hibernate framework.
  • Used Multithreading for invoking the database and also implemented complex modules which contain business logics using Collection, Reflection, and Generics API.
  • Developed various JSP custom tag libraries i.e. JSTL libraries for achieving most code-reusability.
  • Involved in implementation of the presentation layer (GUI) for the application using HTML, XHTML, CSS and JavaScript.
  • Involved in writing PL/SQL Stored Procedures, and Functions for Oracle 10g database.
  • Developed the application front-end with HTML, JSP, JQuery and Ajax to create a dynamic and interactive experience.
  • Developed ADF Model components (creation, configuration, and tuning of entity objects, view objects, application modules, bindings and data controls).
  • Developed Stateless Session EJB to accommodate the business logic.
  • Developed WebService's using EJB 3.x stateless session beans.
  • Implemented the Spring dependency injection of the Database helper instance to the action objects.
  • Involved in writing the Maven based pom.xml scripts to build and deploy the application.
  • Developed the complex queries using JPAannotations in the POJO.
  • Developed and executing unit test cases using JUnit.
  • Deployed the application and tested on WebSphere Application Server.

Environment: Java 1.5, JEE 6, Spring 2.5, Hibernate 3.3, JSP 2.1, Servlet 3.0, Struts 2.2, DB2, JUnit, Maven 3, XML, SOAP, JMS, JavaScript, ADF, Oracle 9i, PLSQL, JDBC, UML, EJB, JBOSS.

Confidential

Java Developer

Responsibilities:

  • Developed Entity Java Beans (EJB) classes to implement various business functionalities (session beans)
  • Developed various end users screens using JSF, Servlet technologies and UI technologies like HTML, CSS and JavaScript.
  • Performed necessary validations of each screen developed by using AngularJS and JQuery.
  • Configured spring configuration file to make use of Dispatcher Servlet provided by Spring IOC.
  • Separated secondary functionality from primary functionality using Spring AOP.
  • Developed a Stored Procedures for regular cleaning of database.
  • Prepared test cases and provided support to QA team in UAT.
  • Consumed Web Service for transferring data between different applications using RESTful APIs along with Jersey API and JAX-RS.
  • Built the application using TDD (Test Driven Development) approach and involved in different phases of testing like Unit Testing.
  • Responsible for fixing bugs based on the test results.
  • Involved in SQL statements, stored procedures, handled SQL Injections and persisted data using Hibernate Sessions, Transactions and Session Factory Objects.
  • Responsible for Hibernate Configuration and integrated Hibernate framework.
  • Analyzed and fixed the bugs reported in QTP and effectively delivered the bug fixes reported with a quick turnaround time.
  • Extensively used Java Collections API like Lists, Sets and Maps.
  • Use PVCS for version control and deploy the application in JBOSS server.
  • Used Jenkins to deploy the application in testing environment.
  • Involved in Unit testing of the application using JUnit.
  • Used for SharePoint for collaborative work.
  • Involved in configuring JMS and JNDI in rational application developer (RAD)
  • Implemented Log4j to maintain system log.
  • Used Spring Repository to load data from oracle database to implement DAO layer.

Environment: JDK1.5, EJB, JSF, Servlets, Html, CSS, JavaScript, AngularJS, JQuery, Spring IOC & AOP, REST, Jersey, JAX-RS, JBOSS, JUnit, Log4J, JMS, JNDI, SharePoint, RAD, JMS API.

We'd love your feedback!