Senior Hadoop Developer Resume
Piscataway, NJ
SUMMARY OF EXPERIENCE:
- Overall 8+ years of experience in a variety of industries including 3+ years of experience in Big Data Technologies (Apache Hadoop stack and Apache Spark) and 5+ years of experience in Java and Web Technologies.
- Hands on experience in installing, configuring and using ApacheHadoop ecosystem components like HDFS, Hadoop MapReduce, Zoo Keeper, Oozie, Hive, Sqoop, Pig and Spark.
- Experience in working wif Cloudera, Hortonworks, Amazon Web Services, and MICROSOFT Azure HDINSIGHT Hadoop Distributions.
- Responsible for ingesting structured data residing on our traditional back - end databases on to Hadoop and HIVE using SQOOP.
- In depth understanding/noledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Worked wif Flume to load teh log data from multiple sources directly into HDFS.
- Experience in optimization of Map reduce algorithm using combiners and Partitioners to deliver teh best results.
- Experienced in using Tableau server in connecting MySQLRDBMS and Hive Server wif Impala for extracting data and creating reports and dashboards.
- Experience in Amazon AWS cloud services (EC2, EBS, S3).
- DevelopedUDF'sin Java as and when necessary to use wifPIGandHIVEqueries.
- Experience in managing and reviewing Hadoop log files.
- Experience in implementing SOAP and RESTful Web Services.
- Good noledge on hadoop architectures such as HDFS and YARN, their various components and Map Reduce programming paradigm.
- Knowledge of manipulating/analyzing large datasets and finding patterns and insights wifin structured and unstructured data
- Proficient in UNIX Shell scripting.
- Excellent working noledge of popular frameworks like Struts, Hibernate, and Spring MVC.
- Expertise in core Java, J2EE, Multithreading, JDBC, Shell Scripting and proficient in using Java API’s for application development.
- Hands on experience on implementing java concepts such as collections framework, garbage collection,Exception handling, XML and JSON parsing.
- Good noledge of Web/Application Servers like Apache Tomcat and IBM Web Sphere.
- Experience in Agile Engineering practices.
- Techno-functional responsibilities include interfacing wif users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented wif problem solving and leadership skills.
TECHNICAL SKILLS
Domain: Hadoop, Big Data, ETL, Legacy Modernization and Integration
Languages: Spark, Scala, Map Reduce, Pig Latin, Python, PL1, SQL, CQL, Assembler
Integration Tools: Informatica Power Center
Database & Ware House: Hive, HBase, DB2, Cassandra
Framework / Tools: Hadoop, Hdfs, Hive, Pig, Zookeeper, Sqoop, Oozie, Data bricks API’s and different file formats (CSV, XML, Avro, Parquet, Json)
Job Scheduling and Monitoring tool: Control-M
PROFESSIONAL EXPERIENCE
Confidential, Piscataway, NJ
Senior Hadoop Developer
Responsibilities:
- Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced in loading and transforming of large sets of structured, semi structured, and unstructured data.
- DevelopedSparkjobs and Hive Jobs to summarize and transform data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark dataframes, Scala and Python.
- Expertise in implementing SparkScala application using higher order functions for both batch and interactive analysis requirement.
- Experienced in developing Spark scripts for data analysis in both python and scala.
- Wrote Scala scripts to make spark streaming work wif Kafka as part of spark Kafka integration efforts.
- Built on-premise data pipelines using kafka and spark for real time data analysis.
- Created reports in TABLEAUfor visualization of teh data sets created and tested native Drill, Impala and Spark connectors.
- Implemented Hive complex UDF's to execute business logic wif Hive Queries.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Evaluated performance of Spark SQL vs IMPALAvs DRILL on offline data as a part of poc.
- Worked on solr configuration and customizations based on requirements.
- ImplementedSparkusingScalaand utilizing Data frames andSparkSQL API for faster processing of data.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and tan loading data into HDFS.
- Exporting of result set from HIVE to MySQL using Sqoop export toolfor further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Responsible for developing data pipeline by implementing Kafka producers and consumers and configuring brokers.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed Unit tests for Map Reduce Programs using MRUnit testing library.
- Experience in managing and reviewing Hadoop Log files.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run wif time and data availability.
- Setup Spark EMR to process huge data which is stored in Amazon S3.
- Developed PIG UDF'S for manipulating teh data as per teh business requirements and worked on developing custom PIG Loaders.
- Used Gradle for building and testing project
- Fixed defects as needed during teh QA phase, support QA testing, troubleshoot defects and identify teh source of defects.
- Used Mingle and later moved to JIRA for task/bug tracking.
- Used GIT for version control
Environment: MapR, Cloudera, Hadoop, HDFS,AWS, PIG, Hive, Impala, Drill, SparkSql, MapReduce,Flume, Sqoop, Oozie, Storm, Docker, Kafka, Spark, Scala, Hbase,ZooKeeper, MySQL, Tableau, Shell Scripting, Java.
Confidential, Bowie, MD
Hadoop Developer
Responsibilities:
- Worked on writing complex Map-Reduce code in Java for processing data in multiple clusters.
- Creating Hive Tables using HiveQL, loading data and running Hive queries to invoke underlying MapReduce program.
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
- Incrementally Updating of Hive Tables Using Sqoop.
- Setting up and Deploying Apache SOLR to provide significantly better search performance.
- Installing, configuring, supporting and managing Hadoop Clusters using Apache Cloudera.
- Involved in using Ingestion mechanism tool Flume for collecting, aggregating and transporting streaming data from various webservers to a centralized data store HDFS.
- Optimizing Hive queries on Data layout techniques such as partitioning and bucketing, Data sampling, Data processing and using custom file formats.
- Experienced in managing and reviewing theHadooplog files.
- Built a Pigapplicationof ETL transaction model by ingesting data from files, streams using teh UDFs to perform select, iteration and other transforms over teh data and finally store teh results into teh Hadoop Data File System.
- Creating workflows and managing coordination among jobs using Oozie and automate tasks.
- Worked wif Avro Data Serialization system to work wif JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
Environment: Hadoop, MapReduce, Sqoop, Oozie, Pig, Hive, Hbase,Flume, LINUX, Java, Eclipse,PL/SQL, Toad 9.6, UNIX Shell Scripting, Putty and Eclipse, HortonWorks.
Confidential -Portland, OR
Hadoop Developer
Responsibilities:
- Transferredpurchase transaction details from legacy systems to HDFS.
- Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
- Developed PIG UDF'S for manipulating teh data asper teh business requirements and worked on developing custom PIG Loaders.
- Collected and aggregated large amounts of weblog data from different sources such as web servers, mobile and network devices using Apache Flume and stored teh data into HDFS for analysis.
- Worked on teh Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer).
- Experience in monitoring and managing Cassandra cluster.
- Analyzed teh weblog data using teh HiveQL, integrated Oozie wif teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Installed and configured Flume, Hive, Pig, SqoopandOozie on teh Hadoop cluster
- Wrote teh MapReduce jobs to parse teh weblogs which are stored in HDFS
- Developed teh services to run teh MapReduce jobs as per teh requirement basis.
- Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP
- Extracted files from NoSQL database, Cassandra through Sqoop and placed in HDFS for processing.
- Responsible to manage data coming from different sources
- Analyzed teh data using teh Pig to extract number of unique patients per day and most purchased medicine
- Implemented teh workflows using Apache Oozieframework to automate tasks
- Wrote UDF's for Hive and Pig that helped spot market trends
- Good noledge in running Hadoopstreaming jobs to process terabytes of xml format data
- Analyzed teh Functional Specifications
Environment: Hadoop, HDFS, pig, Hive,Tez, Accumulo, Flume, Sqoop, Oozie, Cassandra.
Confidential, Kansas City, MO
Java /J2ee Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Developed and deployed UI layer logics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.
- UsedCSS and JavaScriptto build rich internet pages.
- Agile Scrum Methodology been followed for teh development process.
- Designed different design specifications for application development that includes front-end, back-end using design patterns.
- Developed proto-type test screens in HTML and JavaScript.
- Involved in developing JSP for client data presentation and, data validation on teh client side wif in teh forms.
- Developed teh application by using teh Spring MVC framework.
- Collection framework used to transfer objects between teh different layers of teh application.
- Developed data mapping to create a communication bridge between various application interfaces using XML and XSL.
- Spring IOC being used to inject teh parameter values for teh Dynamic parameters.
- Developed JUnit testing framework for Unit level testing.
- Actively involved in code review and bug fixing for improving teh performance.
- Documented application for its functionality and its enhanced features.
- Created connection through JDBC and used JDBC statements to call stored procedures.
Environment:Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, JUnit, Apache Tomcat, My SQL.
Confidential
Java Developer
Responsibilities:
- Developed front end screens using JSP, HTML, CSS and JavaScript.
- Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
- Followed Java & J2EE design patterns and teh coding guidelines to design and develop teh application.
- Interacted wif teh QA team to understand teh information that are part of teh QA weekly report also and teh desired layout.
- Involved wif teh onsite team to come wif teh design and implementation of teh project.
- Developed modules to create, view, delete and search teh weekly reports of teh QA team using Java, JDBC.
- Performed front-end validation using Java Script.
- Designed and created teh database tables in DB2.
- Developed Data Access layer using JDBC for connecting to database.
- Used RTC version control for maintaining source code.
Environment: JAVA, IBM Z/Vm, DB2, RTC
