Senior Hadoop Developer Resume
Piscataway, NJ
SUMMARY
- Overall 8+ years of experience in a variety of industries including 3+ years of experience in Big Data Technologies (Apache Hadoop stack and Apache Spark) and 5+ years of experience in Java and Web Technologies.
- Hands on experience in installing, configuring and using ApacheHadoop ecosystem components like HDFS, Hadoop MapReduce, Zoo Keeper, Oozie, Hive, Sqoop, Pig and Spark.
- Experience in working with Cloudera, Hortonworks, Amazon Web Services, and MICROSOFT Azure HDINSIGHT Hadoop Distributions.
- Responsible for ingesting structured data residing on our traditional back - end databases on to Hadoop and HIVE using SQOOP.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Worked with Flume to load the log data from multiple sources directly into HDFS.
- Experience in optimization of Map reduce algorithm using combiners and Partitioners to deliver the best results.
- Experienced in using Tableau server in connecting MySQLRDBMS and Hive Server with Impala for extracting data and creating reports and dashboards.
- Experience in Amazon AWS cloud services (EC2, EBS, S3).
- DevelopedUDF'sin Java as and when necessary to use withPIGandHIVEqueries.
- Experience in managing and reviewing Hadoop log files.
- Experience in implementing SOAP and RESTful Web Services.
- Good knowledge on hadoop architectures such as HDFS and YARN, their various components and Map Reduce programming paradigm.
- Knowledge of manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data
- Proficient in UNIX Shell scripting.
- Excellent working knowledge of popular frameworks like Struts, Hibernate, and Spring MVC.
- Expertise in core Java, J2EE, Multithreading, JDBC, Shell Scripting and proficient in using Java API’s for application development.
- Hands on experience on implementing java concepts such as collections framework, garbage collection,Exception handling, XML and JSON parsing.
- Good knowledge of Web/Application Servers like Apache Tomcat and IBM Web Sphere.
- Experience in Agile Engineering practices.
- Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS
Domain: Hadoop, Big Data, ETL, Legacy Modernization and Integration
Languages: Spark, Scala, Map Reduce, Pig Latin, Python, PL1, SQL, CQL, Assembler
Integration Tools: Informatica Power Center
Database & Ware House: Hive, HBase, DB2, Cassandra
Framework / Tools: Hadoop, Hdfs, Hive, Pig, Zookeeper, Sqoop, Oozie, Data bricks API’s and different file formats (CSV, XML, Avro, Parquet, Json)
Job Scheduling and Monitoring tool: Control-M
PROFESSIONAL EXPERIENCE
Confidential, Piscataway, NJ
Senior Hadoop Developer
Responsibilities:
- Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced in loading and transforming of large sets of structured, semi structured, and unstructured data.
- DevelopedSparkjobs and Hive Jobs to summarize and transform data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark dataframes, Scala and Python.
- Expertise in implementing SparkScala application using higher order functions for both batch and interactive analysis requirement.
- Experienced in developing Spark scripts for data analysis in both python and scala.
- Wrote Scala scripts to make spark streaming work with Kafka as part of spark Kafka integration efforts.
- Built on-premise data pipelines using kafka and spark for real time data analysis.
- Created reports in TABLEAUfor visualization of the data sets created and tested native Drill, Impala and Spark connectors.
- Implemented Hive complex UDF's to execute business logic with Hive Queries.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Evaluated performance of Spark SQL vs IMPALAvs DRILL on offline data as a part of poc.
- Worked on solr configuration and customizations based on requirements.
- ImplementedSparkusingScalaand utilizing Data frames andSparkSQL API for faster processing of data.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exporting of result set from HIVE to MySQL using Sqoop export toolfor further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Responsible for developing data pipeline by implementing Kafka producers and consumers and configuring brokers.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed Unit tests for Map Reduce Programs using MRUnit testing library.
- Experience in managing and reviewing Hadoop Log files.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Setup Spark EMR to process huge data which is stored in Amazon S3.
- Developed PIG UDF'S for manipulating the data as per the business requirements and worked on developing custom PIG Loaders.
- Used Gradle for building and testing project
- Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
- Used Mingle and later moved to JIRA for task/bug tracking.
- Used GIT for version control
Environment: MapR, Cloudera, Hadoop, HDFS,AWS, PIG, Hive, Impala, Drill, SparkSql, MapReduce,Flume, Sqoop, Oozie, Storm, Docker, Kafka, Spark, Scala, Hbase,ZooKeeper, MySQL, Tableau, Shell Scripting, Java.
Confidential, Bowie, MD
Hadoop Developer
Responsibilities:
- Worked on writing complex Map-Reduce code in Java for processing data in multiple clusters.
- Creating Hive Tables using HiveQL, loading data and running Hive queries to invoke underlying MapReduce program.
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
- Incrementally Updating of Hive Tables Using Sqoop.
- Setting up and Deploying Apache SOLR to provide significantly better search performance.
- Installing, configuring, supporting and managing Hadoop Clusters using Apache Cloudera.
- Involved in using Ingestion mechanism tool Flume for collecting, aggregating and transporting streaming data from various webservers to a centralized data store HDFS.
- Optimizing Hive queries on Data layout techniques such as partitioning and bucketing, Data sampling, Data processing and using custom file formats.
- Experienced in managing and reviewing theHadooplog files.
- Built a Pigapplicationof ETL transaction model by ingesting data from files, streams using the UDFs to perform select, iteration and other transforms over the data and finally store the results into the Hadoop Data File System.
- Creating workflows and managing coordination among jobs using Oozie and automate tasks.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
Environment: Hadoop, MapReduce, Sqoop, Oozie, Pig, Hive, Hbase,Flume, LINUX, Java, Eclipse,PL/SQL, Toad 9.6, UNIX Shell Scripting, Putty and Eclipse, HortonWorks.
Confidential - Portland, OR
Hadoop Developer
Responsibilities:
- Transferredpurchase transaction details from legacy systems to HDFS.
- Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
- Developed PIG UDF'S for manipulating the data asper the business requirements and worked on developing custom PIG Loaders.
- Collected and aggregated large amounts of weblog data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer).
- Experience in monitoring and managing Cassandra cluster.
- Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Installed and configured Flume, Hive, Pig, SqoopandOozie on the Hadoop cluster
- Wrote the MapReduce jobs to parse the weblogs which are stored in HDFS
- Developed the services to run the MapReduce jobs as per the requirement basis.
- Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP
- Extracted files from NoSQL database, Cassandra through Sqoop and placed in HDFS for processing.
- Responsible to manage data coming from different sources
- Analyzed the data using the Pig to extract number of unique patients per day and most purchased medicine
- Implemented the workflows using Apache Oozieframework to automate tasks
- Wrote UDF's for Hive and Pig that helped spot market trends
- Good knowledge in running Hadoopstreaming jobs to process terabytes of xml format data
- Analyzed the Functional Specifications
Environment: Hadoop, HDFS, pig, Hive,Tez, Accumulo, Flume, Sqoop, Oozie, Cassandra.
Confidential, Kansas City, MO
Java /J2ee Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Developed and deployed UI layer logics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.
- UsedCSS and JavaScriptto build rich internet pages.
- Agile Scrum Methodology been followed for the development process.
- Designed different design specifications for application development that includes front-end, back-end using design patterns.
- Developed proto-type test screens in HTML and JavaScript.
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Developed the application by using the Spring MVC framework.
- Collection framework used to transfer objects between the different layers of the application.
- Developed data mapping to create a communication bridge between various application interfaces using XML and XSL.
- Spring IOC being used to inject the parameter values for the Dynamic parameters.
- Developed JUnit testing framework for Unit level testing.
- Actively involved in code review and bug fixing for improving the performance.
- Documented application for its functionality and its enhanced features.
- Created connection through JDBC and used JDBC statements to call stored procedures.
Environment: Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, JUnit, Apache Tomcat, My SQL.
Confidential
Java Developer
Responsibilities:
- Developed front end screens using JSP, HTML, CSS and JavaScript.
- Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
- Followed Java & J2EE design patterns and the coding guidelines to design and develop the application.
- Interacted with the QA team to understand the information that are part of the QA weekly report also and the desired layout.
- Involved with the onsite team to come with the design and implementation of the project.
- Developed modules to create, view, delete and search the weekly reports of the QA team using Java, JDBC.
- Performed front-end validation using Java Script.
- Designed and created the database tables in DB2.
- Developed Data Access layer using JDBC for connecting to database.
- Used RTC version control for maintaining source code.
Environment: JAVA, IBM Z/Vm, DB2, RTC
