Sr. Big Data/ Hadoop Developer Resume
Philadelphia, PA
PROFESSIONAL SUMMARY:
- Having 9+ years of IT experience and expertise in Hadoop, HDFS, HBase, Hive, Sqoop, Oozie, SQL, PLSQL, Teradata, Netezza, Sql Server with hands - on project experience in various Vertical Applications which includes Banking, Financial Services, Department of Health & Education, and eSales.
- Highly dedicated and result oriented Hadoop Developer with 8+ years of strong end-to-end experience on Hadoop Development with varying level of expertise around different BIGDATA HADOOP projects.
- Expertise in core Hadoop and Hadoop technology stack which includes HDFS, Map Reduce, Oozie, Hive, Sqoop, Pig, Flume, HBase, Spark, Storm, Kafka and Zookeeper.
- Hands on experience in installing and deployment of Hadoop ecosystem components like Hadoop Map Reduce, YARN, HDFS, NoSQL, HBase, Oozie, Hive, Tableau, Sqoop, Pig, Zoo Keeper and Flume.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Experienced in implementing complex algorithms on semi/unstructured data using Map reduce programs.
- Experience in Big Data Hadoop Ecosystems experience in ingestion, storage, querying, processing and analysis of big data.
- Explored Spark, Kafka, and Storm along with other open source projects to create a POC.
- Hands on experience in developing Map Reduce programs using Apache Hadoop for analyzing the Big Data.
- Experience in importing and exporting data from RDBMS to HDFS, Hive tables, HBase by using Sqoop.
- Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
- Experienced in migrating ETL kind of operations using Pig transformations, operations and UDF's.
- Good knowledge on Python.
- Excellent Working Knowledge in Spark Core, Spark SQL, Spark Streaming.
- Developed fan-out workflow using flume for ingesting data from various data sources like Webservers, Rest API by using different sources and ingested data into Hadoop with HDFS sink.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems MYSQL, SQL SERVER and vice versa.
- Actively involved in coding using Core Java and collection API's such as Lists, Sets and Maps.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Experience on different operating systems like UNIX, Linux and Windows.
- Hands on Experience in Web Services using XML, HTML, JSON, Jquery and Ajax.
- Strong knowledge of agile development methodologies, waterfall methodologies to minimize customer impact.
- Expertise in middle-tier design and development of various web and enterprise applications using various technologies like JSP, Servlets, Struts, Hibernate, Spring, JDBC, Shell script, XML, AJAX, and Web Services
- Good understanding of all aspects of Testing such as Unit, Regression, Agile, White-box, Black-box.
- Ability to effectively manage deadlines. Self-motivated, highly organized and the ability to multi-task.
TECHNICAL SKILLS:
Big Data Platforms: Cloudera, Big Data, Hadoop, Yarn, Map Reduce, PIG, HIVE, Storm, Kafka, Oozie, Impala, Ignite, FLUME and SPARK
Languages: Java, C++, Python
Databases: Oracle, MySQL, SQL Server
No SQL Databases: Hbase, Cassandra, MongoDB, Accumulo
Job Scheduling Framework: Auto Sys, Quartz Scheduler
Operating Systems: Linux, Unix, Windows 7, Windows 8, XP, Windows vista
Hadoop Distribution: Cloudera, Horton Works, AWS
Web Technologies: HTML, XHTML, Java Script
Data Modelling tools: MS Visio, Rational Rose
Work Environments: Eclipse
PROFESSIONAL EXPERIENCE:
Confidential, Philadelphia, PA
Sr. Big Data/ Hadoop Developer
Responsibilities:
- Extracted the data from Teradata/MySQL into HDFS using Sqoop export/import.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Expertise in using Data organizational design patterns in Map Reduce to convert business data into custom format.
- Worked extensively on importing data using Sqoop.
- Implemented Custom JOINS to create tables having the records of Items by Spark SQL.
- Expertise in optimization of MapReduce algorithms using Combiners, Practitioners and Distributed Cache to deliver best results.
- Experienced with handling data from different sources at a time to reducer using Object Writable in MapReduce programs.
- Experienced knowledge over the Restful API's like Elastic Search.
- Load log data into HDFS using Flume, Kafka.
- Experienced with data processing and pipelining using Apache crunch.
- Analyzed the data by performing Hive queries and running Pig scripts. Created and worked Sqoop jobs with incremental load to populate Hive External tables. Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
- Involved in writing UNIX Shell Scripts for Informatics ETL tool to run the sessions.
- Developed UDFs in Java as and when necessary to use in HIVE queries.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Implemented authentication using Kerberos authentication using Apache Sentry.
- Deployed an Apache Solar search engine server to help speed up the search of the government cultural asset.
- Developed and implemented a migration path from multiple Play instances to a clustered Akka actor system, using Scala capped collections as an event bus.
- Implemented migration path from multiple Play instances to a clustered Akka actor system, using Scala capped collections as an event bus.
- Performed iterative algorithms using Apache Spark on top of Hadoop YARN.
Environment: Hadoop, HDFS, Flume, Sqoop, Spark, Pig, Hive, Map Reduce, Elastic Search, HBase, Oozie, MRUnit, Maven, Avro, Scala, Linux, SVN, Apache Spark, Scala, MYSQL. Kafka.
Confidential, Albany, NY
Sr. Big Data/ Hadoop Developer
Responsibilities:
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and testing the classifier using MapReduce, Pig and Hive jobs.
- Build real-time data pipelines by leveraging open-source tools such as Apache Kafka and Spark.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Analyzed the data by performing Hive queries and running Pig scripts.
- Played the role in understanding the user requirement for Regional Office and how it is related to existing NYSE-CON project.
- Played the role in developing the application using PL/SQL.
- Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production
- Developing Scripts and Auto Sys Jobs to schedule a bundle (group of coordinators), which consists of various Hadoop Programs using Oozie. Work with the Database Specialist and Technical Architect on the design work of the application.
- Created hive tables defined with appropriate static and dynamic partitions, intended for efficiency and worked on them using HIVE QL.
- Used Sqoop to import data from RDBMS into hive tables.
- Used to manage and review Hadoop logs.
- Responsible for moving the source code to Production.
- Involved in gathering the requirements, Documenting and Review from the work streams & performance teams.
- Involved in activity of VISIO diagrams for the complete flow of this application.
- Involved in the mock up design work with the Java Architect and Analyst for the UI.
- Responsible for moving the source code to UAT.
- Responsible for installation of Oracle software on Windows.
Environment: s: Hadoop, MapReduce, HDFS, Hive, Pig, Linux, XML, Cloudera, CDH3/4 Distribution, Oracle 11i, MySQL, Flume, Oozie, Hbase
Confidential, Long Island, NY
Sr. Big Data/ Hadoop Developer
Responsibilities:
- Involved in writing MapReduce jobs.
- Used Hive to do transformations, event joins, filter both traffic and some pre-aggregations before storing the data onto HDFS.
- Involved in developing Hive queries and UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Involved in using SQOOP for importing and exporting data into HDFS and Hive.
- Involved in extracting user’s data from various data sources into Hadoop HDFS.
- Implemented Commissioning and Decommissioning of new nodes to existing cluster.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
- Using Avro and Parquet in MapReduce Jobs with Hadoop, Sqoop, Hive, Impala.
- Collecting and aggregating large amounts of log data of staging data in HDFS for further analysis.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Participated in evaluation and selection of new technologies to support system efficiency.
- Participated in development and execution of system and disaster recovery processes.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
- Involved in preparing the Proof of Concept and the Presentations to demonstrate the solution to the business users on Data Integration.
- Working on Agile scrum methodologies.
- Analyzing new opportunities for my group. This include daily interaction with team to understand the business flow and analyze the application of technology to increase the time efficiency in a business work flow.
Environment: s: Hadoop, Hive 1.2, Oozie, Spark, Kafka, SQL Developer, TOAD, Oracle, Data Point, Agile - Version One, Windows 8, Unix, Teradata SQL Assistant, Agility, SQL Server.
Confidential, New Jersey
Hadoop Developer
Responsibilities:
- Written Map/Reduce programs, Pig scripts to specify the conditions to separate the fraudulent claims
- Good knowledge and understanding of REST architecture style and its application to well performing web sites for global usage.
- Worked on Cloudera distribution of Hadoop
- Worked on optimizing Shuffle and Sort phase in Map Reduce Phase.
- Experience in writing business logic using Hive UDF's to perform ad-hoc queries on structured data.
- Experience with HIVE DDLs and Hive Query language (HQLs)
- Worked on dash boards that internally use Hive queries to perform analytics on structured data, Avro and Json data.
- The Data Interface is implemented to get information of customers using Rest API and Pre-Process data using Map Reduce and store into HDFS.
- Experience with SQOOP to Import/Export data from RDBMs to HDFS.
- The Oozie work flows are configured to automate data flow, preprocess and cleaning tasks using Hadoop Actions.
- Implemented Generic writable to in corporate multiple data sources into reducer to implement recommendation based reports using Map Reduce programs.
- Implemented Map Reduce programs to find out top failure locations of the ATM’s using different tacking device.
- The Cassandra CQL is used with Java API’s to retrieve data from Cassandra tables
- Implemented Optimized joins to perform analysis on different data sets using Map Reduce programs.
- Experienced in handling Avro and Json data in Hive using Hive SerDe's.
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Cassandra, Eclipse
Confidential, NY
Hadoop Developer/ Admin
Responsibilities:
- Involved in requirement analysis, design, coding and implementation.
- Responsible for building scalable distributed data solutions using Hadoop Cloudera.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Experience in supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud, performed Export and import of data into s3.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Used Sqoop to import the data from Hadoop Distributed File System (HDFS) to RDBMS.
- Established custom Map Reduce programs in order to analyze data and used Pig Latin to clean unwanted data.
- Participated in SOLR schema and ingested data into SOLR for data indexing.
- Extensive experience in designing and implementing Data Flow pipeline from RDBMS to Hadoop.
- Worked on S3 buckets on AWS to store Cloud Formation Templates.
- Worked on AWS to create EC2 instances.
- Worked on various performance optimizations like using distributed cache for small datasets, partition, Bucketing and Map side joins.
- Involved in creating Hive tables and applied those HQL on the tables for data validation.
- Responsible for installation and configuration of Hive, Pig, Hbase and Sqoop on the Hadoop cluster.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Used Zookeeper to manage coordination among the clusters.
- Worked with Impala to pull the data from Hive tables.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Hands on experience with NoSQL databases like MongoDB, Cassandra for POC (proof of concept) in storing URL's, images, products and supplements information at real time.
Environment: HDFS, Hadoop, Pig, Hive, Sqoop, Flume, MapReduce, Oozie, Mongo DB, Java 6/7, Oracle 10g, Sub Version, Toad, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Agile Methodology, JIRA, Auto Sys.
Confidential, GA
Java Developer/ Hadoop Developer
Responsibilities:
- Involved in Use Case meeting to understand and analyze the requirements, Coded as per Prototype.
- Developed various UI (User Interface) components using Struts (MVC), JSP, and HTML.
- Developed Controllers, created JSPs and configured in Struts-config.xml, Web.xml files.
- Developed MVC architecture, Business Delegate, Service Locator, Session facade, and Data Access Object and Singleton patterns
- Involved in writing all client side validations using Java Script, JSON.
- Involved in the complete development, testing and maintenance process of the application.
- Used Hibernate as the ORM tool to communicate with the database.
- Designed and created a web-based test client using Struts up on client’s request, which is used to test the different parts of the application.
- Involved in writing the test cases for the application using JUnit.
- Used extensive JSP, HTML, and CSS to develop presentation layer to make it more user friendly.
- Involved in different Testing phases like Unit Test, Integration Test and Regression Test.
- Involved in Development process and have knowledge in usage of Tracker Tools like JIRA.
- Involved in Restful Web services with JQuery using Jackson API,
- Involved in Web services (SOAP, RESTful) Testing using Infor EAM Web Service tool kit
Environment: Core Java, JSP, Servlets, Struts, EJB2.0, Ext JS, XML, Oracle 11g, PostgreSQL, Java Script, Web Service, SQL Server 2008R2, Eclipse, TOAD, JIRA, SVN, Tortoise, Log4j.
