Sr. Hadoop /spark Developer Resume
Atlanta, GA
SUMMARY
- Around 10 years of progressive experience in the IT industry with proven expertise in architecting and implementing Software Solutions using Java&Big Data technologies.
- Around 7 years of experience on Batch Analytics using Hadoop environment includes Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Oozie, and Sqoop.
- Worked extensively in Real time analytics using Storm and Spark - Streaming. Used ingestion tools like Flume, Kafka and Sqoop.
- In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Node Manager, Applications Master, Name Node, Data Node concepts.
- Experience in importing and exporting data using Sqoop from Relational DatabaseSystems to HDFS and vice-versa.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Developed the application using Spring Framework that leverages classical Model View Controller (MVC) architecture.
- Designed and built user interface using spring and JavaScript& employed collection libraries.
- Designed a website for understanding the user requirements and validated the web page using JQuery in Conjunction with Java Spring/ Hibernate.
- Experience in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
- Developed Pig Latin scripts for data cleansing and Transformation.
- Job workflow scheduling and monitoring using tools like Oozie and IBM Tivoli.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from different sources like Storm and Spark.
- Good experience in Cloudera, Hortonworks&Apache Hadoop distributions.
- Worked with relational database systems (RDBMS) such as MySQL, MSSQL, Oracle Relational database systems like HBase and Cassandra.
- Assisted with performance tuning and monitoring of Kafka, HBase, Storm, Pig and Hive.
- Used Shell scripting to move log files into HDFS.
- Good hands on experience in creating the RDD's, DF's for the required input data and performed the data transformations using Spark.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Good understanding in processing of real-time data using Spark. Import the data from different sources like HDFS/HBase into Spark RDD.
- Experience in writing MapReduce jobs in python for some complicated queries.
- Experienced with different file formats like Parquet, ORC, CSV, Text, Sequence, XML, JSON and Avro files.
- Good knowledge on Data Modelling and Data Mining to model the data as per business requirements.
- Involved in unit testing of Map Reduce programs using Apache MRunit.
- Installed, configured, upgraded and administrated Linux Operating Systems.
- Experience writing in house UNIX shell scripts for Hadoop & Big Data Development.
- Good knowledge on python scripting and bash scripting languages.
- Expert in Data Visualization development using Tableau to create complex and innovative dashboards.
- Extensively used Java and J2EE technologies like Core Java, Java Beans, Servlet, JSP, spring, Hibernate, JDBC, JSON Object, and Design Patterns.
- Experienced in Application Development using Java, J2EE, JSP, Servlets, RDBMS, Tag Libraries, JDBC, Hibernate and XML.
TECHNICAL SKILLS
Big data/Hadoop Ecosystem: HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Spark, Storm, Kafka, Impala.
Java / J2EE Technologies: Core Java, Servlets, JSP, JDBC, XML, REST, SOAP
Programming Languages: C, C++, Java, Scala, SQL, PL/SQL, Linux shell scripts, python.
NoSQL Databases: MongoDB, Cassandra, HBase
Database: Oracle 11g/10g, DB2, MS-SQL Server, MySQL, Teradata.
Web Technologies: HTML, XML, JDBC, JSP, CSS, JavaScript, AJAX, SOAP, Angular JS
Frameworks: MVC, Hibernate 3, Spring 3/2.5/2
Tools: Used: Eclipse,IntelliJ,Putty, WinSCP,NetBeans,QC,QlikView.
Operating System: Ubuntu (Linux), Win 95/98/2000/XP, Mac OS
Methodologies: Agile/Scrum, and Waterfall
Distributed plat forms: Hortonworks, Cloudera, MapR.
PROFESSIONAL EXPERIENCE
Confidential, Atlanta, GA
Sr. Hadoop /Spark Developer
Responsibilities:
- Imported bulk data into HBase Using Map Reduce programs.
- Written Storm topology to accept the events from Kafka producer and emit into HBase.
- Developed a data pipeline using Kafka and Strom to store data into HDFS.
- Developed HDFS with huge amounts of data using Apache Kafka.
- Implemented a proof of concept (Poc's) using Kafka, Strom, HBase for processing streaming data.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoopusing Spark context, Spark-SQL,Data Frame, pair RDD's, Spark YARN.
- Experience in deploying data from various sources into HDFS and building reports using Tableau.
- Performed real time analysis on the incoming data.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Involved in loading data from UNIX file system to HDFS.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Developed Shell scripts and Python programs to automate tasks.
- Created UNIX shell scripts for parameterizing the sqoop and hive jobs.
- Continuous monitoring and managing the Hadoopcluster using Hortonworks Worked on Oozie workflow to run multiple jobs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: HadoopMap Reduce, HDFS, Spark, Java, Kafka, Hive, HBase, maven, Jenkins, Pig, UNIX, Python, MRUnit, Git, Storm, Hortonworks, Oozie.
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
- Developed Map Reduce Programs for data analysis and data cleaning.
- Developed PIG Latin scripts for the analysis of semi structured data.
- Worked on analyzing Hadoop clusters using Big Data Analytic tools including Map Reduce, Pig and Hive.
- Responsible to manage data coming from different sources.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
- Involved in developing and writing Pig scripts and to store unstructured data into HDFS.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Migration of ETL processes from MySQL to Hive to test the easy data manipulation.
- Developed Hive queries to process the data for visualizing.
- Installed, configured, upgraded and administrated Linux Operating Systems.
- Develop Unix Shell scripts to perform ELT operations on big data using functions like Sqoop, create external/internal Hive tables, initiate HQL scripts etc.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
- Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dash board actions and Table calculations.
- Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highlyscalable.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS.
- Implemented test scripts to support test driven development and continuous integration.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.
Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, Impala, HBase, Oozie, CDH distribution, MySQL, Tableau, Java, Eclipse, Shell Scripts, Spark, Windows, Linux.
Confidential, Cincinnati, OH
Hadoop Developer
Responsibilities:
- Involved in Automation of clickstream data collection and store into HDFS using Flume.
- Involved in creating Data Lake by extracting customer's data from various data sources into HDFS.
- Used Sqoop to load data from Oracle Database into HDFS.
- Developed MapReduce programs to cleanse the data in HDFS obtained from multiple data sources.
- Involved in creating Hive tables as per requirement defined with appropriate static and dynamic partitions.
- Used Hive to analyze the data in HDFS to identify issues and behavioral patterns.
- Involved in production Hadoop cluster setup, administration, maintenance, monitoring and support.
- Successfully loaded files to Hive and HDFS from Cassandra .
- Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network.
- Analyze Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
- Cluster coordination services through Zookeeper.
- Efficiently put and fetched data to/from HBase by writing MapReduce job.
- Developed MapReduce jobs to automate transfer of data from/to HBase.
- Created data queries and reports using Qlik view and Excel. Created Customs queries/reports designed for qualifying verification and information sharing.
- Assisted with the addition of Hadoop processing to the IT infrastructure.
- Used flume to collect the entire web log from the online ad-servers and push into HDFS.
- Implemented MapReduce job and execute the MapReduce job to process the log data from the ad servers.
- Load and transform large sets of structured, semi structured and unstructured data.
- Back-endJava developer for Data Management Platform (DMP) and building RESTful APIs to build and let other groups build dashboards.
- Responsible for building Scalable distributed data solutions using HortonWorks.
- Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
Confidential
Java/J2EE/Hadoop Developer
Responsibilities:
- Participated in requirement gathering and converting the requirements into technical specifications.
- Created UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams.
- Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
- Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
- Created Business Logic using Servlets, POJO’s and deployed them on Web logic server.
- Wrote complex SQL queries and stored procedures.
- Developed the XML Schema and Web services for the data maintenance and structures.
- Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
- Responsible to manage data coming from different sources.
- Developed map reduce algorithms.
- Got good experience with NOSQL database.
- Installed and configured Hiveand also written Hive UDFs.
- Integrated Hadoop with Solr and implement search algorithms.
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 10g database.
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
- Used struts validation framework for form level validation.
- Wrote test cases in JUnit for unit testing of classes.
- Involved in creating templates and screens in HTML and JavaScript.
- Involved in integrating Web Services using SOAP.
Environment: Hive 0.7.1, Apache Solr - 3.x, HBase-0.90.x/0.20.x, JDK 1.5, Struts 1.3, WebSphere 6.1, HTML, XML, JavaScript, JUnit 3.8,Oracle 10g, Amazon Web Services.
Confidential, Houston, TX
Java Developer
Responsibilities:
- Involved in prototyping, proof of concept, design, Interface Implementation, testing and maintenance.
- Designed and developed front view components using HTML and JSP.
- Developed Interactive web pages using AJAX and JavaScript.
- Developed UI navigation using Struts MVC architecture (JSP, JSF, tiles, JSTL, Custom Tags).
- Created services for various modules like Account (CD/Checking/Savings) Creation and Maintenance using Struts framework.
- Developed reusable utility classes in core java for validation which are used across all modules.
- Developed Java classes for implementing Business logics using EJB 3.0(Stateless session, entity, message driven beans).
- Used JNDI to support transparent access to distribute/d components, directories and services.
- Provided data persistence via Hibernate for CRUD operations in the application.
- Configured and tested the application with database server Oracle 10g.
- Used Oracle, server databases as backend applications and generated queries using Toad.
- Deployed and tested the application with servers Tomcat.
- CVS was used for the version control.
- Responsible for writing JUnit test cases and Peer level testing.
- Involved in bug fixing using Jira.
- Involved in developing various reusable Helper and Utility classes using Core Java, which are being used across all the modules of the application.
Environment: Java 1.4 HTML 4, JavaScript, JSP 2.2, JSTL 1.2, Struts 2.0, EJB 3.0, Hibernate 3.0, JNDI, XML, AJAX, SOAP, WSDL, UML, Shell Scripting, JUnit, log4j, JMS, Apache Tomcat 6.0, JBoss 5.0, Oracle 10g Database, Toad, CVS, Eclipse, Windows NT, Unix/Linux.