We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

5.00/5 (Submit Your Rating)

Cincinnati, OH

SUMMARY:

  • 8+ years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE and Big Data related technologies.
  • Hadoop Developer with 4+ years of working experience in designing and implementing complete end - to-end Hadoop Infrastructure using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Solr, Flume, Oozie, Impala, HBase, Zookeeper, etc.
  • Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
  • Profound experience in creating real time data streaming solutions using Apache Spark / Spark Streaming , Kafka and Flume.
  • Expertise developing MapReduce jobs to scrub, sort, filter, join and query data.
  • Experience developing PigLatin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.
  • Good Hands-on experience on full life cycle implementation using MapR, CDH (Cloudera) and HDP (Hartonworks Data Platform).
  • Strong Knowledge on Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce Framework and Spark execution framework.
  • Experience in working with Python, Scala and Hadoop Streaming Command options.
  • Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experience in handling messaging services using Apache Kafka .
  • Experience integration of Kafka with Storm and Spark for real time data processing.
  • Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Worked on NoSQL databases including HBase, Cassandra and Mongo DB.
  • Strong experience in collecting and storing stream data like log data in HDFS using Apache Flume.
  • Experience in working with Java HBase API for ingestion processed data to Hbase tables.
  • Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and Pig jobs.
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
  • Hands on experience in Capacity planning, monitoring and Performance Tuning of Hadoop Clusters.
  • A nalyzed data with Hue , using Apache Hive via Hue’s Beeswax and Catalog applications.
  • Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
  • Proficient in visualizing data using Tableau, QlikView, Microstratergy and MS Excel.
  • Experience in developing ETL scripts for data acquisition and transformation using Informatica and Talend.
  • Knowledge on Enterprise data warehouse (EDW) architecture and various data modeling concepts like star schema, snowflake schema and also Teradata .
  • Worked on data warehouse product Amazon Redshift which is a part of the AWS (Amazon Web Services).
  • Experienced in Java Application Development , Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, Spring, Hibernate, Struts, JMS, Web Services (SOAP/ REST ), Oracle, SQL Server and other relational databases .
  • Experience writing Shell scripts in Linux OS and integrating them with other solutions.
  • Experience in creating complex data warehouse and application roadmaps and BI implementation specializing in Teradata platforms and design, development and maintenance of ETL code.
  • Fluent with the core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Collections, Data-structures and Serialization
  • Experienced in using Agile methodologies including extreme programming, SCRUM and Test Driven Development (TDD).
  • Experienced in creating and analyzing Software Requirement Specifications SRS and Functional Specification Document FSD. Strong knowledge of Software Development Life Cycle SDLC.
  • Devoted to professionalism, highly organized, ability to work under strict deadline schedules with attention to details, possess excellent written and communication skills.
  • Excellent global exposure to various work cultures and client interaction with diverse teams.

TECHNICAL SKILLS:

Programming Languages: C, Java, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL

Big Data Eco System: HDFS, MapReduce, Apache Crunch, Hive, Pig, Impala, HBase, Sqoop, NOSQL (HBase, Cassandra), Hadoop Streaming, ZooKeeper, Oozie, Solr, Kafka and Flume.

J2EE Technologies: J2EE, Servlets, JSP 2.1/2.2, EJB2.1/3.0, JDBC, MVC Architecture, Java Beans, JNDI, RMI, JMS, Java, ANT 1.8, JavaScript, SpringJ2EE Frameworks: Struts2, Hibernate, Spring 4.2/3.0, JUnit, Log4j, ANT, MAVEN

Distributed Platforms: Hortonworks, Cloudera, MapR

Scripting Languages: Unix Shell Scripting, Perl.

Methodologies: Agile, SDLC, Waterfall, RUP

Version Control Tools: SVN, CVS, GITHUB

Operation Systems: Windows 7/2000/98/XP/NT, UNIX, LINUX, MAC iOS

Web Technologies: JavaScript, Html, jQuery, AJAX, JSF, Web Services, SOAP, REST, WSDL, UDDI, Angular JS

Web/Application Servers: Apache Tomcat 5.x, BEA Web logic 9.0/10, Web Sphere 8.5.5, JBoss 7.1, Glass Fish

NoSQL Technologies: Cassandra, MongoDB, HBase.

Databases: Oracle 11g/12C, MySQL, MS-SQL Server, Teradata, IBM DB2

Software Tools: Eclipse IDE, Net Beans, Dreamweaver, Workbench, ANT, Junit 4.1, DTD, XML schema, TOAD, Visual Studio, Oracle SQL Developer, Tortoise SVN

PROFESSIONAL EXPERIENCE:

Sr. Hadoop/Spark Developer

Confidential, Cincinnati, OH

Responsibilities:

  • Developed simple to complex MapReduce jobs using Java language for processing and validating the data.
  • Developed data pipeline using Sqoop, Spark, MapReduce, and Hive to ingest, transform and analyze, customer behavioral data.
  • Implemented Spark using python and Spark SQL for faster processing of data and algorithms for real time analysis in Spark.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra. Real time streaming the data using Spark with Kafka.
  • Developing Kafka producers and consumers in java and integrating with apache storm and ingesting data into HDFS and HBase by implementing the rules in storm .
  • Develop efficient MapReduce programs in Python to perform batch processes on huge unstructured datasets.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Python .
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Created HBase tables and column families to store the user event data and wrote automated HBase test cases for data quality checks using HBase command line tools.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume .
  • Scheduled and executed workflows in Oozie to run Hive and Pig jobs and created UDF ’s to store specialized data structures in HBase and Cassandra.
  • Develop NiFi workflow to pick up the multiple retail files from ftp location and move those to HDFS on daily basis.
  • Worked with developer teams on Nifi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker.
  • Evaluated Hortonworks NiFi (HDF 2.0) and recommended solution to inject data from multiple data sources to HDFS & Hive using NiFi and importing data using Nifi tool from Linux servers.
  • Developed product profiles using Pig and commodity UDFs & developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s .
  • Tuned Spark/Python code to improve the performance of machine learning algorithms for data analysis.
  • Performed data validation on the data ingested using MapReduce by building a custom model to filter all the invalid data and cleanse the data.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Teradata concepts were used for the early instance creation with the DBMS concepts.

Environment: Hadoop, MapReduce, Yarn, Spark, Hive, Pig, Kafka, HBase, Oozie, Sqoop, Python, Bash/Shell Scripting, Flume, HBase, Cassandra, Oracle 11g, Core Java, Storm, HDFS, Unix, Teradata, NiFi, Eclipse.

Sr. Hadoop Developer

Confidential, Santa Clara, CA

Responsibilities:

  • Developed data pipeline using Flume , Sqoop , Pig and Java mapreduce and Spark to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop .
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline. This pipeline is also involved in Amazon Web Services EMR, S3 and RDS .
  • Develop and run MapReduce jobs on a multi Peta byte YARN and Hadoop clusters which processes billions of events every day, to generate daily and monthly reports as per user's need.
  • Used AWS Data Pipeline to schedule Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
  • Expertise in AWS data migration between different database platforms like SQL Server to Amazon Aurora using RDS tool.
  • Used Storm to consume events coming through Kafka and generate sessions and publish them back to Kafka.
  • Experience working on multiple node cluster tool which offer several commands to return HBase usage.
  • Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase .
  • Optimizing Mapreduce code, pig scripts , user interface analysis, performance tuning and analysis.
  • Further using pig to do transformations, event joins, and pre-aggregations performed before loading JSON files format onto HDFS.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Worked on the clickstream data to load into structured format using Spark/Scala on top of HDFS.
  • Worked on migrating MapReduce programs into Spark transformations using Scala .
  • Written Hive queries using optimized ways like user-defined functions, customizing Hadoop shuffle & sort parameters.
  • Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Used MongoDB as part of POC and migrated few of the stored procedures in sql to MongoDB .
  • Experience on Unix shell scripts for business process and loading data from different interfaces to HDFS.
  • Delivering tuned, efficient and error free codes for new Big Data requirements using my technical knowledge in Hadoop and its Eco-system.

Environment: Hadoop, YARN, Cloudera Manager, Scala, Splunk, Redhat Linux, Bash/Shell Scripting, Unix, AWS, EMR, GIT, HBase, MongoDB, Cent OS, Storm, Java, NoSQL-Kafka, Perl, Cloudera Navigator.

Hadoop Developer

Confidential, New York, NY

Responsibilities:

  • Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
  • Developed multiple MapReduce jobs in Java for complex business requirements including data cleansing and preprocessing.
  • Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Developed Sqoop scripts to import/export data from Oracle to HDFS and into Hive tables.
  • Worked on analyzing Hadoop clusters using Big Data Analytic tools including MapReduce, Pig and Hive.
  • Worked on automation of delta feeds from Teradata using Sqoop, also from FTP Servers to Hive.
  • Involved in developing and writing Pig scripts and to store unstructured data into HDFS.
  • Involved in creating tables in Hive and writing scripts and queries to load data into Hive tables from HDFS.
  • Scripted complex Hive QL queries on Hive tables for analytical functions.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
  • Modeled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
  • Developed Java custom record reader, partition and serialization techniques.
  • Used different data formats (Text format and Avro format) while loading the data into HDFS.
  • Created tables in HBase and loading data into HBase tables.
  • Developed scripts to load data from HBase to Hive meta store and perform MapReduce jobs.
  • Created custom UDF’s in Pig and Hive.
  • Implemented POC using Apache Impala for data processing on top of Hive
  • Created partitioned tables and loaded data using both static partition and dynamic partition methods.
  • Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
  • Designed and developed Dashboards for Analytical purposes using Tableau.
  • Analyzed the Hadoop log files using Pig scripts to oversee the errors.

Environment: Hadoop, HDFS, MapReduce, Impala, Hive, Sqoop, Teradata, Pig, HBase, Unix, Oozie, CDH distribution, Java, Eclipse, Shell Scripts, Tableau, Windows, Linux.

Hadoop Developer

Confidential, Norwalk, CT

Responsibilities:

  • Understanding business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms.
  • Created Hive Tables, loaded transactional data from Teradata using Sqoop.
  • Developed MapReduce jobs for cleaning, accessing and validating the data.
  • Implemented Hive Generic UDF’s to in corporate business logic into Hive Queries.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Designed and developed PIG Latin Scripts to process data in a batch to perform trend analysis
  • Wrote Pig scripts to transform raw data from several data sources.
  • Hands-on experience in using Hive partitioning, bucketing and execute different types of joins on Hive tables and implementing Hive SerDes like JSON and Avro .
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop
  • Worked on Developing custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes of data with respect to business requirement.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated with Continuous Integration servers like Jenkins to build jobs.
  • Involved in End-to-End implementation of ETL logic.
  • Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
  • Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: Hadoop, HDFS MapReduce, Hive, Pig, HBase, Sqoop, Unix, Oozie, Maven, Teradata, Shell Scripting, CDH3, Cloudera Manager

Sr. Java Developer

Confidential, Kenilworth, NJ

Responsibilities:

  • Involved in analysis and design phases of Software Development Life Cycle (SDLC). Developed the functionalities using Agile Methodology (Test Driven Development).
  • Used Eclipse and JBoss to develop source code and debug the application.
  • Integrated Spring with Hibernate for persistence with database.
  • Developed various components using Bounce framework which is customized Spring framework.
  • Used Hibernate ORM framework as persistence engine, configured O/R mapping and wrote hibernate queries.
  • Hibernate Session Factory, Hibernate mapping files, dependencies between delegate classes, DAOs, controller classes, validation classes and the domain Objects are configured as part of the spring configuration file.
  • Used Hibernate HQL to query the database. Involved in Migrating the JDBC Code into Hibernate and implemented various features using Collection APIs.
  • Used Hibernate Criteria API to apply filtration rules and logical conditions on persistence objects.
  • Implemented Web services using WSDL/SOAP and created web services and clients to use services
  • Involved in front end designing using HTML, CSS, AJAX Java Script, Angular JavaScript, Bootstrap.
  • Designed in implementing the Business Delegates, Session Façade Design Patterns.
  • Worked on Free Marker Templates to generate .ftl files and included the .ftl files in jsp pages.
  • Worked on Spring Security for Authentication of Users in of the application.
  • Develop the JSP pages using Struts2 tags and used tiles in JSP to make reusable code.
  • Develop algorithms to achieve various tasks in Search functionality using MarkLogic and XQuery and XPath. Used Metadata to interact extensively with Binary Data in Marklogic .
  • Used JSON and XML documents with Marklogic NoSQL Database extensively. REST API calls are made using Node JS and Java API.
  • Used JQuery to generate data tables and to sort the columns. Using jQuery validations for the fields and use masking for the input fields like ssn, date etc.
  • Used IBM Data Studio to view and edit the tables.
  • Worked on spring cron trigger jobs to schedule automated jobs to run every night and generate the Medicaid automate enroll/disenrollment data file and ftp it to the welfare inbound folder.
  • Used Jenkins as continuous integration to load JAR files from Database required for running the application
  • Resolved bugs/defects in application by coordinated with team members of the project to assure a positive outcome using JIRA.
  • Experience in Object Oriented Analysis and Design (OOAD) techniques using UML in Rational Rose and MS Visio .
  • Developed Struts action classes and integrated struts with hibernate to connect to database.
  • Wrote unit testing codes using JUnit, resolved bugs and other defects using Firebug and Eclipse's debugging tool.

Environment: Java, JSP, JDK, Spring MVC 3, Spring Security, Struts, Hibernate, Apache Tiles, MarkLogic, JavaScript, Ajax, jQuery, CSS3, JAX -WS, Web Services, SOAP, WSDL, DB2, JUnit4, Log4j, SOAP UI, GIT, Maven, Unix, Eclipse, SVN, JBoss 7,Web Sphere 8.5, JIRA, IBM Data Studio, SQL,FTP.

Java Developer

Confidential, Camp Hill, PA

Responsibilities:

  • Implemented the application in Agile methodologies - Sprint & scrums
  • Develop web application using Struts Framework.
  • Develop user interfaces using JSP , HTML and CSS .
  • Develop DAO design pattern for hiding the access to data source objects.
  • Worked directly with product owners to get requirement and implemented it.
  • Actively participated in planning sessions for the Sprints , effort estimations, Backlog refinements and dividing features into User Stories and Tasks.
  • Involve in implementation of REST and SOAP based web services.
  • Worked on improvising the performance of the application.
  • Use SVN for software configuration management and version control
  • Wrote Script for AJAX implementations in the website and created components, used AngularJS / JQuery for client side form validations and used JSON for creating objects in Java script.
  • End-to-end design, setup, integration and maintenance of CI/CD pipeline from source-control to production.
  • Closely working with QA, Business and Architect to solve various Defects in quick and fast to meet the deadlines.
  • Followed Use Case Design Specification and developed Class and Sequence Diagrams using RAD, MS Visio .
  • Involved in writing the JUNIT test cases and testing the functionality. And also involved in smoke testing & integrating testing.

Environment: JAVA/J2EE, Struts 2, JDBC, Hibernate, AngularJS, JSP, JQuery, MS Visio, Tomcat, WebLogic, Oracle11g, REST Web services, Apache CXF, Junit, SVN, GIT, Maven, Rally .

Java Developer

Confidential, Broomfield, CO

Responsibilities:

  • Involved in design and development of core product with J2EE & Struts2 architecture for Application development.
  • Involved in developing the internal workflow using Action classes for online and data migration transactions for this product and implemented the application using Struts Framework and which is based on Model View Controller design pattern for clean separation of business logic from the presentation layer
  • Development UI modules using HTML, JSP, Angular JavaScript and CSS.
  • Design and developed Microservices business components using Spring boot.
  • Developed JSP custom tags for different JSP pages and client-side validations using JavaScript in application development.
  • Used Hibernate as an Object Relational Mapping tool for the data persistence.
  • Used SVN for version control across common source code used by developers.
  • Involved in the discussions with business users, testing team to finalize the technical design documents.
  • Designed and Implemented Unit testing using JUNIT Framework.
  • Developed Servlets and back-end java classes using WebSphere Application server.
  • Developed an API to write XML documents from a database. Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
  • Analyzed the performance of system software code and wrote code to tune it.
  • Performed usability testing for the application using JUnit Test cases.
  • Hibernate Tools were used as persistence Layer - using the database and configuration data to provide persistence services (and persistent objects) to the application.
  • Created and maintained mapping files and transaction control in Hibernate.
  • Helped to integrate the dynamic data HTML and validated them using JavaScript.
  • Build user interface using JSP , Java Script, custom tags and AJAX .
  • Extensively used XSLT transformation.
  • Used OSS framework and Designed a dataflow to migrate data from SOAP XML’S to DB.
  • Written SQL queries and PL/SQL stored procedures.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Java, J2EE, JSP, EJB, JavaScript, Ajax, XML, Struts2, Hibernate3.0, Tortoise SVN, Log4J, ANT, WAS Server, Eclipse, WINSCP, WebSphere, WebLogic, Toad, Oracle 10g, PL/SQL, GIT, XML, XSLT.

Java Developer

Confidential

Responsibilities:

  • Assisted in designing and programming for the system, which includes development of Process Flow Diagram, Entity Relationship Diagram, Data Flow Diagram and Database Design.
  • Involved in Transactions, login and Reporting modules, and customized report generation using Controllers, Testing and debugging the whole project for proper functionality and documenting modules developed.
  • Designed front end components using JSF.
  • Involved in developing Java APIs, which communicates with the Java Beans.
  • Implemented MVC architecture using Java, Custom and JSTL tag libraries.
  • Involved in development of POJO classes and writing Hibernate query language (HQL) queries.
  • Implemented MVC architecture and DAO design pattern for maximum abstraction of the application and code reusability.
  • Created Stored Procedures using SQL/PL-SQL for data modification.
  • Used XML, XSL for Data presentation, Report generation and customer feedback documents.
  • Used Java Beans to automate the generation of Dynamic Reports and for customer transactions.
  • Developed JUnit test cases for regression testing and integrated with ANT build.
  • Implemented Logging framework using Log4J.
  • Involved in code review and documentation review of technical artifacts.

Environment: J2EE/Java, JSP, Servlets, JSF, Hibernate, Spring, JavaBeans, XML, XSL, HTML, DHTML, JavaScript, CVS, JDBC, Log4J, Oracle 9i, IBM WebSphere Application Server

We'd love your feedback!