Sr. Hadoop Developer / Spark Developer Resume

SUMMARY

Over 8 years of professional IT experience with expertise in Java, J2EE, Hadoop and Big data ecosystem related technologies.
5+ years of exclusive experience in Big Data technologies and Hadoop ecosystem components like Spark, MapReduce, Hive, Pig, YARN, HDFS, NoSQL systems like HBase, Cassandra, Oozie, Sqoop, Flume, Zookeeper, Hue and Kafka.
Strong Knowledge of Architecture of Distributed systems and Parallel processing frameworks.
In - depth understanding of MapReduce Framework and Spark execution model.
Worked extensively on fine-tuning long running Spark Applications to utilize better parallelism and executor memory for more caching.
Strong experience working with both batch and real-time processing using Spark framework.
Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
Performed Importing and exporting data into HDFS, Hive and HBase using Sqoop.
Hands-on experience on full life cycle implementation using MapR, CDH (Cloudera) and HDP (Hortonworks Data Platform).
Involved in Design and Architecting of Big Data solutions using Hadoop Eco System.
Experience working in large scale Databases like Oracle 11g, DB2,XML, MS Excel and Flat files
Experience in optimizing Map-Reduce algorithms by using Combiners and Custom partitioners.
Expertise in back-end/server-side java technologies such as: Web services, Java persistence API (JPA), Java Messaging Service (JMS), Java Database Connectivity (JDBC)
Strong knowledge of performance tuning Hive queries and troubleshoots distinct kinds issues in Hive.
Experience in NoSQL Column-Oriented Databases like HBase, Apache Cassandra, MongoDB and its Integration with Hadoop cluster.
Experienced in writing custom MapReduce programs &UDF's in Java to extend Hive and Pig core functionality.
Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
Created Talend Mappings to populate the data into dimensions and fact tables.
Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance Tuning of mappings.
Hands on experience in Capacity planning, monitoring and Performance Tuning of Hadoop Clusters.
Involved in finding, evaluating and deploying new Big Data technologies and tools.
Proficient knowledge ofApache Spark and programming SCALA to analyze large datasets using Spark and Storm &Kafka to process real time data.
Worked on writing custom UDF’s in java for Hive and Pig.
Worked with Sqoop to move (import/export) data from a relational database into Hadoop.
Experience working with Hadoop clusters using Cloudera, Amazon AWS and Hortonworks distributions.
Experience in installation, configuration, support and management of a Hadoop Cluster.
Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
Experienced in using agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).
Experience in creating Hive tables with different file formats like Avro, Parquet, ORC.
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Mastered in using different columnar file formats like Avro, RCFile, ORC and Parquet formats.
Proficient in integrating and configuring the Object-Relation Mapping tool, Hibernate in J2EE applications and other open source frameworks like Struts and Spring.
Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.
Experience in writing test cases in Java Environment using JUnit.
Hands on experience in development of logging standards and mechanism based on Log4j
Experience in building, deploying and integrating applications with ANT, Maven.
Good knowledge ofWeb Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
Flexible, enthusiastic and project-oriented team player with excellent communication skills with leadership abilities to develop creative solutions for the challenging requirement of client.
TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, MapReduce, YARN, HDFS, HBase, Zookeeper, Hive, Hue, Pig, Sqoop, Spark, Oozie, Storm, Flume, Talend, Cloudera Manager, Amazon AWS, NiFi, Apache Ambari, Zookeeper, Hortonworks, Impala, Amazon Redshift, Airflow, Phoenix, Pachyderm, Tableau

Languages: C, C++, Java, Advanced PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL

Java/J2EE: J2EE, Servlets, JSP

Frame works: Struts, Spring 3.x, ORM (Hibernate), JPA, JDBC

Web Services: SOAP, Restful, JAX-WS

Web Servers: Web Logic, Web Sphere, Apache Tomcat, Glassfish 4.0

Scripting Languages: Shell Scripting, Java script, AngularJS.

Database: Oracle 9i/10g, Microsoft SQL Server, MySQL, DB2, Teradata, PostgreSQL

NOSQL DataBase: MongoDB, Cassandra, HBase

IDE & Build Tools: NetBeans, Eclipse, ANT, Jenkins and Maven.

Version Control System: GITHUB, CVS, SVN.

PROFESSIONAL EXPERIENCE

Confidential

Sr. Hadoop Developer / Spark Developer

Responsibilities:
- Developed series of data ingestion jobs for collecting the data from multiple channels and external applications.
- Worked on both batch and streaming ingestion of the data.
- Worked on batch processing and stream processing of data using Spark and Spark Streaming.
- Worked with Kafka extensively for writing the streaming data to Kafka topics.
- Imported data from S3 and performed various data transformations and actions using Spark RDD API and Spark-SQL API.
- Worked on developing Oozie workflows to automate the data pipelines.
- Worked on ingesting data from SQL-SERVER to S3 using Sqoop with in AWS EMR.
- Migrated Map-reduce jobs to Spark applications and integrated with Apache Phoenix and HBase.
- Involved in loading and transforming large sets of data and analyzed them using Hive Scripts.
- Created Tables in Google cloud using Big Query. Extracting the data from the Google Cloud for reporting on Tableau.
- A loaded portion of processed data into Redshift tables and automated the process.
- Worked on migrated Oozie workflows into Apache Airflow DAGs.
- Wrote hive queries for data analysis to meet business requirements
- Worked hand-in-hand with the Architect; enhanced and optimized productSparkcode to aggregate, group and run data mining tasks using Sparkframework.
- Monitored and tuned Spark jobs running on the cluster
- Hands on experience in joining raw data with the reference data using Pig scripting.
- Acquaintance on Teradata Database, Load and Export utilities
- Written custom UDF’s in Hive, Pig using Java and python.
- Involved in enhancing the speed performance using Apache Spark.
- Architected, designed and implemented a Big Data initiative using Hadoop Framework, MapReduce, Pig, Hive. Spark, HBase to process large volumes of structured and unstructured data.
- Developed Scala scripts, UDFs in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Used Spark to perform analytics on data in the hive.
- Created real time data ingestion using Spark streaming to Hadoop
- Hands on extracting data from different databases and to copy into HDFS file system using Sqoop.
- Created Oozie coordinated workflow to execute Sqoop incremental job daily.
- Hands on experience in exporting the results into relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked on various performance optimizations in sparkslike using distributed cache, dynamic allocation, proper resource allocations and custom Spark UDFs.
- Worked on fine tuning long running hive queries by utilized proven standards like using Parquet Columnar format, partitioning, vectorized execution etc.,
Environments: Hadoop 2.x, Pig, HDFS, Scala, Spark, Apache Airflow, Kafka, Sqoop, HBase, Oozie, Java, Maven, IntelliJ, HBase, Putty, AWS EMR, S3, RedShift, Tableau, Machine Learning (MLLIB), K-Means, Map Reduce, HIVE, Pig, cassandra, Teradata, MySql, SVN, Putty, Zookeeper, Linux Shell Scripting.

Confidential

Sr. Hadoop Developer

Responsibilities:
- Created custom input adapters for pulling the raw click stream data from FTP servers and AWS S3 buckets.
- Created Kafka producers for streaming real time click stream events from third party Rest services into our topics.
- Developed Spark streaming applications for consuming the data from Kafka topics.
- Implemented Spark batch applications using Scala for performing various kinds of cleansing, de-normalization and aggregations on hourly click stream logs.
- Worked on automation of delta feeds from Teradata using Sqoop.
- Implemented Hive tables and HQL Queries for the reports. Written and used complex data type in Hive.
- Worked on NiFi for tracking the data from ingestion to aggregation.
- Successfully loaded files to HDFS from Teradata and loaded from HDFS to HIVE.
- Implemented Kafka, Spark streaming and HBase for establishing real time pipeline.
- Wrote PIG scripts and executed by using Grunt shell.
- Worked on the conversion of existing MapReduce batch applications for better performance.
- Big data analysis using Pig and User defined functions (UDF).
- Worked on loading tables to Impala for faster retrieval using different file formats.
- The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
- Created Reports and Dashboards using structured and unstructured data.
- Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
- Used HIVE definition to map the output file to tables.
- Setup and benchmarked Hadoop/HBase clusters for internal use
- Wrote data ingesters and map reduce programs
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance
- Wrote MapReduce/HBase jobs
- Worked with HBase, NOSQL database.
- Apache Storm to process this data from Kafka and eventually persist that data into HDFS and HBase.
- Responsible for troubleshooting and maintaining the accuracy of the jobs running in production.
- Used Sqoop to import the data from databases to Hadoop Distributed File System (HDFS) and performed automated data auditing to validate the accuracy of the loads.
- Involved in loading and transforming large sets of data and analyzed them by running Hive queries.
- Scheduled and executed workflows in Oozie to run Hive and Pig jobs.
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through the persistence of data into HBase.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Created HBase tables to store various data formats coming from different portfolios.
- Worked on running spark job using Maven dependencies.
Environments: Hadoop 2.x, Hive, HDFS, Scala, Spark, NiFi, Storm, Kafka, Sqoop, HBase, Oozie, MapReduce, Pig, Flume, Linux, Java 7, Eclipse, NOSQL, Maven, Cassandra, Putty, CDH 5.7.

Confidential

Sr. Hadoop Developer

Responsibilities:
- Involved in requirement analysis, design, coding and implementation.
- Processed data into HDFS by developing solutions, analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoopto downstream systems.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) process using Talend Integration Suite.
- Implemented Change Data Capture technology in Talend to load deltas to a Data Warehouse.
- Responsible for writing MapReduce programs.
- Established custom Map Reduces programs in order to analyze data and used Pig Latin to cleanunwanted data.
- Auto Populate HDFS with data coming from Flume sink
- Create/ModifyShellscripts for scheduling data cleansing scripts and ETL loading process.
- Created tables, views in Teradata, according to the requirements.
- Implemented Python scripts for auto deployments in AWS.
- Developed Hive queries to analyze reducer output data.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Used Sqoop to import the data to HadoopDistributed File System (HDFS) from RDBMS.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition,Bucketing in the hive and Map Side joins.
- Used IMPALA to analyze the data present in Hive tables.
- Worked on Teradata Query man to validate the data in warehouse for sanity check
- Designed and developed REST web service for validating the address.
- Writing the recurring workflows using Oozie to automate the scheduling flow.
- Addressing the issues occurring due to the huge volume of data and transitions.
- Migration of database objects from previous versions to the latest releases using latest data pump methodologies, when the solution was upgraded.
- Worked on ingesting the data from Amazon S3 buckets to podium data repository.
- Build and run our RESTful web services using maven repository.
- Setup Jenkins on Amazon EC2 servers and configured the notification server to Jenkin server for any changes to the repository.
- Supported in Production rollout which includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
- Designed, documented operational problems by following standards and procedures using JIRA.
Environment: HDFS, Hadoop, Pig, Hive, HBase, Sqoop, Talend, Flume, Map Reduce, Podium Data, Oozie, Java 6/7, Oracle 10g, YARN, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Maven, Agile Methodology, JIRA, Auto Sys.

Confidential

Hadoop Developer

Responsibilities:
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
- Developed Sqoop scripts to import/export data from Oracle to HDFS and into Hive tables.
- Worked on analyzing Hadoop clusters using Big Data Analytic tools including Map Reduce, Pig and Hive.
- Involved in developing and writing Pig scripts and to store unstructured data into HDFS.
- Involved in creating tables in Hive and writing scripts and queries to load data into Hive tables from HDFS.
- Scripted complex Hive QL queries on Hive tables for analytical functions.
- Optimized the Hive tables utilizing improvement techniques like partitions and bucketing to give better execution Hive QL queries.
- Worked on Hive/Hbase vs RDBMS, imported data to the hive, created internal and external tables, partitions, indexes, views, queries and reports for BI data analysis.
- Developed Java custom record reader, partition and serialization techniques.
- Used different data formats (Text format and Avro format) while loading the data into HDFS.
- Created tables in HBase and loading data into HBase tables.
- Developed scripts to load data from HBase to Hive Meta store and perform Map Reduce jobs.
- Created custom UDF’s in Pig and Hive.
- Worked Extensively on Cloudera Distribution.
- Involved in loading data into HDFS from Teradata using Sqoop
- Experienced in moving huge amounts of log file data from different servers
- Worked on implementing complex data transformations using MapReduce framework.
- Involved in generating structured data through MapReduce jobs and have stored them in Hive tables and
- Worked on MapReduce programs to cleanse and pre-process data from various different sources.
- Worked on Sequence files and Avro files in map Reduce programs.
- Created partitioned tables and loaded data using both static partition and dynamic partition methods.
- Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
- Designed and developed Dashboards for Analytical purposes using Tableau.
- Ran JSON scripts using Java with Maven repository.
- Used Jenkins for mapping the maven and the source tree.
- Analyzed the Hadoop log files using Pig scripts to oversee the errors.
Environment:HDFS, Map Reduce, Hive, Sqoop, Pig, HBase, Oozie, CDH distribution, Java, Eclipse, Shell Scripts, Tableau, Windows, Linux.

Confidential

Java/J2EE Developer with Hadoop

Responsibilities:
- Performed requirement gathering, design, coding, testing, implementation and deployment.
- Worked on modeling of Dialog process, Business Processes and coding Business Objects, Query Mapper and JUnit files.
- Installed & Configured Oracle Golden gate 11g using Integrated Extracts &Replicates.
- Monitored the Oracle Golden Gate processes and checking the performance using the Golden Gate Director.
- Involved in the design and creation of Class diagrams, Sequence diagrams and Activity Diagrams using UML models
- Created the Business Objects methods using Java and integrating the activity diagrams.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Used JavaScript to perform client side validations and Struts Validator Framework for server-side validation
- Worked in web services using SOAP, WSDL.
- Wrote Query Mappers and MQ Experience in JUnit Test Cases.
- Developed the UI using XSL and JavaScript.
- Managed software configuration using ClearCase and SVN.
- Design, develop and test features and enhancements.
- Performed error rate analysis of production issues and technical errors.
- Developed test environment for testing all the Web Service exposed as part of the core module and their integration with partner services in an Integration test.
- Analyze user requirement document and develop test plan, which includes test objectives, test strategies, test environment, and test priorities.
- Assisted with the development of the call center's operations, quality and training processes. (Enterprise Contact Center Services)
- Responsible for performing end-to-end system testing of application writing JUnit test cases
- Perform Functional testing, Performance testing, Integration testing, Regression testing, Smoke testing and User Acceptance Testing (UAT).
- Used Jenkins for building and configuring the Java application using Maven.
- Converted Complex SQL queries running on mainframes into pig and Hive as a part of a migration from mainframes into Hadoop cluster.
Environment: Shell Scripting, Java 6, JEE, Spring, Hibernate, Eclipse, Oracle 10g, JavaScript, Servlets, NodeJS, JMS, Ant, Log4j and Junit, Hadoop (Pig & Hive), Oracle Golden Gate 11g

Confidential

Java Developer

Responsibilities:
- Developed MVC design pattern based User Interface using JSP, XML, Prime faces 5.1, HTML, CSS and Struts.
- Involved in the design and development phases of Scrum Agile Software Development.
- Responsible for creating the detailed design and technical documents based on the business requirements.
- Used Struts validator framework to validate user input.
- Creating activity diagrams, Class diagrams and Sequence diagrams for the tasks.
- Used spring framework configuration files to manage objects and to achieve dependency injection.
- Involved in implementing DAO pattern for database connectivity and Hibernate for object persistence.
- Configure Batch jobs, Job steps, job listners, readers, writers and tasklets using spring batch.
- Integrate spring batch and apache camel using spring xml to define service beans, batch jobs, camel routes, camel end points
- Applied Object Oriented Programming (OOP) concepts (including UML use cases, class diagrams, and interaction diagrams).
- Used JQuery for creating JavaScript behaviors.
- Developed utility classes, which allows easy translation from XML to Java and back and also Property Reader to read properties from a flat file.
- Used Java/J2EE Design patterns like Business Delegate and Data Transfer Object (DTO).
- Developed window layouts and screen flows using Struts Tiles.
- Used Ajax, JSTL and JavaScript, JSF & Struts frameworks in front end design.
- Hands on experience with various frond-end technologies JavaScript, JQuery and different versions of Angular JS. Experience in all aspects of Angular JS like "Routing", "modularity", "Dependency injection", "Service calls" and "Custom directives" for development of single page applications.
- Used ANT Script to build WAR and EAR files and deployed on WebSphere.
- Used XML, XSL for Data presentation, Report generation and customer feedback documents.
- Used Java Beans to automate the generation of Dynamic Reports and for customer transactions.
- Developed JUnit test cases for regression testing and integrated with ANT build.
- Implemented Logging framework using Log4J.
- Junit, log4j were used for unit testing and as logging frameworks.
- Involved in Iterative development using Agile Process.
- Used SVN for version control of the source code.
- Created Web services using Apache Axis 2 for communication with other application.
- Created and executed unit and regression test scripts; created personal and common test data, tracked actual vs. expected results, and evaluated the quality of modules created.
Environment: Java/J2EE, JSP, Servlets, Spring, Hibernate, WebSphere Application Server 6.x/7.x, Struts 2, XML Web service, SOAP, SOA, JAX-WS, Linux, UML, Unix, JNDI, Drools, MySQL, JavaScript, JQuery, SVN, XML, XSLT, Eclipse IDE, AJAX, DB Visualizer, Angular JS, JavaScript, Spring Batch, Apache Ant 1.7, JDBC, Windows XP, JUnit 3.8, Log4j, CSS, CVS, Apache Axis 2, Apache Jack Rabbit.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship