We provide IT Staff Augmentation Services!

Hadoop Spark Consultant Resume

2.00/5 (Submit Your Rating)

Charlotte, NC

PROFESSIONAL SUMMARY:

  • 8+ Years of Professional IT experience in Big Data, Hadoop, Java /J2EE and Cloud technologies in Financial, Retail and HealthCare domains.
  • 4+ years of experience working with large data sets and distributed computing on Hadoop and BigData technologies.
  • Experienced in building high performance and scalable solutions using various Hadoop ecosystem components like Yarn, Hive, Sqoop, Pig, Spark, Nifi, and Kafka.
  • Handled Data Movement, Data transformation, Analysis and visualization across the Data Lake by integrating the DataLake with various tools.
  • Extensively worked on Spark 1.6 & 2.1 and its components like Spark SQL and Spark streaming for data manipulation preparation and cleansing.
  • Defined extract - translate-load (ETL) and extract-load-translate (ELT) processes for the Data Lake and experienced working with ETL tools like Talend.
  • Experienced in writing Spark applications in Spark SQL library by using several inbuilt and user defined functionalities.
  • Extensively worked on Hive, Sqoop, and Pig for analysis, transformation, and management of Structured and Semi-Structured data.
  • Very good understanding and Working Knowledge of Object Oriented Programming (OOPS) and programming languages Scala and Python.
  • Worked on all major distributions of Hadoop Cloudera (CDH4, CDH5) and Hortonworks (HDP 2.4, 2.6).
  • Good knowledge of analyzing streaming data and defined real-time data streaming solutions across the cluster using Spark Streaming, Apache Storm, Kafka, and Flume.
  • Configured AWSEC2 instances, S3Buckets, Cloud services and architected the flow of data to and from AWS.
  • Transformed and aggregated data for analysis by implementing workflow management of Sqoop, Hive, and scripts.
  • Experienced in ETL operations from RDBMS databases like Oracle and Teradata to Hadoop and perform required analysis on the data.
  • Experience working on different file formats like Avro, Parquet, ORC, Sequence and Compression techniques like Gzip, Lzo, snappy in Hadoop.
  • Good experience working on Tableau and Spotfire and enabled the JDBC/ODBC data connectivity from those to Hive tables.
  • Well versed in SQL/PL SQL and Oracle database in writing queries, stored procedures, triggers, and functions.
  • Expertise in Unix/Linux environment in writing scripts and schedule or execute jobs.
  • Experience in developing Applications using Java, J2EE, JSP, MVC, Servlets, Struts, Hibernate, JDBC, JSF, EJB, XML, AJAX, and web-based development tools.
  • Expertise in Web Technologies like HTML, CSS, PHP, XML.
  • Worked on various Tools and IDEs like Eclipse, IBM Rational, Apache Ant-Build Tool, MS-Office, PLSQL Developer, SQL*Plus.
  • Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of the profession.

TECHNICAL SKILLS:

Big Data / Hadoop: HDFS, MapReduce, HBase, Kafka, PIG, HIVE, Sqoop, Flume, Spark, Zeppelin and Nifi

Realtime/Stream Processing: Apache Storm, Apache Spark

Cloud Technologies: Amazon web services, EC2, S3, EMR, Redshift

Operating Systems: Windows, Unix, and Linux

Programming Language: C, Java, Scala, J2EE, SQL

Data Base: Oracle 9i/10g, SQL Server, Teradata

Web Technologies: HTML, XML, JavaScript

Development & Build Tools: Eclipse, IntelliJ IDE, SBT and Gradle

Methodologies: Agile, Scrum, and Waterfall

PROFESSIONAL EXPERIENCE:

Confidential, Charlotte, NC

Hadoop Spark Consultant

Responsibilities:

  • Involved in design and development of Marketing Data Application in Hadoop and Spark Environment.
  • Data ingestion from data sourcing layers like Teradata and Oracle databases into HDFS and Hive by Scoop Import.
  • Developed Spark on Scala ETL data pipelines for data movement from landing HDFS to the target HDFS directory after transformation and analysis.
  • Performed and implemented Change Data Capture (CDC) model on customer behavioral and Account related daily and historical data by using Spark SQL on Scala.
  • Archival of Data of the Data sourcing layer and separation of required analytical and production data from the entire Raw data in Hive.
  • Worked extensively with new functionalities of Apache Spark like RDD’s, Data frames and Datasets and various performance tuning techniques.
  • Created Hive Views, External and Managed tables for published and production data for usage for Data Analytics.
  • Wrote Hive scripts for target table creation and maintained the optimum and necessary partition and bucket strategies for performance improvement.
  • Used Apache Pig and wrote Pig scripts for series of data operations like cleansing and research on Raw Data and iterative data processing.
  • Used Apache Nifi to automate and manage data movement between the base data layer to the target layer.
  • Involved in finalizing several performance tuning methods to optimise the Spark jobs.
  • Wrote Shell Scripts for the execution and deployment of developed Spark - Scala code.
  • Used Apache Zeppelin notebook for data discovery, visualizations, and analysis.
  • Used Gradle as build tool in IntelliJ Idea and SVN as a version control tool for the code generation.
  • Defect rectification during the Testing phase and supported QA’s during testing to identify the source of defects.

Environment: Hadoop 2.6, HDFS, YARN, Hive, Sqoop, Pig, Spark 2.1, Scala, SVN, IntelliJ, Gradle, AutoSys, UNIX, and Teradata.

Confidential, Sunnyvale, CA

Hadoop Consultant

Responsibilities:

  • Ingested data from various RDBMS database sources like Oracle and Teradata into HDFS and Hive using Sqoop import.
  • Developed and Created Partition and Bucket strategies for Managed and External tables for optimizing HIVE performance.
  • Handled large datasets by Spark using functionalities like Broadcast Joins, repartitions, persistence and degree of parallelism to optimise performance.
  • Involved in Data flow management, data recovery and secure data flow by prioritized queuing using Apache Nifi.
  • Experience in writing customized UDF's in java to extend Hive and Pig Latin functionality.
  • Used REST API’s to establish communication between external applications and Hadoop and implemented Restful services to pull data from the database tables.
  • Developed a Data pipeline using Kafka and Storm to store data into HDFS and performed real-time analysis of the incoming data.
  • Configured Spark Streaming to receive real-time data from the Kafka and store the stream data to HDFS using Scala.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Involved in scheduling Oozie workflow engine to run multiple Hive, Pig, and Spark jobs.
  • Wrote Shell Scripts for the execution and deployment of developed Spark code.
  • Defect rectification during the Testing phase and supported QA’s during testing

Environment: Hadoop, Yarn, HDFS, Pig, Hive, Sqoop, AWS, LINUX, Spark 1.6, Kafka, Hbase, and UNIX, IntelliJ

Confidential, Atlanta, GA

Sr. Hadoop Developer

Responsibilities:

  • Worked on performance analysis and improvements for Hive and Pig scripts at MapReduce job tuning level.
  • Used Sqoop to load data from RDBMS databases like Oracle and MySQL into Hadoop File System.
  • Worked on several POCs to validate and fit the several Hadoop ecosystem tools on CDH and Hortonworks distributions
  • Designed and Implemented Error-Free Data Warehouse-ETL and Hadoop Integration.
  • Proficient in data modeling with Hive partitioning, bucketing, and other optimization techniques.
  • Developed Python scripts to automate and provide Control flow to Pig scripts for extracting the data and load into HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Set up standards and processes for Hadoop based application design and implementation.
  • Wrote Shell Scripts for several day-to-day processes and worked on their automation and deployment.
  • Collected the logs data from web servers and integrated into HDFS using Flume.
  • Implemented Fair Schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Worked on establishing connectivity between Tableau and Spotfire.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Linux, UNIX.

Confidential, King of Prussia, PA

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop .
  • Collection and Downloading of data generated by sensors from the Patients body activities to HDFS.
  • Performed necessary transformations and aggregation to build the common learner data model in NoSQL store (HBase).
  • Used Pig , Hive, and MapReduce for analyzing the Health insurance data and patient information.
  • Developed workflow in Oozie to orchestrate a series of Pig scripts to remove, merge and compress files using pig pipelines in the data preparation stage.
  • Moving all log files generated from various sources to HDFS for further processing through Flume .
  • Extensively used PIG to communicate with Hive and HBase using Hcatalog and Handlers.
  • Involved in transforming data from legacy tables to HDFS , and HBase tables using Sqoop .
  • Implemented test scripts to support the test-driven development and continuous integration.
  • Exported analyzed data to relational databases using Sqoop for visualization and generate reports for the BI team.
  • Good understanding of ETL tools and their application to Big Data environment.

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Oozie, Java, HBase, Flume, Oracle 10g, UNIX Shell Scripting.

Confidential, St Petersburg, FL

Java Developer

Responsibilities:

  • Designed the application in J2EE architecture and developed dynamic and browser compatible User Interfaces for online account management, order and payment processing.
  • Used Hibernate Object-relational mapping (ORM) to achieve data persistence.
  • Developed Servlets and JSPs based on MVC pattern using Spring Framework.
  • Developed required helper classes following Core Java multi-threaded programming.
  • Developed the presentation layer using JSP, Tag libraries, HTML, CSS and client validations using JavaScript.
  • Designed and developed Web services based on SOAP and WSDL for handling transaction history.
  • Developed web applications using Spring MVC, jQuery and implemented Spring Dependency Injection mechanism.
  • Developed data access classes using JDBC and created SQL queries and used PL/SQL procedures with Oracle Database.
  • Used LOG4J & JUnit for debugging, testing and maintaining the system state and tested the website with older and latest versions/releases on multiple browsers.
  • Implemented test cases for Unit testing of modules using JUnit and used ANT for building the project.
  • Provided production support for two of the applications involving swing and struts framework.

Environment: JDK 1.6, JSP, HTML, JavaScript, JSON, XML, jQuery, Servlets, Spring MVC, Hibernate, Web Services, SOAP, NetBeans.

Confidential, Charlotte, NC

Java Developer

Responsibilities:

  • Worked with Business analysts and Product owners to analyze and understand the requirements and giving the estimates.
  • Implement J2EE design patterns such as Singleton, DAO, DTO and MVC.
  • Developed this web application to store all system information in a central location using Spring MVC, JSP, Servlet, and HTML.
  • Designed and developed database objects like Tables, Views, Stored Procedures, User Functions using PL/SQL, SQL Developer and used them in WEB components.
  • Developed JavaScript and JQuery functions for all Client side Validations.
  • Developed Junit test cases for Unit Testing &Used Maven as build and configuration tool.
  • Used Shell scripting to create jobs to run on daily basis.
  • Debugged the application using Firebug and traversed through the nodes of the tree using DOM functions.
  • Monitored the error logs using log4jand fixed the problems.
  • Used Eclipse IDE and deployed the application on Web Logic server.
  • Responsible for configuring and deploying the builds on Web Sphere App Server.

Environment: Java, J2EE, JavaScript, XML, JavaScript, JDBC, Spring Framework, Hibernate, Rest Full Web services, Web Logic Server, Log4j, JUnit, ANT, SoapUI, Oracle11g.

Confidential, Plano TX

Java Developer

Responsibilities:

  • Design and development of Java classes using Object Oriented Methodology.
  • Worked in system using Java, JSP and SERVLET.
  • Development of Java classes and methods for handling Data from the database.
  • Created and modified web pages using HTML and CSS with JavaScript validation.
  • Used JDBC/Jconnect for Oracle.
  • Create SQL script to create/drop database objects like tables, views, indexes, constraints, sequences, and synonyms.
  • Developing efficient queries and views to produce customers delight.
  • Creating Servlets, JSP for administration module.
  • Creating Unix Shell Scripts for sequential execution of Java scripts including data extraction, loading and Oracle Stored Procedure execution.
  • Developing many KSH scripts for data file movement and scheduling.
  • Attended and Conducted User meetings for requirement analysis and project reporting.
  • Testing and bug fixing and providing support the production.

Environment: Windows XP, Oracle 9i database, EJB 2.1, JSP, Struts Framework, BEA Web logic 8.1, HTML, JavaScript, and Eclipse.

Confidential

Java Developer

Responsibilities:

  • Collecting and understanding the User requirements and Functional specifications.
  • Development of GUI Using HTML, CSS, JSP, and JavaScript.
  • Creating components for isolated business logic.
  • Deployment of application in J2EE Architecture.
  • Using Oracle 8i as the Database Server.
  • Designing EJB 2.0 components with various design patterns like Service Locator and Business Delegate.
  • Finalize the design specifications for the new system.
  • Involvement in design, development, and maintenance of the application.
  • Performing Unit Integration and performance testing and continuous interaction with Quality Assurance group.
  • Provided on-call support based on the priority of the issues.

Environment: Java, JSP, SQL, MS-Access, JavaScript, HTML.

We'd love your feedback!