Spark/Hadoop Developer Resume VA - Hire IT People

SUMMARY

Cloudera Certified Developer for Apache Hadoop with over 9 years of experience in designing, implementing and executing client - server and analytical applications using Hadoop, Spark, Apache NiFi, Kafka, Hive, Hbase, AWS, Java, Scala, Python, REST Services and Micro Service technologies.
Designed and developed custom processers and data flow pipelines between systems using flow-based programming in Apache NiFi, extensive experience in using NiFis web based UI.
Extensive experience working with real time streaming applications and batch style large scale distributed computing applications, worked on integrating Kafka with NiFi and Spark.
Expertise in fine tuning Spark, Hive, MR applications with the available cluster capacity.
Developed re-usable and configurable components as part of project requirements in java, scala and python.
Sound knowledge of Scala's functional style programming techniques like Anonymous Functions (Closures), Currying, Higher Order Functions and Pattern Matching.
Delivered POCs on innovative business ideas and presented them to the technology leaders. Mentored other team members to get them up to the speed in challenging projects.
Used python scripting for automation and deep has understanding of advanced data analysis libraries like NumPy, Pandas, Matplotlib.
Hands on experience in advanced features of Java8 like Lambda Expressions and Streams.
Expertise in working with different kind of data formats like Avro, Parquet and ORC.
Worked with Kafka API to develop Publisher, Subscriber components
Developed component to upload/download from AWS S3 based on project specific configurations
Developed machine learning POCs using R programming and Python modules for data analytics.
Developed end to end POC from Data Ingestion to Data Transformation to Data Quality to Data Lineage for Big Data Platform.
Strong knowledge of machine learning algorithms like Linear Regression, Logistic Regression, Decision Tree, KNN, Holt-Winters, SVM and K-Means.
Has excellent Relational Database understanding and experience with Oracle 10g/11i and MySQL.
Good command on algorithms and data structures, active user at DZone and other technical forums.
Experienced in using Version Control Tools like SVN, GIT. Aced in build tools like Maven. Worked in multi module applications maven.
Participated in entire Software Development Life Cycle including Requirement Analysis, Design, Development, Testing, Implementation, Documentation and Support of software applications.
Experience working with Agile Methodologies including SCRUM and Test-Driven Development.
Strong work ethic with desire to succeed and make significant contributions to the organization. Strong problem solving skills, good communication, interpersonal skills and a good team player.

TECHNICAL SKILLS

Languages: Java, Scala, Python and R(beginner), Unix Scripting, SQL/PLSQL

BigData Ecosystem: Hadoop, Spark (pyspark and scala), NiFi, HDFS, Map Reduce, PIG, Hive, HBase, Kafka, Zeppeline, Oozie, Sqoop, Avro, Parquet

Web Technologies: Java/JEE, Django, REST, Micro Services, Play, JSP, JSTL, XML, XSL, AJAX, JDBC, Angular JS, Bootstrap, JQuery

Machine Learning: Holt-Winter, K Nearest Neighbors, Similarity Matrix, Decision Trees, Spark

Tools: Intellij idea, Eclipse, Scala IDE, Pycharm, Spring Tool Suite, Apache Lucene, Log4J, Filezilla, WinSCP, IzPack, Avro-Tools

XML Technologies: PyXB, XSD, XML, XSLT, XPath, SAX, DOM and JAXB

Operating Systems: Unix, Linux, Solaris, Mac, Windows 10/7/XP

Database: MySQL, MySQL WorkBench and Oracle 11i/10g, SQL Developer, TOAD

Servers: Apache Tomcat, JBoss

Job Scheduling: Apache NiFi, Autosys, ACE

Editors: vim, Sublime, Intellij idea, Eclipse, Scala IDE, Pycharm, Jupyter/Ipython notebooks

Build Tools: Maven (multi module in java/scala), DistUtils/SetupTools(Python)

Code Version Ctrl: Git, GitHub, Bitbucket, Stash.

Continuous Integration: Bamboo, Junit, EasyMock, Mockito, Jenkins

PROFESSIONAL EXPERIENCE

Confidential, VA

Spark/Hadoop Developer

Responsibilities:

Involved in architectural design meetings of product involving all related stakeholders
Contributed to the design and implementation of the product.
Developed custom NiFi processer to convert custom source format to JSON, to upload and download files from AWS S3 based on project specific configurations.
Developed custom NiFi processer to upload and download files from AWS S3 based on project specific configurations.
Developed custom NiFi processer to create weekly-full/daily-incremental data extracts for reporting consumption based on configurations.
Developed custom NiFi processer to launch spark processers based on the source file ingested.
Developed Spark/Scala code to consume data in Batch and Streaming mode based on the type of source defined in config definition.
Developed file tracker component (Java/Scala/Spark) to track the status of the files ingested at each step of workflow.
Developed rules engine component Java/Scala to apply business rules to each record based on the source file ingested.
Developed reusable component (commons module) consisting of several user defined functions used across different modules in the project.
Developed configurations module to drive end-to-end (ingestion/pre-processing/processing/loading) data from different sources based on xml/json configuration files.
Ingested data from different sources like S3, SFTP, Custom data APIs and RDBMS.
Handled different formats of data like Avro, JSON, XML, CSV and STIFF(Custom format).
Developed Spark code to process data from different sources and store into Hive/Hbase (Data is pre-processed and stores in HDFS using NiFi before spark consumption).
Designed and Executed modules of BDP like Ingestion, Organization, Transformation and Data Quality.
Developed powershell scripts for auto-download of extract files from AWS S3 to Tableau server and trigger tableau refresh.
Developed component to execute data quality checks on data related to different domains on daily basis, runs based on settings in a configuration file.

Confidential, Philadelphia, PA

Spark/Hadoop Developer

Responsibilities:

Contributed to the design and implementation of RDS and EDT
Developed Change Data Capture component using Spark/Scala
Developed Bi-Temporal CDC component (Captures the change with Bi-Temporal attributes) to store the change in two time dimensions.
Worked on Metadata capture component to capture the change in metadata of new feeds
Worked on Schema evolution component to deal with the changing schema.
Worked on rules engine to detect changes which are not acceptable and triggers alerts to the source system
Worked on 3NF to Dimensional model converter with change data capture.
Used configurable XML as inputs to the products; JAXB/SAX/DOM to marshal/unmarshal the XML config files.
Worked for the development of Enterprise Data Transfer (EDT) for IMS
Developed micro services using Akka HTTP, as per the application requirements
Developed the micro services for finding the active name node and generating Key tabs
Worked with Kafka API to develop Publisher, Subscriber components
Developed consumer for 3NF data(Avro) from source Kafka topics and persist the data in landing area in HDFS
Developed cron jobs for monitoring and logging.
Deployed AKKA cluster to production using Lightbend ConductR
Worked on data load (Drug Distribution Data) project, developed python scripts to automate end-to-end flow.
Worked on loading files from Mainframes to Datalake using Impala SQL, and Python Scripts.
Used sqoop for history load of eight years worth of fact data.
Converted Netezza business logic SQL to Impala SQL
Developed shell/python scripts for RI check of load process of multiple dimensions.
RI validation reports are sent to concerned departments using Oozie email action.

Confidential, Cupertino, CA

Hadoop Developer

Responsibilities:

Worked on migration of data across Hadoop clusters as part of DR Cluster Prineville setup initial phase(CR1 & CR2).
Developed python automation scripts for extracting DDL scripts from the source system and executing the statements on new cluster
Developed validation scripts for verification of the data copied to new cluster by comparing with the stats took at the source system.
Worked on developing Export/Import scripts for hive tables and data migration to avoid the manual intervention.
Experienced in Data/Application Migration. Has excellent insight into the challenges involved with such projects.
Understanding business logic of Teradata procedures.
Identifying source systems and develop hive scripts to point those source systems.
Work with framework team in finalizing the Metadata setup for enabling F2C on Phoenix
Configure clusters by using ANSIBLE playbooks
Written automation scripts for migrating the data using distcp and distcp2, Worked on optimizing the process of migration.
Migrating data from Teradata to Hive by using Sqoop.
Converting the existing relational database model to Hadoop ecosystem.
Developed HiveQL scripts by deriving logic from Teradata stored procedures.
Optimizing Hive queries by implementing SMB Joins and Map Side joins on partitioned and bucketed datasets.
Compare the Hive queries output to existing data model outputs.
Customizing batch Java programs & Shell script development.
Scheduling hive scripts to run hive queries in regular intervals.
Worked on python and shell scripting to create automation scripts.
Written Sqoop incremental import job to move new / updated info from Database to HDFS and Hive.

Confidential

Oracle Developer

Responsibilities:

Understanding the systems requirement and functional design.
Coding modules and following the Scripting specifications and standards.
Developing Shell Scripts for Automation of Job Submission.
Implemented alerting mechanism using shell scripts as existing system reached its end of life
Packaging and Building Beta/Production /FT Releases.
Deploying the Quarterly Releases and FT Releases.
Developing shell scripts for monitoring the business critical streams.
Upgradation of ACE (Job Scheduling Tool).
Placing Blackouts/Outages on Servers being upgraded
Installation and Configuration of ACE software on Solaris 10
Upgradation/Configuration of Mailbox(SFTP File Transfer Tool) software
Testing functionality of upgraded systems.
Post Upgrade Support for the upgraded systems.
Analysis of existing Incoming/Outgoing Interfaces.
Implementation of RFS/SPIDER (Custom ETL tool).
Configuration of RFS/SPIDER Transmissions using Oracle Forms.
Implementation/Configuration of ACE (Job Scheduling Tool).
Configuring Streams, Programs and Batches.
Post Implementation System Support.
Documentation of Changes in Process.

Confidential

Oracle Developer

Responsibilities:

Archiving using UNIX tools when system space usage reaches the threshold limit.
Printer Management on Solaris Environment.
Monitoring Customized Namespaces like Load, Hold and Error.
Supporting the Customized interfaces used for IN/OUT dataflow.
Registering, Scheduling and Monitoring the Jobs, Batch Processes running in Scheduling Tool.
Tracking and resolving the tickets within SLA.
Applying the Release during the release cycle.
Participated in planned DR test.
Installation and Configuration of ACE software on Solaris 10
Configuration of ACE job scheduling tool.
Developed the UNIX shell script to monitor daily failure process and take the appropriate action to automate dis process. dis added a significant reduction in ticket and highly appreciated by the customer.
Configuration of Job Failure Alerting Systems.
Implementation/Configuration of Mailbox (SFTP File Transfer Tool) software.
Configuration of Mailboxes and related Components.

Confidential

Java Developer

Responsibilities:

Involved in Development and Support phases of Software Development Life Cycle (SDLC).
Participated in gathering business requirements of software for High Level Design.
Supported various sub-projects and communicated with support teams, leads and other cross teams.
Used XML Spy to design XML Schema (XSD) and WSDL.
Used JAXB to marshal/unmarshal to/from XML from/to Java Objects.
Spring-Ioc Container was used for dependency injection.
Used Spring-JDBC and Spring DAO support to persist POJO objects into database.
Used SoapUI as a test tool for testing Soap and REST Web services.
Used Maven to clean, compile, build, install, deploy and manage jar and war archives.
Used Log4J for debugging and error logging purpose.
Used Prismy as a Defect/Base Change Tracking System within the team to enhance the service.
Developed Unit and Functional Test cases using SoapUI.

We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship