We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

0/5 (Submit Your Rating)

VA

SUMMARY

  • Cloudera Certified Developer for Apache Hadoop with over 9 years of experience in designing, implementing and executing client - server and analytical applications using Hadoop, Spark, Apache NiFi, Kafka, Hive, Hbase, AWS, Java, Scala, Python, REST Services and Micro Service technologies.
  • Designed and developed custom processers and data flow pipelines between systems using flow-based programming in Apache NiFi, extensive experience in using NiFis web based UI.
  • Extensive experience working with real time streaming applications and batch style large scale distributed computing applications, worked on integrating Kafka with NiFi and Spark.
  • Expertise in fine tuning Spark, Hive, MR applications with the available cluster capacity.
  • Developed re-usable and configurable components as part of project requirements in java, scala and python.
  • Sound knowledge of Scala's functional style programming techniques like Anonymous Functions (Closures), Currying, Higher Order Functions and Pattern Matching.
  • Delivered POCs on innovative business ideas and presented them to the technology leaders. Mentored other team members to get them up to the speed in challenging projects.
  • Used python scripting for automation and deep has understanding of advanced data analysis libraries like NumPy, Pandas, Matplotlib.
  • Hands on experience in advanced features of Java8 like Lambda Expressions and Streams.
  • Expertise in working with different kind of data formats like Avro, Parquet and ORC.
  • Worked with Kafka API to develop Publisher, Subscriber components
  • Developed component to upload/download from AWS S3 based on project specific configurations
  • Developed machine learning POCs using R programming and Python modules for data analytics.
  • Developed end to end POC from Data Ingestion to Data Transformation to Data Quality to Data Lineage for Big Data Platform.
  • Strong knowledge of machine learning algorithms like Linear Regression, Logistic Regression, Decision Tree, KNN, Holt-Winters, SVM and K-Means.
  • Has excellent Relational Database understanding and experience with Oracle 10g/11i and MySQL.
  • Good command on algorithms and data structures, active user at DZone and other technical forums.
  • Experienced in using Version Control Tools like SVN, GIT. Aced in build tools like Maven. Worked in multi module applications maven.
  • Participated in entire Software Development Life Cycle including Requirement Analysis, Design, Development, Testing, Implementation, Documentation and Support of software applications.
  • Experience working with Agile Methodologies including SCRUM and Test-Driven Development.
  • Strong work ethic with desire to succeed and make significant contributions to the organization. Strong problem solving skills, good communication, interpersonal skills and a good team player.

TECHNICAL SKILLS

Languages: Java, Scala, Python and R(beginner), Unix Scripting, SQL/PLSQL

BigData Ecosystem: Hadoop, Spark (pyspark and scala), NiFi, HDFS, Map Reduce, PIG, Hive, HBase, Kafka, Zeppeline, Oozie, Sqoop, Avro, Parquet

Web Technologies: Java/JEE, Django, REST, Micro Services, Play, JSP, JSTL, XML, XSL, AJAX, JDBC, Angular JS, Bootstrap, JQuery

Machine Learning: Holt-Winter, K Nearest Neighbors, Similarity Matrix, Decision Trees, Spark

Tools: Intellij idea, Eclipse, Scala IDE, Pycharm, Spring Tool Suite, Apache Lucene, Log4J, Filezilla, WinSCP, IzPack, Avro-Tools

XML Technologies: PyXB, XSD, XML, XSLT, XPath, SAX, DOM and JAXB

Operating Systems: Unix, Linux, Solaris, Mac, Windows 10/7/XP

Database: MySQL, MySQL WorkBench and Oracle 11i/10g, SQL Developer, TOAD

Servers: Apache Tomcat, JBoss

Job Scheduling: Apache NiFi, Autosys, ACE

Editors: vim, Sublime, Intellij idea, Eclipse, Scala IDE, Pycharm, Jupyter/Ipython notebooks

Build Tools: Maven (multi module in java/scala), DistUtils/SetupTools(Python)

Code Version Ctrl: Git, GitHub, Bitbucket, Stash.

Continuous Integration: Bamboo, Junit, EasyMock, Mockito, Jenkins

PROFESSIONAL EXPERIENCE

Confidential, VA

Spark/Hadoop Developer

Responsibilities:

  • Involved in architectural design meetings of product involving all related stakeholders
  • Contributed to the design and implementation of the product.
  • Developed custom NiFi processer to convert custom source format to JSON, to upload and download files from AWS S3 based on project specific configurations.
  • Developed custom NiFi processer to upload and download files from AWS S3 based on project specific configurations.
  • Developed custom NiFi processer to create weekly-full/daily-incremental data extracts for reporting consumption based on configurations.
  • Developed custom NiFi processer to launch spark processers based on the source file ingested.
  • Developed Spark/Scala code to consume data in Batch and Streaming mode based on the type of source defined in config definition.
  • Developed file tracker component (Java/Scala/Spark) to track the status of the files ingested at each step of workflow.
  • Developed rules engine component Java/Scala to apply business rules to each record based on the source file ingested.
  • Developed reusable component (commons module) consisting of several user defined functions used across different modules in the project.
  • Developed configurations module to drive end-to-end (ingestion/pre-processing/processing/loading) data from different sources based on xml/json configuration files.
  • Ingested data from different sources like S3, SFTP, Custom data APIs and RDBMS.
  • Handled different formats of data like Avro, JSON, XML, CSV and STIFF(Custom format).
  • Developed Spark code to process data from different sources and store into Hive/Hbase (Data is pre-processed and stores in HDFS using NiFi before spark consumption).
  • Designed and Executed modules of BDP like Ingestion, Organization, Transformation and Data Quality.
  • Developed powershell scripts for auto-download of extract files from AWS S3 to Tableau server and trigger tableau refresh.
  • Developed component to execute data quality checks on data related to different domains on daily basis, runs based on settings in a configuration file.

Confidential, Philadelphia, PA

Spark/Hadoop Developer

Responsibilities:

  • Contributed to the design and implementation of RDS and EDT
  • Developed Change Data Capture component using Spark/Scala
  • Developed Bi-Temporal CDC component (Captures the change with Bi-Temporal attributes) to store the change in two time dimensions.
  • Worked on Metadata capture component to capture the change in metadata of new feeds
  • Worked on Schema evolution component to deal with the changing schema.
  • Worked on rules engine to detect changes which are not acceptable and triggers alerts to the source system
  • Worked on 3NF to Dimensional model converter with change data capture.
  • Used configurable XML as inputs to the products; JAXB/SAX/DOM to marshal/unmarshal the XML config files.
  • Worked for the development of Enterprise Data Transfer (EDT) for IMS
  • Developed micro services using Akka HTTP, as per the application requirements
  • Developed the micro services for finding the active name node and generating Key tabs
  • Worked with Kafka API to develop Publisher, Subscriber components
  • Developed consumer for 3NF data(Avro) from source Kafka topics and persist the data in landing area in HDFS
  • Developed cron jobs for monitoring and logging.
  • Deployed AKKA cluster to production using Lightbend ConductR
  • Worked on data load (Drug Distribution Data) project, developed python scripts to automate end-to-end flow.
  • Worked on loading files from Mainframes to Datalake using Impala SQL, and Python Scripts.
  • Used sqoop for history load of eight years worth of fact data.
  • Converted Netezza business logic SQL to Impala SQL
  • Developed shell/python scripts for RI check of load process of multiple dimensions.
  • RI validation reports are sent to concerned departments using Oozie email action.

Confidential, Cupertino, CA

Hadoop Developer

Responsibilities:

  • Worked on migration of data across Hadoop clusters as part of DR Cluster Prineville setup initial phase(CR1 & CR2).
  • Developed python automation scripts for extracting DDL scripts from the source system and executing the statements on new cluster
  • Developed validation scripts for verification of the data copied to new cluster by comparing with the stats took at the source system.
  • Worked on developing Export/Import scripts for hive tables and data migration to avoid the manual intervention.
  • Experienced in Data/Application Migration. Has excellent insight into the challenges involved with such projects.
  • Understanding business logic of Teradata procedures.
  • Identifying source systems and develop hive scripts to point those source systems.
  • Work with framework team in finalizing the Metadata setup for enabling F2C on Phoenix
  • Configure clusters by using ANSIBLE playbooks
  • Written automation scripts for migrating the data using distcp and distcp2, Worked on optimizing the process of migration.
  • Migrating data from Teradata to Hive by using Sqoop.
  • Converting the existing relational database model to Hadoop ecosystem.
  • Developed HiveQL scripts by deriving logic from Teradata stored procedures.
  • Optimizing Hive queries by implementing SMB Joins and Map Side joins on partitioned and bucketed datasets.
  • Compare the Hive queries output to existing data model outputs.
  • Customizing batch Java programs & Shell script development.
  • Scheduling hive scripts to run hive queries in regular intervals.
  • Worked on python and shell scripting to create automation scripts.
  • Written Sqoop incremental import job to move new / updated info from Database to HDFS and Hive.

Confidential

Oracle Developer

Responsibilities:

  • Understanding the systems requirement and functional design.
  • Coding modules and following the Scripting specifications and standards.
  • Developing Shell Scripts for Automation of Job Submission.
  • Implemented alerting mechanism using shell scripts as existing system reached its end of life
  • Packaging and Building Beta/Production /FT Releases.
  • Deploying the Quarterly Releases and FT Releases.
  • Developing shell scripts for monitoring the business critical streams.
  • Upgradation of ACE (Job Scheduling Tool).
  • Placing Blackouts/Outages on Servers being upgraded
  • Installation and Configuration of ACE software on Solaris 10
  • Upgradation/Configuration of Mailbox(SFTP File Transfer Tool) software
  • Testing functionality of upgraded systems.
  • Post Upgrade Support for the upgraded systems.
  • Analysis of existing Incoming/Outgoing Interfaces.
  • Implementation of RFS/SPIDER (Custom ETL tool).
  • Configuration of RFS/SPIDER Transmissions using Oracle Forms.
  • Implementation/Configuration of ACE (Job Scheduling Tool).
  • Configuring Streams, Programs and Batches.
  • Post Implementation System Support.
  • Documentation of Changes in Process.

Confidential

Oracle Developer

Responsibilities:

  • Archiving using UNIX tools when system space usage reaches the threshold limit.
  • Printer Management on Solaris Environment.
  • Monitoring Customized Namespaces like Load, Hold and Error.
  • Supporting the Customized interfaces used for IN/OUT dataflow.
  • Registering, Scheduling and Monitoring the Jobs, Batch Processes running in Scheduling Tool.
  • Tracking and resolving the tickets within SLA.
  • Applying the Release during the release cycle.
  • Participated in planned DR test.
  • Installation and Configuration of ACE software on Solaris 10
  • Configuration of ACE job scheduling tool.
  • Developed the UNIX shell script to monitor daily failure process and take the appropriate action to automate dis process. dis added a significant reduction in ticket and highly appreciated by the customer.
  • Configuration of Job Failure Alerting Systems.
  • Implementation/Configuration of Mailbox (SFTP File Transfer Tool) software.
  • Configuration of Mailboxes and related Components.

Confidential

Java Developer

Responsibilities:

  • Involved in Development and Support phases of Software Development Life Cycle (SDLC).
  • Participated in gathering business requirements of software for High Level Design.
  • Supported various sub-projects and communicated with support teams, leads and other cross teams.
  • Used XML Spy to design XML Schema (XSD) and WSDL.
  • Used JAXB to marshal/unmarshal to/from XML from/to Java Objects.
  • Spring-Ioc Container was used for dependency injection.
  • Used Spring-JDBC and Spring DAO support to persist POJO objects into database.
  • Used SoapUI as a test tool for testing Soap and REST Web services.
  • Used Maven to clean, compile, build, install, deploy and manage jar and war archives.
  • Used Log4J for debugging and error logging purpose.
  • Used Prismy as a Defect/Base Change Tracking System within the team to enhance the service.
  • Developed Unit and Functional Test cases using SoapUI.

We'd love your feedback!