Spark/hadoop Developer Resume
VA
SUMMARY
- Cloudera Certified Developer for Apache Hadoop with over 9 years of experience in designing, implementing and executing client - server and analytical applications using Hadoop, Spark, Apache NiFi, Kafka, Hive, Hbase, AWS, Java, Scala, Python, REST Services and Micro Service technologies.
- Designed and developed custom processers and data flow pipelines between systems using flow-based programming in Apache NiFi, extensive experience in using NiFis web based UI.
- Extensive experience working with real time streaming applications and batch style large scale distributed computing applications, worked on integrating Kafka with NiFi and Spark.
- Expertise in fine tuning Spark, Hive, MR applications with the available cluster capacity.
- Developed re-usable and configurable components as part of project requirements in java, scala and python.
- Sound knowledge of Scala's functional style programming techniques like Anonymous Functions (Closures), Currying, Higher Order Functions and Pattern Matching.
- Delivered POCs on innovative business ideas and presented them to the technology leaders. Mentored other team members to get them up to the speed in challenging projects.
- Used python scripting for automation and deep has understanding of advanced data analysis libraries like NumPy, Pandas, Matplotlib.
- Hands on experience in advanced features of Java8 like Lambda Expressions and Streams.
- Expertise in working with different kind of data formats like Avro, Parquet and ORC.
- Worked with Kafka API to develop Publisher, Subscriber components
- Developed component to upload/download from AWS S3 based on project specific configurations
- Developed machine learning POCs using R programming and Python modules for data analytics.
- Developed end to end POC from Data Ingestion to Data Transformation to Data Quality to Data Lineage for Big Data Platform.
- Strong knowledge of machine learning algorithms like Linear Regression, Logistic Regression, Decision Tree, KNN, Holt-Winters, SVM and K-Means.
- Has excellent Relational Database understanding and experience with Oracle 10g/11i and MySQL.
- Good command on algorithms and data structures, active user at DZone and other technical forums.
- Experienced in using Version Control Tools like SVN, GIT. Aced in build tools like Maven. Worked in multi module applications maven.
- Participated in entire Software Development Life Cycle including Requirement Analysis, Design, Development, Testing, Implementation, Documentation and Support of software applications.
- Experience working with Agile Methodologies including SCRUM and Test-Driven Development.
- Strong work ethic with desire to succeed and make significant contributions to the organization. Strong problem solving skills, good communication, interpersonal skills and a good team player.
TECHNICAL SKILLS
Languages: Java, Scala, Python and R(beginner), Unix Scripting, SQL/PLSQL
BigData Ecosystem: Hadoop, Spark (pyspark and scala), NiFi, HDFS, Map Reduce, PIG, Hive, HBase, Kafka, Zeppeline, Oozie, Sqoop, Avro, Parquet
Web Technologies: Java/JEE, Django, REST, Micro Services, Play, JSP, JSTL, XML, XSL, AJAX, JDBC, Angular JS, Bootstrap, JQuery
Machine Learning: Holt-Winter, K Nearest Neighbors, Similarity Matrix, Decision Trees, Spark
Tools: Intellij idea, Eclipse, Scala IDE, Pycharm, Spring Tool Suite, Apache Lucene, Log4J, Filezilla, WinSCP, IzPack, Avro-Tools
XML Technologies: PyXB, XSD, XML, XSLT, XPath, SAX, DOM and JAXB
Operating Systems: Unix, Linux, Solaris, Mac, Windows 10/7/XP
Database: MySQL, MySQL WorkBench and Oracle 11i/10g, SQL Developer, TOAD
Servers: Apache Tomcat, JBoss
Job Scheduling: Apache NiFi, Autosys, ACE
Editors: vim, Sublime, Intellij idea, Eclipse, Scala IDE, Pycharm, Jupyter/Ipython notebooks
Build Tools: Maven (multi module in java/scala), DistUtils/SetupTools(Python)
Code Version Ctrl: Git, GitHub, Bitbucket, Stash.
Continuous Integration: Bamboo, Junit, EasyMock, Mockito, Jenkins
PROFESSIONAL EXPERIENCE
Confidential, VA
Spark/Hadoop Developer
Responsibilities:
- Involved in architectural design meetings of product involving all related stakeholders
- Contributed to the design and implementation of the product.
- Developed custom NiFi processer to convert custom source format to JSON, to upload and download files from AWS S3 based on project specific configurations.
- Developed custom NiFi processer to upload and download files from AWS S3 based on project specific configurations.
- Developed custom NiFi processer to create weekly-full/daily-incremental data extracts for reporting consumption based on configurations.
- Developed custom NiFi processer to launch spark processers based on the source file ingested.
- Developed Spark/Scala code to consume data in Batch and Streaming mode based on the type of source defined in config definition.
- Developed file tracker component (Java/Scala/Spark) to track the status of the files ingested at each step of workflow.
- Developed rules engine component Java/Scala to apply business rules to each record based on the source file ingested.
- Developed reusable component (commons module) consisting of several user defined functions used across different modules in the project.
- Developed configurations module to drive end-to-end (ingestion/pre-processing/processing/loading) data from different sources based on xml/json configuration files.
- Ingested data from different sources like S3, SFTP, Custom data APIs and RDBMS.
- Handled different formats of data like Avro, JSON, XML, CSV and STIFF(Custom format).
- Developed Spark code to process data from different sources and store into Hive/Hbase (Data is pre-processed and stores in HDFS using NiFi before spark consumption).
- Designed and Executed modules of BDP like Ingestion, Organization, Transformation and Data Quality.
- Developed powershell scripts for auto-download of extract files from AWS S3 to Tableau server and trigger tableau refresh.
- Developed component to execute data quality checks on data related to different domains on daily basis, runs based on settings in a configuration file.
Confidential, Philadelphia, PA
Spark/Hadoop Developer
Responsibilities:
- Contributed to the design and implementation of RDS and EDT
- Developed Change Data Capture component using Spark/Scala
- Developed Bi-Temporal CDC component (Captures the change with Bi-Temporal attributes) to store the change in two time dimensions.
- Worked on Metadata capture component to capture the change in metadata of new feeds
- Worked on Schema evolution component to deal with the changing schema.
- Worked on rules engine to detect changes which are not acceptable and triggers alerts to the source system
- Worked on 3NF to Dimensional model converter with change data capture.
- Used configurable XML as inputs to the products; JAXB/SAX/DOM to marshal/unmarshal the XML config files.
- Worked for the development of Enterprise Data Transfer (EDT) for IMS
- Developed micro services using Akka HTTP, as per the application requirements
- Developed the micro services for finding the active name node and generating Key tabs
- Worked with Kafka API to develop Publisher, Subscriber components
- Developed consumer for 3NF data(Avro) from source Kafka topics and persist the data in landing area in HDFS
- Developed cron jobs for monitoring and logging.
- Deployed AKKA cluster to production using Lightbend ConductR
- Worked on data load (Drug Distribution Data) project, developed python scripts to automate end-to-end flow.
- Worked on loading files from Mainframes to Datalake using Impala SQL, and Python Scripts.
- Used sqoop for history load of eight years worth of fact data.
- Converted Netezza business logic SQL to Impala SQL
- Developed shell/python scripts for RI check of load process of multiple dimensions.
- RI validation reports are sent to concerned departments using Oozie email action.
Confidential, Cupertino, CA
Hadoop Developer
Responsibilities:
- Worked on migration of data across Hadoop clusters as part of DR Cluster Prineville setup initial phase(CR1 & CR2).
- Developed python automation scripts for extracting DDL scripts from the source system and executing the statements on new cluster
- Developed validation scripts for verification of the data copied to new cluster by comparing with the stats took at the source system.
- Worked on developing Export/Import scripts for hive tables and data migration to avoid the manual intervention.
- Experienced in Data/Application Migration. Has excellent insight into the challenges involved with such projects.
- Understanding business logic of Teradata procedures.
- Identifying source systems and develop hive scripts to point those source systems.
- Work with framework team in finalizing the Metadata setup for enabling F2C on Phoenix
- Configure clusters by using ANSIBLE playbooks
- Written automation scripts for migrating the data using distcp and distcp2, Worked on optimizing the process of migration.
- Migrating data from Teradata to Hive by using Sqoop.
- Converting the existing relational database model to Hadoop ecosystem.
- Developed HiveQL scripts by deriving logic from Teradata stored procedures.
- Optimizing Hive queries by implementing SMB Joins and Map Side joins on partitioned and bucketed datasets.
- Compare the Hive queries output to existing data model outputs.
- Customizing batch Java programs & Shell script development.
- Scheduling hive scripts to run hive queries in regular intervals.
- Worked on python and shell scripting to create automation scripts.
- Written Sqoop incremental import job to move new / updated info from Database to HDFS and Hive.
Confidential
Oracle Developer
Responsibilities:
- Understanding the systems requirement and functional design.
- Coding modules and following the Scripting specifications and standards.
- Developing Shell Scripts for Automation of Job Submission.
- Implemented alerting mechanism using shell scripts as existing system reached its end of life
- Packaging and Building Beta/Production /FT Releases.
- Deploying the Quarterly Releases and FT Releases.
- Developing shell scripts for monitoring the business critical streams.
- Upgradation of ACE (Job Scheduling Tool).
- Placing Blackouts/Outages on Servers being upgraded
- Installation and Configuration of ACE software on Solaris 10
- Upgradation/Configuration of Mailbox(SFTP File Transfer Tool) software
- Testing functionality of upgraded systems.
- Post Upgrade Support for the upgraded systems.
- Analysis of existing Incoming/Outgoing Interfaces.
- Implementation of RFS/SPIDER (Custom ETL tool).
- Configuration of RFS/SPIDER Transmissions using Oracle Forms.
- Implementation/Configuration of ACE (Job Scheduling Tool).
- Configuring Streams, Programs and Batches.
- Post Implementation System Support.
- Documentation of Changes in Process.
Confidential
Oracle Developer
Responsibilities:
- Archiving using UNIX tools when system space usage reaches the threshold limit.
- Printer Management on Solaris Environment.
- Monitoring Customized Namespaces like Load, Hold and Error.
- Supporting the Customized interfaces used for IN/OUT dataflow.
- Registering, Scheduling and Monitoring the Jobs, Batch Processes running in Scheduling Tool.
- Tracking and resolving the tickets within SLA.
- Applying the Release during the release cycle.
- Participated in planned DR test.
- Installation and Configuration of ACE software on Solaris 10
- Configuration of ACE job scheduling tool.
- Developed the UNIX shell script to monitor daily failure process and take the appropriate action to automate dis process. dis added a significant reduction in ticket and highly appreciated by the customer.
- Configuration of Job Failure Alerting Systems.
- Implementation/Configuration of Mailbox (SFTP File Transfer Tool) software.
- Configuration of Mailboxes and related Components.
Confidential
Java Developer
Responsibilities:
- Involved in Development and Support phases of Software Development Life Cycle (SDLC).
- Participated in gathering business requirements of software for High Level Design.
- Supported various sub-projects and communicated with support teams, leads and other cross teams.
- Used XML Spy to design XML Schema (XSD) and WSDL.
- Used JAXB to marshal/unmarshal to/from XML from/to Java Objects.
- Spring-Ioc Container was used for dependency injection.
- Used Spring-JDBC and Spring DAO support to persist POJO objects into database.
- Used SoapUI as a test tool for testing Soap and REST Web services.
- Used Maven to clean, compile, build, install, deploy and manage jar and war archives.
- Used Log4J for debugging and error logging purpose.
- Used Prismy as a Defect/Base Change Tracking System within the team to enhance the service.
- Developed Unit and Functional Test cases using SoapUI.