We provide IT Staff Augmentation Services!

Big Data Spark Technical Analyst Resume

Bentonville, AR


  • 10 plus years of professional experience in IT, with 5 years on Big Data technologies - Spark-Scala, Hadoop, Cassandra, Pig, HBase, Hive and Impala.
  • Extensive experience in building Spark applications using Scala and Java APIs
  • Worked in Lead and Senior Developer roles for Spark ETL applications design and implementations
  • Good experience working on big data platforms like Amazon AWS, Cloudera and Hortonworks
  • Worked extensively with RDDs and Dataframes in Spark using Spark Context and used Scala to read multiple data formats
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext
  • Performed ETL process with Spark using Scala for processing and validation of raw data logs.
  • Performed data processing in Spark by handling multiple data repositories / data sources.
  • Hands-on with JSON Object Serialization/Deserialization using JSON Serializer in Java and Play librabies in Scala
  • Has good knowledge of py-spark.
  • Experience working on XSL transformations(XSLT) with XML data for uniform consumption by multiple applications
  • In-depth knowledge of Hadoop Architecture and Hadoop Daemons.
  • Experience in developing applications using Java and using Java libraries.
  • Experience in using Java Persitence API (JPA) for interaction with RDBMS and Java objects in the application.
  • Experience with working on JSON and XML formats for data exchanges as payloads for webservices
  • Experience in writing Map-Reduce programs using Apache Hadoop framework to anlyze volumes of data.
  • Hands-on experience with moving data from HDFS to HIVE to analyze data using HIVE query language.
  • Well versed with Pig Latin to analyze large data sets.
  • Experience in Integrating Hive and Sqoop with HBase and analyzing data in HBase.
  • Microsoft Certified Office Master(Excel, Access, Word and Powerpoint) - Very good at using statistical tools in Excel for data analysis.
  • Knowledge of Kafka Distributed Messaging System
  • Worked on extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs.
  • Knowledge of architecture and functionality of NOSQL DBs like HBase, Cassandra and MongoDB.
  • Extensive experience with SQL, PL/SQL and database concepts.
  • Experience in Web Services using XML, HTML and SOAP.
  • Very good knowledge of XSLT.
  • Handled and executed projects the SCRUM way (as SCRUM Master) with a detailed product backlog along with active involvement in core product features development.
  • Experience in documenting Use Cases, Sequence diagrams and Class Diagrams in UML with Rational Rose, Visio
  • Experience working with Git and Jenkins for Continuous Integration.
  • 6+ Years of experience with SAP on development and Business Rules Consultant roles.
  • Has been part of product development team in Confidential for creating server side component of blue printing.


Big Data Eco System: Spark, Hadoop 2.0 Map Reduce, HDFS, Pig, Hive, HBase, Impala, Sqoop, YARN, Cassandra, Apache Nifi, Streamsets

Programming Languages: Scala, Java, SAP ABAP, C++, VBA.

Scripting Languages: Java script, HTML, XML, XSLT, Linux shell script

Relational Databases: Oracle 11g/10g, Max DB, MS-SQL, MS-Access

NoSQL Databases: Cassandra, HBase, MongoDB

Modelling Languages: UML

Tools: /Editors: Eclipse, Rational Rose, Win SCP, Microsoft Office, Visio, Putty, SPSS

Operating Systems: Windows Environments, Linux


Confidential, Bentonville, AR

Big Data Spark Technical Analyst


  • Built spark applications for data governance of the datalake
  • Developed scripts for running the framework with different rest api calls
  • Scheduled jobs on automic ( Confidential s workflow scheduler) for running data governance jobs
  • Used JIRA for tasks and bug tracking.
  • Wrote SQL queries to do specific metric calculations and execute them from shell script for the automation purpose
  • Used Java apis and libraries for the spark applications.
  • Tasks identification and estimation from design to implementation of the framework.
  • Developed and integrated code standards for security of the applications to run on organization s cluster.
  • Identified multiple scenarios and use cases of using the data gonvernance framework across the business domains.

Confidential, O'Fallon, MO

Senior Big Data Spark Developer


  • Build Nifi workflows for the process orchestration
  • Lead and develop spark applications for Scoring and Capping of the transactions
  • Deliver code in adherence to security requirements and compliance
  • Participate in requirements and task planning for the development iterations
  • Evaluate movement of Nifi configurations across development, staging and production environments
  • Develop sqoop scripts for data ingestion to HDFS and hive tables from Postgres DB
  • Plan and estimate the tasks for each user story identified for a feature
  • Perform code reviews along with peers to ensure code quality
  • Failures and exceptions handling for process work flows designed on Wifi
  • Creating alert mechanisms for the data work flow
  • Dataloading to Cassandra and Postgres post data processing.

Confidential, NJ

Spark-Scala Lead


  • Designed and built a custom and generic ETL framework - Spark application using Scala; for data loading and transformations.
  • Managed a team of 14 from two offshore locations and an onsite location.
  • Involved in the data modelling of the new system for Cassandra from the existing legacy Oracle DB.
  • Handled data transformations as per the business and mapping rules.
  • Executed complex data aggregations on the calls and sales data for the BI dashboards.
  • Involved in the configuration of spark jobs through amazon data pipeline for weekly, monthly and adhoc executions.
  • Created custom logger to handle huge application log data.
  • Created an error reprocessing framework that handles flagged errors during the subsequent loads.
  • Used Zeppelin, beeline for querying Cassandra tables
  • Executed queries using sparkSQL for complex joins and data validation.
  • Wrote scala udfs for handling complex transformation logics.
  • Involved in the design of partition and clustering keys as per the data volume and query patterns on Cassandra tables.
  • Analyzed legacy data model and create Cassandra data model for data loads from heterogenous systems.
  • Created modular and independent components for amazon aws S3 connections, data reads and data stores.
  • Designed a custom referential integrity framework on the No SQL Cassandra tables for maintaining data integrity and relations in the data.
  • Wrote scala scripts for extracts from Cassandra Operational Data Store tables for comparing with legacy system data.
  • Created the data ingestion file validation component for checksum, last modified and threshold levels.

Environment: Spark 1.6.0, Cassandra, Scala IDE, Amazon AWS, DBeaver, Zeppelin, Beeline, Amazon workspace, S3 Browser, Amazon Datapipeline, Git, JIRA, Mobax client, Shell scripting

Confidential, Dayton, OH

Spark-Scala Developer


  • Developed various POC’s for the client and analyzing various hadoop technologies.
  • Created spark applications using Scala for file validations, data processing and transformations.
  • Pulled data from Veeva to Hadoop cluster using CData driver
  • Wrote Pig UDFs, hiveQL queries, hive UDFs and SparkSQL queries .
  • Created a series of Spark jobs and processes that used YARN as Spark Resource Manager
  • Handled data processing from multiple data sources and repositories using Spark
  • Handled batch processing data in Spark using Scala
  • Configured Log4j in Spark for custom logging in Spark Applications
  • Responsible for creating domain and staging data models.
  • Created hive tables,loaded the data and analysed data using hive queries.
  • Written custom mapreduce programs.
  • Responsible for creataing hbase tables and loading aggregated data into them using pig.
  • Developed Pig UDFs to make customize various functions and make them reusable.
  • Resposible for scheduling workflows for dailydelta loads.
  • Developed shell scripts for integrating all the compenents like hive queries,mapreduce jobs, pig files and other components.
  • Guided the Team for their day to day activities and preparing them to reach the deadlines.
  • Collaborate with infrastructure and security architects to integrate enterprise information architecture into overall enterprise architecture
  • Used git as version control tools to maintain the code repository.
  • Provide the documentation and train the teams,build effective cross team communications to ensure accuracy, consistency, problem solving, conflict resolution and on time project completion.
  • Communicate to the senior management to provide status,to discuss strategic plans, develop road maps and identify critical success factors.

Environment: CDH5, HADOOP Eco System, HIVE, Sqoop,SOLRCloud, Impala, Teradata Connector, SparkSQL, HBase.

Confidential, PA

Hadoop Developer/ Engineer


  • Gave extensive presentations about the Hadoop ecosystem, best practices, data architecture in Hadoop.
  • Designed the ETL process from various sources in to Hadoop/HDFS for analysis and further processing.
  • Provide review and feedback for existing physical architecture, data architecture, analysis, designs and code. Designed the next generation architecture for unstructured data.
  • Debugged and solved issues as the subject matter expert focusing issues around data sciences and processing.
  • Wrote Pig Latin and pig UDFs and optimized the code.
  • Worked on Data archival model on Hadoop framework.
  • Wrote HectorAPI code for cassandra
  • Developed Information Strategy in alignment with all agency strategy for master data management, data integration, data virtualization, metadata management, data quality and profiling, data modeling and data governance.
  • Created Hive tables,loaded the data and analysed data using hive queries.
  • Worked on hive ranking algorithm to classify the patterns.
  • Defined business and technical requirements, design Proof of Concept for evaluating afms agencies data evaluation criteria and scoring and select data integration and information management.
  • Captured and documented the volumetric analysis of CDC module with Informatica.
  • Generated huge records of data for volumetric testing.
  • Collaborated with infrastructure and security architects to integrate enterprise information architecture into overall enterprise architecture.

Environment: CDH4, Cassandra, Hector API, Hdfs, Mapreduce, Pig, Hive, Informatica, Shell scripting


Business Rules & ABAP Consultant


  • Identified business rule scenarios for various processes (new and existing)
  • Gave Product Demos to the internal customers and external prospects to showcase product features and capabilities
  • Used Java Persistence API (JPA) for connecting with Derby for Content Management Solutions.
  • Developed server side component (SOCO) for the product called ‘Business Process Blueprinting’ that is available for the customers with SAP Solution Manager (release versions of 7.1 and above)
  • Developed ABAP Units for testing server component using ABAP Unit Testing Framework.
  • Significant contributor to the product features development for three releases of the product.
  • Handled customer issues raised on the development component - “Business Process Blueprinting”.
  • Developed BSP applications using HTML and Java Script for the product administration that allows users to perform actions accordingly to the roles assigned.
  • Provided inputs to the Knowledge management team for product guides that would eventually be available for the customers from SAP Service Market Place (SAP SMP).
  • Gave product demos in the prominent technology events like SAP TechEd, Sapphire etc.,
  • Hands-on product training and workshops to the pilot users.

Hire Now