Big Data Spark Technical Analyst Resume Bentonville, AR - Hire IT People

SUMMARY:

10 plus years of professional experience in IT, with 5 years on Big Data technologies - Spark-Scala, Hadoop, Cassandra, Pig, HBase, Hive and Impala.
Extensive experience in building Spark applications using Scala and Java APIs
Worked in Lead and Senior Developer roles for Spark ETL applications design and implementations
Good experience working on big data platforms like Amazon AWS, Cloudera and Hortonworks
Worked extensively with RDDs and Dataframes in Spark using Spark Context and used Scala to read multiple data formats
Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext
Performed ETL process with Spark using Scala for processing and validation of raw data logs.
Performed data processing in Spark by handling multiple data repositories / data sources.
Hands-on with JSON Object Serialization/Deserialization using JSON Serializer in Java and Play librabies in Scala
Has good knowledge of py-spark.
Experience working on XSL transformations(XSLT) with XML data for uniform consumption by multiple applications
In-depth knowledge of Hadoop Architecture and Hadoop Daemons.
Experience in developing applications using Java and using Java libraries.
Experience in using Java Persitence API (JPA) for interaction with RDBMS and Java objects in the application.
Experience with working on JSON and XML formats for data exchanges as payloads for webservices
Experience in writing Map-Reduce programs using Apache Hadoop framework to anlyze volumes of data.
Hands-on experience with moving data from HDFS to HIVE to analyze data using HIVE query language.
Well versed with Pig Latin to analyze large data sets.
Experience in Integrating Hive and Sqoop with HBase and analyzing data in HBase.
Microsoft Certified Office Master(Excel, Access, Word and Powerpoint) - Very good at using statistical tools in Excel for data analysis.
Knowledge of Kafka Distributed Messaging System
Worked on extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs.
Knowledge of architecture and functionality of NOSQL DBs like HBase, Cassandra and MongoDB.
Extensive experience with SQL, PL/SQL and database concepts.
Experience in Web Services using XML, HTML and SOAP.
Very good knowledge of XSLT.
Handled and executed projects the SCRUM way (as SCRUM Master) with a detailed product backlog along with active involvement in core product features development.
Experience in documenting Use Cases, Sequence diagrams and Class Diagrams in UML with Rational Rose, Visio
Experience working with Git and Jenkins for Continuous Integration.
6+ Years of experience with SAP on development and Business Rules Consultant roles.
Has been part of product development team in Confidential for creating server side component of blue printing.

TECHNICAL SKILLS:

Big Data Eco System: Spark, Hadoop 2.0 Map Reduce, HDFS, Pig, Hive, HBase, Impala, Sqoop, YARN, Cassandra, Apache Nifi, Streamsets

Programming Languages: Scala, Java, SAP ABAP, C++, VBA.

Scripting Languages: Java script, HTML, XML, XSLT, Linux shell script

Relational Databases: Oracle 11g/10g, Max DB, MS-SQL, MS-Access

NoSQL Databases: Cassandra, HBase, MongoDB

Modelling Languages: UML

Tools: /Editors: Eclipse, Rational Rose, Win SCP, Microsoft Office, Visio, Putty, SPSS

Operating Systems: Windows Environments, Linux

PROFESSIONAL EXPERIENCE:

Confidential, Bentonville, AR

Big Data Spark Technical Analyst

Responsibilities:

Built spark applications for data governance of the datalake
Developed scripts for running the framework with different rest api calls
Scheduled jobs on automic ( Confidential s workflow scheduler) for running data governance jobs
Used JIRA for tasks and bug tracking.
Wrote SQL queries to do specific metric calculations and execute them from shell script for the automation purpose
Used Java apis and libraries for the spark applications.
Tasks identification and estimation from design to implementation of the framework.
Developed and integrated code standards for security of the applications to run on organization s cluster.
Identified multiple scenarios and use cases of using the data gonvernance framework across the business domains.

Confidential, O'Fallon, MO

Senior Big Data Spark Developer

Responsibilities:

Build Nifi workflows for the process orchestration
Lead and develop spark applications for Scoring and Capping of the transactions
Deliver code in adherence to security requirements and compliance
Participate in requirements and task planning for the development iterations
Evaluate movement of Nifi configurations across development, staging and production environments
Develop sqoop scripts for data ingestion to HDFS and hive tables from Postgres DB
Plan and estimate the tasks for each user story identified for a feature
Perform code reviews along with peers to ensure code quality
Failures and exceptions handling for process work flows designed on Wifi
Creating alert mechanisms for the data work flow
Dataloading to Cassandra and Postgres post data processing.

Confidential, NJ

Spark-Scala Lead

Responsibilities:

Designed and built a custom and generic ETL framework - Spark application using Scala; for data loading and transformations.
Managed a team of 14 from two offshore locations and an onsite location.
Involved in the data modelling of the new system for Cassandra from the existing legacy Oracle DB.
Handled data transformations as per the business and mapping rules.
Executed complex data aggregations on the calls and sales data for the BI dashboards.
Involved in the configuration of spark jobs through amazon data pipeline for weekly, monthly and adhoc executions.
Created custom logger to handle huge application log data.
Created an error reprocessing framework that handles flagged errors during the subsequent loads.
Used Zeppelin, beeline for querying Cassandra tables
Executed queries using sparkSQL for complex joins and data validation.
Wrote scala udfs for handling complex transformation logics.
Involved in the design of partition and clustering keys as per the data volume and query patterns on Cassandra tables.
Analyzed legacy data model and create Cassandra data model for data loads from heterogenous systems.
Created modular and independent components for amazon aws S3 connections, data reads and data stores.
Designed a custom referential integrity framework on the No SQL Cassandra tables for maintaining data integrity and relations in the data.
Wrote scala scripts for extracts from Cassandra Operational Data Store tables for comparing with legacy system data.
Created the data ingestion file validation component for checksum, last modified and threshold levels.

Environment: Spark 1.6.0, Cassandra, Scala IDE, Amazon AWS, DBeaver, Zeppelin, Beeline, Amazon workspace, S3 Browser, Amazon Datapipeline, Git, JIRA, Mobax client, Shell scripting

Confidential, Dayton, OH

Spark-Scala Developer

Responsibilities:

Developed various POC’s for the client and analyzing various hadoop technologies.
Created spark applications using Scala for file validations, data processing and transformations.
Pulled data from Veeva to Hadoop cluster using CData driver
Wrote Pig UDFs, hiveQL queries, hive UDFs and SparkSQL queries .
Created a series of Spark jobs and processes that used YARN as Spark Resource Manager
Handled data processing from multiple data sources and repositories using Spark
Handled batch processing data in Spark using Scala
Configured Log4j in Spark for custom logging in Spark Applications
Responsible for creating domain and staging data models.
Created hive tables,loaded the data and analysed data using hive queries.
Written custom mapreduce programs.
Responsible for creataing hbase tables and loading aggregated data into them using pig.
Developed Pig UDFs to make customize various functions and make them reusable.
Resposible for scheduling workflows for dailydelta loads.
Developed shell scripts for integrating all the compenents like hive queries,mapreduce jobs, pig files and other components.
Guided the Team for their day to day activities and preparing them to reach the deadlines.
Collaborate with infrastructure and security architects to integrate enterprise information architecture into overall enterprise architecture
Used git as version control tools to maintain the code repository.
Provide the documentation and train the teams,build effective cross team communications to ensure accuracy, consistency, problem solving, conflict resolution and on time project completion.
Communicate to the senior management to provide status,to discuss strategic plans, develop road maps and identify critical success factors.

Environment: CDH5, HADOOP Eco System, HIVE, Sqoop,SOLRCloud, Impala, Teradata Connector, SparkSQL, HBase.

Confidential, PA

Hadoop Developer/ Engineer

Responsibilities:

Gave extensive presentations about the Hadoop ecosystem, best practices, data architecture in Hadoop.
Designed the ETL process from various sources in to Hadoop/HDFS for analysis and further processing.
Provide review and feedback for existing physical architecture, data architecture, analysis, designs and code. Designed the next generation architecture for unstructured data.
Debugged and solved issues as the subject matter expert focusing issues around data sciences and processing.
Wrote Pig Latin and pig UDFs and optimized the code.
Worked on Data archival model on Hadoop framework.
Wrote HectorAPI code for cassandra
Developed Information Strategy in alignment with all agency strategy for master data management, data integration, data virtualization, metadata management, data quality and profiling, data modeling and data governance.
Created Hive tables,loaded the data and analysed data using hive queries.
Worked on hive ranking algorithm to classify the patterns.
Defined business and technical requirements, design Proof of Concept for evaluating afms agencies data evaluation criteria and scoring and select data integration and information management.
Captured and documented the volumetric analysis of CDC module with Informatica.
Generated huge records of data for volumetric testing.
Collaborated with infrastructure and security architects to integrate enterprise information architecture into overall enterprise architecture.

Environment: CDH4, Cassandra, Hector API, Hdfs, Mapreduce, Pig, Hive, Informatica, Shell scripting

Confidential

Business Rules & ABAP Consultant

Responsibilities:

Identified business rule scenarios for various processes (new and existing)
Gave Product Demos to the internal customers and external prospects to showcase product features and capabilities
Used Java Persistence API (JPA) for connecting with Derby for Content Management Solutions.
Developed server side component (SOCO) for the product called ‘Business Process Blueprinting’ that is available for the customers with SAP Solution Manager (release versions of 7.1 and above)
Developed ABAP Units for testing server component using ABAP Unit Testing Framework.
Significant contributor to the product features development for three releases of the product.
Handled customer issues raised on the development component - “Business Process Blueprinting”.
Developed BSP applications using HTML and Java Script for the product administration that allows users to perform actions accordingly to the roles assigned.
Provided inputs to the Knowledge management team for product guides that would eventually be available for the customers from SAP Service Market Place (SAP SMP).
Gave product demos in the prominent technology events like SAP TechEd, Sapphire etc.,
Hands-on product training and workshops to the pilot users.

We provide IT Staff Augmentation Services!

Big Data Spark Technical Analyst Resume

Bentonville, AR

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship