Big Data Spark Technical Analyst Resume
Bentonville, AR
SUMMARY:
- 10 plus years of professional experience in IT, with 5 years on Big Data technologies - Spark-Scala, Hadoop, Cassandra, Pig, HBase, Hive and Impala.
- Extensive experience in building Spark applications using Scala and Java APIs
- Worked in Lead and Senior Developer roles for Spark ETL applications design and implementations
- Good experience working on big data platforms like Amazon AWS, Cloudera and Hortonworks
- Worked extensively with RDDs and Dataframes in Spark using Spark Context and used Scala to read multiple data formats
- Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext
- Performed ETL process with Spark using Scala for processing and validation of raw data logs.
- Performed data processing in Spark by handling multiple data repositories / data sources.
- Hands-on with JSON Object Serialization/Deserialization using JSON Serializer in Java and Play librabies in Scala
- Has good knowledge of py-spark.
- Experience working on XSL transformations(XSLT) with XML data for uniform consumption by multiple applications
- In-depth knowledge of Hadoop Architecture and Hadoop Daemons.
- Experience in developing applications using Java and using Java libraries.
- Experience in using Java Persitence API (JPA) for interaction with RDBMS and Java objects in the application.
- Experience with working on JSON and XML formats for data exchanges as payloads for webservices
- Experience in writing Map-Reduce programs using Apache Hadoop framework to anlyze volumes of data.
- Hands-on experience with moving data from HDFS to HIVE to analyze data using HIVE query language.
- Well versed with Pig Latin to analyze large data sets.
- Experience in Integrating Hive and Sqoop with HBase and analyzing data in HBase.
- Microsoft Certified Office Master(Excel, Access, Word and Powerpoint) - Very good at using statistical tools in Excel for data analysis.
- Knowledge of Kafka Distributed Messaging System
- Worked on extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs.
- Knowledge of architecture and functionality of NOSQL DBs like HBase, Cassandra and MongoDB.
- Extensive experience with SQL, PL/SQL and database concepts.
- Experience in Web Services using XML, HTML and SOAP.
- Very good knowledge of XSLT.
- Handled and executed projects the SCRUM way (as SCRUM Master) with a detailed product backlog along with active involvement in core product features development.
- Experience in documenting Use Cases, Sequence diagrams and Class Diagrams in UML with Rational Rose, Visio
- Experience working with Git and Jenkins for Continuous Integration.
- 6+ Years of experience with SAP on development and Business Rules Consultant roles.
- Has been part of product development team in Confidential for creating server side component of blue printing.
TECHNICAL SKILLS:
Big Data Eco System: Spark, Hadoop 2.0 Map Reduce, HDFS, Pig, Hive, HBase, Impala, Sqoop, YARN, Cassandra, Apache Nifi, Streamsets
Programming Languages: Scala, Java, SAP ABAP, C++, VBA.
Scripting Languages: Java script, HTML, XML, XSLT, Linux shell script
Relational Databases: Oracle 11g/10g, Max DB, MS-SQL, MS-Access
NoSQL Databases: Cassandra, HBase, MongoDB
Modelling Languages: UML
Tools: /Editors: Eclipse, Rational Rose, Win SCP, Microsoft Office, Visio, Putty, SPSS
Operating Systems: Windows Environments, Linux
PROFESSIONAL EXPERIENCE:
Confidential, Bentonville, AR
Big Data Spark Technical Analyst
Responsibilities:
- Built spark applications for data governance of the datalake
- Developed scripts for running the framework with different rest api calls
- Scheduled jobs on automic ( Confidential s workflow scheduler) for running data governance jobs
- Used JIRA for tasks and bug tracking.
- Wrote SQL queries to do specific metric calculations and execute them from shell script for the automation purpose
- Used Java apis and libraries for the spark applications.
- Tasks identification and estimation from design to implementation of the framework.
- Developed and integrated code standards for security of the applications to run on organization s cluster.
- Identified multiple scenarios and use cases of using the data gonvernance framework across the business domains.
Confidential, O'Fallon, MO
Senior Big Data Spark Developer
Responsibilities:
- Build Nifi workflows for the process orchestration
- Lead and develop spark applications for Scoring and Capping of the transactions
- Deliver code in adherence to security requirements and compliance
- Participate in requirements and task planning for the development iterations
- Evaluate movement of Nifi configurations across development, staging and production environments
- Develop sqoop scripts for data ingestion to HDFS and hive tables from Postgres DB
- Plan and estimate the tasks for each user story identified for a feature
- Perform code reviews along with peers to ensure code quality
- Failures and exceptions handling for process work flows designed on Wifi
- Creating alert mechanisms for the data work flow
- Dataloading to Cassandra and Postgres post data processing.
Confidential, NJ
Spark-Scala Lead
Responsibilities:
- Designed and built a custom and generic ETL framework - Spark application using Scala; for data loading and transformations.
- Managed a team of 14 from two offshore locations and an onsite location.
- Involved in the data modelling of the new system for Cassandra from the existing legacy Oracle DB.
- Handled data transformations as per the business and mapping rules.
- Executed complex data aggregations on the calls and sales data for the BI dashboards.
- Involved in the configuration of spark jobs through amazon data pipeline for weekly, monthly and adhoc executions.
- Created custom logger to handle huge application log data.
- Created an error reprocessing framework that handles flagged errors during the subsequent loads.
- Used Zeppelin, beeline for querying Cassandra tables
- Executed queries using sparkSQL for complex joins and data validation.
- Wrote scala udfs for handling complex transformation logics.
- Involved in the design of partition and clustering keys as per the data volume and query patterns on Cassandra tables.
- Analyzed legacy data model and create Cassandra data model for data loads from heterogenous systems.
- Created modular and independent components for amazon aws S3 connections, data reads and data stores.
- Designed a custom referential integrity framework on the No SQL Cassandra tables for maintaining data integrity and relations in the data.
- Wrote scala scripts for extracts from Cassandra Operational Data Store tables for comparing with legacy system data.
- Created the data ingestion file validation component for checksum, last modified and threshold levels.
Environment: Spark 1.6.0, Cassandra, Scala IDE, Amazon AWS, DBeaver, Zeppelin, Beeline, Amazon workspace, S3 Browser, Amazon Datapipeline, Git, JIRA, Mobax client, Shell scripting
Confidential, Dayton, OH
Spark-Scala Developer
Responsibilities:
- Developed various POC’s for the client and analyzing various hadoop technologies.
- Created spark applications using Scala for file validations, data processing and transformations.
- Pulled data from Veeva to Hadoop cluster using CData driver
- Wrote Pig UDFs, hiveQL queries, hive UDFs and SparkSQL queries .
- Created a series of Spark jobs and processes that used YARN as Spark Resource Manager
- Handled data processing from multiple data sources and repositories using Spark
- Handled batch processing data in Spark using Scala
- Configured Log4j in Spark for custom logging in Spark Applications
- Responsible for creating domain and staging data models.
- Created hive tables,loaded the data and analysed data using hive queries.
- Written custom mapreduce programs.
- Responsible for creataing hbase tables and loading aggregated data into them using pig.
- Developed Pig UDFs to make customize various functions and make them reusable.
- Resposible for scheduling workflows for dailydelta loads.
- Developed shell scripts for integrating all the compenents like hive queries,mapreduce jobs, pig files and other components.
- Guided the Team for their day to day activities and preparing them to reach the deadlines.
- Collaborate with infrastructure and security architects to integrate enterprise information architecture into overall enterprise architecture
- Used git as version control tools to maintain the code repository.
- Provide the documentation and train the teams,build effective cross team communications to ensure accuracy, consistency, problem solving, conflict resolution and on time project completion.
- Communicate to the senior management to provide status,to discuss strategic plans, develop road maps and identify critical success factors.
Environment: CDH5, HADOOP Eco System, HIVE, Sqoop,SOLRCloud, Impala, Teradata Connector, SparkSQL, HBase.
Confidential, PA
Hadoop Developer/ Engineer
Responsibilities:
- Gave extensive presentations about the Hadoop ecosystem, best practices, data architecture in Hadoop.
- Designed the ETL process from various sources in to Hadoop/HDFS for analysis and further processing.
- Provide review and feedback for existing physical architecture, data architecture, analysis, designs and code. Designed the next generation architecture for unstructured data.
- Debugged and solved issues as the subject matter expert focusing issues around data sciences and processing.
- Wrote Pig Latin and pig UDFs and optimized the code.
- Worked on Data archival model on Hadoop framework.
- Wrote HectorAPI code for cassandra
- Developed Information Strategy in alignment with all agency strategy for master data management, data integration, data virtualization, metadata management, data quality and profiling, data modeling and data governance.
- Created Hive tables,loaded the data and analysed data using hive queries.
- Worked on hive ranking algorithm to classify the patterns.
- Defined business and technical requirements, design Proof of Concept for evaluating afms agencies data evaluation criteria and scoring and select data integration and information management.
- Captured and documented the volumetric analysis of CDC module with Informatica.
- Generated huge records of data for volumetric testing.
- Collaborated with infrastructure and security architects to integrate enterprise information architecture into overall enterprise architecture.
Environment: CDH4, Cassandra, Hector API, Hdfs, Mapreduce, Pig, Hive, Informatica, Shell scripting
Confidential
Business Rules & ABAP Consultant
Responsibilities:
- Identified business rule scenarios for various processes (new and existing)
- Gave Product Demos to the internal customers and external prospects to showcase product features and capabilities
- Used Java Persistence API (JPA) for connecting with Derby for Content Management Solutions.
- Developed server side component (SOCO) for the product called ‘Business Process Blueprinting’ that is available for the customers with SAP Solution Manager (release versions of 7.1 and above)
- Developed ABAP Units for testing server component using ABAP Unit Testing Framework.
- Significant contributor to the product features development for three releases of the product.
- Handled customer issues raised on the development component - “Business Process Blueprinting”.
- Developed BSP applications using HTML and Java Script for the product administration that allows users to perform actions accordingly to the roles assigned.
- Provided inputs to the Knowledge management team for product guides that would eventually be available for the customers from SAP Service Market Place (SAP SMP).
- Gave product demos in the prominent technology events like SAP TechEd, Sapphire etc.,
- Hands-on product training and workshops to the pilot users.