Spark & Hadoop Developer Resume
Overland Park Ks, KansaS
SUMMARY:
- Certified Big Data Professional with 10+ years of experience in software development with deep business acumen and technical expertise in Big Data technologies.
- Hands - on experience in working with Big Data technologies i.e. HDFS, MapReduce, PIG, HBase & Hive, Oozie, Sqoop, Flume, Scala, Spark, Storm, Kafka & AWS.
- Worked on all phases of data warehouse development life cycle, ETL design & implementation & support of new & existing applications.
- Extensive experience in Scala, Spark Core, Spark SQL, Spark Streaming
- Hands on experience in installing, configuring, and using Hadoop ecosystem components on AWS.
- Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using different Hadoop distributions: Cloudera CDH, Hortonworks, Apache Hadoop
- Well experienced in data transformation using custom MapReduce, Hive and Spark scripts for different types of file formats Text File, Sequence File, Avro, ORC, and PARQUET.
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) in Java.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience with Building stream-processing systems with Kafka
- Experienced in analyzing business requirements and translating requirements into functional and technical design specifications.
- Created sample data for puzrpose of testing the developed mappings/scripts.
- Implemented Proof of concepts on migration of data from different databases (i.e. Teradata, Oracle and MySQL) to Hadoop.
- Experience in working with different & complex datasets, like Flat files, JSONs, XML files and Databases, in combination big data technologies.
- Excellent problem solving skills, high analytical skills and interpersonal skills.
- Involved in data modeling.
- Delivered Hadoop migration strategy, roadmap and technology fitment
- Effectively participated in defining data storage, data access and data analytic pattern.
- Involved in application architecture designs with stakeholders.
- Developed Python & Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark 1.x/2.x for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Designed & implemented HBase tables, Hive UDFs, scripts & Sqoop’ed the data with complete ownership.
- Developed a data pipeline using Kafka, Spark and Storm to store data into HDFS.
- Designed complex workflows using Oozie.
- Automated many cross-technology tasks using shell scripting & defining the cron-tabs.
- Worked collaboratively with different teams to smoothly slide the project to production.
- Worked with different kind of compression techniques like LZO, GZip, and Snappy.
- Brief experience on working with Hadoop clusters on AWS.
- Designed Voice Activated Hands-free reporting solution using NLP assistants like Amazon Alexa & Google Assistant.
- Worked with different clients for multiple Big Data projects across domains.
- Provided consulting to customers in, identifying Big Data use cases.
- Providing Big Data strategic planning, technology roadmap to clients in Big Data technology competitiveness such as Hadoop (HDFS & MapReduce), Pig, HIVE, HBase, Sqoop, Flume, Storm, Kafka, Tableau, social graph & Big Data analytics.
- Provided consulting services to clients in Hadoop migration strategy, roadmap and technology fitment.
- Participated in engineering advanced big data technologies viz., Spark, Cassandra for designing online trading platform.
- Designed, Validated and assessed strategy for Data movement from Kafka to Hadoop using Storm & Hadoop Ecosystem technologies to provide analytics solution.
- Recommended, Validated and Assessed on Hadoop Infrastructure and data center planning considering data growth.
- Worked with pre-sales and delivery team in designing strategy engagements with customers with Big Data Technologies.
- Worked on Independent channel of the US based Insurance major in to fix the issues with the past and current data and providing predictive analytics solution using Teradata as major technology.
- Worked on Educators’ console, which provides service to 200+ Trainers about 15000+ trainees and schedules, using Java & MySql as major technologies
- Worked on Employees trainee portal, which can be accessed by 120,000 employees, using Java framework as major.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, MapReduce, PIG, Hive, Oozie, Flume, Sqoop, Spark, Storm, Kafka, Storm
NOSQL Technologies: HBase, Cassandra
Programming: Java 8, Unix, Shell Scripting, Scala
Databases: SQL Server, Oracle, Teradata.
Operating Systems: Windows, Unix, Linux, Cloudera, Hortonworks
EXPERIENCE:
Confidential, Overland Park, KS Kansas
Spark & Hadoop Developer
Responsibilities:
- Analyzing the existing system process.
- Identifying the business critical Measures by closely working with the SME.
- Streamlined the migration process from the existing system to Big Data architecture.
- Worked on ETL using Spark, Storm, Kafka, Hive, HBase, Oozie on Hadoop.
- Involved in the PoCs using Spark SQL with Python.
- Generated various efficient Spark scripts to handle the 15mins data to monthly data.
- Involved in tuning of spark jobs using spark configurations, RDDs.
- Performed significant role in upgrading the system to Spark 2.0 with Dataframes and optimizing the jobs to best utilization of Tungsten Engine.
- Written complex UDFs to handle the various
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala
- Created Spark RDD for data-centric task processing using Scala
- Experienced in managing and reviewing Spark log files for troubleshoot & debug.
- Designed, Validated and assessed strategy for Data movement from Kafka to Hadoop using Storm & Hadoop Ecosystem technologies to provide analytics solution.
- Designed and developed both Managed and External tables using Spark SQL.
- Setting up the cluster environment in AWS for PoCs and for R&D purposes
- Created HBase tables to store audit data.
- Worked with requirements team to calculate the complex KPIs.
- Designed various dimension tables using HBase and written scripts to automate the data loading to dimension tables.
- Designed workflows & coordinators for the task management and scheduling using Oozie.
- Written cron jobs to handle perform checks in the files system and data.
- Used Hive to pull the data from Hadoop system.
- Analyzing the table indexes and partitions selections for data and access.
- Worked closely on increasing system performance by reducing the I/O by identifying the process gaps and tuning the queries.
- Developed Automation scripts using UNIX/Python which incorporates the business process for data processing.
- Developed Scripts to take the backup of the current data sets from business critical Teradata Tables and move it to Hadoop tables.
- The process for backing up the data is being completely atomized using UNIX.
- Responsible for performance tuning.
- Developed Voice Activated Hands-Free Solution using NLP Assistants in Java, JSON and worked with various services from AWS.
Environment: Spark, Scala, Java, Kafka, Hive, Pig, Sqoop, Shell Script, Oozie, HBase, HDP, AWS.
Confidential, NY
Spark & Hadoop Developer
Responsibilities:
- Analysis the Existing system process.
- Have Prepared design documents for ACE process.
- Have done the coding for ACE and zombification using Scala and spark.
- Have done the coding for 7 days reprocessing logic
- Leads quality assurance reviews, code inspections, and walkthroughs the offshore developers' code.
- Acts as technical interface to development team for external groups
- Have prepared validation script to check source and ACE enrichment.
- Have developed common Data frame utilities to save the data as ORC.
- Have loaded and Processed station and program data from tribune media station in ACE.
- Have created job and configure in Tidal scheduler.
- Have worked 70 Nodes physical cluster in hadoop2x.
- Have used Sqoop to import data from oracle to Hive.
- Have used hue to connect with Hadoop cluster.
Environment: Spark, Scala, Java, Sqoop, Kafka, Hive, Pig, Shell Script, Oozie, HBase, HDP, SQL Server
Confidential
Big Data Developer
Responsibilities:
- Client is a gaming organization, which has its customer base across the globe. The business of client is spread across various platforms, which delivers their data in JSON format. Our requirement is to receive the data generated by the end users, process it to enable it in such a way that it should be used by the managers of the organization in making effective decisions, by using big data technologies viz. PIG, Hive, Sqoop, Storm and Kafka.
- Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Participated in designing Hadoop Migration Strategy.
- Worked on stream data processing to HDFS using Kafka & Storm.
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest various sources data into HDFS for analysis
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
- Developed job flows in Oozie to automate the workflow for extraction of data from warehouses and weblogs.
- Optimizing MapReduce code, pig scripts, user interface analysis, performance tuning and analysis.
- Developed Pig Latin scripts to extract and filter relevant data from the web server output files to load into HDFS.
- Handled importing of data from various data sources, performed transformations using Hive
- Created HBase tables to store various data formats of PII data coming from different profiles.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Developed Shell and Python scripts to automate and provide Control flow to Pig scripts
Environment: Storm, Kafka, Hive, Pig, Sqoop, Java, Shell Script, Oozie, HBase, Cloudera
Confidential
Big Data Developer
Responsibilities:
- Have created DAO(Data Access Object-Design pattern) and data modeling in Hive
- Have written common UDFs & used Phoenix client to connect with HBase from Hive to process the data in required format and send it to the downstream.
- Have created pig script and hive script implemented the business logic
- Have created unix script copy the feed from local FS to HDFS through Oozie scheduler.
- Have created and configure coordinator, workflow and bundles in Oozie.
- Have worked 8 Nodes cluster in AWS for Dev environment.
Environment: Hadoop1x, HDFS, MapReduce, Java, Pig, Hive, Cassandra, Oracle, Cloudera, Sqoop, AWS.
Confidential, Columbus, OH
BI Developer
Responsibilities:
- Understanding the requirement specifications.
- Preparation of dashboards using Java.
- Extracting data from Oracle, Flat Files and loading to data warehouse by developing FLOAD scripts.
- Making effective use of transformations like SQL, EXP, Filter, AGG, LKP, UPD, Joiner, SG, Sorter and Router.
- Resolving issues in transformations and mappings.
- Creating sessions and workflows.
- Testing the mappings and workflows and loading the data.
- Populating Fact/Dim tables for adhoc queries and summary tables.
- Preparation of Unit Test Plan, Unit Testing and Unit Test Reports.
- Fine-tuned complex queries and views.
Environment: Java 2.0, Hibernate, EJB, Teradata, Fload, Oracle, UNIX, Auto sys
Confidential, Boston, MA
Sr. Software Engineer
Responsibilities:
- Coding and implementation.
- Have used EJB (stateless session bean) as a business Object.
- Involved in existing product Stockholm to fix the change request (CR).
- Participated in design proposal review. Worked in design proposal (DP) for London and fixed the change requests (CR).
- Prepared Impact Analysis and UTP for CR.
- Performed Installation and Configuration.
Environment: Java 2.0, JSP1.2, MS SQL server 2000/2005, JRUN 4, NFC Framework, Hibernate3.0, Tomcat 6.0, IIS6.0, MS Clarify, MS VSS6.0
Confidential
Sr. Software Engineer
Responsibilities:
- Designed front end screens for users using Java technologies.
- Implemented functionality to connect the front end with Oracle tables & stored procedures.
- Developed Stored Procedure to implement business logic.
- Meeting the coding standards and policies during development.
- Fine-tuned complex queries and views.
- Analyzing the tables and indexes selections for data and access demographics.
- Check the skew factor across the dev and pro schema to ensure better data distribution
- Responsible for performance tuning.
- Involved in root cause analysis on production issues.
- Preparation of HLD & DLDs.
- Involved in peer reviews.
Environment: Java, J2EE, JSP, WebLogic, Oracle