Big Data Spark Developer Resume
Lincolnshire, IllinoiS
SUMMARY:
- Dynamic, Result - oriented professional with over 9.5 years of IT experience in the Business Requirements Analysis, Application Design, Coding, Implementation of business applications with RDBMS and with having strong technical and functional domain knowledge in Retail.
- Around 2 years’ experience working with Spark programming using Python API and Python modules.
- 3.5 years’ experience with Hadoop echo system and its ecosystem components HDFS, Sqoop, Hive and PIG.
- Proficiency in Spark for loading data from the local file system, HDFS, Relational databases and using Spark SQL, import data into RDD and Ingesting data from a range of sources using Spark Streamingwith JSON output and loading them to HDFS, text file.
- Good hands on knowledge of Spark streaming and Kafka.
- Excellent knowledge of ETL methodologies, Business Intelligence, Data Warehouse concepts.
- Extensively worked on huge Oracle databases with the size of 90TB with EXADATA.
- Proficiently designed and developed Oracle PL/SQL Packages, Procedures, Functions, database objects like Tables, Views, Indexes, Sequences, Synonyms, Materialized Views.
- Hands on knowledge of Oracle advance concepts like partitions, sub-partitions, hints, indexing, gather statistics etc. on large tables to improve performance.
- Expertise in writing high Quality SQL and UNIX Shell Scripting code based on Business requirements. Also, decent knowledge of shell Scripting, vi editor and Cron Job Scheduler.
- Worked on SQL Server and Visual Studio 2008R2 & 2012. Designed SSIS packages to process data from SQL server.
- Strong Problem solving, Troubleshooting and Analytical skills. Multiple Automation tool created to stabilize the system.
- Experience in working with cross-functional teams. Team player and self-starter.
- Capacity planning & Bench marking, monitoring database growth, resource usage etc.
- Ability to handle multiple tasks and capable of handling responsibilities independently also lead the small team (3-4 team members).
TECHNOLOGY SKILLS:
Programming/Languages: Pyspark, SparkSQL, Spark Streaming, Hive, Scoop, Python, Kafka, Pig, SQL, PL/SQLSSIS, shell scripting
Products/Packages: ETL, Business Intelligence, Data Warehouse
Database: HDFS, Oracle, EXADATA, SQL Server
Tools: WinSCP, Toad, Putty, SQL developer, Hue, Service-now
PROFESSIONAL EXPERIENCE:
Confidential, Lincolnshire, Illinois
Big Data Spark Developer
Technology: PySpark, Spark SQL, Hive, HDFS (Hadoop),Oracle 12c, Python, Sqoop,Linux
Responsibilities:
- Sqoop data from Oracle to HDFS.
- Pyspark and Spark SQL created user profiles.
- Used data frames and hive queries to retrieve data.
- Created Confidential to calculate clicks vs ordered on email campaign.
- Once processed loaded data back to Oracle for report generation.
Confidential
Project LeadTechnology: Pig, Hive, HDFS (Hadoop), Oracle 12c, Sqoop,Linux
Responsibilities:
- Due to huge volume of data, process was taking 6-7 hours and was very expensive for Oracle.
- New process not even completed within an hour.
- Created Daily process to sqoop differential data from oracle and pig and hive scripts to calculate Confidential .
- It enabled CR Rep to target customers who did not placed order in a while or falling behind on orders.
Confidential
Project LeadTechnology: PySpark, Kafka, Hive, HDFS, Oracle 12c, Python, Sqoop,Linux
Responsibilities:
- Working on to monitor ongoing load on the Confidential
- Based on user activity improve search engine Confidential
- Determine most searched products on Confidential
Confidential
Project LeadTechnology: Pyspark, Spark SQL, Hive, HDFS (Hadoop), Oracle 12c, Linux
Responsibilities:
- Earlier process was rigid and time consuming.
- With Spark and hive not even we have made process run faster, we even made it customizable to change the category and run as demand.
- Got differential data from Oracle for multiple tables.
Confidential
Project LeadTechnology: Pyspark, Hive, HDFS (Hadoop), Oracle 12c, Linux
Responsibilities:
- Created hive partitioned table, dynamic partitioned and Imported data using Scoop wizard from Oracle, sqlserver and saved with different compression type.
- Created DataFrames from RDDs using reflection and programmatic inference of schema over RDD.
- Developed Python Spark programs for processing HDFS Files using RDDs, Pair RDDs, Spark SQL, Spark Streaming, DataFrames, Accumulators, Broadcast variables.
- Developed Pyspark programs using various Transformations and operations.
- Converted existing PIG and Mapreduce jobs to Spark programs.
- Performance tuning of Pig,HIVE and Spark jobs used caching / persistence, partitioning and Best practices.
- Created customized hadoop process for business users and provided flexibility to select and aggregate the data as per need.
- Created pig, hive, python scripts to accomplish faster processing. Completed many processes in less then 1 hr compare to 12 or more hrs earlier in oracle.
- Validated and loaded JSON data in hadoop for processing.
- Analysis/Design & Develop new processes as well as enhance existing functionality as per clients requirement.
- Migrated database from Oracle 10g to 11g and server from unix to linux. Completed the project in 7 months where estimated for 1.5 years using automated scripting.
- Performance tuning for ETL process by gathering stats, re indexing, multi-threading, etc.
- Created the standard package template along with custom logging. Designed SSIS Packages to migrated/mapped data accurately from SQL Server 2012 to Oracle 11g using sqlserver configuration.
- Replaced SQLLDR with SSIS to load data from SQL Server to staging area using configuration table for validation and data quality for thousands of load which saved a lot of time and resources.
- Created process to monitor thousands of daily load and ensure on time completion and also alert for long running jobs.
- Created shell scripts, unix function using vi editor. Hands on knowledge of utilities like find, grep, sed, etc..
- Create custom advanced SQL queries cleansing scripts to fix data discrepancies such as Orphan records, duplicate data, etc.
- Performed implementation of data conversions, integration, testing and validation using complex SQL and PL/SQL/Merge Statements for BI reporting purposes.
- Using cursor, ref cursor, user defined exceptions, Oracle in-build exceptions, etc in day to day development work.
Confidential, Schaumburg, Illinois
Project Lead
Tools: EXADATA, UNIX, Shell Scripting, SQL, PL/SQL, ETL, Data Warehouse, SQLLDR
Responsibilities:
- Gather requirements, prepare design documents, develop the code, test and deploy the changes into production. Worked on multiple code enhancements for business critical functionality and delivered successfully.
- Managed a high volume Oracle database of 90 TB with EXADATA.
- Responsible for extraction data using heterogeneous dblink and loading the data for business using shell and Oracle procedure.
- Ensure production support and production tickets which involves any research and code fix are resolved within SLA of the project. Also responsible for High Severity issues in the application.
- Created automated process to compare data aggregation post extraction to ensure quality deliverable. Resulted in 90% reduction in high severity tickets.
- Migrated database from Oracle 10g to Oracle 11g.
- Redesigned ETL process end to end for performance improvement and achieved 50% reduction in processing timings.
- For improved performance, created partition and sub partition table since the huge table size.
- Worked on Oracle Enterprise Manager to improve query performance by analyzing different parameters like cpu, memory, sga, pga etc.
- Delivered Walmart POC data by tweaking the ETL process to complete in 1 day instead of 14 days as per regular process and got awarded for the same to assist Nielsen with getting Walmart deal.
Confidential, Schaumburg, Illinois
Project Lead
Tools: EXADATA, UNIX, Shell Scripting, SQL, PL/SQL, ETL, Data Warehouse, SQLLDR
Responsibilities:
- Responsible for loading extracted text files using sql loader and deliver aggregated data for business using shell and Oracle procedure.
- Ensure the Production problems are resolved within SLA for the global support project. Ensure the high availibility of application during business hours.
- Delivered BI reports using Actuate reports from Oracle database.
- Though having challenges with application being writtenin JAVA, successfully migrated reports from Actuate 9 to Actuate 11.
- Used dbms util package to capture the logs.
- Managed Retail ACView Admin tool to restricts user access control base on service level aggrement.
- Coordinate and communicate with the Clients.Implementation and creating deliverables.