Senior Data Engineer Resume
SUMMARY:
- Senior Data Engineer with 8+ years of experience in SDLC with key emphasis on trending Big Data Technologies - Spark, Scala, Python, Spark Mlib, Hadoop, Tableau, Cassandra, SAS & SQL Programming expertise, Statistical Analysis, Data Modeling, Risk Analysis & reporting experience emphasizing on analysis, design, development, testing & implementation of various projects in Financial, Health and Pet care industries.
- Experience working remotely for clients such as Wells Fargo (100% remote), NBC Universal (100% remote), Nestle Purina (50% remote), CareSource (50% remote) and JPMorgan Chase (75% remote).
- Strong understanding of APIs, and experience of producing and consuming with APIs across different domains.
- Experience in direct marketing processes, List Processing and marketing analytical methods.
- Expertise in creating Packages using SQL Server Integration Services (SSIS).
- Experienced in Database optimization and developing stored procedures, Triggers, Cursors, Joins, Views, Cursors and SQL on databases: MySQL, Oracle12g, OMWB tool.
- Good Knowledge of Big Data and Data Warehouse Architecture and Designing Star Schema, Snowflake Schema, Fact and Dimensional Tables, Physical and Logical Data Modeling using Erwin, ER Studio.
- Architect, design & develop Big Data Solutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data.
- Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
- Excellent understanding of Hadoop architecture and underlying framework including storage management.
- Extensive experience in data modeling, data architect, solution architect, data warehousing & business intelligence concepts and master data management (MDM) concepts.
- Expertise in architecting Big data solutions using Data ingestion, Data Storage and Data migration
- Experienced with NoSQL databases - Hbase, Cassandra & MongoDB, perfor
PROFESSIONAL EXPERIENCE:
Confidential
Senior Data Engineer
Responsibilities:
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text, sequence, and Parquet files. Supported teh daily/weekly ETL batches in teh Production environment. Implemented a proof of concept deploying dis product in Amazon Web Services AWS. Developed SAS/Excel ad - hoc reports for Priceless management team. Generated comprehensive data files (MS Excel), identified trends, patterns and troubleshoot inconsistencies. Effectively managed image library including image manipulation, selection & uploading of digital media. Researched HR columns/values (in teh masterlist are getting refreshed with each load), identified teh key elements for HR data refresh for monthly load and verified with data reconciliation for HR data refresh. Generated DDL and DML scripts using Python and managed aspects of QA testing. Developed and Validated teh SAS EG import processes for SAS Sandboxes, tables, and users using teh data available in UAT (as provided by SAS team in Oracle database). Performed multiple regressions modelling for load forecasting, and sequentially tested additional variables and combinations for model improvement using R programming. Send, collect, and analyze surveys to risk assess all Analytical Workspaces across Enterprise and remediate control effects via SAS. Processed pilot survey results from Ultimate survey tool and imported into SAS datasets for further analysis. Researched on SBX ASSESS OUTPUT1 file - missing sandbox IDs (on daily backup files), determined teh root cause for IS Recurring flag being set inconsistently, applied teh fix for teh same and validated teh fix. Developed jobs, components and Joblets in Talend and ETL Jobs using Talend Integration Suite (TIS). Created complex mappings in Talend using tHash, tDenormalize, tMap, tUniqueRow. tPivotToColumns
- Delimited as well as custom component such as tUnpivotRow. Used tStatsCatcher, tDie, tLogRow to create a generic joblet to store processing stats into table for job history. Created Talend jobs to load data into various Oracle tables. Utilized Oracle stored procedures and wrote Java code to capture global map variables and use them in teh job. Created Talend Mappings to populate teh data into dimensions and fact tables. Frequently used Talend Administrative Console (TAC). Created new SAS EG processes for implementing sandboxes changes for old SAS parent folders to new child SAS folders (Sandboxes) for monthly release process. Added new SAS sandboxes to inventory (By deactivating and deriving Owner information from existing sandboxes and deactivating teh parent folder SAS sandboxes). Implemented new users, projects, tasks within multiple different environments of TAC (Dev, Test, Prod, and DR). Developed complex Talend ETL jobs to migrate teh data from flat files to database. Implemented custom error handling in Talend jobs and also worked on different methods of logging. Created ETL/Talend
Confidential
Senior Data Engineer
Responsibilities:
- Involved in deployments, automation, scheduling of SAS Data Integration (DI) jobs and process documentation. Creating Stored Processes to provide users of reporting DataMart easy access using SAS Enterprise Guide. Integrating and managing teh data mappings from teh client's data sources (mainly Oracle) to teh SAS Analytical data mart using SAS DI Studio and functioned as subject matter expert for using SAS Data Integration Studio Primarily responsible for extracting, transforming and data loading (ETL) into SAS datasets and creating flat files, reports and excel files. Design, produce and implement capabilities to execute file processing (ETL) required for production of SAS data sets for modeling and analytics dat include data from credit records, loans, GL, consumer and mortgage. Developed data structures, performed data manipulation and application development. Designing, developing and implementing teh DI Studio job flows to load teh data into SAS. Integrated Salesforce API with Oracle database tables using Informatica Intelligent cloud services (IICS). Generating ODS and DDS mappings from Source to target with attribute and generated & scheduled ETL jobs. Configuring setting up User
- Accounts, Groups, and roles together with SAS Platform Administrator and generated Oracle Schemas, views, procedures and cursors. Created many complex ETL jobs for data exchange from and to Database Server and various other systems including RDBMS, XML, CSV, and Flat file structures. Fixed memory related issues from clockwork report and developed testing scripts in UNIX Bash shell. Created Implicit, local and global Context variables in teh job. Worked on Talend Administration Console (TAC) for scheduling jobs and adding users. Analyzed data and provided insights with R Programming and Python Pandas. Used rest API with Python to ingest Data into BIGQUERY. Worked on various Talend components: tMap, tFilterRow, tAggregateRow, tFileExist, tFileCopy, tFileList, & tDie Developed stored procedure to automate teh testing process to ease QA efforts and also reduced teh test timelines for data comparison on tables. Developing/supporting production environment and following organization trends in CI/CD, DevOps and DevSecOps to mature and continually improve MISO IT services. Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Atana. Implemented CI/CD process on Apache Airflow binary and DAG's by building Airflow Docker images by Docker - Compose and uploading to Jfrog artifactory and deployment through AWS ECS cluster. Worked on Airflow performance tuning of teh DAG's and task instance. Analyzed various types of raw file like JSON, CSV, XML with Python using Pandas, Numpy etc. Worked on Airflow scheduler (celery) and worker setting in airflow.cfg file. Lo
Confidential
Data Integration Engineer
Responsibilities:
- Performed configuration and support of SAS ECM Metadata (Cases, Incidents, Parties, and Financial Items), configuration and support of ECM tables (All Lookup tables). Updated workflows through SAS Workflow studio for additional enhancements including Quality Check for approved cases for further investigations on their alerts. Developed data warehousing solutions and associated pipelines for internal/external sources, including ETL, ELT and API Development (SOAP & REST). Performed data analysis and compared SQL automated files / Manual SQL extract files and identified teh data issues between them for monthly release process. Automation of data validation after monthly release and generated summary metrics of teh monthly load. Worked with Palantir Foundry tools including Workshop, Slate and Object. Implemented solutions for ingesting data from various sources and processing teh
- Data - at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive. Implementation of Big Data ecosystem (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture. Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need. Experience on BI reporting with at Scale OLAP for Big Data. Using Informatica PowerCenter created mappings and mapplets to transform teh data according to teh business rules. Developed ad-hoc report on sandboxes along with AU user details (AU number and AU name). Utilized Kimball Methodology for designing various data marts and evaluated database jobs and supported overall database functioning. Utilized transformations like Source Qualifier, Joiner, Lookup, SQL, and router, Filter, Expression and Update Strategy. Reconciled pilot survey results from Ultimate survey tool, imported it into SAS datasets for further analysis. Developed Python program and used apache beam and executed it in cloud Dataflow to run Data validation between raw source file and BigQuery tables. Developed Scala program for spark transformation in Dataproc. Worked on Informatica - Repository Manager, Designer, Workflow
- Manager & Workflow Monitor. Integrated data into CDW by sourcing it from different sources like SQL, Flat Files and Mainframes (DB2) using Power Exchange. Developed Python program and used apache beam and executed it in cloud Dataflow to run Data validation between raw source file and BigQuery tables. Monitored BigQuery, Dataproc and cloud Data flow jobs via Stackdriver for all environments. Extensively worked on integrating data from Mainframes to Informatica Power Exchange. Extensively worked on Informatica tools such as Source Analyzer, Data Warehouse Designer, Transformation Designer, Mapplet Designer and Mapping Designer to designed, developed and tested complex mappings and Mapplets to load data from external flat files and RDBMS. Used output xml files, to remove empty delta files and to FTP teh o
Confidential
Data Engineer
Responsibilities:
- Define and create SAS/Excel ad - hoc reports for NBC networks. Used teh PL/SQL procedures for Informatica mappings for truncating teh data in target tables at run time. Performed data cleansing and validation on working with several millions of records and data points. Statistical validation of teh forecast results using error estimation techniques. Developed SQL Server Stored Procedures, Tuned SQL Queries (using Indexes and Execution Plan). Utilized Oozie workflow to run Pig and Hive Jobs, extracted files from Mongo DB through Sqoop into HDFS. Migrated large volume of Network Impressions data into
- HDFS. Utilized MS Azure services with focus on big Data Engineer /analytics / enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making. Experience in data cleansing and data mining. Developed DOMO dashboards with customer data trends and plots for visual analytics. Used Informatica Power Center 8.6 for extraction, transformation and load (ETL) of data in data warehouse. Developed stored procedures, triggers in MySQL for lowering traffic between servers & clients. Performed correlation analysis using PROC HPCORR and generated pairwise Pearson correlation statistics. Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart. Prepare detailed statistical reports in MS Access and MS Excel to track progress of weekly, monthly and quarterly customers and sales growth. Implemented CI/CD process on Apache Airflow binary and DAG's by building Airflow Docker images by Docker-Compose and uploading to Jfrog artifactory and deployment through AWS ECS cluster. Worked on Airflow performance tuning of teh DAG's and task instance. Developed forecast models from scratch with SAS Forecast Studio for NBC's various networks E, BRAVO, OXYG, SPROUT, USA, CHILLER, NBCSN, SYFY and MSNBC Developed ARIMA, Exponential
- Smoothing models (ESM), Intermittent Demand Models (IDM) and fine-tuned them by identifying outliers, creating pulse or level shift events and automated them. Worked on Airflow scheduler (celery) and worker setting in airflow.cfg file. Created Hooks and custom operator which sense for trigger files in S3 and kick starts data pipeline process. Implemented Multiple Data pipeline DAG's and Maintenance DAG's in Airflow orchestration. Developed teh code for Importing and exporting data into HDFS and Hive using Sqoop Installed and configured Hadoop, responsible for maintaining cluster, managing and reviewing Hadoop log files. Worked on Informatica Power Center tools- Designer, Repository Manager, Workflow Manager and Monitor. Parsed high-level design specification to simple ETL coding and mapping standards.
Confidential
Data Integration / ETL Consultant
Responsibilities:
- Primarily responsible for extracting, transforming and data loading (ETL) into SAS datasets and creating flat files, reports and excel files. Developed data structures, performed data manipulation and application development. Designing, developing and implementing teh DI Studio job flows to load teh data into SAS. Gathering data requirements, designing and developing data marts using SAS Data Integration Studio. Generated Baseline and promoted demand planning forecasts using SAS Forecast Studio. MySQL database backup and recovery strategies and Replication and synchronization. Created, tested, and maintained PHP scripts, MySQL programming, forms, reports, triggers and procedures for data Warehouse. Analyzing data with Hive, Pig and Hadoop Streaming. Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required Configured & developed teh triggers, workflows, validation rules & having hands on teh deployment process from one sandbox to other. Created automatic field updates via workflows and triggers to satisfy internal compliance requirement of stamping certain data on a call during submission. Extensively used Erwin as teh main tool for modeling along with
Visio Established and maintained comprehensive data model documentation including detailed descriptions of business entities, attributes, and data relationships. Worked on Metadata Repository (MRM) for maintaining teh definitions and mapping rules up to mark. Developed data Mart for teh base data in Star Schema, Snow - Flake Schema involved in developing teh data warehouse for teh database. Developed enhancements to Mongo DB architecture to improve performance and scalability. Forward Engineering teh Data models, Reverse Engineering on teh existing Data Models. Performed data cleaning and data manipulation activities using NoSQL utility. Extracting data from SQL Server, Oracle and created tables & populated them to centralize comprehensive data using SQL Server Management Studio. Produced use cases and data flow diagrams as supporting documentation using MS Visio. Assisted conceptual, logical, and physical data modeling, and interface effectively with data modelers Involved in deployments, automation, scheduling of SAS DI jobs and process documentation. Creating Stored Processes to provide users of reporting DataMart easy access using SAS Enterprise Guide. Integrating and managing teh data mappings from teh client's data sources (mainly Oracle) to teh SAS Analytical data mart using SAS DI Studio. Designed workflows with many sessions with decision, assignment task, event wait, and event raise tasks, used Informatica scheduler to schedule jobs. Documented and setup files, SAS programs, and log files for new production box. Oversaw production jobs, production server, running daily/quarterly process jobs to support teh business in reporting capability and day to day