Lead Data Engineer Resume
St Louis, MO
SUMMARY
- Lead Data Engineer with 10+ years of work experience in Data Engineering including Data Ingestion, Pipeline Design, Data Transformation, Data storage, Data querying, Data processing, Data Visualization and Analytics.
TECHNICAL SKILLS
Enterprise Data Warehouse: Teradata
ETL Tools: SSIS (SQL Server Integration Services), Informatica, Snowflake
Programming Languages: Python, SQL, Pyspark Data Frames, Spark SQL, Pandas, xlsxwriter, xlrd, C, C# and VBA.
Cloud: AWS Cloud, S3 Bucket & EC2, Lambdas, EMR, Athena, Secrets Manager.
Servers: Windows Servers 2008/2008 R2/2012, Solaris 8 - 11, CentOS, Red Hat Enterprise and Ubuntu Linux
Database: MYSQL, RDS, Dynamo DB
OS: DOS, Windows 98, 2000/NT, UNIX and Linux
Tools: Jupiter Notebooks, Anaconda, PyCharm, Eclipse, Spyder, Notepad ++, Textpad, Putty, SSh, Filezila, Service now, Jira
PROFESSIONAL EXPERIENCE
Confidential, St Louis MO
Lead Data Engineer
Responsibilities:
- Worked with product owner to understand business requirements and translated those into actionable reports in Palantir Foundry AWS EC2 Environment (Currently serving over 200 + providers for MEDICAID, MEDICARE and MARKET PLACE)
- Working Experience with Enterprise Data Warehouse called Teradata, developed python/sql scripts using Python Data frames /SQL Data sets in Spark Environment for Data Aggregation, queries and writing data back into enterprise data warehouse
- Built python data ingestion scripts to ingest unstructured raw data from Teradata into AWS EC2 Palantir foundry Env using ETL SSIS (SQL Server Integration Services) / Informatica / Snowflake
- Created clean database tables using large sets of unstructured, semi-structured data coming from multiple input resources. Implemented Pipeline Process to Convert over 200+ million rows of unstructured / semi-structured raw data into structured clean data using Python, Pandas, Pyspark and Spark SQL
- Used SQL / Python to perform wide range of data transformations / data aggregations on clean datasets to create multiple input sources feeding the reporting process
- Implemented Pipeline process to produce provider reports (in different formats like flat delimited files, .xlsm, txt, XML, html and Parquet) utilizing large scale structured clean datasets based on the business requirements.
- Performed Data wrangling, Data Partitioning, Large Scale Dataset analysis, Data Visualization, Enhancements and Bug Fixes
- Experienced in working with lambda architecture using Python. Automated provider reports Production Run Schedule using AWS Lambdas
- Worked on Reporting Migration services by developing and deploying AWS Lambda functions for Automated Provider Report Delivery into AWS S3 buckets by state and by plan
- Worked on loading and unloading data to a set of files in Amazon S3 bucket.
- Experienced in Agile engineering practices
- Proven experience in leading teams and complex data analytics projects. supervised and mentored team of developers throughout the project
- Assessed code during testing stage to determine potential glitches and bugs collaborated with business team to assist client stake holders with emergent technical issues and developed effective solutions
- Followed company’s required standards and guidelines regarding the use of the tools including technical documentation. Assisted in creating documents that ensure consistency in development across the cross functional teams. Implemented and improved core software infrastructure.
- Worked across multiple cross-functional teams in high visibility roles and owned the data solution end-to-end
- Translated complex functional and technical requirements into detailed architecture, design, and high performing software
Confidential, Burbank CA
Lead Data Engineer / Software Engineer 1
Responsibilities:
- Worked with VP of operations to understand and collect Business Requirements for new project TPSW & Translated Requirements into user story’s using JIRA
- Involved in Architecture, Design, Development of MySQL databases and ETL processes in scope of Application development
- Implemented 100 + algorithmic functional modules for Landing-gear system sensor data parameters (Ex: Tire Pressure) using Python, SQL and spark used AWS EMR spark cluster to trigger batch jobs from master node to various worker nodes with multiple EC2 instances.
- Worked with AWS Athena for Interactive query management
- Large Scale Data Integration from multiple sensor sources, Complex sensor’s mathematical computations and extensive data analysis using inbuilt python library's / Pyspark / VBA
- Migrated business reports into AWS S3 buckets in different Formats (Ex: .txt / .csv / .html) with pass fail criteria using CLI command prompt synchs
- Provided direction and guidance to other Big Data developers to solve problems, improve efficiency and process, and employ new technologies. Managed more junior members of the development team
- Lead the software development lifecycle of new processing jobs and data pipelines
- Working knowledge of both relational and document-oriented database systems
- Provided scope, estimation, planning, design, development, and support services during the project
Confidential, Carbondale IL
Graduate Research Assistant
Responsibilities:
- Developed a new interface to pull unsettled student library bursar claims from MySQL database into excel reports using combination of python and SQL
- Developed and deployed wide range of server backend application to monitor health status of Morris Library website's using VM's on VMware in Linux / windows environment.
- Deployed, Developed, Maintained physical and virtual Windows, Linux server installations
- Experienced in Windows Servers 2008/2008 R2/2012, Solaris 8-11, CentOS, Red Hat Enterprise and Ubuntu Linux environment System administration
- Developed scripts to automate network administration tasks and application deployments
- Experienced with administering virtualization environments (Microsoft Hyper-V, Ubuntu KVM, VMware, vSphere, VMware ESX). Familiarity with Citrix Xenserver and cloud (AWS, Azure).
- Experienced with DB administration: MSSQL and MYSQL. Experience with Microsoft SCCM.
- Experienced in troubleshooting network connectivity issues, event/application logs and Firewall configuration. Knowledge of FreeBSD.
- Familiarity with installing and configuring file servers: NTFS, FTP and clustering, LDAP, DNS
- Assisted in developing plans, schedules and requirements for deployment of systems
- Experienced in installing, configuring and maintaining server hardware and operating systems
- Experienced with the Data center operations: racking/de-racking equipment, cable management, KVM configuration, UPS/PDU monitoring
Confidential
Senior Data Engineer
Responsibilities:
- Worked with Air data System Software design, software development, unit testing, Data Cleaning, Data Analysis, Data Visualization, Deployment and Reporting using combination of python, sql, MySQL, c, C#, VBA and AWS Resources.
- Provided technical leadership to clients in a team that designs and develops pathbreaking large-scale cluster data processing systems. Managed team of junior data engineers and big data specialists by leading technology initiatives related to data and analytic solutions for ADAS SIS (Sensor's and Integrated Systems) within Global Engineering Center worked with business Analyst's, application developers and technical staff in an agile process and environment such as Scrum or Kanban
- Developed and maintained technical standard requirement documents, architecture designs, data flow diagrams, product Design documents, Test procedures, and data pipelines that leveraged structured and unstructured data integrated from multiple sources for efficient report delivery to end users
- Working knowledge of Data Warehousing tools and methodologies, reporting tools and ETL tools
- Provided accurate estimates for project development and implementation. Worked with management to meet the expectations
- Assist with the active management/coordination of IBM contracted resources to perform operations functions assisted in continuous development of tools, reporting improvements, and automation to create new innovative and insightful reports
Confidential
AGENCY Data Engineer / Engineer-A
Responsibilities:
- Designed & Implemented Autopilot longitudinal Control Law software for commercial applications using Python, C, C++ and SQL.
- Real Time Data Acquisition and Feedback Data Analysis at scale using MATLAB
- Involved in Creation of clean database tables, partitioning tables, Join conditions, correlated sub queries, nested queries, views, sequences, synonyms for the business application development.
- Extensively involved in writing SQL queries (Sub queries and Join conditions), PL/SQL programming.
- Understanding the existing business processes and interacting with Super User and End User to finalize their requirement.
- Designed and Developed Logical and physical Data flow Model of Iron Bird Digital flight control computer Process
- Developed database triggers, packages, functions, and stored procedures using PL/SQL and maintained the scripts for various data feeds.
- Created Indexes for faster retrieval of the customer information and enhance the database performance.
- Extensively used the advanced features of PL/SQL like collections, nested table, varrays, ref cursors, materialized views and dynamic SQL.