Data Engineer Resume
Sunnyvale, CA
OBJECTIVE:
Participate in a major Data Analytics or Web Application Project as a Senior Backend or Data Engineer.
SUMMARY:
- Over 15 years of hands - on experience as a Data Engineer in IT Projects including Business Intelligence, Business Analytics, Data Cleansing, Data Mining, Data Quality, Master Data Management and Machine Learning
- Experienced in all phases of SDLC using RUP, Scrum and Agile Methodologies.
- Hands on experience in Functional Requirements Gathering, Data Exploration, Analysis, Profiling and Cleansing
- Expertise in SQL, Python and NoSQL databases for Machine Learning, Business Analytics and Data Cleansing
- Knowledge of Statistics and experience using Statistical packages for analyzing datasets.
- Expertise in Predictive Modeling, Classification Modeling using Supervised and Unsupervised Machine Learning
TECHNICAL SKILLS:
Relational Databases: SQL Server, Oracle 11g, Confidential RedShift, Teradata, DB2
NoSQL Databases: MongoDB 3.6
Statistical Software: SAS, MATLAB
Programming: Python, Scala
SQL T: SQL, PL/SQL, Hive SQL
ETL: SSIS, DTS, Informatica Power Center 9.6, SAP BODS, Talend, Data Quality Informatica Data Quality, Atacama, Cloud AWS EC2, S3, Data Pipelines, Apache HDFS, Spark, Hive, HBase, Pig, Airflow, Luigi, Kafka
Reporting: SQL Server Reporting Services, Power BI, Tableau
EXPERIENCE:
Confidential, Sunnyvale, CA
Data Engineer
Responsibilities:
- Collaborate with business teams understand dashboard and reporting requirements for Geospatial Data Quality
- Develop Restful APIs using Flask micro-framework to expose Geospatial data to front end Web applications
- Leveraged Python libraries and ODM to model documents in MongoDB collections
- Develop Data Pipelines using Python libraries; Data staging, transformation and aggregation in MongoDB
- Implement Data validation and cleansing rules to combine and prep Geospatial data from multiple sources.
Environment: Python 3.6, MongoDB 3.6, Flask, PyMongo, MongoEngine, Pandas, Jenkins, ESRI GIS, Oracle 11g
Confidential, San Rafael, CA
Master Data Engineer
Responsibilities:
- Collaborate with business teams to understand issues with Vendor on-boarding process and P2P cycle
- Data exploration, Data Profiling, Data Quality and ETL to load and transform huge data sets.
- Implement Data validation and cleansing algorithms to combine and prep Vendor data from multiple sources.
- Developed algorithm for Fuzzy Matching of Vendor records for de-duping data and creating a unique record set.
Environment: Oracle 11g, Python, Oracle EBS R12, IDQ, Collibra, Power BI, HDFS, Hive, Shell scripting, Apache Airflow
Confidential, Santa Clara, CA
Data Engineer
Responsibilities:
- Collaborate with business teams and Data Scientists to understand data needs for Predictive Analytics.
- Data exploration, Data Profiling to analyze trends, Data Quality and ETL to load and transform huge data sets.
- Build Data Pipelines; implement Data cleansing rules to transform and aggregate data from multiple sources.
- Developed fuzzy matching algorithms for de-duping Forex Account ids across multiple source systems
Environment: Oracle 11g, Confidential RedShift, Python, Talend, Tableau, HDFS, Hive, Pig, Kafka, Luigi
Confidential, San Francisco, CA
Data Analytics Consultant
Responsibilities:
- Collaborate with stakeholders to define functional and technical requirements for modeling Master Data
- Data exploration, Data Profiling, Data Quality and ETL to load and transform huge data sets.
- Data Profiling, Data Analysis; identify and implement business rules to uniquely identify Securities.
- Design and configure Match Rules and Trust Rules to cleanse, standardize, match and merge Securities records
- Custom Asset Classification using Python
Environment: SQL Server, Erwin r9, SSIS, Informatica PowerCenter 9.6.1, IDQ, Informatica MDM, Python, HDFS, Hive
Confidential, Oakland, CA
Data Engineer
Responsibilities:
- Design and implement ETL interfaces for loading Member Eligibility data and Claims data.
- Data exploration, Data Profiling, Data Quality and ETL to load and transform huge data sets.
- Implement Hierarchy and Affiliation relationships between Patients, their Household Members and Providers.
Environment: SQL Server, SSIS, Informatica PowerCenter 9.1, IDQ, Informatica MDM 9.5, Python, HDFS, Hive
Confidential, San Francisco, CA
Data Engineer
Responsibilities:
- Involved in gathering business requirements and analysis for various data feeds from third party Fund Index data providers including State Street, S&P Dow Jones, Russell and Blackrock
- Designed and developed of critical ETL processes.
- Developed a parameter driven ETL framework that includes dynamic configurations, custom logging and reporting.
Environment: SQL Server 2008 R2, SQL Server Integration Services (SSIS) 2008 R2, Teradata 13.10
Confidential, San Francisco, CA
Data Integration Engineer
Responsibilities:
- Participated in the Confidential to Confidential Conversion Project as part of the Reporting and Analytics team.
- Data Profiling, Data Analysis, Data cleansing rules prior to conversion.
- Involved in the design, development and implementation of ETL loads and reporting
- Involved in the design and execution of several other ad hoc SQL queries and canned reports for analysis.
Environment: SQL Server 2008 R2, Integration Services, Informatica PowerCenter 8.6, IDQ, Oracle 10g, Erwin v7.3.
Confidential, NC
ETL Developer
Responsibilities:
- Design and develop a custom Audit and Logging Framework for the Regulatory Reporting & Compliance Team
- Deployed Data Lineage to track data integration from operational systems to BI reports.
- Designed a metadata repository used to store technical and business metadata
Environment: - SQL Server 2008 R2, SSIS, Oracle 9i, TFS 2012.
Confidential, South San Francisco, CA
Master Data Developer Analyst
Responsibilities:
- Capture Use cases for MDM; identify multiple sources of Physician data to develop ETL for integration into MDM.
- Data Profiling, Analysis and developed complex fuzzy matching algorithms for cleansing, standardizing data
- Design and development of Master Data Model for Physicians Data and Predictive Modeling for new Physicians
Environment: SQL Server 2008 R2, Integration Services (SSIS), Microsoft Master Data Services (MDS) 2008 R2.