We provide IT Staff Augmentation Services!

Data Engineer Resume

Sunnyvale, CA

SUMMARY

  • Over 15 years of experience as a Data Engineer in IT Projects including Business Intelligence, Business Analytics, Data Cleansing, Data Mining, Data Quality, Master Data Management and Machine Learning
  • Experienced in all phases of SDLC using RUP, Scrum and Agile Methodologies.
  • Hands on experience in Functional Requirements Gathering, Data Exploration, Analysis, Profiling and Cleansing
  • Expertise in SQL, Python and NoSQL databases for Machine Learning, Business Analytics and Data Cleansing
  • Knowledge of Statistics and experience using Statistical packages for analyzing datasets.
  • Expertise in Predictive Modeling, Classification Modeling using Supervised and Unsupervised Machine Learning

TECHNICAL SKILLS

Relational Databases: SQL Server, Oracle 11g, Confidential RedShift, Teradata, DB2

NoSQL Databases: MongoDB 3.6

Programming: Python, Scala

Python libraries: Numpy, Pandas, Flask, Multiprocessing

SQL: T - SQL, PL/SQL, Hive SQL

ETL: SSIS, DTS, Informatica Power Center 9.6, SAP BODS, Talend

Data Quality: Informatica Data Quality, Atacama

Cloud: AWS EC2, S3, Data Pipelines

Apache: HDFS, Spark, Hive, HBase, Pig, Airflow, Luigi, Kafka

Reporting: SQL Server Reporting Services, Power BI, Tableau

PROFESSIONAL EXPERIENCE

Confidential, Sunnyvale, CA

Data Engineer

Responsibilities:

  • Collaborate with business teams understand dashboard and reporting requirements for Geospatial Data Quality
  • Develop Restful APIs using Flask micro-framework to expose Geospatial data to front end Web applications
  • Leveraged Python libraries and ODM to model documents in MongoDB collections
  • Develop Data Pipelines using Python libraries; Data staging, transformation and aggregation in MongoDB
  • Implement Data validation and cleansing rules to combine and prep Geospatial data from multiple sources.

Environment: - Python 3.6, MongoDB 3.6, Flask, PyMongo, MongoEngine, Pandas, Jenkins, ESRI GIS, Oracle 11g, AWS

Confidential, San Rafael, CA

Master Data Quality Engineer

Responsibilities:

  • Collaborate with business teams to understand issues with Vendor on-boarding process and P2P cycle
  • Data exploration, Data Profiling, Data Quality and ETL to load and transform huge data sets.
  • Implement Data validation and cleansing algorithms to combine and prep Vendor data from multiple sources.
  • Developed algorithm for Fuzzy Matching of Vendor records for de-duping data and creating a unique record set.

Environment: - Oracle 11g, Python, Oracle EBS R12, IDQ, Collibra, Power BI, HDFS, Hive, Shell scripting, Apache Airflow

Confidential, Santa Clara, CA

Data Engineer

Responsibilities:

  • Collaborate with business teams and Data Scientists to understand data needs for Predictive Analytics.
  • Data exploration, Data Profiling to analyze trends, Data Quality and ETL to load and transform huge data sets.
  • Build Data Pipelines; implement Data cleansing rules to transform and aggregate data from multiple sources.
  • Developed fuzzy matching algorithms for de-duping Forex Account ids across multiple source systems

Environment: - Oracle 11g, Confidential RedShift, Python, Talend, Tableau, HDFS, Hive, Pig, Kafka, Luigi

Confidential, San Francisco, CA

Data Analytics Consultant

Responsibilities:

  • Collaborate with stakeholders to define functional and technical requirements for modeling Master Data
  • Data exploration, Data Profiling, Data Quality and ETL to load and transform huge data sets.
  • Data Profiling, Data Analysis; identify and implement business rules to uniquely identify Securities.
  • Design and configure Match Rules and Trust Rules to cleanse, standardize, match and merge Securities records
  • Custom Asset Classification using Python

Environment: - SQL Server, Erwin r9, SSIS, Informatica PowerCenter 9.6.1, IDQ, Informatica MDM, Python, HDFS, Hive

Confidential, Oakland, CA

Data Engineer

Responsibilities:

  • Design and implement ETL interfaces for loading Member Eligibility data and Claims data.
  • Data exploration, Data Profiling, Data Quality and ETL to load and transform huge data sets.
  • Implement Hierarchy and Affiliation relationships between Patients, their Household Members and Providers.

Environment: - SQL Server, SSIS, Informatica PowerCenter 9.1, IDQ, Informatica MDM 9.5, Python, HDFS, Hive

Confidential, San Francisco, CA

Data Engineer

Responsibilities:

  • Involved in gathering business requirements and analysis for various data feeds from third party Fund Index data providers including State Street, S&P Dow Jones, Russell and Blackrock
  • Designed and developed of critical ETL processes.
  • Developed a parameter driven ETL framework that includes dynamic configurations, custom logging and reporting.

Environment: - SQL Server 2008 R2, SQL Server Integration Services (SSIS) 2008 R2, Teradata 13.10

Confidential, San Francisco, CA

Data Integration Engineer

Responsibilities:

  • Participated in the Wachovia to Confidential Conversion Project as part of the Reporting and Analytics team.
  • Data Profiling, Data Analysis, Data cleansing rules prior to conversion.
  • Involved in the design, development and implementation of ETL loads and reporting
  • Involved in the design and execution of several other ad hoc SQL queries and canned reports for analysis.

Environment: - SQL Server 2008 R2, Integration Services, Informatica PowerCenter 8.6, IDQ, Oracle 10g, Erwin v7.3.

Confidential, NC

ETL Developer

Responsibilities:

  • Design and develop a custom Audit and Logging Framework for the Regulatory Reporting & Compliance Team
  • Deployed Data Lineage to track data integration from operational systems to BI reports.
  • Designed a metadata repository used to store technical and business metadata

Environment: - SQL Server 2008 R2, SSIS, Oracle 9i, TFS 2012.

Confidential, South San Francisco, CA

Master Data Developer Analyst

Responsibilities:

  • Capture Use cases for MDM; identify multiple sources of Physician data to develop ETL for integration into MDM.
  • Data Profiling, Analysis and developed complex fuzzy matching algorithms for cleansing, standardizing data
  • Design and development of Master Data Model for Physicians Data and Predictive Modeling for new Physicians

Environment: - SQL Server 2008 R2, Integration Services (SSIS), Microsoft Master Data Services (MDS) 2008 R2.

Confidential, East Rutherford, NJ

ETL Developer

Responsibilities:

  • Performed Data Profiling and Data Analysis, creation of Cleansing Lists for Mainframe source system
  • Drafted Technical Design Specifications, ETL workflows and Source to Target Mappings.
  • Developed ETL packages for loading data in Staging and Data Mart.
  • Developed complex canned reports feeding from Underwriting and Claims systems

Environment: - SQL Server 2005, SSRS 2005, SSIS 2005

Confidential, NJ

Database Developer

Responsibilities:

  • Migration of the legacy High Point systems and Interfaces to a Matrix architecture.
  • Designed, developed and implemented Interfaces specifically for external Vendor Systems including ISO, NetMap
  • Data Profiling and Analysis of third-party Vendor data for integration into High Point systems.
  • Defined Technical Design Specifications, ETL workflows and Source to Target Mapping (STTM) spreadsheets

Environment: - IBM DB2 UDB, IBM Control Center, SQL Server 2000, Microsoft DTS, SSIS 2005

Hire Now