We provide IT Staff Augmentation Services!

Data Architect Resume

SUMMARY:

  • Data Architect with extensive experience in relational and dimensional data modeling; conceptual, logical, and physical modeling, star and snowflake schema, dimensions and hierarchies.
  • ETL Specialist hands - on proficient in Informatica, Sqoop, SQL, and PL/SQL.
  • Big Data experience with Hadoop, HDFS, Hive, Pig, PostgreSQL, Spark, Python
  • Machine learning implemented with Python, R, scikit-learn

SPECIAL SKILLS:

Database Development: PostgreSQL, Oracle SQL, PL/SQL

Data Modeling and Data Warehouse Design: ERwin 9.5, Oracle Data Modeler

ETL: Informatica 9.5, Pentaho, Python

Machine Learning: Python, R, NumPy, Matplotlib, Pandas

Big Data: Hive, Pig, Greenplum, Spark, Scala

Website Development: PHP, Bootstrap, Javascript, Jquery

WORK HISTORY:

Data Architect

Confidential

Responsibilities:

  • Developed data and process models for the US Pharmacopeia (USP) Reference Standard testing and production business. The modeling tool used was BiZZdesign Horizzon.
  • Designed a workflow pipeline for the Confidential IT Category Management project which collected and enhanced government wide transaction data using NLP text mining.
  • Designed, created and loaded the Confidential Enterprise Acquisition and Spend Database (EASD) on PostgreSQL and Redshift
  • Developed ETL for the EASD using Python and SQL on PostgreSQL, Redshift, and Excel.
  • Developed and coded a Python GUI (PyQT5) for pipeline and ETL management.
  • Developed ETL for the 2020 Census Fraud Detection System involving Hive, PostgreSQL, Oracle, and Spark.
  • Tuned Hive for performance by refactoring existing code.

Confidential

Data Engineer

Responsibilities:

  • Designed and coded Informatica workflows, mappings, and stored procedures on the Greenplum (PostgreSql) big data platform for the CADE2 to ODS Refresh project.
  • Designed and coded automated SQL generation from requirements spreadsheets to replace legacy assembler code using Greenplum PL/SQL.
  • Applied scikit-learn machine learning algorithms and advanced statistics to the new Greenplum platform, including regression, classification, clustering, and dimension reduction. Used K-Folds cross-validation and Grid Search for model selection. Used XGBoost to improve performance of base algorithms.
  • Built a test harness in Python to extract sample data from the mainframe and run several algorithms for direct comparison of effectiveness including K-nn, K-means, LDA, Naïve Bayes, Decision Trees, Random Forest, PCA, LDA, SVM, Linear and Logistic Regression.
  • Used Tableau to visualize data for stakeholders.
  • Designed and implemented automated test procedures using R that produced thousands of test cases and automatically analyzed results as a full regression test.

Confidential

Consultant

Responsibilities:

  • Developed and implemented a B2B website using PHP and MySQL.
  • Developed a client side UI using Bootstrap, HTML5, CSS3, Javascript and JQuery.
  • Rewrote numerous SQL queries reducing runtime from 2.5 hours to 3 minutes.
  • Reversed engineered the OLTP Oracle database into a 3NF model in order to analyze production reporting and uncover its primary bottleneck.
  • Re-architected the reporting mart to de-normalize report data in one pass and pre-compute the most used views.
  • Runtime was reduced from hours to seconds, and some reports which could not be run at all were now made available.
  • Designed Hive tables to answer business transaction questions.
  • Coded HiveQL using Cloudera Hue.
  • Extracted text files from their data warehouse and loaded into HDFS.
  • Coded UDF’s in Java using the NetBeans IDE for use in HiveQL.

Confidential

Data Architect

Responsibilities:

  • Developed extensive Oracle SQL extracts of statistical data.
  • Designed and coded Oracle PL/SQL load, extract, and QA processes based on SDTM.
  • Implemented automated data cleansing procedures.
  • Designed and implemented Clinical Trials Repository in Oracle for the Confidential with a team of six.
  • Developed multidimensional star schema models. Created logical and physical models.
  • Created a 3NF data model with 107 entities based on BRIDG model content.
  • Designed and created an Oracle object-relational database of ISO21090 datatypes.

Confidential

Data Architect

Responsibilities:

  • Designed a star schema DB2 Data Warehouse for reservations, ticket issuance and ticket collection.
  • Analyzed disparate data sources for data quality improvement and common dimensions.
  • Created conceptual, logical and physical data models. Designed detailed Informatica ETL processes.
  • Coded Informatica workflows, sessions, mappings, and transformations.

Confidential

Data Architect

Responsibilities:

  • Designed and implemented data mapping for Confidential, the bi-directional health information exchange between DoD and VA systems.
  • Mentored a group of 4 new developers in Informatica coding especially for performance.
  • This was done on an Oracle platform using ERwin and TOAD.
  • Developed Web Services with Informatica to deliver XML content.

Confidential

Data Architect/ETL Specialist

Responsibilities:

  • Designed and developed transformation processes to load data from source to Oracle.
  • Developed and tuned Informatica mappings and sessions.
  • Coded and documented scripts and stored procedures for data warehousing processes.
  • Performance tuning of Informatica for parallel loads and cache optimization.
  • Documented ETL processing systems.

Hire Now