Sr. Big Data Architect Resume
Tampa, FL
SUMMARY
- Over 10+ years of strong IT experienced in Data Architecture, Data Modeling and Big Data Architecture including Design and Development.
- Experience in working with Business Intelligence and Enterprise Data Warehouse(EDW) including SSAS, Pentaho, Cognos, OBIEE, QlikView, Greenplum, Amazon Redshift and Azure Data Warehouse
- Experienced in SQL queries and optimizing the queries in Oracle, Teradata, SQL Server, DB2, and Netezza.
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project.
- Experience in designing Architecture for Modeling a Datawarehouse by using tools like Erwin r9.6/r9.5 and E/R Studio 9.6/9.x.
- Experience in metadata design, real time BI Architecture including Data Governance for greater ROI.
- Strong experience with architecting highly performs databases using PostgreSQL, PostGIS, and Cassandra.
- Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
- Hands on experience in machine learning, big data, data visualization, R and Python development, Unix, SQL, GIT/GitHub.
- Experienced in handling big data using Hadoop Ecosystem components like MapReduce, HBase, Pig, Hive, Impala, Sqoop, Pig and Hive.
- Experience writing spark streaming and spark batch jobs, using spark MLlib for analytics.
- Experience in designing Enterprise Data Warehouses, Data Marts, Reporting data stores (RDS) and Operational data stores (ODS).
- Extensive experience with various data processing platforms and languages including Apache Spark(Scala), Apache Drill and PostgreSQL PL/pgSQL.
- Experience in Dimensional Data Modeling, Star/Snowflake schema, Fact & Dimension tables.
- Hands on experience on clustering algorithms like K - means & Medoid and Predictive algorithms.
- Experience in conducting Joint Application Development (JAD) sessions with SMEs, Stakeholders and other project team members for requirements gathering and analysis.
- Experienced in Teradata SQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and Fast Export.
- Experienced inDataArchitecture anddatamodeling using Erwin, ER-Studio and SQLdata modeler.
- Extensive experience on usage of ETL & Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS)
- Experience in integration of various relational and non-relational sources such as DB2, Teradata, Oracle, Netezza, SQL Server, NoSQL database.
- Hands on experience in Normalization & De-normalization techniques design considerations upto 3NF Form for OLTP/OLTP Databases and Models.
TECHNICAL SKILLS
Data Modeling Tools: Erwin r9.6/r9.5, ER Studio 9.7/9.0
ETL Tools: SSIS, SSRS, Pentaho, Informatica Power Center 9.7/9.6/9.5/9 Database Tools: Oracle12c/11g, MS SQL Server 2016/2014, Teradata, and MS Access, Poster SQL, Netezza, SQL Server, Oracle etc.
Programming Languages: Java, Base SAS and SAS/SQL, SQL, T-SQL, HTML, Java Script, CSS, UNIX shells scripting, PL/SQL
Reporting Tools: Business Objects, Crystal Reports etc.
Statistics: Decision Trees, Regression Models, KNN, K Means Clustering, PCA, Naïve Bayes
Operating Systems: Microsoft Windows 8/7 and UNIX
Tools: & Software: Toad, MS Office, BTEQ, Teradata15/14.1, SQL Assistant, PL/SQL, AWK etc.
Big Data: MapReduce, HBase, Pig, Hive, Impala, Sqoop, Pig, Hive
PROFESSIONAL EXPERIENCE
Confidential, Tampa, FL
Sr. Big Data Architect
Responsibilities:
- As a Big Data Architect, lead designers and other developers in the team to guide them and help providing right technical solutions to the business.
- Building scalable distributed data solutions using Hive, Python, Spark, Informatica Big data and Hadoop.
- Designed and Architect, and help Maintain scalable solutions on the big data analytics platform for enterprise module.
- Worked on NOSQL databases such as MongoDB, HBase and Cassandra to enhance scalability and performance.
- Integrated Hadoop frameworks/technologies such as Hive and HBase to further operational and analytical experience.
- Analyzed large data sets (structured and unstructured) using Hive queries, R Programming & Pig Scripts
- Accomplished multiple Prototypes and POCs for the product & modules.
- Analyze multiple sources of structured and unstructured data to propose and design data architecture solutions for scalability, high availability, fault tolerance, and elasticity.
- Developed the warehouse specific DataLake using Hive and Pig scripting and also ETL Talend pipelines for populating the Data Marts for user/business consumption using Hive/Impala and Python.
- Worked on importing impala to Python.
- Working on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
- Designed the real-time analytics and ingestion platform using Storm and Kafka.
- Responsible for data movement from client library and relational database to HDFS using some Linux job and Sqoop.
- Incremental data movement using Sqoop and Oozie jobs.
- Implement Big Data systems in distributed cloud environment (AWS) using Amazon EMR
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Assisted in creating fact and dimension table implementation in Star Schema model based on requirements.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning and slots configuration.
- Involved in dealing customer portfolio management and using the right data to provide recommendations.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation
- Work closely with the product management and development teams to rapidly translate the understanding of customer data and requirements to product and solutions
- Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
- Working on Spark to convert existing MAP Job (Avro) using Spark Core with Scala
Environment: Java, Scala, XML, Oracle BDA (Cloudera), Cloudera Manager, Hadoop MapReduce, Yarn, OOZIE, Flume, Kafka, Spark Core/SQL, HP Vertica/Teradata, Tableau, Hive
Confidential, Malvern PA
Sr. Data Architect/Data Modeler
Responsibilities:
- Lead the development of data models for information systems ensuring a highly level of trust and integrity in the data.
- Worked on Master data Management (MDM) Hub and interacted with multiple stakeholders.
- Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
- Worked on implementing and executing enterprise data governance and data quality framework.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS
- Extracted the data from Oracle into HDFS using Sqoop.
- Worked with Teradata 15 RDBMS using Fast load, Fast Export, Multi load, Tpump, and Teradata SQL Assistance and BTEQ Teradata utilities.
- Designed both 3NFdatamodels for ODS, OLTP systems and dimensionaldatamodels using Star and Snowflake Schemas.
- Analyze all projects and prepare corporate strategies for same and provide support to all Cognos applications.
- Prepare all reports for management with help of Cognos Report Studio.
- Performeddataanalysis anddataprofiling using complex SQL on various sources systems including Oracle and Netezza.
- Improving efficiency by designing Regular Expressions to decipher the unique identity of grants codes.
- Translated business requirements into logical and physical data models.
- Created source to target mapping documents involving the Business rules and transformation rules.
- Provides architectures, patterns, tooling choices and standards for master data and hierarchy life cycle management
- Developed Data Mapping, Data Governance, and Transformation and cleansing rules for the Master Data Management Architecture involving OLTP, ODS.
- Coordinated data models and dictionaries across multiple systems.
- Created Logical Data Model (LDM) and Physical Data Models (PDM) using Erwin data modeling tool.
- Performed data cleaning and data preparation tasks to convert data into a meaningful data set using R
- Worked with reverse engineering Data Model from Database instance and Scripts.
- Involved in Netezza to Oracle Shell script for loading tables which are required by QA Tools from Netezza in to the Oracle.
- Implementing the Data management strategies across the various database domains and products with respect to each business units.
Environment: R3.5, Netezza, Oracle12c, Teradata15, Hadoop, Big Data, T-SQL, SQL Server 2014, DB2, SSIS, ERWIN r9.6, MDM, PL/SQL, Informatica Power Center 9.7 etc.
Confidential, NYC, NY
Sr. Data Architect/ Data Modeler
Responsibilities:
- Worked as DataArchitectsand ITArchitectsto understand the movement ofdataand its storage and Erwin r9.5.
- Defined Architecture documentation including System Context and deployment diagrams using Visio
- Involved in integration of various relational and non-relational sources such as DB2, SQL Server 2012, Hadoop, XML and Flat Files.
- Managed, designed, and created the star schema and Snowflake Schema for a financialdatamart using Erwin and DB2 using Ralph Kimball dimensional modeling techniques. generate alerts.
- Enabling the Accounting Division to perform indexed lookups beyond the scope of standard Excel
- Prepare all designs and code for various Cognos reporting objects and ensure compliance to all best business practices in industry.
- Functions by utilizing VBA to embed functions within Excel.
- Generated of DDL scripts for implementation in a database.
- Analyzed the sourcedataand worked with theDataArchitect in designing and developing the logicaland physicaldatamodelsfor the EnterpriseDataWarehouse.
- Implementation Logical Data Model and Physical Data Models and reviewing with the Application/Platform DBA.
- Developed data models for data warehouses and operational data stores.
- Managed and supported technical Sybase, DB2 UDB, Sybase IQ& MS SQL Server database andDatawarehouse operations.
- Provided leadership to developers and database administrators to ensure model is effectively communicated.
- Provided data sets to the user queries on day to day bases to determine the performance and quality of the test data.
- Directing the data warehouse Meta data capture and access effort and also defined Meta data standards for the data warehouse.
Environment: Metadata, Data Modeler, Hadoop, Big Data, T-SQL, SQL Server 2012, DB2, SSIS, ERWIN r9.5, MDM, PL/SQL, etc.
Confidential, New York, NY
Data Architect/ Data Modeler
Responsibilities:
- Identified entities and attributes and developed Conceptual,LogicalandPhysical Models using ERWIN r9.1.
- Defined data governance policies including data classification, NPI identification and tracking, data retention and archiving, data integration, and performance.
- Involved in BigDataAnalytics and Massively Parallel Processing (MPP) architectures like GreenPlum and Teradata.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Involved in physical Modeling,LogicalModeling, Relational Modeling, Dimensional Modeling (Star Schema, Snow-Flake, FACT, and Dimensions), Entities, Attributes, OLAP, OLTP, Cardinality, and ER Diagrams.
- Used Teradata 14 utilities such as Fast Export, MLOAD for handling various tasks datamigration/ETL from OLTP Source Systems to OLAP Target Systems.
- Designed both 3NFdatamodels for ODS, OLTP systems and dimensionaldatamodels using star schema and snow flake Schemas.
- Maintained Conceptual,Logical andPhysicalDataModelsalong with corresponding Metadata.
- Involved in datasources across multiple database platforms (DB2, SQL Server, Oracle, and Access) to determine domains, usage and constraints to clearly map criticaldatapoints to destinations in the newdatawarehouse.
- Worked inNormalizationandDe-Normalizationtechniques for both OLTP and OLAP systems in creating Database Objects like tables, Constraints (Primary key, Foreign Key, Unique, Default), Indexes.
- Extensively used Aginity Netezza work bench to perform various DML, DDL operations on Netezza database.
- Worked closely with the Data Stewards to ensure correct and related data is captured in the data warehouse.
- Designing and directing the implementation of security requirements for the data warehouse and direct the information access and delivery effort for the data warehouse.
Environment: Erwin 9.1, Teradata14, Data Modeler, Oracle10g, T-SQL, SQL Server, MDM, PL/SQL, ETL, Informatica Power center 9.6 etc.
Confidential, Bridgewater, NJ
Sr. Data Analyst/ Data Modeler ’
Responsibilities:
- Development ofConceptual,Logicaland PhysicalDataModelsfor transactional and analytical systems ER Studio.
- Involved inDataModeler on MS SQL Server 2008&MS Access with extraction ofdatafrom various database sources like Netezza, DB2 and Flat files into theDataStage.
- Designed a STAR schema and snow flake Schemas for salesdatainvolving shared dimensions (Conformed) using Data Modeler.
- Involved in Database Creation and maintenance ofphysicaldatamodelswith Netezza, DB2 and SQL Server databases.
- Developed Conceptual, Normalization and De-normalizationLogicaland Physical database modelsto design OLTP system.
- Developed stored procedures on Netezza and SQL server fordatamanipulation anddata warehouse population
- Involved in extraction, transformation and loading ofdatadirectly from different source systems like flat files, Excel, Oracle and SQL Server.
- Created PhysicalDataModelfrom theLogicalDataModelin ER/Studio and worked with the naming standards utility.
- Involved in the development of PhysicalDataModelfor multiple platforms SQL Server/ DB2.
- Worked on various operational sources like DB2, Flat files, XML, Mainframe and excel files.
- Identified the key facts and dimensions necessary to support the business requirements.
- Performed source data quality assessments and developed strategies for data acquisition, storage, integrity and archival.
Environment: ER Studio, Teradata13.1, Data Modeler, Netezza, SSIS, T-SQL, SQL Server 2008, DB2, MDM, PL/SQL, Informatica Power Center 9.6 etc.
Confidential
Data Modeler/Data Analyst
Responsibilities:
- CreatedLogical/PhysicalModelsandConceptualModelsusingErwinr8.xfor Reporting.
- Involved in Relational Database Management System (RDMS),DataWarehouse Concepts, OLTP&OLAP,DataMapping,DataQuality andDataProfiling.
- Performed data analysis to support mapping and transformation of data from legacy systems to physical data models.
- Extensively used Star Schema and snow flake Schemas methodologies in building and designing thelogicaldatamodel into DimensionalModels.
- Involved in system migration from Oracle to Teradata.
- Involved in ETL procedures loadeddatafrom variousdatasources (Oracle, SQL Server, flat files) into a Teradatadatawarehouse.
- Profiled data to understand inconsistencies and issues.
- Worked in BTEQ Concepts, Database Management Systems, DatabasePhysicalandLogical design,DataMapping, Table Normalization and De-normalizationDataModeling, Creating ER Diagrams using tools such as MS Visio.
- Designed conceptual,logical,physicaldatamodelsand proof of design for enterprisedata management.
- Worked extensively onOracleBI Publisher and provided the capability to download the reports in the user defined formats.
- Developed and maintained UNIX shell scripts for data extraction and manipulation.
- Provided high level Conceptual Data Model to the Business Users once the Business Requirements are evaluated and finalized.
- Applied client Data Naming Standards, checks models in and out of Model repository, and documents data model translation decisions.
Environment: ERWIN8.x, Netezza, Oracle8.x, SQL, PL/SQL, Teradata13, T-SQL, SQL Server 2005, SSAS, MDM, PL/SQL, ETL, etc.
