Bigdata Consultant Resume
Washington, DC
SUMMARY
- About 12+ years of experience in Leading Architecture and design of Data processing, Data warehousing, Data Quality, Data Analytics & Business Intelligence development projects with complete end - to-end SDLC process.
- Experience in Architecture, Design and Development of large Enterprise Data Warehouse (EDW) and Data-marts for target user-base consumption.
- Experienced in designing many key system Architecture's along with integration of many modules and systems including Big-DataHadoop systems, Talend, Java, AWS with hardware sizing, estimates, benchmarking anddataarchitecture.
- Expert in writing SQL queries and optimizing the queries in Oracle 10g/11g/12c, DB2, Netezza, SQL Server 2008/2012/2016 and Teradata 13/14.
- Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata and worked on Teradata SQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and Fast Export.
- Hands on experience Hadoop framework and its ecosystem like Distributed file system (HDFS), MapReduce, Pig, Hive, Sqoop, Flume, Spark and Proficient in Hive Query language and experienced in hive performance optimization using Static-Partitioning,Dynamic-Partitioning, Bucketing and Parallel Executionconcepts.
- Good experience in Data Profiling, Data Mapping, Data Cleansing, Data Integration, Data Analysis, Data Quality, Data Architecture, Data Modelling, Data governance, Metadata Management & Master Data Management.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark, Pyspark with Hive and SQL/Oracle/Snowflake.
- Expertise inDataModeling created various Conceptual, Logical and PhysicalDataModels for DWH projects. Created first of a kind uniqueDataModel for an Intelligence Domain.
- Excellent knowledge on Perl & UNIX and expertise lies in Data Modeling, Database design and implementation of Oracle, AWS Redshift databases and Administration, Performance tuning etc.
- Experienced in analyzing data using Hadoop Ecosystem including HDFS, Hive, Spark, Spark Streaming, Elastic Search, Kibana, Kafka, HBase, Zookeeper, PIG, Sqoop, and Flume.
- Experienced working with Excel Pivot and VBA macros for various business scenarios and involved in data Transformation using Pig scripts in AWS EMR, AWS RDS and AWS Glue.
- Experienced in designing & implementing many projects using various set of ETL/BI tools involving the latest features & product trends, experience in technologies such as Big-data, Cloud Computing (AWS) & In-memory Apps.
- Experience in Importing the Data using SQOOP from various heterogeneous systems like RDMS (MySQL, Oracle, DB2 etc.,)/Mainframe/XML etc., to HDFS and Vice Versa.
- Experienced in continuous Performance Tuning, System Optimization & many improvements for BI/OLAP systems & traditional Databases such as Oracle, SQL, DB2 and many high-performance databases.
- Well versed in Normalization / De-normalization techniques for optimum performance in relational and dimensional database environments and implemented variousdatawarehouse projects in Agile Scrum/Waterfall methodologies.
- Expertise in writing SQL queries and optimizing the queries in Oracle, SQL Server 2008/12/16 and Teradata and involved in developed and managing SQL, Java, Python code bases for data cleansing and data analysis using Git version control.
- Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources and scheduling.
- Extensive ETL testing experience using Informatica (Power Center/ Power Mart) (Designer, Workflow Manager, Workflow Monitor and Server Manager)
- Expertise in Excel Macros, Pivot Tables, VLOOKUPs and other advanced functions and expertise Python user with knowledge of statistical programming languages SAS.
TECHNICAL SKILLS
Analysis and Modeling Tools: Erwin 9.6/9.5/9.1, Oracle Designer, ER/Studio.
Languages: SQL, Python, Pyspark, Scala, Java, T-SQL and Perl
Database Tools: Microsoft SQL Server 2016/2014/2012 , Talend, Teradata 15/14, Oracle 12c/11g/10g, MS Access, Poster SQL, Netezza, DB2, Snowflake, HBase, MongoDB and Cassandra
ETL Tools: SSIS, Informatica Power 9.6/9.5 and SAP Business Objects.
Cloud: AWS S3, AWS EC2, AWS EMR, AWS Airflow, SQS, AWS RDS and AWS Glue.
Operating System: Windows, Dos and Unix.
Reporting Tools: SSRS, Business Objects, Crystal Reports.
Tools: & Software's TOAD, MS Office, BTEQ, Teradata SQL Assistant and Netezza Aginity
Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, HBase, Sqoop, Flume, Oozie and No SQL Databases
PROFESSIONAL EXPERIENCE
Bigdata Consultant
Confidential, Washington DC
Responsibilities:
- Designed architecture collaboratively to develop methods of synchronizingdatacoming in from multiple source systems and lead the strategy, architecture and process improvements fordataarchitecture anddata management, balancing long and short-term needs of the business.
- Performed System Analysis & Requirements Gathering related to Architecture, ETL, Talend, Data Quality, Cloudera, MDM, Dashboards and Reports. Captured enhancements from various entities and provided Impact analysis.
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Designed Real time Stream processing Application using Spark, Kafka, Scala, Oozieand Hive to perform Streaming ETL and apply Machine Learning.
- Designed the Logical Data Model using ERWIN 9.64 with the entities and attributes for each subject areas and involved in Dimensional modeling (Star Schema) of theDatawarehouse and used Erwin to design the business process, dimensions and measured facts and developed and maintained an EnterpriseDataModel (EDM) to serve as both the strategic and tactical planning vehicles to manage the enterprisedatawarehouse.
- Developed and maintained mostly Python and some Perl ETL scripts to scrape data from external web sites and load cleansed data into a MySQL DB.
- Enhancements to traditionaldatawarehouse based on STAR schema, updatedatamodels, performDataAnalytics and Reporting using Tableau and extracted thedatafrom MySQL, AWS into HDFS using Sqoop and created Airflow Scheduling scripts in Python.
- UsedAWS Lambdato perform data validation, filtering, sorting, or other transformations for every data change in a DynamoDB table and load the transformed data to another data store.
- Used Hive Context which provides a superset of the functionality provided by SQL Context and Preferred to write queries using the HiveQL parser to read data from Hive tables (fact, syndicate).
- Working on AWS and architecting a solution to load data create data models and run BI on it and developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Involved in data modeling session, developed technical design documents and used the ETL DataStage Designer to develop processes for extracting, cleansing, transforms, integrating and loading data into data warehouse database.
- Creating Spark clusters and configuring high concurrency clusters using Databricks to speed up the preparation of high-quality data and creating Databricks notebooks using SQL, Python and automated notebooks using jobs.
- Involved in creating Hive tables, and loading and analyzing data using hive queries Developed Hive queries to process the data and generate the data cubes for visualizing Implemented
- Selecting the appropriate AWS service based on data, compute, database, or security requirements and defined and deployed monitoring, metrics, and logging systems on AWS.
- Designed and developed a Data Lake using Hadoop for processing raw and processed claims via Hive and DataStage and designed both 3NFdatamodels for ODS, OLTP systems and dimensionaldatamodels using star and snow flake Schemas.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift and Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark and create external tables with partitions using Hive, AWS Athena and Redshift
- Creating dimensional data models based on hierarchical source data and implemented on Teradata achieving high performance without special tuning.
- Involved in designing Logical and Physical data models for different database applications using the Erwin and involved in Data modeling, Design, implement, and deploy high-performance, custom applications at scale on Hadoop /Spark.
- Involved in Building predictive models to identify High risk cases using Regression and Machine learning techniques by using SAS and Pythonand Performed Data analysis, statistical analysis, generated reports, listings and graphs using SAS Tools-SAS/Base, SAS/Macros and SAS/Graph, SAS/SQL, SAS/Connect, SAS/Access.
- Working on AWS and architecting a solution to load data create data models and run BI on it and involved in creating, debugging, scheduling and monitoring jobs using Airflow.
- Developed automated data pipelines from various external data sources (web pages, API etc) to internal Data Warehouse (SQL server, AWS), then export to reporting tools like Datorama by Python.
- Worked on AWS utilities such as EMR, S3 and Cloud watch to run and monitor jobs on AWS and involved in working on converting SQL Server table DDL, views and SQL Queries to Snowflake.
- Involved in loading data from LINUX file system to HDFS Importing and exporting data into HDFS and Hive using Sqoop Implemented Partitioning, Dynamic Partitions, and Buckets in Hive.
Sr. Data Engineer
Confidential, Chicago IL
Responsibilities:
- Responsible for definingDataArchitecture Standards on Teradata / Hadoop Platforms and defined specific process specific zones for standardizingdataand loading into target tables. Also responsible for defining audit columns for each specific zone in Hadoop environment.
- Participated in requirement gathering meetings and translated business requirements into technical design documents/ report visualization specification design documents and synthesized and translated Business data needs into creative visualizations in Tableau.
- Managed Logical and Physical Data Models in ER Studio Repository based on the different subject area requests for integrated model. Developed Data Mapping, Data Governance, and transformation and cleansing rules involving OLTP, ODS.
- Optimizing and tuning the Redshift environment, enabling queries to perform up to 100 xs faster for Tableau and SAS Visual Analytics.
- Understanding the business requirements and designing the ETL flow in DataStage as per the mapping sheet, Unit Testing and Review activities.
- Written Python Scripts, mappers to run on Hadoop distributed file system (HDFS) and performed troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source ofdatafor both customers and internal customer service team.
- Implemented solutions for ingestingdatafrom various sources and processing theData-at-Rest utilizing BigDatatechnologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive.
- Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
- Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc and manage metadata alongside the data for visibility of where data came from, its linage to ensure and quickly and efficiently finding data for customer projects using AWS Data lake and its complex functions like AWS Lambda, AWS Glue.
- Develop Executive Dashboards by collecting the requirement from department directors and stakeholders, profile the data by Informatics developer and mapping the data columns from source to target, do further analysis by querying in Hadoop hive and impala, working closely with big data engineers to customize the data structure to align tableau visualization to meet the special business requirements.
- Full life cycle of Data Lake, Data Warehouse with Big data technologies like Spark, Hadoop and designed and deployed scalable, highly available, and fault tolerant systems on AWS.
- Load data into Amazon Redshift and use AWS Cloud Watch to collect and monitor AWS RDS instances and designed and developed ETL/ELT processes to handle data migration from multiple business units and sources including Oracle, Postgres, Informix, MSSQL, Access and others.
- Integrated NoSQL database like Hbase with Map Reduce to move bulk amount ofdatainto HBase and Involved in loading and transforming large sets ofdataand analyzed them by running Hive queries.
- Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ER Studio and developed Data Model -Conceptual/l Logical/ Physical DM for ODS & Dimensional delivery layer in SQL Data Warehouse
- Performed database health checks and tuned the databases using Teradata Manager and used MapReduce, and "Big data" work on Hadoop and other NOSQL platforms.
- Developed, managed and validated existing data models including logical and physical models of the Data Warehouse and source systems utilizing a 3NF model.
- Implemented logical and physical relational database and maintained Database Objects in the data model using ER Studio and used Star schema and Snowflake Schema methodologies in building and designing the Logical Data Model into Dimensional Models.
- Migrate data into RV Data Pipeline using DataBricks, Spark SQL and Scala and migrate Confidential Callcenter Data into RV data pipeline from Oracle into HDFS using Hive and Sqoop
- Developed several behavioral reports and data points creating complex SQL queries and stored procedures using SSRS and Excel.
- Spun up HDInsight clusters and used Hadoop ecosystem tools like Kafka, Spark and databricks for real-time analytics streaming, sqoop, pig, hive and CosmosDB for batch jobs.
- Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS) and generated reports using Global Variables, Expressions and Functions using SSRS.
- Used extensively BaseSAS,SAS/Macro,SAS/SQL, and Excel to develop codes and generated various analytical reports.
- Implemented Spark using Python/Scala and utilizing Spark Core, Spark Streaming and Spark SQL for faster processing of data instead of MapReduce in Java
- Validated the data of reports by writing SQL queries in PL/SQL Developer against ODS and involved in user sessions and assisting in UAT (User Acceptance Testing).
- Involved working onAWScloud formation templates and configuredSQSservice through javaAPI, microservicesto send and receive the information.
Sr. Data Modeler/ Data Analyst
Confidential, Minneapolis, MN
Responsibilities:
- Gather and analyze business data requirements and model these needs. In doing so, work closely with the users of the information, the application developers and architects, to ensure the information models are capable of meeting their needs.
- Coordinated withDataArchitects on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
- Performed Business Area Analysis and logical and physicaldatamodeling for aData Warehouse utilizing the Bill Inmon Methodology and also designedDataMart application utilizing the Star Schema Dimensional Ralph Kimball methodology.
- Worked on AWS utilities such as EMR, S3 and Cloud watch to run and monitor jobs on AWS
- Designed and Developed logical & physical data models and Meta Data to support the requirements using Erwin
- Developed, maintained, and tested Unix shell and Perl DBI/DBD ETL scripts and developed Perl ETL scripts to scrape data from external marketing websites and populate a MySQL DB
- SAP Data Services Integrator ETL developer with strong ability to write procedures to ETL data into a Data Warehouse from a variety of data sources including flat files and database links (Postgres, MySQL, and Oracle).
- Designed the ER diagrams, logical model (relationship, cardinality, attributes, and, candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata as per business requirements using Erwin
- Used python script to connect to oracle pull the data or make the data files and load those data to snowflake.
- Worked on multipleDataMarts in EnterpriseDataWarehouse Project (EDW) and involved in designing OLAPdatamodels extensively used slowly changing dimensions (SCD).
- Designed 3rd normal form target data model and mapped to logical model and involved in extensive DATA validation using ANSI SQL queries and back-end testing
- Generated DDL statements for the creation of new ERwin objects like table, views, indexes, packages and stored procedures.
- Design MOLAP/ROLAP cubes on Teradata Database using SSAS and used SQL for Querying the database in UNIX environment and creation of BTEQ, Fast export, Multi Load, TPump, Fast load scripts for extracting data from various production systems
- Developed automated procedures to producedatafiles using Microsoft Integration Services (SSIS) and performeddataanalysis anddataprofiling using complex SQL on various sources systems including Oracle and Netezza
- Worked RDS for implementing models and data on RDS.
- Developed mapping spreadsheets for (ETL) team with source to target data mapping with physical naming standards, data types, volumetric, domain definitions, and corporate meta-data definitions.
- Used CA Erwin Data Modeler (Erwin) for Data Modeling (data requirements analysis, database design etc.) of custom developed information systems, including databases of transactional systems and data marts.
- Identified and tracked the slowly changing dimensions (SCD I, II, III & Hybrid/6) and determined the hierarchies in dimensions.
- Worked ondataintegration and workflow application on SSIS platform and responsible for testing all new and existing ETLdatawarehouse components.
- Designing Star schema and Snow Flake Schema on Dimensions and Fact Tables and worked with Data Vault Methodology Developed normalized Logical and Physical database models.
- Transformed Logical Data Model to Physical Data Model ensuring the Primary Key and Foreign key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index considerations.
- Generated various reports using SQL Server Report Services (SSRS) for business analysts and the management team and wrote and running SQL, BI and other reports, analyzing data, creating metrics/dashboards/pivots/etc.
- Working along with ETL team for documentation of transformation rules for data migration from OLTP to warehouse for purpose of reporting.
- Involved in writing T-SQL working on SSIS, SSRS, SSAS,DataCleansing,DataScrubbing andDataMigration.
Confidential
Data Analyst
Responsibilities:
- Attended and participated in information and requirements gathering sessions and translated business requirements into working logical and physical data models for Data Warehouse, Data marts and OLAP applications.
- Performed extensive Data Analysis and Data Validation on Teradata and designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN.
- Created and maintained Logical Data Model (LDM) for the project includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
- Integrateddatafrom variousDatasources like MS SQL Server, DB2, Oracle, Netezza and Teradata using Informatica to perform Extraction, Transformation, loading (ETL processes) Worked on ETL development andDataMigration using SSIS and (SQL Loader, PL/SQL).
- Created Entity/Relationship Diagrams, grouped and created the tables, validated thedata, identified PKs for lookup tables.
- Created components to extract application messages stored in XML files.
- Involved in Designed and Developed logical & physical data models and Meta Data to support the requirements using ERWIN.
- Involved using ETL tool Informatica to populate the database, data transformation from the old database to the new database using Oracle.
- Involved in modeling (Star Schema methodologies) in building and designing the logicaldatamodel into Dimensional Models and Performance query tuning to improve the performance along with index maintenance.
- Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
- Wrote and executed unit, system, integration and UAT scripts in a Data Warehouse projects.
- Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data Warehouse, and data mart reporting system in accordance with requirements.
- Responsible for Creating and Modifying T-SQL stored procedures/triggers for validating the integrity of thedata.
- Worked on Data Warehouse concepts and dimensional data modelling using Ralph Kimball methodology.
- Created number of standard reports and complex reports to analyzedatausing Slice & Dice and Drill Down, Drill through using SSRS.
- Developed separate test cases for ETL process (Inbound & Outbound) and reporting.