Sr. Big Data Engineer Resume
Battle Creek, MI
SUMMARY:
- Above 9+ years of experience as Big Data Engineer/Data Modeler/Data Architect and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
- Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
- Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCIOOP, Hue, Scala and CHEF.
- Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.
- Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
- Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
- Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS.
- Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, Cassandra.
- Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
- Excellent experienced on NoSQL databases like MongoDB, Cassandra and write Apache Spark streaming API on Big Data distribution in the active duster environment.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Performed the performance and tuning at source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE and Data Stage.
- Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
TECHNICAL SKILLS:
Big Data technologies:: MapReduce, HBase 1.2, HDFS, Sqoop 1.4, Spark, Hadoop 3.0, Hive 2.3, PIG, Impala 2.1.
Cloud Architecture:: Amazon AWS, EC2, Elastic Search, Elastic Load Balancing & Basic MS Azure
Data Modeling Tools: ER/Studio V17, Erwin 9.7, Power Sybase Designer.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9/7
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R, AWK, SED
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
Testing and defect tracking Tools: HP/Mercury (Quality Center, Win Runner, Quick Test Professional, Performance Center, Requisite, MS Visio & Visual Source Safe
Operating System: Windows, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau 10, and Pentaho.
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.
PROFESSIONAL EXPERIENCE:
Confidential - Battle Creek, MI
Sr. Big Data Engineer
Responsibilities:
- Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
- Ingest Worked on implementation and maintenance of Cloudera Hadoop cluster.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Worked in exporting data from Hive 2.0.0 tables into Netezza 7.2.x database.
- Implemented the Big Data solution using Hadoop, hive and Informatica 9.5.1 to pull/load the data into the HDFS system.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Active involvement in design, new development and SLA based support tickets of Big Machines applications.
- Experience in Server infrastructure development on Gateway, ELB, Auto Scaling, Dynamo DB, Elastic search, Virtual Private Cloud (VPC
- Involved in Kafka and building use case relevant to our environment.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Developed Oozie 3.1.0 workflow jobs to execute hive 2.0.0, Sqoop 1.4.6 and map-reduce actions.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Developed numerous MapReduce jobs in Scala 2.10.x for Data Cleansing and Analyzing Data in Impala 2.1.0.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
- Proof-of-concept to determine feasibility and product evaluation of Big Data products
- Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Used Kafka and Storm for real time data injestion and processing.
- Hands-on experience in developing integration with Elastic search in any of the programming languages. Having knowledge of advance reporting using Elastic search and Node JS.
- AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
- Worked on writing Perl scripts covering data feed handling, implementingmark logic, communicating with web-services through SOAP Lite module and WSDL.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Involved in developing Map-reduce framework, writing queries scheduling map-reduce
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Design of Redshift Data model, Redshift Performance improvements/analysis
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Performed File system management and monitoring on Hadoop log files.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Developed customized classes for serialization and Deserialization in Hadoop.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS.
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
Environment: Pig, Sqoop, Kafka, Apache Cassandra, Elastic search, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.
Confidential - Houston, TX
Jr. Big Data Engineer
Responsibilities:
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive with Cloud Architecture.
- Construct and maintain an appropriate, scalable, and easy-to-use infrastructure with various tools to support the development of actionable reports used in decision-making across the strategy team
- Develop and maintain reports, dashboards, cubes, and scorecards to deliver information requests and deepen the analytics capabilities of operations, fiscal, and strategy staff
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Worked on AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
- Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
- Extracted the data from MySQL, AWS Redshift into HDFS using Sqoop.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Explored with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns. supporting various human capital functions, including human capital strategy, workforce planning and analytics, recruiting, employee engagement and retention, and performance management
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Developed Spark streaming application to pull data from cloud to Hive table.
- Used Spark SQL to process the huge amount of structured data.
- Assigned name to each of the columns using case class option in Scala.
- Used Talend for Big data Integration using Spark and Hadoop
- Used Microsoft Windows server and authenticated client server relationship via Kerberos protocol.
- Identify query duplication, complexity and dependency to minimize migration efforts
- Worked on Talend Magic Quadrant for performing fast integration tasks.
- Performed data profiling and transformation on the raw data using Pig, Python, and Java.
- Used Apache Spark for batch processing to source the data.
- Developed predictive analytic using Apache Spark Scala APIs.
- Involved in working of big data analysis using Pig and User defined functions (UDF)
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NoSQL (Cassandra)
- Responsible for importing log files from various sources into HDFS using Flume.
- Expert in performing business analytical scripts using Hive SQL.
- Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Hortonworks Distributions (HDP) and MapR.
Environment: Hadoop 3.0, HBase 1.2, Hive 2.3, AWS, EC2, S3, RDS, VPC, MySQL, Redshift, Sqoop, HDFS, Spark, ETL, YARN, Talend, Python, UDF, HQL, NoSQL, Flume 1.8, Cassandra 3.11, Hortonworks, MapR, Tableau r15
Confidential - Edison, NJ
Big Data Analyst
Responsibilities:
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Worked in the Advanced Operational Analytics and Big Data Analysis team.
- Designed business layer, database layer, and implemented transaction management into the existing architecture.
- Worked in Agile environment and participated in daily Stand-ups/Scrum Meetings.
- Worked on NOSQL databases such as MongoDB, HBase and Cassandra to enhance scalability and performance.
- Created Load Balancer on AWS EC2 for stable cluster and services which provide fast and effective processing of Data.
- Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
- Used AWS Lambda to perform data validation, filtering, sorting or other transformations for every data change in HBase table and load the transformed data to another data store.
- Integrated Hadoop frameworks/technologies such as Hive and HBase to further operational and analytical experience.
- Loaded data from different servers to S3 bucket and setting appropriate bucket permissions.
- Created Hive queries for supporting the existing application.
- Wrote the HiveQL and manage Hive Meta store server to control different advanced activities.
- Worked with statistical analysis patterns and create the dashboards for quick references and share to the internal customers on daily, weekly or monthly basis.
- Worked on partitioning Hive tables and running scripts parallel to reduce run time of the scripts.
- Implemented business logic by writing UDFs and configuring CRON Jobs.
- Worked with streaming and Data ware housing projects.
- Installed and configured Hive and written Hive UDFs.
- Worked in Json scripts, mongo dB and Unix environment to non-Sql data clean-up grouping and create the analysis reports.
- Wrote python scripts and java coding for business applications and MapReduce programs.
- Worked with hive warehouse directory and hive tables and services.
- Used Spark shell for interactive data analysis and process using Spark Sql to query structured data.
- Created Stored Procedures to communicate with SQL database.
- Involved in writing complex SQL Queries and provided SQL Scripts for the Configuration Data which is used by the application.
- Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
- Used Tableau to generate dashboards and the statistical reports and created a portal using the Tableau JavaScript API
- Worked closely with business analyst for requirement gathering and translating into technical documentation.
Environment: NOSQL, MongoDB 3.6, HBase 1.2, Cassandra, AWS, EC2, Agile, Amazon Redshift, Hadoop frameworks, S3, UDFs, Json, Scripts, UNIX, MapReduce, Python, R, Tableau
Confidential - Brentwood, TN
Data Modeler/Data Architect
Responsibilities:
- Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation.
- Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.
- Familiarity with a NoSQL database such as MongoDB.
- Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
- Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin 9.5
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Used SDLC Methodology of Data Warehouse development using Kanbanize.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
- Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
- Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
- Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
- Developed Master data management strategies for storing reference data.
- Worked with Data Stewards and Business analysts to gather requirements for MDM Project.
- Involved in Testing like Unit testing, System integration and regression testing.
- Worked with SQL Server Analysis Services (SSAS) and SQL Server Reporting Service (SSRS).
- Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
- Perform reverse engineering of the dashboard requirements to model the required data marts.
- Developed Source to Target Matrix with ETL transformation logic for ETL team.
- Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL
- Created Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
- Handled performance requirements for databases in OLTP and OLAP models.
- Conducted meetings with business and development teams for data validation and end-to-end data mapping.
- Responsible for Metadata Management, keeping up to date centralized metadata repositories using Erwin modeling tools.
- Involved in debugging and Tuning the PL/SQL code, tuning queries, optimization for the Sql database.
- Lead data migration from legacy systems into modern data integration frameworks from conception to completion.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server 2014 database systems..
- Managed the meta-data for the Subject Area models for the Data Warehouse environment.
- Generated DDL and created the tables and views in the corresponding architectural layers.
- Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from My SQL into HDFS using Sqoop
- Involved in performing extensive Back-End testing by writing SQL queries and PL/SQL stored procedures to extract the data from SQL Database.
- Participate in code/design reviews and provide input into best practices for reports and universe development.
- Involved in Netezza Administration Activities like backup/restore, performance tuning, and Security configuration
- Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality and data displayed in the reports.
- Created a high-level industry standard, generalized data model to convert it into logical and physical model at later stages of the project using Erwin and Visio
- Participated in Performance Tuning using Explain Plan and TKPROF.
- Involved in translating business needs into long-term architecture solutions and reviewing object models, data models and metadata.
Environment: Erwin 9.5, HDFS, HBase, Hadoop, Metadata, MS Visio, SQL Server 2014, SDLC, PL/SQL, ODS, OLAP, OLTP, flat files.
Confidential - Plano, TX
Data Modeler/ Data Analyst
Responsibilities:
- Created Physical Data Analyst from the Logical Data Analyst using Compare and Merge Utility in ER Studio and worked with the naming standards utility.
- Developed normalized Logical and Physical database models for designing an OLTP application.
- Extensively used Star Schema methodologies in building and designing the logical data model into Dimensional Models
- Creation of database objects like tables, views, Materialized views, procedures, packages using Oracle tools like PL/SQL, SQL*Loader and Handled Exceptions.
- Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Utilized SDLC and Agile methodologies such as SCRUM.
- Involved in administrative tasks, including creation of database objects such as database, tables, and views, using SQL, DDL, and DML requests.
- Worked on Data Analysis, Data profiling, and Data Modeling, data governance identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
- Loaded multi format data from various sources like flat-file, Excel, MS Access and performing file system operation.
- Used T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data marts.
- Worked on Physical design for both SMP and MPP RDBMS, with understanding of RDMBS scaling features.
- Wrote SQL Queries, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
- Wrote simple and advanced SQL queries and scripts to create standard and ad hoc reports for senior managers.
- Performed ETL SQL optimization designed OLTP system environment and maintained documentation of Metadata.
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
- Worked with developers on data Normalization and De-normalization, performance tuning issues, and provided assistance in stored procedures as needed.
- Used Teradata for OLTP systems by generating models to support Revenue Management Applications that connect to SAS.
- Created SSIS Packages for import and export of data between Oracle database and others like MS Excel and Flat Files.
- Worked in the capacity of ETL Developer (Oracle Data Integrator (ODI) / PL/SQL) to migrate data from different sources in to target Oracle Data Warehouse.
- Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
- Involved in creating tasks to pull and push data from Salesforce to Oracle Staging/Data Mart.
- Created VBA Macros to convert the Excel Input files in to correct format and loaded them to SQL Server.
- Helped the BI, ETL Developers in understanding the Data Model, data flow and the expected output for each model created.
Environment: ER/Studio 8.0, Oracle 10g Application Server, Oracle Developer Suite, PL/SQL, T-SQL, SQL plus, SSIS, Teradata 13, OLAP, OLTP, SAS, MS Excel.
Confidential - Denver, CO
Data Modeler/Data Architect
Responsibilities:
- Lead the design and modeling of tactical architectures for development, delivery, and support of projects.
- Developing full life cycle software including defining requirements, prototyping,designing, coding, testing and maintaining software.
- Used Agile Methodology of Data Warehouse development using Kanbanize.
- Interacting with business users to analyze the business process and requirements and transforming requirements into Conceptual, logical and Physical Data Models, designing database, documenting and rolling out the deliverables.
- Responsible for Master Data Management (MDM) and Data Lake design andarchitecture. Data Lake is built using Cloudera Hadoop.
- Involved in Normalization and De-Normalization of existing tables for faster query retrieval and designed both 3NF data models for ODS, OLTP systems and dimensionaldata models using star and snow flake Schemas.
- Used forward engineering to create a physical data model with DDL that best suits the requirements from the Logical Data Model.
- Used Erwin for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information.
- Implementation of full lifecycle in Data warehouses and Business Data marts with Star Schemas, Snowflake Schemas, SCD & Dimensional Modeling.
- Responsible for Dimensional Data Modeling and Modeling Diagrams using Erwin.
- Designed and documented Use Cases, Activity Diagrams, Sequence Diagrams, OOD(Object Oriented Design) using UML and Visio.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets ofsemi-structured data coming from various sources.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Extracted files from Cassandra, MongoDB through Sqoop and placed in HDFS for processed.
- Implemented Dynamic Partition and Bucketing in Hive as part of performance tuning for the workflow and co-ordination files using Oozie framework to automate tasks.
- Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Developed SQL, BTEQ (Teradata) queries for Extracting data from production databaseand built data structures, reports.
- Wrote and executed unit, system, integration and UAT scripts in a data warehouseprojects.
- Created queries using BI-Reporting variables, navigational attributes and Filters. Developed workbooks, info set Queries. Defined reports as per reporting requirements.
- Implemented slowly changing and rapidly changing dimension methodologies; created aggregate fact tables for the creation of ad-hoc reports.
- Created and maintained surrogate keys on the master tables to handle SCD type 2 changes effectively.
- Worked with reversed engineer Data Model from Database instance and Scripts.
- Implemented the Slowly Changing Dimensions as per the requirement.
- Running Quality checks using SQL Queries and keep sync all databases with Erwinmodel and across all environments.
- Deployed naming standard to the Data Model and followed company standard for Project Documentation.
Environment: Erwin 9.6, Agile, MDM, Kanbanize, SQL, BTEQ, Teradata r14, DBA, ODS, OLTP, OOD, UML, ETL, Hadoop 3.0, Cassandra, MongoDB, Sqoop 1.4, HDFS, Oozie, Pig.
Confidential - Denver, CO
Data Analyst/Data Modeler
Responsibilities:
- Performed in team responsible for the analysis of business requirements and design implementation of the business solution.
- Developed logical and physical data models for central model consolidation.
- Worked with DBAs to create a best fit physical data model from the logical data model.
- Conducted data modeling JAD sessions and communicated data-related standards.
- Used Erwin r8 for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
- Used Star/Snowflake schemas in the data warehouse architecture.
- Redefined many attributes and relationships in the reverse engineered model andcleansed unwanted tables/columns as part of data analysis responsibilities.
- Developed process methodology for the Reverse Engineering phase of the project.
- Used reverse engineering to connect to existing database and create graphicalrepresentation (E-R diagram)
- Utilized Erwin's reverse engineering and target database schema conversion process.
- Involved in logical and physical designs and transforms logical models into physical implementations.
- Created 3NF business area data modeling with de-normalized physical implementation data and information requirements analysis using ERWIN tool.
- Involved in extensive data analysis on Teradata, and Oracle Systems Querying and Writing in SQL and Toad.
- Involved using ETL tool Informatica to populate the database, data transformation from the old database to the new database using Oracle and SQL Server.
- Creation of database objects like tables, views, Materialized views, procedures, packages using Oracle tools like PL/SQL, SQL* Plus, SQL*Loader and Handled Exceptions.
- Used Informatica Designer, Workflow Manager and Repository Manager to create source and target definition, design mappings, create repositories and establish users, groups and their privileges
- Involved in Data profiling in order to detect and correct inaccurate data and maintain the data quality.
- Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
- Involved in the creation, maintenance of Data Warehouse and repositories containingMetadata.
- Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
- Involved in the study of the business logic and understanding the physical system and the terms and condition for database.
- Worked closely with the ETL SQL Server Integration Services (SSIS) Developers to explain the Data Transformation.
- Created reports using SQL Reporting Services (SSRS) for customized and ad-hoc Queries.
- Created documentation and test cases, worked with users for new module enhancements and testing.
- Created simple and complex mapping using Data stage to load Dimensions and Facttables as per Star schema techniques.
- Designed and Developed Oracle database Tables, Views, Indexes with proper privileges and Maintained and updated the database by deleting and removing old data.
- Generated ad-hoc reports using Crystal Reports.
Environment: Erwin r8, Informatica 7.0, Windows XP, Oracle10g, SQL Server 2008, MS Excel, MS Visio, Microsoft Transaction Server, Crystal Reports, SQL*Loader.