Sr. Big data engineer Resume Dearborn, MI - Hire IT People

SUMMARY

Above 12+ years of experience as Big Data Engineer/Data Modeler/Data Architect and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and TEMPeffective use of MapReduce, SQL and Cassandra to solve big data type problems.
Pleasant experience in working wif different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for TEMPeffective and optimum performance in OLTP and OLAP environments.
Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCIOOP, Hue, Scala and CHEF.
Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on teh Yarn cluster.
Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.
Experienced in configuring and administering teh Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
Solid noledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling wif Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Informatica Power Centre.
Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
Experience wif Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
Strong experience wif architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
Experienced on R and Python for statistical computing. Also experience wif MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS.
Extensive experience in loading and analyzing large datasets wif Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, Cassandra.
Experienced on implementation of a log producer in Scala dat watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
Excellent experienced on NoSQL databases like MongoDB, Cassandra and write Apache Spark streaming API on Big Data distribution in teh active duster environment.
Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
Strong Experience in working wif Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
Performed teh performance and tuning Confidential source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE and Data Stage.
Strong noledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
Pleasant experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.

TECHNICAL SKILLS

Big Data technologies: MapReduce, HBase 1.2, HDFS, Sqoop 1.4, Spark, Hadoop 3.0, Hive 2.3, PIG, Impala 2.1

Cloud Architecture: Amazon AWS, EC2, Elastic Search, Elastic Load Balancing & Basic MS Azure

Data Modeling Tools: ER/Studio V17, Erwin 9.7, Power Sybase Designer.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9/7

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

Testing and defect tracking Tools: HP/Mercury (Quality Center, Win Runner, Quick Test Professional, Performance Center, Requisite, MS Visio & Visual Source Safe

Operating System: Windows, Unix, Sun Solaris

ETL/Datawarehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau 10, and Pentaho.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential, Dearborn, MI

Sr. Big data engineer

Responsibilities:

Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
Used Agile (SCRUM) methodologies for Software Development.
Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist teh data into HDFS.
Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
Designed and develop end to end ETL processing from Oracle to AWS using Amazon S3, EMR, and Spark.
Developed teh code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing wif Pig.
Implemented AWS cloud computing platform using S3, RDS, Dynamo DB, Redshift, and Python.
Responsible in loading and transforming huge sets of structured, semi structured and unstructured data.
Implemented business logic by writing UDFs and configuring CRON Jobs.
Extensively involved in writing PL/SQL, stored procedures, functions and packages.
Created logical and physical data models using Erwin and reviewed these models wif business team and data architecture team.
Involved in converting MapReduce programs into Spark transformations using Spark python API.
Developed Spark scripts by using python and bash Shell commands as per teh requirement.
Worked wif NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
Responsible for translating business and data requirements into logical data models in support Enterprise data models, ODS, OLAP, OLTP and Operational data structures.
Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files and CVS files.
Worked closely wif teh SSIS, SSRS Developers to explain teh complex data transformation using Logic.
Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data Modeling tools like Erwin.
Developed teh Star Schema/Snowflake Schema for proposed warehouse models to meet teh requirements.
Designed class and activity diagrams using Power Designer and UML tools like Visio.

Environment: Hive 2.3, MapReduce, Hadoop 3.0, HDFS, Oracle, Spark 2.3, Hbase 1.2, Flume 1.8, Pig 0.17, Sqoop 1.4, Oozie 4.3, Python, PL/SQL, Erwin 9.7, NoSQL, OLAP, OLTP, SSIS, MS Excel 2016, SSRS, Visio

Confidential, Battle Creek, MI

Sr. Big Data Engineer

Responsibilities:

Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
Ingest Worked on implementation and maintenance of Cloudera Hadoop cluster.
Created Hive External tables to stage data and then move teh data from Staging to main tables
Worked in exporting data from Hive 2.0.0 tables into Netezza 7.2.x database.
Implemented teh Big Data solution using Hadoop, hive and Informatica 9.5.1 to pull/load teh data into teh HDFS system.
Pulling teh data from data lake (HDFS) and massaging teh data wif various RDD transformations.
Active involvement in design, new development and SLA based support tickets of Big Machines applications.
Experience in Server infrastructure development on Gateway, ELB, Auto Scaling, Dynamo DB, Elastic search, Virtual Private Cloud (VPC
Involved in Kafka and building use case relevant to our environment.
Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
Developed Oozie 3.1.0 workflow jobs to execute hive 2.0.0, Sqoop 1.4.6 and map-reduce actions.
Provided thought leadership for architecture and teh design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
Developed numerous MapReduce jobs in Scala 2.10.x for Data Cleansing and Analyzing Data in Impala 2.1.0.
Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Load teh data from various sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate teh output response.
Developed complete end to end Big-data processing in Hadoop eco system.
Objective of dis project is to build a data lake as a cloud-based solution in AWS using Apache Spark and provide visualization of teh ETL orchestration using CDAP tool.
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
Proof-of-concept to determine feasibility and product evaluation of Big Data products
Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive.
Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting on teh dashboard.
Used Kafka and Storm for real time data injestion and processing.
Hands-on experience in developing integration wif Elastic search in any of teh programming languages. Having noledge of advance reporting using Elastic search and Node JS.
AWS Cloud and On-Premise environments wif Infrastructure Provisioning / Configuration.
Worked on writing Perl scripts covering data feed handling, implementingmark logic, communicating wif web-services through SOAP Lite module and WSDL.
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on teh dashboard
Involved in developing Map-reduce framework, writing queries scheduling map-reduce
Developed teh code for Importing and exporting data into HDFS and Hive using Sqoop
Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
Design of Redshift Data model, Redshift Performance improvements/analysis
Continuous monitoring and managing teh Hadoop cluster through Cloudera Manager.
Worked on configuring and managing disaster recovery and backup on Cassandra Data.
Performed File system management and monitoring on Hadoop log files.
Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
Used Flume to collect, aggregate, and store teh web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
Implemented partitioning, dynamic partitions and buckets in HIVE.
Developed customized classes for serialization and Deserialization in Hadoop.
Analyzed copious amounts of data sets to determine optimal way to aggregate and report on it.
Implemented a proof of concept deploying dis product in Amazon Web Services AWS.
Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.

Environment: Pig, Sqoop, Kafka, Apache Cassandra, Elastic search, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

Confidential, Houston, TX

Big Data Engineer

Responsibilities:

Lead architecture and design of data processing, warehousing and analytics initiatives.
Implemented solutions for ingesting data from various sources and processing teh Data- Confidential -Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive wif Cloud Architecture.
Construct and maintain an appropriate, scalable, and easy-to-use infrastructure wif various tools to support teh development of actionable reports used in decision-making across teh strategy team
Develop and maintain reports, dashboards, cubes, and scorecards to deliver information requests and deepen teh analytics capabilities of operations, fiscal, and strategy staff
Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
Worked on AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
Worked wif AWS to implement teh client-side encryption as Dynamo DB does not support Confidential rest encryption Confidential dis time.
Extracted teh data from MySQL, AWS Redshift into HDFS using Sqoop.
Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored teh data into HDFS in CSV format.
Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
Explored wif teh Spark for improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Used Data Frame API in Scala for converting teh distributed collection of data organized into named columns.
supporting various human capital functions, including human capital strategy, workforce planning and analytics, recruiting, employee engagement and retention, and performance management
Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
Developed Spark streaming application to pull data from cloud to Hive table.
Used Spark SQL to process teh huge amount of structured data.
Assigned name to each of teh columns using case class option in Scala.
Used Talend for Big data Integration using Spark and Hadoop
Used Microsoft Windows server and authenticated client server relationship via Kerberos protocol.
Identify query duplication, complexity and dependency to minimize migration efforts
Worked on Talend Magic Quadrant for performing fast integration tasks.
Performed data profiling and transformation on teh raw data using Pig, Python, and Java.
Used Apache Spark for batch processing to source teh data.
Developed predictive analytic using Apache Spark Scala APIs.
Involved in working of big data analysis using Pig and User defined functions (UDF)
Created Hive External tables and loaded teh data into tables and query data using HQL.
Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream teh log data from servers.
Implement enterprise grade platform (mark logic) for ETL from mainframe to NoSQL (Cassandra)
Responsible for importing log files from various sources into HDFS using Flume.
Expert in performing business analytical scripts using Hive SQL.
Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Hortonworks Distributions (HDP) and MapR.

Environment: Hadoop 3.0, HBase 1.2, Hive 2.3, AWS, EC2, S3, RDS, VPC, MySQL, Redshift, Sqoop, HDFS, Spark, ETL, YARN, Talend, Python, UDF, HQL, NoSQL, Flume 1.8, Cassandra 3.11, Hortonworks, MapR, Tableau r15

Confidential, Edison, NJ

Big Data Analyst

Responsibilities:

Involved in distinct phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per teh business requirements.
Worked in teh Advanced Operational Analytics and Big Data Analysis team.
Designed business layer, database layer, and implemented transaction management into teh existing architecture.
Worked in Agile environment and participated in daily Stand-ups/Scrum Meetings.
Worked on NOSQL databases such as MongoDB, HBase and Cassandra to enhance scalability and performance.
Created Load Balancer on AWS EC2 for stable cluster and services which provide fast and TEMPeffective processing of Data.
Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
Used AWS Lambda to perform data validation, filtering, sorting or other transformations for every data change in HBase table and load teh transformed data to another data store.
Integrated Hadoop frameworks/technologies such as Hive and HBase to further operational and analytical experience.
Loaded data from different servers to S3 bucket and setting appropriate bucket permissions.
Created Hive queries for supporting teh existing application.
Wrote teh HiveQL and manage Hive Meta store server to control different advanced activities.
Worked wif statistical analysis patterns and create teh dashboards for quick references and share to teh internal customers on daily, weekly or monthly basis.
Worked on partitioning Hive tables and running scripts parallel to reduce run time of teh scripts.
Implemented business logic by writing UDFs and configuring CRON Jobs.
Worked wif streaming and Data ware housing projects.
Installed and configured Hive and written Hive UDFs.
Worked in Json scripts, mongo dB and Unix environment to non-Sql data clean-up grouping and create teh analysis reports.
Wrote python scripts and java coding for business applications and MapReduce programs.
Worked wif hive warehouse directory and hive tables and services.
Used Spark shell for interactive data analysis and process using Spark Sql to query structured data.
Created Stored Procedures to communicate wif SQL database.
Involved in writing complex SQL Queries and provided SQL Scripts for teh Configuration Data which is used by teh application.
Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
Used Tableau to generate dashboards and teh statistical reports and created a portal using teh Tableau JavaScript API
Worked closely wif business analyst for requirement gathering and translating into technical documentation.

Environment: NOSQL, MongoDB 3.6, HBase 1.2, Cassandra, AWS, EC2, Agile, Amazon Redshift, Hadoop frameworks, S3, UDFs, Json, Scripts, UNIX, MapReduce, Python, R, Tableau

Confidential, Brentwood, TN

Data Modeler/Data Architect

Responsibilities:

Responsible for teh data architecture design delivery, data model development, review, approval and Data warehouse implementation.
Designed and developed teh conceptual then logical and physical data models to meet teh needs of reporting.
Familiarity wif a NoSQL database such as MongoDB.
Involved in designing and developing Data Models and Data Marts dat support teh Business Intelligence Data Warehouse.
Implemented logical and physical relational database and maintained Database Objects in teh data model using Erwin 9.5
Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
Used SDLC Methodology of Data Warehouse development using Kanbanize.
Worked wif Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
Performed teh Data Mapping, Data design (Data Modeling) to integrate teh data across teh multiple databases in to EDW.
Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
Developed Master data management strategies for storing reference data.
Worked wif Data Stewards and Business analysts to gather requirements for MDM Project.
Involved in Testing like Unit testing, System integration and regression testing.
Worked wif SQL Server Analysis Services (SSAS) and SQL Server Reporting Service (SSRS).
Worked on Data modeling, Advanced SQL wif Columnar Databases using AWS.
Perform reverse engineering of teh dashboard requirements to model teh required data marts.
Developed Source to Target Matrix wif ETL transformation logic for ETL team.
Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and Confidential -SQL
Created Data Migration and Cleansing rules for teh Integration Architecture (OLTP, ODS, DW).
Handled performance requirements for databases in OLTP and OLAP models.
Conducted meetings wif business and development teams for data validation and end-to-end data mapping.
Responsible for Metadata Management, keeping up to date centralized metadata repositories using Erwin modeling tools.
Involved in debugging and Tuning teh PL/SQL code, tuning queries, optimization for teh Sql database.
Lead data migration from legacy systems into modern data integration frameworks from conception to completion.
Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server 2014 database systems.
Managed teh meta-data for teh Subject Area models for teh Data Warehouse environment.
Generated DDL and created teh tables and views in teh corresponding architectural layers.
Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted teh data from My SQL into HDFS using Sqoop
Involved in performing extensive Back-End testing by writing SQL queries and PL/SQL stored procedures to extract teh data from SQL Database.
Participate in code/design reviews and provide input into best practices for reports and universe development.
Involved in Netezza Administration Activities like backup/restore, performance tuning, and Security configuration
Involved in teh validation of teh OLAP, Unit testing and System Testing of teh OLAP Report Functionality and data displayed in teh reports.
Created a high-level industry standard, generalized data model to convert it into logical and physical model Confidential later stages of teh project using Erwin and Visio
Participated in Performance Tuning using Explain Plan and TKPROF.
Involved in translating business needs into long-term architecture solutions and reviewing object models, data models and metadata.

Environment: Erwin 9.5, HDFS, HBase, Hadoop, Metadata, MS Visio, SQL Server 2014, SDLC, PL/SQL, ODS, OLAP, OLTP, flat files.

Confidential, Plano, TX

Data Modeler/ Data Analyst

Responsibilities:

Created Physical Data Analyst from teh Logical Data Analyst using Compare and Merge Utility in ER Studio and worked wif teh naming standards utility.
Developed normalized Logical and Physical database models for designing an OLTP application.
Extensively used Star Schema methodologies in building and designing teh logicaldatamodel into Dimensional Models
Creation of database objects like tables, views, Materialized views, procedures, packages using Oracle tools like PL/SQL, SQL*Loader and Handled Exceptions.
Enforced referential integrity in teh OLTP data model for consistent relationship between tables and efficient database design.
Worked wif data investigation, discovery and mapping tools to scan every single data record from many sources.
Utilized SDLC and Agile methodologies such as SCRUM.
Involved in administrative tasks, including creation of database objects such as database, tables, and views, using SQL, DDL, and DML requests.
Worked on Data Analysis, Data profiling, and Data Modeling, data governance identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
Loaded multi format data from various sources like flat-file, Excel, MS Access and performing file system operation.
Used Confidential -SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data marts.
Worked on Physical design for both SMP and MPP RDBMS, wif understanding of RDMBS scaling features.
Wrote SQL Queries, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
Wrote simple and advanced SQL queries and scripts to create standard and ad hoc reports for senior managers.
Performed ETL SQL optimization designed OLTP system environment and maintained documentation of Metadata.
Involved wifDataAnalysis primarily IdentifyingDataSets, SourceData, Source MetaData, Data Definitions andDataFormats
Worked wif developers on data Normalization and De-normalization, performance tuning issues, and aided in stored procedures as needed.
Used Teradata for OLTP systems by generating models to support Revenue Management Applications dat connect to SAS.
Created SSIS Packages for import and export of data between Oracle database and others like MS Excel and Flat Files.
Worked in teh capacity of ETL Developer (Oracle Data Integrator (ODI) / PL/SQL) to migrate data from various sources in to target Oracle Data Warehouse.
Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
Involved in creating tasks to pull and push data from Salesforce to Oracle Staging/Data Mart.
Created VBA Macros to convert teh Excel Input files in to correct format and loaded them to SQL Server.
Helped teh BI, ETL Developers in understanding teh Data Model, data flow and teh expected output for each model created.

Environment: ER/Studio 8.0, Oracle 10g Application Server, Oracle Developer Suite, PL/SQL, Confidential -SQL, SQL plus, SSIS, Teradata 13, OLAP, OLTP, SAS, MS Excel.

Confidential, Denver, CO

Data Modeler/Data Architect

Responsibilities:

Lead teh design and modeling of tactical architectures for development, delivery, and support of projects.
Developingfulllifecyclesoftwareincluding defining requirements,prototyping, designing,coding,testingandmaintainingsoftware.
UsedAgile MethodologyofData Warehousedevelopment usingKanbanize.
Interacting wif business users to analyze teh business process and requirements and transforming requirements intoConceptual, logical and Physical Data Models, designing database, documenting and rolling out teh deliverables.
Responsible forMasterDataManagement(MDM) andDataLakedesignandarchitecture.DataLakeis built usingClouderaHadoop.
Involved inNormalizationandDe-Normalizationof existing tables for faster query retrieval and designed both3NF data modelsforODS,OLTPsystems and dimensionaldatamodelsusingstarandsnow flake Schemas.
Usedforwardengineeringto create aphysicaldatamodelwifDDLdat best suits teh requirements from theLogicalDataModel.
UsedErwinforreverseengineeringto connect to existing database andODSto create graphical representation in teh form ofEntity Relationshipsand elicit more information.
Implementation of full lifecycle inData warehousesand BusinessData martswifStar Schemas,Snowflake Schemas, SCD & Dimensional Modeling.
Responsible forDimensional Data ModelingandModeling DiagramsusingErwin.
Designed and documentedUse Cases, Activity Diagrams, Sequence Diagrams,OOD (Object Oriented Design)usingUMLandVisio.
Worked wifNoSQLdatabases like HBase in creatingHBasetablesto load large sets ofsemi-structureddata coming from various sources.
Exported data fromHDFSenvironment intoRDBMSusingSqoopfor report generation and visualization purpose.
Extracted files fromCassandra,MongoDBthroughSqoopand placed inHDFSfor processed.
ImplementedDynamicPartitionandBucketinginHiveas part of performance tuning for teh workflow and co-ordination files usingOozieframeworkto automate tasks.
DevelopedPigLatinscriptsfor replacing teh existing legacy process to teh Hadoop and teh data is fed toAWSS3.
Worked wifBTEQto submitSQLstatements,importandexportdata, and generate reports inTeradata.
Involved wifdataprofilingfor multiple sources and answered complex business questions by providing data to business users.
Worked wifdata investigation, discoveryandmapping toolsto scan every single data record from many sources.
DevelopedSQL, BTEQ (Teradata) queriesfor Extracting data from productiondatabaseand built data structures, reports.
Wrote and executed unit, system, integration andUAT scriptsin adatawarehouseprojects.
Created queries usingBI-Reportingvariables, navigational attributes and Filters. Developed workbooks, info setQueries. Defined reports as per reporting requirements.
Implemented slowly changing and rapidly changing dimension methodologies; created aggregate fact tables for teh creation ofad-hoc reports.
Created and maintained surrogate keys on teh master tables to handleSCDtype 2 changes TEMPeffectively.
Worked wif reversed engineerData ModelfromDatabaseinstance andScripts.
Implemented teh Slowly Changing Dimensions as per teh requirement.
Running Quality checks usingSQLQueriesand keep sync alldatabaseswifErwinmodeland across all environments.
Deployed naming standard to theData Modeland followed company standard for Project Documentation.

Environment: Erwin 9.6,Agile,MDM,Kanbanize, SQL, BTEQ, Teradatar14, DBA, ODS, OLTP, OOD, UML,ETL,Hadoop3.0,Cassandra, MongoDB, Sqoop1.4, HDFS,Oozie, Pig.

Confidential, Denver, CO

Data Analyst/Data Modeler

Responsibilities:

Performed in team responsible for teh analysis of business requirements and design implementation of teh business solution.
Developedlogicalandphysicaldata models for central model consolidation.
Worked wifDBAsto create a best fitphysicaldatamodelfrom teh logical data model.
Conducted data modelingJADsessions and communicated data-related standards.
UsedErwin r8for TEMPeffective model management of sharing, dividing and reusing model information and design for productivity improvement.
UsedStar/Snowflakeschemas in thedatawarehousearchitecture.
Redefined many attributes and relationships in theirverseengineeredmodel andcleansedunwantedtables/columnsas part ofdataanalysisresponsibilities.
Developed process methodology for theReverseEngineeringphase of teh project.
Usedreverseengineeringto connect to existing database and create graphicalrepresentation (E-R diagram)
UtilizedErwin's reverse engineeringand target databaseschemaconversionprocess.
Involved inlogicalandphysicaldesignsand transforms logical models into physical implementations.
Created3NFbusiness area data modeling wif de-normalized physical implementation data and information requirements analysis usingERWINtool.
Involved in extensivedataanalysisonTeradata, andOracleSystemsQueryingand Writing inSQLandToad.
Involved usingETLtoolInformaticato populate teh database, data transformation from teh old database to teh new database usingOracleandSQLServer.
Creation of database objects like tables, views, Materialized views, procedures, packages usingOracletools likePL/SQL, SQL* Plus, SQL*Loaderand Handled Exceptions.
UsedInformatica Designer,WorkflowManagerandRepositoryManagerto create source and target definition, design mappings, create repositories and establish users, groups and their privileges
Involved inDataprofilingto detect and correct inaccurate data and maintain thedataquality.
DevelopedData MigrationandCleansingrules for teh Integration Architecture(OLTP, ODS, DW).
Involved in teh creation, maintenance ofDataWarehouseand repositories containingMetadata.
Developed Star and Snowflake schemas based dimensional model to develop teh data warehouse.
Involved in teh study of thebusinesslogicand understanding thephysicalsystemand teh terms and condition for database.
Worked closely wif theETL SQL Server Integration Services (SSIS)Developers to explain theData Transformation.
Createdreports usingSQL Reporting Services (SSRS)for customized andad-hoc Queries.
Created documentation and test cases, worked wif users for new module enhancements and testing.
Created simple andcomplexmappingusingData stageto load Dimensions andFacttablesas perStarschematechniques.
Designed and DevelopedOracle database Tables,Views,Indexeswif proper privileges and Maintained and updated teh database by deleting and removing old data.
Generatedad-hocreports usingCrystalReports.

Environment: Erwin r8, Informatica7.0, Windows XP, Oracle10g, SQL Server 2008, MS Excel, MS Visio, Microsoft Transaction Server, Crystal Reports,SQL*Loader.

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Dearborn, MI

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship