Sr. Data Engineer Resume Dallas, TX - Hire IT People

SUMMARY

Over 9 year of experience as Sr.Data Engineer including designing, developing and implementation of datamodels for enterprise - level applications and systems.
Design the Data Pipeline which can capture data from streaming web data as well as RDBMS source data
Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
Vast experience in building Azure stack(including but not limited to Key Vault, ADW, Data Factory, CosmosDB,Event Hub, Stream Analytics)
Experience in importing and exporting data using Sqoop from HDFS to RelationalDatabase Management Systems (RDBMS) and from RDBMS to HDFS.
Extensive experience on Hadoop, HDFS, Hive, Pig, MapReduce, Spark
Expertise in designing Star schema, Snowflake schema for Data Warehouse, by using tools like Erwin data modeler, Power Designer, and E-R Studio.
Responsible for Data cleansing & transformation of timeseries data with PySpark
Extensively using open source languages Perl, Python and Scala
Extensive experience in development of OLAP, OLTP, PL/SQL, Stored Procedures, Triggers, Functions, Packages, performance tuning and optimization for business logic implementation.
Experience in developing Map Reduce Programs using ApacheHadoop for analyzing the big data as per the requirement.
Good understanding of AWS, big data concepts and Hadoop ecosystem.
Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
Good working on real time data integration using Kafka, Spark streaming and HBase.
Designing the Data Marts in dimensionaldatamodeling using star and snowflakeschemas.
Extensive experience working with XML, Schema Designing and XML data.
Excellent knowledge and extensively using NOSQL databases (HBase).
Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS).
Experience in setting up connections to different RDBMSDatabases like Oracle, Teradata according to users requirement.
Knowledge in job workflow scheduling and monitoring tools like Oozie.
Excellent Knowledge of Ralph Kimball and Bill Inmon's approaches to Data Warehousing.
Good knowledge of Data Marts, Dimensional Data Modeling with RalphKimball Methodology using Analysis Services.
Experiencein Data Scrubbing/Cleansing, Data Quality, Data Mapping, Data Profiling, Data Validation in ETL
Experience with Tableau in analysis and creation of dashboard and user stories.
Experience with SQL Server and T-SQL in constructing Temporary Tables, Table variables, Triggers, userfunctions, views, Stored Procedures.
Strong experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, DataImport, and Data Export
Experience in writing and executing unit, system, integration and UATscripts in a datawarehouseprojects.
Experienced in writing UNIX shell scripting and hands on experienced with scheduling of shell scripts using Control-M.
Experience in Batch processes, Import, Export, Backup, Database Monitoring tools and Application support.

TECHNICAL SKILLS

Data Modeling Tools: Erwin 9.8/9.7, Sybase Power Designer, Oracle Designer, ER/Studio V17

Big Data tools: Hadoop 3.0, HDFS, Hive 2.3, Pig 0.17, Scala, HBase 1.2, Sqoop 1.4, Kafka1.1, Oozie4.3

Database Tools: Oracle 12c/11g, Teradata 15/14 and

ETL Tools: SSIS, Informatica v10.

Programming Languages: SQL, T-SQL, Python, UNIX shells scripting, PL/SQL.

Operating Systems: Microsoft Windows 10/8/7, UNIX, and Linux

Project Execution Methodologies: JAD, Agile, SDLC.

PROFESSIONAL EXPERIENCE

Confidential - Dallas, TX

Sr. Data Engineer

Responsibilities:

Implemented datapipeline infrastructure to support machine learning systems
Responsibility included the full SDLC management for designing, analyzing, developing, testing, Implementation and application support
Initiated and conducted JADsessions inviting various teams to finalize the required datafields and their formats.
Createddata pipelines in cloud using Azure Data Factory.
Authored Python (Pyspark) Scripts for custom UDF’s for Row/ Column manipulations, merges,aggregations, stacking, data labeling and for all Cleaning and conforming tasks.
Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
Implemented DimensionalModeling using Star and SnowFlakeSchema, Identifying Facts and Dimensions, Physical and logical data modeling using Erwin
Created Data Dictionary and Data Mapping from Sources to the Target in MDM Data Model.
Created SQL scripts to find data quality issues and to identify keys, data anomalies, and data validation issues.
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Data Lake, Data Factory, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
Performed Hiveprogramming for applications that were migrated to Bigdata using Hadoop
Built and maintained SQLscripts, indexes, and complex queries for data analysis and extraction.
Assisted in the oversight for compliance to the Enterprise Data Standards, datagovernance and data quality.
Provided analytical reports to end users using tableau in integration with Hadoop.
Converted SQLprocedures in SQL Server to SparkSQL
Replaced the existing MapReduce programs and HiveQueries into Spark application using Scala.
Defined Key control management using key vault API Azure services
Implemented of Azurecloudsolution using HDInsight, Event Hubs, CosmosDB, cognitive services and KeyVault.
Developed Scalaapplications on Hadoop and Spark SQL for high - volume and real-time data processing.
Designed and developed a Data Lake using Hadoop for processing raw and processed claims via Hive.
Built solutions for real time data processing using Kafkamirroring, Kafkaqueues, Key Vault, AzureDataWarehouse, Power BI, Analysis Services, Cosmos DB
Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL access on Hadoop data
Preprocessed data using pig scripts so that it can be used for data analysis.
Used Oozie to automate the process of loading data into HDFS.
Developed and automated multiple departmental Reports using Tableau and MSExcel.
Worked and transformed structured, semi structured and unstructured data and loaded into Hbase.
Moved data onto HDFS from local system and vice versa.
Designed and developed T-SQL stored procedures to extract, aggregate, transform, and insert data.
Used SSRS to create reports, customized Reports, on-demand reports, ad-hoc reports and involved in analyzing multi-dimensional reports in SSRS.

Environment: Erwin9.8, Azure, ETL, Kafka 1.1,MDM, SQL, Hive2.3, Python, Pyspark, Big data3.0, Hadoop3.0, Spark SQL, Tableau, Scala, CosmosDB, HDInsight, MapReduce, HDFS, MS Excel, HBase1.2, T-SQL, Oozie4.3,Pig0.17 SSRS.

Confidential - Nashville, TN

Data Engineer

Responsibilities:

Used the Agile Scrum methodology to build the different phases of Softwaredevelopmentlifecycle.
Implemented a proof of concept deploying this product in Amazon Web Services AWS.
Transformed data which is moved onto HDFS into single file using PigScripts, Python.
Designed the LogicalDataModel using Erwin with the entities and attributes for each subject areas
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Designed new SAS programs by analyzing requirements, constructing workflow charts and diagrams, studying system capabilities and writing specifications.
Exported and imported data from RDBMS in different countries to Hadoop using sqoop.
Created ComplexSQLQueries using Views, Indexes, Triggers, Roles, Stored procedures and User Defined Functions Worked with different methods of logging in SSIS.
Performed Verification, Validation, and Transformations on the Input data (Text files, XML files) before loading into target database.
Extracted the source data from Oracle tables, sequential files and excel sheets.
Loaded data into Hive Tables from HadoopDistributed File System (HDFS) to provide SQLaccess on Hadoopdata
Extensively used Kafka along with HBase and Apache Hive.
Tested the ETLprocess for both before data validation and after data validation process
Formed numerous Volatile, Global, Set, Multi-Set tables on Teradata
Generated parameterized queries for generating tabular reports using global variables, expressions,functions, and stored procedures using SSRS.
Created Hivequeries that helped analysts spot emerging trends by comparing fresh datawith EDW reference tables and historical metrics.
Developed and automated multiple departmental Reports using TableauSoftware.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from NoSQL and a variety of portfolios.
Developed and implemented data cleansing, data security, data profiling and datamonitoring processes.
Involved in using MSAccess to dump the data and analyze based on business needs.

Environment: Erwin9.7, Agile, Python, Pig0.17, AWS, MapReduce, SAS, SQL, SSIS, XML, Hadoop3.0 Hive2.3, Kafka1.1,HBase1.2, ETL, Teradata15, SSRS, Access,Tableau.

Confidential - Houston, TX

Data Modeler/Data Engineer

Responsibilities:

Extensively performed Data analysis using Python Pandas.
Worked in a Scrum Agile process& Writing Stories with two week iterations delivering product for each iteration
Extracted Mega Data from Amazon Redshift using SQLQueries to create reports.
Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
Worked on data integration and workflow application on SSIS platform and responsible for testing all new and existing ETL data warehouse components.
Performed data analysis and dataprofiling using complexSQL on various sources systems including OraclePL/SQL.
Enforced referential integrity in the OLAPandOLTP data model for consistent relationship between tables and efficient database design.
Created tables in Hive and loaded the structured (resulted from Map Reduce jobs) data
Implemented Star Schemamethodologies in modeling and designing the logicaldatamodel into DimensionalModels.
SAP Power Designer is the data modeling tool used for creating physical data models and effective model management of sharing and reusing model information.
Designed and implemented in areas related to Teradatautilities such as Fast Export and MLOAD for handling numerous tasks.
Moved data from Hivetables into HBase for real time analytics on Hive tables.
Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
Used Data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
Used SQLProfiler for monitoring and troubleshooting performance issues in T-SQL code and storedprocedures.
Performed UAT testing before Production Phase of the database components being built.
Manipulated, cleansing & processing data using Excel, and SQL. Responsible for loading, extracting and validation of client data.

Environment: SAPPowerDesigner, Agile, Python, Redshift, SQL, HDFS, Hive, Map Reduce, SSIS, ETL, PL/SQL, Oracle, OLAP, OLTP, HBase, Excel, T-SQL, Teradata.

Confidential - Atlanta, GA

Data Analyst/Data Modeler

Responsibilities:

Performed Data Modeling, Database Design, and Data Analysis with the extensive use of ER/Studio.
Developed Pythonprograms for manipulating the data reading from various Teradata and convert them as one CSV Files.
Developed normalized Logical and Physicaldatabasemodels for designing an OLTPapplication.
Involved in extensive Data validation by writing several complexSQLqueries and Involved in back-end testing and worked with data quality issues.
Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable.
Created data flow, process documents and ad-hoc reports to derive requirements for existing system enhancements.
Performed extensive data analysis to identify various data quality issues with the data coming from external systems
Synthesized and translated Business data needs into creative visualizations in Tableau
Introduced a Data Dictionary for the process, which simplified a lot of the work around the project.
Created Stored Procedures to transform the Data & worked extensively in T-SQL for various needs of the transformations while loading the data.
Created number of standard reports and complex reports to analyze data using Slice & Dice and Drill Down, Drill through using SSRS.
Created Informatica Mapping, writing UNIX shell scripts and also modifying and changing the PL/SQL scripts.
Developed complex Stored Procedures for SSRS (SQL Server Reporting Services) and created database objects like tables, indexes etc.
Designed different type of Starschemas for detailed data marts and plan data marts in the OLAP
Extensively worked on Shellscripts for running SSIS programs in batch mode on UNIX.
Wrote multiple SQL queries to analyze the data and presented the results using Excel and Crystalreports.
Worked on SQL stored procedures, functions and packages in Oracle.

Environment: ER/Studio, Python, Teradata, OLTP, SQL, Tableau, SSRS, T-SQL, OLAP, PL/SQL, UNIX, SSIS, Oracle, Excel

We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship