- Over 9 year of experience as Sr.Data Engineer including designing, developing and implementation of datamodels for enterprise - level applications and systems.
- Design the Data Pipeline which can capture data from streaming web data as well as RDBMS source data
- Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
- Vast experience in building Azure stack(including but not limited to Key Vault, ADW, Data Factory, CosmosDB,Event Hub, Stream Analytics)
- Experience in importing and exporting data using Sqoop from HDFS to RelationalDatabase Management Systems (RDBMS) and from RDBMS to HDFS.
- Extensive experience on Hadoop, HDFS, Hive, Pig, MapReduce, Spark
- Expertise in designing Star schema, Snowflake schema for Data Warehouse, by using tools like Erwin data modeler, Power Designer, and E-R Studio.
- Responsible for Data cleansing & transformation of timeseries data with PySpark
- Extensively using open source languages Perl, Python and Scala
- Extensive experience in development of OLAP, OLTP, PL/SQL, Stored Procedures, Triggers, Functions, Packages, performance tuning and optimization for business logic implementation.
- Experience in developing Map Reduce Programs using ApacheHadoop for analyzing the big data as per the requirement.
- Good understanding of AWS, big data concepts and Hadoop ecosystem.
- Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
- Good working on real time data integration using Kafka, Spark streaming and HBase.
- Designing the Data Marts in dimensionaldatamodeling using star and snowflakeschemas.
- Extensive experience working with XML, Schema Designing and XML data.
- Excellent knowledge and extensively using NOSQL databases (HBase).
- Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS).
- Experience in setting up connections to different RDBMSDatabases like Oracle, Teradata according to users requirement.
- Knowledge in job workflow scheduling and monitoring tools like Oozie.
- Excellent Knowledge of Ralph Kimball and Bill Inmon's approaches to Data Warehousing.
- Good knowledge of Data Marts, Dimensional Data Modeling with RalphKimball Methodology using Analysis Services.
- Experiencein Data Scrubbing/Cleansing, Data Quality, Data Mapping, Data Profiling, Data Validation in ETL
- Experience with Tableau in analysis and creation of dashboard and user stories.
- Experience with SQL Server and T-SQL in constructing Temporary Tables, Table variables, Triggers, userfunctions, views, Stored Procedures.
- Strong experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, DataImport, and Data Export
- Experience in writing and executing unit, system, integration and UATscripts in a datawarehouseprojects.
- Experienced in writing UNIX shell scripting and hands on experienced with scheduling of shell scripts using Control-M.
- Experience in Batch processes, Import, Export, Backup, Database Monitoring tools and Application support.
Data Modeling Tools: Erwin 9.8/9.7, Sybase Power Designer, Oracle Designer, ER/Studio V17
Big Data tools: Hadoop 3.0, HDFS, Hive 2.3, Pig 0.17, Scala, HBase 1.2, Sqoop 1.4, Kafka1.1, Oozie4.3
Database Tools: Oracle 12c/11g, Teradata 15/14 and
ETL Tools: SSIS, Informatica v10.
Programming Languages: SQL, T-SQL, Python, UNIX shells scripting, PL/SQL.
Operating Systems: Microsoft Windows 10/8/7, UNIX, and Linux
Project Execution Methodologies: JAD, Agile, SDLC.
Confidential - Dallas, TX
Sr. Data Engineer
- Implemented datapipeline infrastructure to support machine learning systems
- Responsibility included the full SDLC management for designing, analyzing, developing, testing, Implementation and application support
- Initiated and conducted JADsessions inviting various teams to finalize the required datafields and their formats.
- Createddata pipelines in cloud using Azure Data Factory.
- Authored Python (Pyspark) Scripts for custom UDF’s for Row/ Column manipulations, merges,aggregations, stacking, data labeling and for all Cleaning and conforming tasks.
- Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
- Implemented DimensionalModeling using Star and SnowFlakeSchema, Identifying Facts and Dimensions, Physical and logical data modeling using Erwin
- Created Data Dictionary and Data Mapping from Sources to the Target in MDM Data Model.
- Created SQL scripts to find data quality issues and to identify keys, data anomalies, and data validation issues.
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Data Lake, Data Factory, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
- Performed Hiveprogramming for applications that were migrated to Bigdata using Hadoop
- Built and maintained SQLscripts, indexes, and complex queries for data analysis and extraction.
- Assisted in the oversight for compliance to the Enterprise Data Standards, datagovernance and data quality.
- Provided analytical reports to end users using tableau in integration with Hadoop.
- Converted SQLprocedures in SQL Server to SparkSQL
- Replaced the existing MapReduce programs and HiveQueries into Spark application using Scala.
- Defined Key control management using key vault API Azure services
- Implemented of Azurecloudsolution using HDInsight, Event Hubs, CosmosDB, cognitive services and KeyVault.
- Developed Scalaapplications on Hadoop and Spark SQL for high - volume and real-time data processing.
- Designed and developed a Data Lake using Hadoop for processing raw and processed claims via Hive.
- Built solutions for real time data processing using Kafkamirroring, Kafkaqueues, Key Vault, AzureDataWarehouse, Power BI, Analysis Services, Cosmos DB
- Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL access on Hadoop data
- Preprocessed data using pig scripts so that it can be used for data analysis.
- Used Oozie to automate the process of loading data into HDFS.
- Developed and automated multiple departmental Reports using Tableau and MSExcel.
- Worked and transformed structured, semi structured and unstructured data and loaded into Hbase.
- Moved data onto HDFS from local system and vice versa.
- Designed and developed T-SQL stored procedures to extract, aggregate, transform, and insert data.
- Used SSRS to create reports, customized Reports, on-demand reports, ad-hoc reports and involved in analyzing multi-dimensional reports in SSRS.
Environment: Erwin9.8, Azure, ETL, Kafka 1.1,MDM, SQL, Hive2.3, Python, Pyspark, Big data3.0, Hadoop3.0, Spark SQL, Tableau, Scala, CosmosDB, HDInsight, MapReduce, HDFS, MS Excel, HBase1.2, T-SQL, Oozie4.3,Pig0.17 SSRS.
Confidential - Nashville, TN
- Used the Agile Scrum methodology to build the different phases of Softwaredevelopmentlifecycle.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS.
- Transformed data which is moved onto HDFS into single file using PigScripts, Python.
- Designed the LogicalDataModel using Erwin with the entities and attributes for each subject areas
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Designed new SAS programs by analyzing requirements, constructing workflow charts and diagrams, studying system capabilities and writing specifications.
- Exported and imported data from RDBMS in different countries to Hadoop using sqoop.
- Created ComplexSQLQueries using Views, Indexes, Triggers, Roles, Stored procedures and User Defined Functions Worked with different methods of logging in SSIS.
- Performed Verification, Validation, and Transformations on the Input data (Text files, XML files) before loading into target database.
- Extracted the source data from Oracle tables, sequential files and excel sheets.
- Loaded data into Hive Tables from HadoopDistributed File System (HDFS) to provide SQLaccess on Hadoopdata
- Extensively used Kafka along with HBase and Apache Hive.
- Tested the ETLprocess for both before data validation and after data validation process
- Formed numerous Volatile, Global, Set, Multi-Set tables on Teradata
- Generated parameterized queries for generating tabular reports using global variables, expressions,functions, and stored procedures using SSRS.
- Created Hivequeries that helped analysts spot emerging trends by comparing fresh datawith EDW reference tables and historical metrics.
- Developed and automated multiple departmental Reports using TableauSoftware.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from NoSQL and a variety of portfolios.
- Developed and implemented data cleansing, data security, data profiling and datamonitoring processes.
- Involved in using MSAccess to dump the data and analyze based on business needs.
Environment: Erwin9.7, Agile, Python, Pig0.17, AWS, MapReduce, SAS, SQL, SSIS, XML, Hadoop3.0 Hive2.3, Kafka1.1,HBase1.2, ETL, Teradata15, SSRS, Access,Tableau.
Confidential - Houston, TX
Data Modeler/Data Engineer
- Extensively performed Data analysis using Python Pandas.
- Worked in a Scrum Agile process& Writing Stories with two week iterations delivering product for each iteration
- Extracted Mega Data from Amazon Redshift using SQLQueries to create reports.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Worked on data integration and workflow application on SSIS platform and responsible for testing all new and existing ETL data warehouse components.
- Performed data analysis and dataprofiling using complexSQL on various sources systems including OraclePL/SQL.
- Enforced referential integrity in the OLAPandOLTP data model for consistent relationship between tables and efficient database design.
- Created tables in Hive and loaded the structured (resulted from Map Reduce jobs) data
- Implemented Star Schemamethodologies in modeling and designing the logicaldatamodel into DimensionalModels.
- SAP Power Designer is the data modeling tool used for creating physical data models and effective model management of sharing and reusing model information.
- Designed and implemented in areas related to Teradatautilities such as Fast Export and MLOAD for handling numerous tasks.
- Moved data from Hivetables into HBase for real time analytics on Hive tables.
- Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
- Used Data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Used SQLProfiler for monitoring and troubleshooting performance issues in T-SQL code and storedprocedures.
- Performed UAT testing before Production Phase of the database components being built.
- Manipulated, cleansing & processing data using Excel, and SQL. Responsible for loading, extracting and validation of client data.
Environment: SAPPowerDesigner, Agile, Python, Redshift, SQL, HDFS, Hive, Map Reduce, SSIS, ETL, PL/SQL, Oracle, OLAP, OLTP, HBase, Excel, T-SQL, Teradata.
Confidential - Atlanta, GA
Data Analyst/Data Modeler
- Performed Data Modeling, Database Design, and Data Analysis with the extensive use of ER/Studio.
- Developed Pythonprograms for manipulating the data reading from various Teradata and convert them as one CSV Files.
- Developed normalized Logical and Physicaldatabasemodels for designing an OLTPapplication.
- Involved in extensive Data validation by writing several complexSQLqueries and Involved in back-end testing and worked with data quality issues.
- Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable.
- Created data flow, process documents and ad-hoc reports to derive requirements for existing system enhancements.
- Performed extensive data analysis to identify various data quality issues with the data coming from external systems
- Synthesized and translated Business data needs into creative visualizations in Tableau
- Introduced a Data Dictionary for the process, which simplified a lot of the work around the project.
- Created Stored Procedures to transform the Data & worked extensively in T-SQL for various needs of the transformations while loading the data.
- Created number of standard reports and complex reports to analyze data using Slice & Dice and Drill Down, Drill through using SSRS.
- Created Informatica Mapping, writing UNIX shell scripts and also modifying and changing the PL/SQL scripts.
- Developed complex Stored Procedures for SSRS (SQL Server Reporting Services) and created database objects like tables, indexes etc.
- Designed different type of Starschemas for detailed data marts and plan data marts in the OLAP
- Extensively worked on Shellscripts for running SSIS programs in batch mode on UNIX.
- Wrote multiple SQL queries to analyze the data and presented the results using Excel and Crystalreports.
- Worked on SQL stored procedures, functions and packages in Oracle.
Environment: ER/Studio, Python, Teradata, OLTP, SQL, Tableau, SSRS, T-SQL, OLAP, PL/SQL, UNIX, SSIS, Oracle, Excel