- Above 7 years of professional experience in IT, which includes work experience in Big Data, Hadoop ecosystem for data processing, Data Warehousing and Data Pipeline design and implementation.
- Expert knowledge in SDLC (Software Development Life Cycle) and was involved in all phases in projects.
- Expertise in using Cloud based managed services for data warehousing in Confidential Azure (Azure Data Lake Storage, Azure Data Factory).
- Strong experience with big data processing using Hadoop technologies Map Reduce, Apache Spark, Apache Hive and Pig.
- Good understanding of cloud configuration in Amazon web services (AWS).
- Have experience in Dimensional Modeling using Star and Snowflake schema methodologies of Data Warehouse and Integration projects
- Excellent proficiency in Agile/Scrum and waterfall methodologies.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio.
- Experience in integration of various data sources with multiple Relational Databases like SQL Server, Teradata, and Oracle.
- Experience in Data Ingestion projects to inject data into Data Lake using multiple sources systems using Talend BigData.
- Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
- Proficient in data governance, data quality, metadata management, master data management.
- Experience in working with creating ETL specification documents, & creating flowcharts, process work flows and data flow diagrams.
- Experience in execution of Batch jobs through the data streams to SPARK Streaming.
- Good knowledge in streaming applications using Apache Kafka.
- Hands on experience in working with Tableau Desktop, Tableau Server and Tableau Reader in various versions.
- Extending HIVE and PIG core functionality by using custom UDF’s.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Expertise in SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS) and SQL Server Integration Services.
- In - depth knowledge of T-SQL, SSAS, SSRS, SSIS, OLAP, OLTP, BI suite, Reporting and Analytics.
- Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
- Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.
Big Data Tools: HBase 1.2, Hive 2.3, Pig 0.17, HDFS, Sqoop 1.4, Kafka 1.0.1, Scala, Oozie 4.3, Hadoop3.0, MapReduce, Spark
BI Tools: Tableau 10, Tableau server 10, SAP Business Objects, MS BI
Methodologies: JAD, System Development Life Cycle (SDLC), Agile, Waterfall Model.
ETL Tools: Informatica 9.6/9.1 and Tableau, Pentaho
Data Modeling Tools: Erwin Data Modeler 9.8, ER Studio v17, and Power Designer 16.6.
Databases: Oracle 12c, Teradata R15, MS SQL Server 2016, DB2, Snowflake SaaS
Cloud Architecture: Amazon AWS, EC2, Basic MS Azure, Google Cloud (GCP)
Programming Languages: SQL, PL/SQL, Python, UNIX shell Scripting
Operating System: Windows, Unix
Confidential, Eden Prairie, MN
- Developed complete end to end Big-data processing in hadoop eco system.
- Provided application support during the build and test phases of the SDLC for their product.
- Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
- Recreated existing application logic and functionality in the Azure Data Lake, Data Factory, Data Bricks, SQL Database and SQL data warehouse environment
- Performed data profiling and transformation on the raw data using Pig, Python, and oracle
- Developed predictive analytic using Apache Spark Scala APIs.
- Created dimensional model for the reporting system by identifying required dimensions and facts using Erwin.
- Developed and implemented a data pipeline using Kafka and Strom to store data into HDFS.
- Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
- Worked with Snowflake SaaS for cost effective data warehouse implementation on cloud.
- Designed and implemented database solutions in Azure SQL Data Warehouse, Azure SQL
- Developed customer cleanse functions, cleanse lists and mappings for MDM Hub
- Worked extensively on Oracle PL/SQL, and SQL Performance Tuning.
- Exactly worked on Python Open stack API's.
- Involved in modeling (Star Schema methodologies) in building and designing the logical data model into Dimensional Models.
- Created shared dimension tables, measures, hierarchies, levels, cubes and aggregations on MS OLAP/ OLTP/Analysis Server (SSAS).
- Created indexes both non clustered and clustered indexes in order to maximize the query performance in T-SQL.
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Generated multiple enterprise reports like SSRS and Crystal report, worked on Tableau.
- Managed Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.
- Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS.
- Used Sqoop to efficiently transfer data between databases and HDFS.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Used Pig as ETL tool to do transformations, joins and some pre-aggregations before storing the data into HDFS.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Worked on the creating Adhoc reports, Database Imports and Exports using SSIS
Environment: Erwin9.8, SQL, Oracle12c, PL/SQL, Bigdata3.0, Hadoop3.0, Azure Data Lake, Spark, Scala, APIs, Pig0.17, Python, Kafka1.1, HDFS, ETL, MDM, OLAP, OLTP, SSAS, T-SQL, Hive2.3, SSRS, Tableau, MapReduce, Sqoop1.4, Scala, HBase1.2, SSIS.
Confidential, Ashburn, VA
Data Analyst/Data Engineer
- Followed Test driven development of Agile Methodology to produce high quality software.
- Designed and developed a horizontally scalable APIs using Python Flask.
- Conducted JAD sessions, wrote meeting minutes and also documented the requirements.
- Worked with cloud providers and API's for Amazon (AWS) EC2, S3, VPC with GFS storage.
- Worked with Data ingestion, querying, processing and analysis of big data.
- Performed tuned and optimized various complex SQL queries.
- Developed normalized Logical and Physical database models to design OLTP system.
- Extensively involved in creating PL/SQL objects i.e. Procedures, Functions, and Packages.
- Performed bug verification, release testing and provided support for Oracle based applications.
- Used Model Mart of Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement
- Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution.
- Worked with Real-time Streaming using Kafka and HDFS.
- Worked with Alteryx a data Analytical tool to develop workflows for the ETL jobs.
- Designed the data marts in dimensional data modeling using star and snowflake schemas.
- Wrote, tested and implemented Teradata Fastload, Multiload, DML and DDL.
- Used various OLAP operations like slice / dice, drill down and roll up as per business requirements.
- Wrote SQL queries, stored procedures, views, triggers, T-SQL and DTS/SSIS.
- Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
- Designed SSRS reports with sub reports, dynamic sorting, defining data source and subtotals for the report.
- Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
- Worked with Sqoop commands to import the data from different databases.
- Gathered SSRS reports requirements and created in Tableau.
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML.
Environment: Erwin9.8, SQL, PL/SQL, Kafka1.1, AWS, API's, Agile, ETL, HDFS, OLAP, HDFS, T-SQL, SSIS, Teradata15, Hive2.3, SSRS, Sqoop1.4, Tableau, Map Reduce, XML.
Confidential, Round Rock, TX
Data Analyst/Data Modeler
- As a Data Analyst /Data Modeler I was responsible for all data related aspects of a project.
- Created report on Cloud based environment using Amazon Redshift and published on Tableau
- Developed of Python APIs to dump the array structures in the Processor at the failure point for debugging.
- Worked extensively on ER/ Studio in several projects in both OLAP and OLTP applications.
- Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
- Performed Data Analysis and data profiling using complex SQL on various sources systems including Oracle.
- Developed the required data warehouse model using Star schema for the generalized model
- Implemented Visualized BI Reports with Tableau.
- Worked on stored procedures for processing business logic in the database.
- Extensively worked on Viewpoint for Teradata to look at performance Monitoring and performance tuning.
- Performed Extract, Transform and Load (ETL) solutions to move legacy and ERP data into Oracle data warehouse.
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Managed database design and implemented a comprehensive Snow flake-Schema with shared dimensions.
- Worked Normalization and De-normalization concepts and design methodologies.
- Worked on the reporting requirements for the data warehouse.
- Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
- Developed complex T-Sql code such as Stored Procedures, functions, triggers, Indexes, and views for the business application.
- Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
Environment: ER/ Studio, SQL, Python, APIs, OLAP, OLTP, PL/SQL, Oracle, Teradata, BI, Tableau, ETL, SSIS, SSAS, SSRS, T-SQL, Redshift.
- Worked with Data Analyst for requirements gathering, business analysis and project coordination.
- Wrote Python normalizations scripts to find duplicate data in different environments.
- Handled performance requirements for databases in OLTP and OLAP models.
- Performed Data Analysis and Data validation by writing SQL queries using SQL assistant.
- Worked with Informatica cloud for creating source and target objects, developed source to target mappings.
- Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing the packages all environments.
- Developed reports for users in different departments in the organization using SQL Server Reporting Services (SSRS).
- Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
- Created new tables, written stored procedures, triggers, views, functions.
- Compared data with original source documents and validate Data accuracy.
- Developed the financing reporting requirements by analyzing the existing business objects reports
- Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports
Environment: SQL, OLAP, Python, OLTP, Informatica, SSIS, SSRS, T-SQL, Tableau, Excel.