- 4 years in Data ingestion, Transformation and Data Analysis, Data Visualization using Bigdata Hortenworks platform Map Reduce, HDFS, Hive, Sqoop, Hbase, Yarn, Impyla, Spark, Scala, Oozie and Kafka.
- 15+ years of experience in Data modeling, ETL Solutions, Application Development (Windows and Web) and handled all stages of SDLC such as analysis, estimation, requirement gathering, design, development, testing, certification, and target deployment.
- Thorough understanding of Kimball strategies. Good experience in Dimensional Modeling. Logical & Physical Model design, using design tools MS Visio and ErWin. Designing Facts and Dimension Tables (Star Schema and Snowflake Scheme).
- Experience in extracting encrypted data from Amazon Redshift and UNLOAD to S3 Bucket. Extract the data using boto/boto3 services and load to hive.
- Experience in data import and export data between HDFS and Relational Database Management systems using Sqoop.
- Experience in developing Map Reduce/Yarn Programs using Java, Scala for data modelling.
- Using python to connect to Hive using pyHive and perform data analysis.
- Experience in OOP using Java, Scala and Python. Developed reusable framework for performing DQ checks.
- Hive and Spark SQL’s performance tuning and optimization.
- Conversion of SQL server SQL’s to Hive and Spark SQL’s to reuse the business logic. Involved in converting SQL queries into Spark transformations using Spark RDDs, Scala.
- Data Visualization using Tableau 8.0 ~ 10.0. Publish the report as PDF to end users.
- Importing data from external systems using importing the real - time data to Hadoop using Kafka and implemented Oozie jobs for daily imports.
- Developed Confidential scripts to load the files to hive database using impala passing parameters for HiveQL and partitions. UC4 for workflow automation.
- Experience in design /Development/Implementation of large scale ETL infrastructure using SQL Server Integration Services.
- Experience in designing and building Data Marts using SQL Services Analysis Services (2005 / 2008 / 2012 / 2014 ).
- Experience in working on databases such as SQL Server (2005, 2008, 2010, 2012,2014), Oracle 12c and My SQL.
- Experience in programing languages like C, Java, Python, VB.NET, C#, Scripting, Java and VB Scripts. Experience in building and importing C APIs in Python, Multi-Threading, Data Access and Splitting large files.
- Work with the team in Managing Multiple Source Systems total with around 15,000 ETL Packages (SSIS, Talend Open Studio).
- Data Loading of 30,000 source files of complex nature to 100K tables, monthly around 1TB after data compression.
- Developed a tool to create SSIS packages programmatically ( https://insight.codeplex.com/) using SSIS runtime libraries and execute the SSIS Packages using SSIS runtime libraries, Published the same as Free Source.
- Developed workflow automation tool (Meta Data Driven) in Java/Linux and C#/Windows. Can used as alternative to UC4, Control M and AutoSys.
- Experience in in reporting tools such as Tableau (Open Source), MS SQL Reporting Services (2005 / 2008 / 2012 / 2014 ).
Lead Developer & Data Architect
Confidential, San Francisco, CA
- Insight is targeted for Business Analysts (BA). BA uses insight to create simple actions such as (SELECT, JOIN, and AGGREGATE) and these actions are converted into SSIS ETL package.
- Automatic creation of SSIS package is obtained by using SSIS Design Time Library. Insight hides the complexities of building the SSIS package from the Business Analyst.
- Insight will be released as open source to user communities.
- Insight uses SSIS Design Time Library for SSIS package creation. SSIS Components, which are supported are provided below-
- Control Flow: Execute SQL Task and Script Component
- Data Flows: Data Flow Transforms such as Aggregate, Conditional Split, Derived Column, Sort, Union ALL, Multi Cast, Merge Join, Row Count, Script Component, OLE DB Command, OLE DB Source, and OLE DB Destination
Technical Architect / Lead
- Designed complex SSAS solutions using multiple dimensions, perspectives, hierarchies, measures groups. Designed OLAP cubes with star schema and multiple partitions using SSAS
- Designed complex SSIS packages with error handling as well as using various data transformations like conditional split, fuzzy look up, multi cast, column conversion, fuzzy grouping.
- Worked with tabular, matrix, gauges & chart reports, drill down reports as well as interactive reports according to business requirements.
- Designing and developing data warehouses, data marts and business intelligence using multi - dimensional models such as star schemas and snow flake schema for developing cubes using MDX.
- Built MDX queries and data mining expression (DMX) queries for analysis services & reporting services.
- Conversion of Sql Server DTS 2000 to Sql Server SSIS 2008 Packages for Development, Testing, Production environments.
- Created packages in SSIS with error handling as well as created complex SSIS packages using various data transformations like conditional split, lookups, aggregations, expressions, fuzzy look up, for each loop, multi cast, column conversion, fuzzy grouping, script components
- Migration Sql Server 2000 DTS packages in to Sql Server 2008 using built in tool
- Deploying SSRS Reports in Share point integration environment, allow the business users to interact to those reports and implement securities for sensitive reports
- Experience on created reports using stored procedure. Involved in scheduling, creating snapshot and subscription for the reports and triggers to facilitate consistent data entry into the database.
- Created projected income statements and other financial reports. Developed yearly, quarterly, monthly sales Reports, bench marking reports and commission reports
Environment: VS 2008, C#, VB.NET, Web Service, MS SQL Server 2008, MS Foundation, SSRS 2008, SSIS 2008, SSAS 2008, ADO.NET, XML, VBA, IIS, Agile Scrum, MS Visio