Etl Developer Resume
0/5 (Submit Your Rating)
San Francisco, CA
SUMMARY:
- 4 years in Data ingestion, Transformation and Data Analysis, Data Visualization using Bigdata Hortenworks platform Map Reduce, HDFS, Hive, Sqoop, Hbase, Yarn, Impyla, Spark, Scala, Oozie and Kafka.
- 15+ years of experience in Data modeling, ETL Solutions, Application Development (Windows and Web) and handled all stages of SDLC such as analysis, estimation, requirement gathering, design, development, testing,, and target deployment.
- Thorough understanding of Kimball strategies. Good experience in Dimensional Modeling. Logical & Physical Model design, using design tools MS Visio and ErWin. Designing Facts and Dimension Tables (Star Schema and Snowflake Scheme).
- Experience in extracting encrypted data from Amazon Redshift and UNLOAD to S3 Bucket. Extract the data using boto/boto3 services and load to hive.
- Experience in data import and export data between HDFS and Relational Database Management systems using Sqoop.
- Experience in developing Map Reduce/Yarn Programs using Java, Scala for data modelling.
- Using python to connect to Hive using pyHive and perform data analysis.
- Experience in OOP using Java, Scala and Python. Developed reusable framework for performing DQ checks.
- Hive and Spark SQL’s performance tuning and optimization.
- Conversion of SQL server SQL’s to Hive and Spark SQL’s to reuse the business logic. Involved in converting SQL queries into Spark transformations using Spark RDDs, Scala.
- Data Visualization using Tableau 8.0 ~ 10.0. Publish the report as PDF to end users.
- Importing data from external systems using importing the real - time data toHadoopusing Kafka and implemented Oozie jobs for daily imports.
- Developed shell scripts to load the files to hive database using impala passing parameters for HiveQL and partitions. UC4 for workflow automation.
- Experience in design /Development/Implementation of large scale ETL infrastructure using SQL Server Integration Services.
- Experience in designing and building Data Marts using SQL Services Analysis Services ( / ).
- Experience in working on databases such as SQL Server (2005, 2008, 2010, 2012,2014), Oracle 12c and My SQL.
- Experience in programing languages like C, Java, Python, VB.NET, C#, Shell Scripting, Java and VB Scripts. Experience in building and importing C APIs in Python, Multi-Threading, Data Access and Splitting large files.
- Work with the team in Managing Multiple Source Systems total with around 15,000 ETL Packages (SSIS, Talend Open Studio).
- Data Loading of 30,000 source files of complex nature to 100K tables, monthly around 1TB after data compression.
- Developed a tool to create SSIS packages programmatically ( https://insight.codeplex.com/) using SSIS runtime libraries and execute the SSIS Packages using SSIS runtime libraries, Published the same as Free Source.
- Developed workflow automation tool (Meta Data Driven) in Java/Linux and C#/Windows. Can used as alternative to UC4, Control M and AutoSys.
- Experience in in reporting tools such as Tableau (Open Source), MS SQL Reporting Services ( / ).
PROFESSIONAL EXPERIENCE:
Confidential
ETL Developer
Responsibilities:
- Data Loading of 30,000 source files of complex nature to 100K tables, monthly around 1TB after data compression. ETL Production support to meet the SLA's timelines.
- Hadoop, Hive, Spark
- Design and Build Data Ingestion, Loading the RAWSINK and RAW layer in Paypal/Argus Datalake. Used UC4, Auto Sys, Control M and Custom workflow automation framework (Java/Python) for Hive Data ingestion. Fact and Dimension tables maintained in Hive Database. Developed data model in hive with spark SQL and hive SQL, Loading the slowly changing dimension table and incremental fact tables. Load the incoming files in RAW and RAWSINK layers in Datalake. Tableau reports to connect to hive and generate the financial reports from the data model. Complex transformations are developed using Hive
- QL, Spark SQL, Spark RDD using Scala. Developed user defined functions in hive to cleanse sensitive data from JSON data stored in hive, which were written in java Involved in loading and transforming large sets of Structured, Semi structured and unstructured data from relational databases into - HDFS using Sqoop imports. Also exporting the data to MySQL. Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Connect to AWS redshift and execute query using Python and PostgreSQL libraries and UNLOAD the data to S3. Connect to S3 using Boto/Boto3. Worksheets, User Story Dashboard created in Tableau such as (KPI, trend, Matrix etc..)
- SQL Server, SSIS, SSRS and SSAS
- Work with the team in Managing Multiple Source Systems total with around 15,000 ETL Packages (SSIS, Talend Open Studio). Customize SSIS Data Loaders using script component using VB.NET and C#..
- Agile Development and Documentation (Source ~ Target), User Story, Data Analysis Design, Dimension Bus Matrix.
- Design and Develop ETL infrastructure using SSIS 2008/2012/2014. Design Data Models with Fact and various Dimension tables. Develop complex Data Marts in SSAS. MDX queries for reporting. SSIS transformation to load the Fact and Dimensions tables.
- MS Excel Dash board and for reporting. Design various Dimensions and Fact tables, Building the Analysis cubes.
- Performance tuning SQL s, SSIS Packages, SP s, SSRS reports etc. Backup and Restore databases, Manage permissions and High availability database systems. Setup Disaster Recovery environments.
- Manage high volume data by Table Partitions and Cube Partitions.
- Data Profiling and Debug the SQL s using SSMS. C Programing to split the large files and also using V-Edit.
Confidential, San Francisco, CA
Lead Developer & Data Architect
Responsibilities:
- Insight is a tool developed for the automation of ETL Package creation and execution http://insight.codeplex.com.
- Insight is targeted for Business Analysts (BA). BA uses insight to create simple actions such as (SELECT, JOIN, and AGGREGATE) and these actions are converted into SSIS ETL package. Automatic creation of SSIS package is obtained by using SSIS Design Time Library. Insight hides the complexities of building the SSIS package from the Business Analyst. Insight will be released as open source to user communities.
- Insight uses SSIS Design Time Library for SSIS package creation. SSIS Components, which are supported are provided below-
- Control Flow: Execute SQL Task and Script Component
- Data Flows: Data Flow Transforms such as Aggregate, Conditional Split, Derived Column, Sort, Union ALL, Multi Cast, Merge Join, Row Count, Script Component, OLE DB Command, OLE DB Source, and OLE DB Destination
- C#, VB.NETSSIS 2008
- SSIS Runtime Libraries
- SSIS Design Time Libraries
Confidential, San Francisco, CA
Technical Architect / Lead
Responsibilities:
- Designed complex SSAS solutions using multiple dimensions, perspectives, hierarchies, measures groups. Designed OLAP cubes with star schema and multiple partitions using SSAS
- Designed complex SSIS packages with error handling as well as using various data transformations like conditional split, fuzzy look up, multi cast, column conversion, fuzzy grouping.
- Worked with tabular, matrix, gauges & chart reports, drill down reports as well as interactive reports according to business requirements.
- Designing and developing data warehouses, data marts and business intelligence using multi - dimensional models such as star schemas and snow flake schema for developing cubes using MDX.
- Built MDX queries and data mining expression (DMX) queries for analysis services & reporting services.
- Conversion of Sql Server DTS 2000 to Sql Server SSIS 2008 Packages for Development, Testing, Production environments.
- Created packages in SSIS with error handling as well as created complex SSIS packages using various data transformations like conditional split, lookups, aggregations, expressions, fuzzy look up, for each loop, multi cast, column conversion, fuzzy grouping, script components
- Migration Sql Server 2000 DTS packages in to Sql Server 2008 using built in tool
- Deploying SSRS Reports in Share point integration environment, allow the business users to interact to those reports and implement securities for sensitive reports
- Experience on created reports using stored procedure. Involved in scheduling, creating snapshot and subscription for the reports and triggers to facilitate consistent data entry into the database.
- Created projected income statements and other financial reports. Developed yearly, quarterly, monthly sales Reports, bench marking reports and commission reports
Environment: VS 2008, C#, VB.NET, Web Service, MS SQL Server 2008, MS Foundation, SSRS 2008, SSIS 2008, SSAS 2008, ADO.NET, XML, VBA, IIS, Agile Scrum, MS Visio