Sr Data Engineer Resume
Elk Grove, CA
SUMMARY
- 9 years of IT experience in E2E data platforms as Sr Data Engineer.
- Expertise in using Cloud based managed services for data warehousing/analytics in Microsoft Azure (Azure Data Lake Analytics, Azure Data Lake Storage, Azure Data Factory v1/v2, Azure Table Storage, U - SQL, Stream Analytics, HDInsight, Spark and DataBricks etc.).
- Good knowledge on Hadoop ecosystem such as Hive, HBase, Sqoop, MapReduce, Zookeeper and Spark for data storage and data analysis.
- Hands-on experience in developing Logic App workflows for performing event-based data movement, perform file operations on ADLS,Blob Storage, SFTP/FTP Servers, getting/manipulating data in Azure SQL Server/Azure DW.
- Proficient in developing strategies for Enterprise DW using the leading ETL mechanism tools like SQL Server Integration Services (SSIS), Informatica and Talend.
- Proven experience on Oracle, PostgreSQL, Snowflake, PL-SQL, SQL SERVER 2019/2017 Database design and developing T-SQL, views, Stored-Procedures, functions, cursors, triggers, CTE, create SQL jobs, Performance tuning and scheduling.
- Excellent with Relational and Dimensional Modeling techniques likeStar, Snowflake Schema, OLTP, OLAP, Normalization, Fact and Dimensional Tables.
- Skilled in designing and implementingETL Architecturefor cost/efficient effective environment.
- Remarkable work experience in implementing ETL packages using SSIS 2017/Azure Data Factory.
- Expertise in Performance Tuning, Index Tuning Wizard, Maintenance, Troubleshooting, Query Optimization, Client/Server connectivity and Database Consistency Checks using DBCC Utilities, DMVs and DMFs.
- Hands-on experience in Python and Hive scripting.
- Proficient in performing Exploratory, Root cause, Impact analysis on large volumes of datasets.
- Excellent knowledge on C# programming to develop Azure Event Hubs, Function Apps, business objects and develop solutions as per the business requirement.
- Extensively worked on BE to build API’s using .Net Core API and NodeJs.
- Excellent knowledge on Linux shell scripting and Autosys job scheduling.
- Expert in creating trends and charts using QlikView and Power BI
- Proficient with version controlling tools GIT, CVS and TFS.
- Good verbal and written communication skills.
TECHNICAL SKILLS
Cloud Platform: Azure: Azure Data Factory v1/v2, BLOB Storage, Azure Data Lake Store (ADLS), Key Vaults, Azure SQL DB, SQL server, Data Bricks, Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub, App Services, Logic Apps, Event Grid, Azure DevOps, GIT Repository Management. AWS: AWS, AWS S3, EC2, Kafka-Lenses.
ETL Tools: Talend Data Studio, SSIS, Informatica Power Center, PDI
Languages: Apache Spark, Python, C#, NodeJS, U-SQL, T-SQL, LINUX Shell Scripting, AZURE PowerShell, Scala/Java.
Hadoop Ecosystems: HDFS, HBase, Hive, Sqoop, Yarn, Spark, Spark SQL, Kafka-Lenses
Databases: Oracle, Microsoft SQL Server 2019/2017, PostgreSQL and MySQL.
Reporting Tools: QlikView, Power BI, SAP-BOBJ Reports, OBIEE, SSRS.
IDE’s: VS Code, Microsoft Visual Studio
Operating Systems: Linux, Windows.
PROFESSIONAL EXPERIENCE
Sr Data Engineer
Confidential, Elk Grove, CA
Responsibilities:
- Closely worked with platform business analysts, and architect to design the new system and gathered the system requirements.
- Developed a detailed project plan and helped manage the data conversion migration from the legacy system to the target EDW Azure data warehouse.
- Develop U-SQL scripts to perform Extract, Transform, Load (ETL) on files in Data Lake, creating views and custom transformations using C# language.
- Creating data model that correlates all the metrics and gives a valuable output.
- Involved in ongoing monitoring, automation and refinement of data engineering solutions prepare complex SQL Views, Stored Procedures in Azure SQLDW.
- Develop custom Extractors, Out putters for customized processing of files, which are integrated with U-SQL scripts.
- Develop general and parameterized Azure Data Factory Pipelines, Activities, Data Sets, Linked Services.
- Develop Web Hooks and Trigger based Azure Function Apps. Which can be interacted with Azure Storage Triggers, Event Hub Triggers, Event Grid Triggers.
- Develop SQL Server Integration Services (SSIS) packages to extract data from SQL Data Base, Running Powershell commands to interact with Azure Data Lake, Azure Data Lake Analytics, Azure Data Factory.
- Develop code components using Azure C#/Python/Node.js libraries.
- Migrated more than 100 Talend jobs to Azure Cloud using Azure Data Factory Pipelines.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
- Primarily involved in Data Migration using SQL, SQL Azure, Azure Data Factory, SSIS, PowerShell and Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure DW) and processing the data in In Azure Databricks.
- Used ETL to implement the Slowly Changing Transformation, to maintain Historically Data in Data warehouse.
- Worked with Data mapping, High level design and Detailed design documents.
- Ensure deliverables of the KPI’s (Daily, Weekly, MTD & YTD) are prepared to satisfy the project requirements and business needs.
- QlikView application level (L3) calculation extracted and enhanced the backend jobs as per the requirements, QlikView auto job scheduling
- Worked on a direct query using Power BI to compare legacy data with the current data and generated reports.
- Created action filters, parameters, and calculated sets for preparing visualizations, dashboards and worksheets using Power BI
- Windows Batch scheduling’s created Jobs & Alerts using SQL server Agent.
Environment: Python, AWS, AWS-EC2, AWS-S3, ADFS, ADLS, AZURE Datawarehouse, Synapse, SSIS, Spark, Hive/Hadoop, AWS Data Pipeline, SFTP, Batch Schedule, Oracle, PL/SQL, MS SQL Server 2017/2019, T-SQL, PostgreSQL, Neo4J (for POC), C#.Net, NodeJS-API, QlikView, Power BI, SQL Server Reporting Services (SSRS).
Data Engineer
Confidential, Sacramento, CA
Responsibilities:
- Closely worked with business analyst, users, and project manager/architect to design the new system and gathered the system requirements.
- Implemented Python REST API’s to connect different file systems (HDFS, SFTP and Aws S3), Database systems (ORACLE, PostgreSQL, MySQL, Cassandra and Mongo DB) and external systems (SAP, Salesforce and Kinaxis) for ETL process.
- Develop Python/Hive scripting to process massive files using the Hadoop technology.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
- Involved in creating Hive tables and then applied SQOOP and HiveQL on those tables data validation.
- Worked with Avro and Parque file formats and used various compression techniques to leverage the storage in HDFS.
- Implemented to upload and process large scale of files within short time using Spark RDD/DF technologies.
- Widely used MoveIt/SFTP for file transfer purposes.
- Design, Modified, Improved Python, JavaScript by code refactor.
- Responsible for Design and Develop the Scala and Node based technologies.
- Worked closely with Business and Technical Design architects to understand the flow.
- Automated data extract and load into different systems under SCM with high security.
- Developed visualizations and dashboards usingQlikView.
- Windows Batch scheduling’s created Jobs & Alerts using SQL server Agent.
Environment: Python, Pandas, SQOOP, Hive/Hadoop, Spark, SFTP, Window Scheduler, Batch Schedule, Oracle, Excel, PL/SQL, MS SQL Server 2017/2019, QlikView.
Lead ETL Engineer
Confidential, Charlotte, NC
Responsibilities:
- Analyzed the source systems and involved in designing the ETL data load.
- Developed/designed Informatica mappings by translating the business requirements.
- Worked in various transformation like Lookup, Joiner, Sorter, Aggregator, Router, Rank and source qualifier to create complex mapping.
- Worked with Data mapping, High level design and Detailed design documents.
- Involved in performance tuning of the mappings using various components like Parameter Files, round robin and key range partitioning to ensure source and target bottlenecks were removed.
- Extensively worked on Stored Procedures, Functions, triggers, and Views usingT-SQL for Data Analysis, Profiling and Reporting purposes.
- Involved in production support & implementation part and fixing the issues as part of the SLA and provided L1 and L2 level support on production environments.
- Experience process of AML files using Calypso and generate daily reports to the compliance team.
- Extensively involved inPerformance Tuning and Query Optimization.
- Worked with Teradata utilities like Fast load and Multiload.
- Tuned the mappings by removing the Source/Target bottlenecks and Expressions to improve the throughput of the loads.
- Implemented new Mappings & Transformation as per the business requirements.
- Gathered all the departments’ users to get appropriate reports from new channel.
- Have prepared around 20+ transactional databases ready for backup and restoration.
- Used Autosys for scheduling various data cleansing scripts and loading processes.
- Created Jobs, Alerts, SQL Mail Agent, and schedule SSIS Packages.
- Involved in fixing the security & vulnerability issues.
- Involved in promoting source code to different environments (like Dev, UAT, Pilot and Production).
- Unit Test Case preparation and review. Involved in both Unit level and System level testing.
Environment: Informatica, Calypso, ETL- SQL Server Integration Services (SSIS), Teradata, SQL Server, SFTP, MoveIT, Batch Schedule, Oracle, PL/SQL, MS SQL Server 2014, T-SQL, QlikView, Autosys, SQL Agent.
Sr ETL Developer
Confidential, SFO, CA
Responsibilities:
- Involved in build Jobs by looking to the ETL specification documents.
- Requirement gathering, documentation and application Development.
- Responsible for develop the jobs to implement the address validations, clean and standardization on Talend ETL with different components like tRecordMaching, tFuzzyMatch, tMatchGroup, DI, DP, DQ components used features such as context variables and database components.
- Maintained historical data CDC, by using the slowly changing dimensions SCD-2.
- Responsible for run the talend jobs by using TAC.
- Involved in the performance tuning part in SQL/PLSQL queries.
- Involved to develop the modelling using Talend MDM at the same time responsible to develop the DI jobs to populate the data in REF/XREF tables.
- Experienced using the Talend with Big data using Hive, Sqoop, HDFS components.
- Implemented QlikView reports for High level departments.
- Used SVN as version control for the Talend Jobs to maintain history.
Environment: Talend, Core Java, Hive, Talend Studio, Oracle, PL-SQL, SQL Server, SOAP/REST API’s, SVN
Software Engineer
Confidential
Responsibilities:
- Involved in Designing and coding of the UI screens using ASP.Net web forms and custom classes.
- Requirement gathering, documentation and application Development.
- ImplementedjQueryforclient-side validations.
- Involved in writing Stored Procedures, Functions, Views, Triggers and Creating SQL Transactions.
- Supported the production incidents.
- ImplementedLINQfor querying, sorting, filtering the complex objects.
- Implementing data access layer with web services.
- Involved in fixing the security & vulnerability issues.
- Responsible for source code scanning for security vulnerabilities with FORTIFY tool and fixing the issues.
- Implemented workflow to upload documents in agent login portal.
- Involved in creating the roles based on the user requirement.
- Created multilayered Class Libraries to support SOA architecture.
- Involved in ETL processes to create packages, mappings, and scheduling.
- Involved in implementing HTML Reports, SSRS Reports.
- Involved in promoting source code to different environments (like Dev, UAT, Pilot and Production).
- Implemented SSIS packages to sync data from multiple environments (AS400, Oracle, SQL Server)
- Designed and developed complex stored procedures in SQL Server.
- Created portal reports using Crystal Reports.
- Unit Test Case preparation and review. Involved in both Unit level and System level testing.
Environment: C#.Net, .NET Framework 2.0/3.5, Visual Studio 2010, jQuery, FortifyScan, SSRS, Crystal Report 12.1, AntiXSS Library, AjaxControls, Cryptographic, SSIS and SQL-Server2008