Data Engineer Resume
Englewood, CO
SUMMARY
- Eight plus years of experience in Analysis, Design, Development and Implementation as a Data Engineer.
- Expert in providing ETL solutions for any type of business model.
- Provided and constructed solutions for complex data issues.
- Experience in development and design of various scalable systems using Hadoop technologies in various environments. Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, MapReduce, Hive & PIG.
- Experience in understanding the security requirements for Hadoop.
- Extensive experience in working with Informatica Powercenter
- Implemented Integration solutions for cloud platforms with Informatica Cloud.
- Worked with Java based ETL tool, Talend.
- Proficient in SQL, PL/SQL and Python coding.
- Experience developing On - premise and Real Time processes.
- Excellent understanding of best practices of Enterprise Data Warehouse and involved in Full life cycle development of Data Warehousing.
- Expertise in DBMS concepts.
- Involved in building Data Models and Dimensional Modeling with 3NF, Star and Snowflake schemas for OLAP and Operational data store (ODS) applications.
- Skilled in designing and implementing ETL Architecture for cost effective and efficient environment.
- Optimized and tuned ETL processes & SQL Queries for better performance.
- Performed complex data analysis and provided critical reports to support various departments.
- Work with Business Intelligence tools like Business Objects and Data Visualization tools like Tableau.
- Extensive Shell/Python scripting experience for Scheduling and Process Automation.
- Good exposure to Development, Testing, Implementation, Documentation and Production support.
- Develop effective working relationships with client teams to understand and support requirements, develop tactical and strategic plans to implement technology solutions, and effectively manage client expectations.
- An excellent team member with an ability to perform individually, good interpersonal relations, strong communication skills, hardworking and high level of motivation.
TECHNICAL SKILLS
BigData Eco Systems: Hadoop, HDFS, MapReduce, Hive
Programming: Python
Data Warehousing: Informatica Power Center 9.x/8.x/7.x, Informatica Cloud, Talend Open studio & Integration suite
Applications: Salesforce, RightNow, Eloqua
Databases: Oracle (9i/10g/11g), SQL Server 2005
BI Tools: Business Objects XI, Tableau 9.1
Query Languages: SQL, PL/SQL, T-SQL
Scripting Languages: Unix, Python, Windows PowerShell
RDBMS Utility: Toad, SQL Plus, SQL Loader
Scheduling Tools: ESP Job Scheduler, Autosys, Windows scheduler
PROFESSIONAL EXPERIENCE
Confidential, Englewood, CO
Data Engineer
Responsibilities:
- Worked on AWS Data pipeline to configure data loads from S3 to into Redshift.
- Using AWS Redshift, I Extracted, transformed and loaded data from various heterogeneous data sources and destinations
- Created Tables, Stored Procedures, and extracted data using T-SQL for business users whenever required.
- Performs data analysis and design, and creates and maintains large, complex logical and physical data models, and metadata repositories using ERWIN and MB MDR
- I has written shell script to trigger data Stage jobs.
- Assist service developers in finding relevant content in the existing reference models.
- Like Access, Excel, CSV, Oracle, flat files using connectors, tasks and transformations provided by AWS Data Pipeline.
- Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
- Worked on developing Pyspark script to encrypting the raw data by using Hashing algorithms concepts on client specified columns.
- Responsible for Design, Development, and testing of the database and Developed Stored Procedures, Views, and Triggers
- Developed Python-based API (RESTful Web Service) to track revenue and perform revenue analysis.
- Compiling and validating data from all departments and Presenting to Director Operation.
- KPI calculator Sheet and maintain dat sheet within SharePoint.
- Created Tableau reports with complex calculations and worked on Ad-hoc reporting using PowerBI.
- Creating data model dat correlates all the metrics and gives a valuable output.
- Worked on the tuning of SQL Queries to bring down run time by working on Indexes and Execution Plan.
- Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and upload into the Data warehouse servers.
- Pre-processing using Hive and Pig.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
- Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
- Implemented Copy activity, Custom Azure Data Factory Pipeline Activities
- Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell.
- Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
- Migration of on-premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2).
- Developed a detailed project plan and halped manage the data conversion migration from the legacy system to the target snowflake database.
- Design, develop, and test dimensional data models using Star and Snowflake schema methodologies under the Kimball method.
- Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight
- Developed data pipeline using Spark, Hive, Pig, python, Impala, and HBase to ingest customer
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Ensure deliverables (Daily, Weekly & Monthly MIS Reports) are prepared to satisfy the project requirements cost and schedule
- Worked on a direct query using PowerBI to compare legacy data with the current data and generated reports and stored and dashboards.
- Designed SSIS Packages to extract, transfer, load (ETL) existing data into SQL Server from different environments for the SSAS cubes (OLAP)
- SQL Server reporting services (SSRS). Created & formatted Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP, Subreports, ad-hoc reports, parameterized reports, interactive reports & custom reports
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets using PowerBI
- Developed visualizations and dashboards using PowerBI
- Used ETL to implement the Slowly Changing Transformation, to maintain Historically Data in Data warehouse.
- Performing ETL testing activities like running the Jobs, Extracting the data using necessary queries from database transform, and upload into the Data warehouse servers.
- Created dashboards for analyzing POS data using Power BI
Environment: MS SQL Server 2016, T-SQL, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), SQL Server Analysis Services (SSAS), Management Studio (SSMS), Advance Excel (creating formulas, pivot tables, Hlookup, Vlookup, Macros), Spark, Python, ETL, Power BI, Tableau, Hive/Hadoop, Snowflakes, Power BI, AWS Data Pipeline, IBM Cognos 10.1, Data Stage, Cognos Report Studio 10.1, Cognos 8 & 10 BI, Cognos Connection, Cognos office Connection, Cognos 8.2/3/4, Data stage and Quality Stage 7.5
Confidential, Mountain view, CA
Data Engineer
Responsibilities:
- Built reporting data warehouse from ERP system using Order Management, Invoice & Service contracts modules.
- Extensive work in Informatica Powercenter.
- Acted as SME for Data Warehouse related processes.
- Performed Data analysis for building Reporting Data Mart.
- Worked with Reporting developers to oversee the implementation of report/universe designs.
- Tuned performance of Informatica mappings and sessions for improving the process and making it efficient after eliminating bottlenecks.
- Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tasks
- Worked with PowerShell and UNIX scripts for file transfer, emailing and other file related tasks.
- Worked with deployments from Dev to UAT, and then to Prod.
- Worked with Informatica Cloud for data integration between Salesforce, RightNow, Eloqua, WebServices applications
- Expertise in Informatica cloud apps Data Synchronization (ds), Data Replication (dr), Task Flows & Mapping configurations.
- Worked on migration project which included migrating webmethods code to Informatica cloud.
- Implemented Proof of concepts for SOAP & REST APIs
- Built web services mappings and expose them as SOAP wsdl
- Worked with Reporting developers to oversee the implementation of reports/dashboard designs in Tableau.
- Assisted users in creating/modifying worksheets and data visualization dashboards in Tableau.
- Tuned and performed optimization techniques for improving report/dashboard performance.
- Assisted report developers with writing required logic and achieve desired goals.
- Met End Users for gathering and analyzing the requirements.
- Worked with Business users to identify root causes for any data gaps and developing corrective actions accordingly.
- Created Ad hoc Oracle data reports for presenting and discussing the data issues with Business.
- Performed gap analysis after reviewing requirements.
- Identified data issues within DWH dimension and fact tables like missing keys, joins, etc.
- Wrote SQL queries to identify and validate data inconsistencies in data warehouse against source system.
- Validated reporting numbers between source and target systems.
- Finding a technical solution and business logic for fixing any missing or incorrect data issues identified
- Coordinating and providing technical details to reporting developers
Environment: Informatica Power Center 9.5/9.1, Informatica Cloud, Oracle 10g/11g, SQL Server 2005, Tableau 9.1, Salesforce, RightNow, Eloqua, Web Methods, PowerShell, Unix
Confidential, Vernon Hills, IL
Data Engineer
Responsibilities:
- Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP
- Strong understanding of AWS components such as EC2 and S3
- Implemented a Continuous Delivery pipeline with Docker and Git Hub
- Worked with g-cloud function with Python to load Data in to Bigquery for on arrival csv files in GCS bucket
- Process and load bound and unbound Data from Google pub/sub topic to Bigquery using cloud Dataflow with Python.
- Devised simple and complex SQL scripts to check and validate Dataflow in various applications.
- Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python.
- Developed and deployed data pipeline in cloud such as AWS and GCP
- Performed data engineering functions: data extract, transformation, loading, and integration in support of enterprise data infrastructures - data warehouse, operational data stores and master data management
- Responsible for data services and data movement infrastructures good experience with ETL concepts, building ETL solutions and Data modeling
- Architected several DAGs (Directed Acyclic Graph) for automating ETL pipelines
- Hands on experience on architecting the ETL transformation layers and writing spark jobs to do the processing.
- Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, writing applications)
- Experience in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension)
- Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance.
- Developed logistic regression models (Python) to predict subscription response rate based on customers variables like past transactions, response to prior mailings, promotions, demographics, interests, and hobbies, etc.
- Develop near real time data pipeline using spark
- Process and load bound and unbound Data from Google pub/sub topic to Bigquery using cloud Dataflow with Python
- Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver
- Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines
- Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling
- Worked on confluence and Jira skilled in data visualization like Matplotlib and seaborn library
- Hands on experience with big data tools like Hadoop, Spark, Hive
- Experience implementing machine learning back-end pipeline with Pandas, Numpy
Environment: Gcp, Bigquery, Gcs Bucket, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, Gsutil, Docker, Kubernetes, AWS, Apache Airflow, Python, Pandas, Matplotlib, seaborn library, text mining, Numpy, Scikit-learn, Heat maps, Bar charts, Line charts, ETL workflows, linear regression, multivariate regression, Python, Scala, Spark
Confidential
ETL Developer
Responsibilities:
- Gatheird business requirements and prepared technical design documents, target to source mapping document, mapping specification document.
- Extensively worked on Informatica Powercenter.
- Parsed complex files through Informatica Data Transformations and loaded it to Database.
- Optimized query performance by oracle hints, forcing indexes, working with constraint based loading and few other approaches.
- Extensively worked on UNIX Shell Scripting for splitting group of files to various small files and file transfer automation.
- Worked with Autosys scheduler for scheduling different processes.
- Performed basic and unit testing.
- Assisted in UAT Testing and provided necessary reports to the business users.
Environment: Informatica Power Center 8.6, Oracle 10g/11g, UNIX Shell Scripting, Autosys
Confidential
SQL Developer
Responsibilities:
- Gatheird business requirements and converted them into new T-SQL stored procedures in visual studio for database project.
- Performed unit tests on all code and packages.
- Analyzed requirement and impact by participating in Joint Application Development sessions with business client online.
- Performed and automated SQL Server version upgrades, patch installs and maintained relational databases.
- Performed front line code reviews for other development teams.
- Modified and maintained SQL Server stored procedures, views, ad-hoc queries, and SSIS packages used in the search engine optimization process.
- Updated existing and created new reports using Microsoft SQL Server Reporting Services. Team consisted of 2 developers.
- Created files, views, tables and data sets to support Sales Operations and Analytics teams
- Monitored and tuned database resources and activities for SQL Server databases.