We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

0/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • Eight plus years of experience in Analysis, Design, Development, and Implementation as aData Engineer.
  • Expert in providingETL solutionsfor any type of business model.
  • Excellent understanding of best practices ofEnterprise Data Warehouseand involved in Full life cycle development ofData Warehousing.
  • Involved in buildingData ModelsandDimensional Modelingwith3NF, Star and Snowflakeschemas forOLAPandOperational data store (ODS)applications.
  • Skilled in designing and implementingETL Architecturefor a cost effective and efficient environment.
  • Optimized and tuned ETL processes & SQL Queries for better performance.
  • Performed complexdata analysisand provided critical reports to support various departments.
  • Work with Business Intelligence tools likeBusiness Objectsand Data Visualization tools likeTableau.
  • ExtensiveShell/Python scriptingexperience for Scheduling and Process Automation.
  • Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage, Composer.
  • Building Glue crawlers, catalogs, and integration with on - prem and AWS services.
  • Wrote POWERSHELL scripts to copy or move data from local file system to HDFS Blob storage.
  • Worked on scalable distributed data system using Hadoop ecosystem in AWS EMR, MapR distribution.
  • Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star - Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables
  • Developed Python code to gather the data from HBase and designs the solution to implement using Spark.
  • Implementing Hive queries to Spark SQL to reduce the overall batch time.
  • Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
  • Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream-processing)
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Highly motivated to work on Python, Spark scripts and performance tuning to improve efficiency and Data Quality.
  • Good Experience in Linux Bash scripting and following PEP Guidelines in Python.
  • Worked on Kafka for streaming data and on data ingestion.
  • Developed UNIX Shell scripts to automate repetitive database processes.
  • Worked on Bootstrap scripts to upgrade/downgrade and install custom applications on EMR and building dependent data catalogs on EKS.
  • Provided and constructed solutions for complex data issues.
  • Experience in development and design of various scalable systems usingHadooptechnologies in various environments. Extensive experience in analyzing data using Hadoop Ecosystems includingHDFS, MapReduce, Hive & PIG.
  • Experience in understanding the security requirements for Hadoop.
  • Extensive experience in working withInformatica PowerCenter.
  • ImplementedIntegration solutionsforcloud platformswithInformatica Cloud.
  • Worked with Java based ETL tool,Talend.
  • Proficient inSQL, PL/SQL,andPythoncoding.
  • Experience developingOn - premisesandReal-Time processes.
  • Good exposure to Development, Testing, Implementation, Documentation and Production support.
  • Develop effective working relationships with client teams to understand and support requirements, develop tactical and strategic plans to implement technology solutions, and effectively manage client expectations.
  • An excellent team member with an ability to perform individually, good interpersonal relations, strong communication skills, hardworking and high level of motivation.

TECHNICAL SKILLS

Big Data Eco Systems: Hadoop, HDFS, MapReduce, Hive

Programming: Python

Data Warehousing: Informatica Power Center 9.x/8.x/7.x, Informatica Cloud, Talend Open studio & Integration suite

Applications: Salesforce, RightNow, Eloqua

Databases: Oracle (9i/10g/11g), SQL Server 2005

BI Tools: Business Objects XI, Tableau 9.1

Query Languages: SQL, PL/SQL, T-SQL

Scripting Languages: Unix, Python, Windows PowerShell

RDBMS Utility: Toad, SQL Plus, SQL Loader

Scheduling Tools: ESP Job Scheduler, Autosys, Windows scheduler

PROFESSIONAL EXPERIENCE

Confidential

Sr. Data Engineer

Responsibilities:

  • Experience in EMR, ECS, Glue, Athena, Airflow and Jupyter notebooks for various data engineering use cases.
  • Work on Data Quality, Referential Integrity and Trend Analysis on Partner Card data and provide an application for the user to trigger/schedule these jobs.
  • Work on Requirements, Analysis and Design of Airflow/Spark core engine architecture
  • Python core and Django middleware building web applications.
  • Usage of Pyspark and Python to build core engines for Data validation and Analysis.
  • ReactJS and Victory charts for visualization of trend analysis
  • Experience in GCP Dataproc, GCS, Cloud functions, Big Query.
  • Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery.
  • Design and build GCP data driven solutions for enterprise data warehouse and data lakes.
  • Implemented metadata standards, data governance and stewardship, master data management, ETL, ODS, data warehouse, data marts, GCP, reporting, dashboard, analytics, segmentation, and predictive modelling.
  • Work on Financial data and tokenization of sensitive data
  • Work on Jenkins pipelines to trigger and schedule the jobs.
  • Building and migrating metadata catalogs to Production
  • Work on SSO and configuration for RStudio-Connect
  • Work on Integration and infrastructure build for Jupyter notebooks and serverless Jupyter lab architecture for RStudio-Connect.
  • Production support for Spark jobs.
  • Using google cloud function with Python to load Data into Big Query for on arrival csv files in GCS bucket.
  • Worked on Google Cloud Platform (GCP) services like compute engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring and cloud deployment manager.
  • Splunk RBAC Design to Configure roles and capabilities to restrict indexes and scheduled searches for specific users.
  • Splunk Custom API Data Pull to develop a script to pull data from a cloud API and forward to HEC.
  • Handle huge datasets and manipulations, partitions, and bucketing Parquet datasets.
  • Experience in creating accumulators and broadcast variables in Spark.
  • Work with huge Snowflake and S3 datasets and using Snow SQL/ Snowflake worksheets.
  • Production Support for RStudio-Connect infrastructure and using iframes to configure and display reports.
  • Work on generating reports based on financial requirements for Credit and ability to pay.
  • Act as liaison between teams for managing infrastructure for RStudio-Connect and Python applications.

Environment: Apache Spark, Python, HDFS, Azure Data Lake, AWS. Azure Databricks, Azure Cosmos DB, Azure Service Bus, Azure SQL database, Azure SQL Datawarehouse.

Confidential, Chicago, IL

Data/DevOps Engineer

Responsibilities:

  • Built reporting data warehouse from ERP system using Order Management, Invoice & Service contracts modules.
  • Extensive work in Informatica PowerCenter.
  • Acted as SME for Data Warehouse related processes.
  • Performed Data analysis for building Reporting Data Mart.
  • Worked with Reporting developers to oversee the implementation of report/universe designs.
  • Tuned performance of Informatica mappings and sessions for improving the process and making it efficient after eliminating bottlenecks.
  • Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tasks.
  • Worked with PowerShell and UNIX scripts for file transfer, emailing and other file related tasks.
  • Worked with deployments from Dev to UAT, and then to Prod.
  • Worked with Informatica Cloud for data integration between Salesforce, RightNow, Eloqua, Web Services applications.
  • Expertise in Informatica cloud apps Data Synchronization (ds), Data Replication (dr), Task Flows & Mapping configurations.
  • Worked on migration project which included migrating web methods code to Informatica cloud.
  • Implemented Proof of concepts for SOAP & REST APIs
  • Built web services mappings and expose them as SOAP wsdl.
  • Worked with Reporting developers to oversee the implementation of reports/dashboard designs in Tableau.
  • Assisted users in creating/modifying worksheets and data visualization dashboards in Tableau.
  • Tuned and performed optimization techniques for improving report/dashboard performance.
  • Assisted report developers with writing required logic and achieve desired goals.
  • Met End Users for gathering and analyzing the requirements.
  • Worked with Business users to identify root causes for any data gaps and developing corrective actions accordingly.
  • Created Ad hoc Oracle data reports for presenting and discussing the data issues with Business.
  • Performed gap analysis after reviewing requirements.
  • Identified data issues within DWH dimension and fact tables like missing keys, joins, etc.
  • Wrote SQL queries to identify and validate data inconsistencies in data warehouse against source system.
  • Validated reporting numbers between source and target systems.
  • Finding a technical solution and business logic for fixing any missing or incorrect data issues identified
  • Coordinating and providing technical details to reporting developers

Environment: Informatica Power Center 9.5/9.1, Informatica Cloud, Oracle 10g/11g, SQL Server 2005, Tableau 9.1, Salesforce, RightNow, Eloqua, Web Methods, PowerShell, Unix.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

  • Implemented reporting Data Warehouse with online transaction system data.
  • Developed and maintained data warehouse for PSN project.
  • Provided reports and publications to Third Parties for Royalty payments.
  • Managed user account, groups, and workspace creation for different users in PowerCenter.
  • Wrote complex UNIX/windows scripts for file transfers, emailing tasks from FTP/SFTP.
  • Worked with PL/SQL procedures and used them in Stored Procedure Transformations.
  • Extensively worked on oracle and SQL server. Wrote complex SQL queries to query ERP system for data analysis purpose.
  • Worked on most critical Finance projects and had been the go-to person for any data related issues for team members.
  • Migrated ETL code from Talend to Informatica. Involved in development, testing and postproduction for the entire migration project.
  • Documented the code.
  • Tuned ETL jobs in the new environment after fully understanding the existing code.
  • Maintained Talend admin console and provided quick assistance on production jobs.
  • Involved in designing Business Objects universes and creating reports.
  • Built adhoc reports using stand-alone tables.
  • Involved in creating and modifying new and existing Web Intelligence reports.
  • Created Publications which split into various reports based on specific vendor.
  • Wrote Custom SQL for some complex reports.
  • Worked with business partners internal and external during requirement gathering.
  • Worked closely with Business Analyst and report developers in writing the source to target specifications for Data warehouse tables based on the business requirement needs.
  • Exported data into excel for business meetings which made the discussions easier while looking Confidential the data.
  • Performed analysis after requirements gathering and walked team through major impacts.
  • Provided and debugged crucial reports for finance teams during the month end period.
  • Addressed issue reported by Business Users in standard reports by identifying the root cause.
  • Get the reporting issues resolved by identifying whether it is reporting related issue or source related issue.
  • Creating Ad hoc reports as per user needs.
  • Investigating and analyzing any discrepancy found in data and then resolving it.

Environment: Informatica Power Center 9.1/9.0, Talend 4.x & Integration suite, Business Objects XI, Oracle 10g/11g, Oracle ERP, EDI, SQL Server 2005, UNIX, Windows Scripting, JIRA.

Confidential

ETL Developer

Responsibilities:

  • Gathered business requirements and prepared technical design documents, target to source mapping document, mapping specification document.
  • Extensively worked on Informatica PowerCenter.
  • Parsed complex files through Informatica Data Transformations and loaded it to Database.
  • Optimized query performance by oracle hints, forcing indexes, working with constraint-based loading and a few other approaches.
  • Extensively worked on UNIX Shell Scripting for splitting group of files to various small files and file transfer automation.
  • Worked with Autosys scheduler for scheduling different processes.
  • Performed basic and unit testing.
  • Assisted in UAT Testing and provided necessary reports to the business users.
  • Gathered requirements from Business and documented for project development.
  • Coordinated design reviews, ETL code reviews with teammates.
  • Developed mappings using Informatica to load data from sources such as Relational tables and Sequential files into the target system.
  • Extensively worked with Informatica transformations.
  • Created datamaps in Informatica to extract data from Sequential files.
  • Extensively worked on UNIX Shell Scripting for file transfer and error logging.
  • Scheduled processes in ESP Job Scheduler.
  • Performed Unit, Integration and System testing of various jobs.

Environment: Informatica Power Center 8.6, Oracle 10g/11g, UNIX Shell Scripting, Autosys

We'd love your feedback!