We provide IT Staff Augmentation Services!

Sr Gcp Data Engineer Resume

3.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • A Google Certified Professional Data Engineer with 8+ years in IT data analytics.
  • Deep understanding of the architecture in bigdata and reporting layers and various tools including Hadoop with SAS Visual Analytics, Big Query with Tableau reporting layer and as well as Azure synapse analytics with Power BI.
  • Have knowledge in various other fully integrated BI tools such as Qlik, Mode, Qubole, Superset etc.
  • Did migration projects from Quoble to on premise Hadoop using presto and spark engine for SQL and reporting in Mode saving millions in licensing fees.
  • Highly experienced in developing DataMart’s as per the requirements and developing warehousing designs using distributed SQL both in Hadoop and as well as in google cloud environments.
  • Hands on experience in using the various GCP components such as DataFlow with python SDK, DataProc, BigQuery, Composer(Airflow), Gsuite for impersonation of the service accounts, Cloud IAM, Cloud pubsub, Cloud functions for handling functions as service requests, Cloud data fusion, cloud GCS, Cloud data catalog.
  • Used Google native components very much for security and big data related applications.
  • Hands on experience with different programming languages such as Python and Scala.
  • Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds.
  • Knowledge in Kubernetes platform such as GKE and OpenShift for deploying applications.
  • Deep understanding of CI/CD process and various architectures around Git flows and writing test cases for code reliability.
  • Converted a lot Hive SQL code into SPARK SQL and pyspark code depending on the requirement.
  • Converted PL/SQL type of code to both bigquery - python architecture as well as azure databricks and pyspark in dataproc.
  • Experience with Jira, azure devops and ability to work in both kanban and 2 weeks sprint models.

TECHNICAL SKILLS

RDBMS: MySQL, MS SQL Server, T-SQL, Oracle, PL/SQL.

Google Cloud Platform: GCP Cloud Storage, Big Query, Composer, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Pub/Sub, Dataflow etc.

Big Data: Apache Beam, Spark, Hadoop, Google Bigdata stack, Azure Bigdata Stack

ETL/Reporting: Power BI, Data Studio, Tableau

Python Modules: Pandas, SciPy, Matplotlib.

Programing: Shell/Bash, C#, R, Go, Python.

PROFESSIONAL EXPERIENCE

Confidential

Sr GCP Data Engineer

Responsibilities:

  • Experience in working with product teams to create various store level metrics and supporting data pipelines written in GCP’s bigdata stack.
  • Deep understanding of moving data into GCP using SQOOP process, using custom hooks for MySQL, using cloud data fusion for moving data from Teradata to GCS.
  • Good knowledge in building data pipelines in airflow as a service (composer) using various operators.
  • Build a program using Python and apache beam to execute it in cloud Dataflow and to run Data validation jobs between raw source file and Big query tables.
  • Extensive use of cloud shell SDK in GCP to configure/deploy the services like Dataproc, Storage and Big query.
  • Involved in loading and transforming large sets of the structured, semi-structured dataset and analyzed them by running Hive queries.
  • Wrote scripts in Hive SQL for creating complex tables with high performance metrics like partitioning, clustering and skewing.
  • Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets.
  • Built custom code in python for tagging tables and columns using cloud data catalog and built an application for user provisioning.
  • Hands on experience in coding in python and call GCP’s rest API’s for integrating data.
  • Migrated an entire oracle DB and the reports done in OBIEE to bigquery and tableau.
  • Lift and shift experience of moving on prem Hadoop jobs to google dataproc.

Confidential, Charlotte, NC

Sr. Hadoop Engineer

Responsibilities:

  • Built reporting in power bi after building the ETL in on prem Hadoop cluster.
  • Building various data pipelines using both hive SQL, spark RDD’s and oozie for scheduling.
  • Converted MySQL queries to Hive SQL using TEZ engine, migrated previously written hive using MR to both TEZ and Spark based on the requirement.
  • Converted previously written SAS programs into python to save up on licenses fees, moved SAS analytics reports to power BI.
  • Built supply chain data marts as the product team and exposed data using API’s written in JAVA for external health systems.
  • Migrated Hadoop to dataproc and moved reporting from power bi to tableau and data studio.
  • Understanding the logical plans from spark and improving the processes of the data pipelines for both efficiency and controlling costs in dataproc.
  • Developed custom python program including CI/CD rules for google cloud data catalog for metadata management.
  • Worked with google data catalog and other google cloud API’s for monitoring, query and billing related analysis for Big query usage.
  • Experience in moving data between GCP and Azure using Azure Data Factory.
  • Monitoring Bigquery, Dataproc and cloud Data flow jobs via Stackdriver for all the environments.
  • Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from Big Query.

Trisan Info Private Limited, India May 2012 - Nov 2015

Data Analyst

Responsibilities:

  • Prepared Test plan, test case according to the Source to target mapping document.
  • Involved in logical and physical designs and transforming logical models into physical implementations.
  • Developing python programs that can run the end-to-end data migration and as well as transformation and load data into sinks such as oracle and mysql.
  • Experienced in using python with SQLite for DML’s and python modules for calling various functions like datetime conversions, while loops, for loops and writing custom classes as required while passing user defined arguments to the code.
  • Gained extensive experience with AGILE methodologies in software projects, participated in SCRUM meetings, followed biweekly sprint schedules and tracked progress on JIRA.
  • Involved in requirement gathering and data analysis and Interacted with Business users to understand the reporting requirements.
  • Extensively used PL/SQL to build Oracle Reports 10g and views for processing data, enforcing referential integrity and needed business rules.
  • Worked on creating DDL, DML scripts for the data models.
  • Created interactive dashboards and visualizations of claims reports, competitor analysis and improved statistical data using Tableau.
  • Designed and built Data marts by using Star and snowflake schema.
  • Tested the database to check field size validation, check constraints, stored procedures and cross verifying the field size defined within the application with metadata.
  • Performing data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs and tools like Toad, MS Access, Excel and SQL.

We'd love your feedback!