We provide IT Staff Augmentation Services!

Gcp Data Engineer Resume

4.00/5 (Submit Your Rating)

New York City, NY

SUMMARY

  • Over 8+ years of experience in Analyzing, Designing, Developing and Implementation of data, architecture, frameworks as a Data Engineer.
  • Specialized in Data Warehousing, Decision support Systems and extensive experience in implementing Full Life cycle Data Warehousing Projects and in Hadoop/Big Data related technology experience in Storage, Querying, Processing, analysis of data.
  • Software developmentinvolving cloud computing platforms likeAmazon Web Services (AWS), AzureandGoogle Cloud (GCP).
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Build a program with Python and apache beam and execute it in cloud Dataflow to run Data validation between raw source file and big query tables.
  • Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Zookeeper and Flume.
  • Experience in analyzing data using HiveQL, HBase and custom Map Reduce programs.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems like Teradata, Oracle, SQL Server and vice - versa.
  • Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools) worked on Azure suite: Azure SQL Database, Azure Data Lake (ADLS), Azure Data Factory (ADF) V2, Azure SQL Data Warehouse, Azure Service Bus, Azure key Vault, Azure Analysis Service (AAS), Azure Blob Storage, Azure Search, Azure App Service, AZURE data Platform Service.
  • Hands on experience inGCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Data Proc, Stack driver.
  • Developed complex mappings and load the data from various sources into the Data Warehouse, using different transformations/ Stages like Joiner, Transformer, Aggregator, Update Strategy, Rank, Lookup, Filter, Sorter, Source Qualifier, Stored Procedure transformation etc.
  • Implemented POC to migrate map reduce jobs into Spark transformations using Python.
  • Developed Apache Spark jobs using Python in a test environment for faster data processing and used Spark SQL for querying.
  • Experienced in Spark Core, Spark RDD, Pair RDD, Spark Deployment Architectures.
  • Experienced with performing real time analytics on NoSQL databases like HBase and Cassandra.
  • Worked on AWS EC2, EMR and S3 to create clusters and manage data using S3.
  • Experienced with Dimensional modelling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Strong understanding of the entire AWS Product and Service suite primarily EC2, S3, VPC, Lambda, Redshift, Spectrum, Athena, EMR(Hadoop) and other monitoring service of products and their applicable use cases, best practices and implementation, and support considerations.
  • Strong experience in UNIX and Shell Scripting. Experience on Source control repositories like SVN, CVS and GITHUB.
  • Experience on Hadoop Distribution Platforms: Hortonworks, IBM Big Insights and Cloudera and Cloudera platforms GCP and AWS.
  • Actively participated in Scrum meetings, Sprint planning, Refinement and Retrospective ceremonies.
  • Good experience in software development in Python (libraries used: libraries- Beautiful Soup, NumPy, SciPy, matplotlib, python-twitter, Pandas data frame, network, urllib2, MySQL dB for database connectivity).
  • Can work parallelly in both GCP and Azure Clouds coherently.
  • Synchronizing both the unstructured and structured data using Pig and Hive on a business prospectus.
  • Worked on various programming languages using IDEs like Eclipse, NetBeans, and IntelliJ, Putty, GIT.
  • Experienced in migrating Hive QL into Impala to minimize query response time.
  • Experience in Complete Software Development Life Cycle (SDLC) which includes Requirement Analysis, Design, and Coding, Testing and Implementation using Agile and Waterfall.
  • Having working experience with Building RESTful web services, and RESTful API.
  • A self-motivated exuberant learner and adequate with challenging projects and work in ambiguity to solve complex problems independently or in the collaborative team.
  • Widely used different features of Teradata such as BTEQ, Fast load, Multiload, SQL Assistant, DDL and DML commands and very good understanding of Teradata UPI and NUPI, secondary indexes and join indexes.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: Hadoop, HDFS, MapReduce, HBase, Apache Pig, Hive, Sqoop, Apache Impala, Oozie, Yarn, Apache Flume, Kafka, Zookeeper, Databricks

Cloud Platform: GCP,Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, Azure Data Lake, Data Factory

Hadoop Distributions: Cloudera, Hortonworks, MapR

Programming Language: Java, Scala, Python, SQL, PL/SQL, Shell Scripting, Storm, JSP, Servlets

Frameworks: Spring, Hibernate, Struts, JSF, EJB, JMS

Web Technologies: HTML, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX

Databases: Oracle 12c/11g, SQL

Operating Systems: Linux, Unix, Windows 10/8/7

IDE and Tools: Eclipse, NetBeans, IntelliJ, Maven, Visual Basic Studio

NoSQL Databases: HBase, Cassandra, MongoDB, Accumulo

Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere

SDLC Methodologies: Agile, Waterfall

Version Control: GIT, SVN, CVS, Codecommit

PROFESSIONAL EXPERIENCE

Confidential, New York City, NY

GCP Data engineer

Responsibilities:

  • Developed Spark programs to parse the raw data, populate staging tables, and store the refined data in partitioned tables in the Enterprise Data warehouse.
  • Experience in building power bireports on Azure Analysis services for better performance. 0 Used cloud shell SDK in GCP to configure the services Data Proc, Storage, Big Query Developed Streaming applications using PySpark to read from the Kafka and persist the data NoSQL databases such as HBase and Cassandra.
  • Creating standard and ad-hoc reports provided daily/weekly/monthly with insights into emerging patterns aligned with business initiatives that may be driving the changes
  • Worked on Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
  • Migrating an entire oracle database to BigQuery and using of power biform reporting.
  • Build data pipelines in airflow in GCP for ETl related jobs using different airflow operators.
  • Developed streaming and batch processing applications using PySpark to ingest data from the various sources into HDFS Data Lake.
  • Developed DDLs and DMLs scripts in SQL and HQL for analytics applications in RDBMS and Hive.
  • Developed and implemented HQL scripts to create Partitioned and Bucketed tables inHivefor optimized data access.
  • Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
  • Written Hive UDFs to implement custom functions in the Hive for aggregations.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa loading data into HDFS.
  • Monitoring YARN applications Troubleshoot and resolve cluster related system problems.
  • Created shell scripts to parameterize the Hive actions in Oozie workflow and for scheduling the jobs.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Worked as a key role in a team of developing an initial prototype of a NiFi big data pipeline. This pipeline demonstrated an end to end scenario of data ingestion, processing.
  • Using the NiFi tool to check whether a message reached the end system or not.
  • Developed the custom processor for NiFi.
  • Worked on NoSQL Databases such as HBase and integratedwith PySpark for processing and persisting real-time streaming.
  • Experience in GCP Dataproc, GCS, Cloud functions,BigQuery.
  • Used Oozie Scheduler systems to automate the pipeline workflow and orchestrate the map-reduce jobs that extract and Zookeeper for providing coordinating services to the cluster.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Built ETL for ingestion and processing of web transaction big data across Google Cloud Platform, Hadoop, and Vertica.
  • Design, develop & support multiple data projects in traditional relational database such as Oracle & PostgreSQL as well as non-traditional database such as Vertica DB & AWS Redshift
  • Assessed existing and EDW (enterprise data warehouse) technologies and methods to ensure our EDW/BI architecture meet the needs of the business and enterprise and allows for business growth.
  • Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm, and web Methods technologies.
  • Design and develop data pipelines for integrated data Analytics using Hive, Spark, Sqoop, MySQL.

Environment: CDH5, Hortonworks, Apache Hadoop 2.6.0, HDFS, Java 8, Hive 1.2.1000, Sqoop 1.4.6, HBase 1.1.2, Oozie 4.1.0, Storm 0.9.3, YARN, NiFi, Cassandra, Zookeeper, Spark, Kafka, Oracle 11g, MySQL, Shell Script, GCP, EC2, Tomcat 8, Spring 3.2.3, STS 3.6, Build Tool Gradle 2.2, Source Control GIT, Tera Data SQL Assistant.

Confidential, Foster City, CA

Sr.Data Engineer

Responsibilities:

  • Import data using Sqoop to load data from Teradata to HDFS on a regular basis.
  • Write Hive queries for ad-hoc reporting to the business.
  • Participated in weekly release meetings with Technology stakeholders to identify and mitigate potential risks associated with the releases.
  • Implemented Responsible AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, and Auto scaling groups, Optimized volumes and EC2 instances.
  • Wrote Terraform templates for AWS Infrastructure as a code to build staging, production environments & set up build & automations for Jenkins.
  • Configured Elastic Load Balancers (ELB) with EC2 Auto scaling groups.
  • Created Amazon VPC to create public-facing subnet for web servers with internet access, and backend databases & application servers in a private-facing subnet with no Internet access.
  • Created AWS Launch configurations based on customized AMI and use this launch configuration to configure auto scaling groups.
  • Utilized Puppet for configuration management of hosted Instances within AWS Configuring and Networking of Virtual Private Cloud (VPC).
  • Utilized S3 bucket and Glacier for storage and backup on AWS.
  • Using Amazon Identity Access Management (IAM) tool created groups & permissions for users to work collaboratively.
  • Implemented /setup continuous project build and deployment delivery process using Subversion, Git, Jenkins, IIS, Tomcat.
  • Connected continuous integration system with GIT version control repository and continually build as the check-in's come from the developer.
  • Knowledge in build tools Ant and Maven and writing build.xml and pom.xml respectively.
  • Knowledge in authoring pom.xml files, performing releases with the Maven release plug-in and managing Maven repositories. Implemented Maven builds to automate JAR and WAR files.
  • Designed and built deployment using ANT/ Shell scripting and automate the overall process using git and MAVEN.
  • Implemented a Continuous Delivery frameworks using Jenkins, Ansible/puppet, and Maven & Nexus in Linux environment.
  • Wrote Terraform, Cloud formation templates for AWS Infrastructure as a code to build staging, production environments & set up build & automations for Jenkins.
  • Design, development and implementation of performant ETL pipelines using python API (pyspark) of Apache Spark on AWS EMR.
  • Performed Code Reviews and responsible for Design, Code, and Test signoff.
  • Assigning work to the team members and assisting them in development, clarifying on design issues, and fixing the issues.

Environment: Scala, Hadoop, MapReduce, Spark, Yarn, Hive, Pig, Nifi, Kafka, Hortonworks, Cloudera, Sqoop, Flume, Elastic Search, Cloudera Manager, Java, J2EE, Web services, Hibernate, Struts, JSP, JDBC, XML, WebLogic Workshop, Jenkins, Maven.

Confidential, Washington DC

Sr Data engineer

Responsibilities:

  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Installed and configured Hive, Pig, Sqoop, and Oozie on the Hadoop cluster.
  • Experienced in SQL programming and creation of relational database models
  • Installed Oozie Workflow engine to run multiple Hive and Pig Jobs.
  • Used SQL Azure extensively for database needs in various applications.
  • Developed multiple Map Reduce jobs in Java for data cleansing and preprocessing.
  • Developed Simple to complex Map Reduce Jobs using Hive. Involved in loading data from the UNIX file system to HDFS.
  • Analysis the metric dashboard reports and identified the formulas and functionality of the dashboard reports and digitizing the metric dashboards to Tableau application.
  • Published the dashboard reports to Tableau Server for navigating the developed dashboards in web.
  • Scheduled the published dashboards from Tableau Server on weekly basis.
  • Sending the dashboards to users by emails with the help of admin team with subscriptions.
  • Performance tuning of Reports by creating Linked Universes, Joins, Contexts, Aliases for resolving loops and checked the integrity of the universes using Business Objects Designer module during development.
  • Involved in integrating Tableau with Angular JS to enable self-service model kind of functionality on dashboards.
  • Given training/demos to users on Tableau Desktop for development.
  • Created Tableau worksheet which involves Schema Import, Implementing the business logic by customization.
  • Created Data Connections, published on Tableau Server for usage with Operational/Monitoring Dashboards.
  • Administered user, user groups, and scheduled instances for reports in Tableau.
  • Built complex formulas in Tableau for various business calculations.
  • Resolved various performance issues and analysed the best process distribution for different projects.
  • Provided customer support to Tableau users and Wrote Custom SQL to support business requirements
  • Created Azure SQL database, performed monitoring and restoring of Azure SQL database.
  • Performed migration of Microsoft SQL server to Azure SQL database.
  • Azure Data Factory (ADF), Integration Run Time (IR), File System Data Ingestion, Relational Data Ingestion.
  • Worked in mixed role DevOps: Azure Architect/ System Engineering, network operations and data engineering.

Environment: Apache Hadoop, Azure,Cloudera Manager, CDH2, Python, CentOS, Java, MapReduce, Pig, Hive, Sqoop, Oozie and SQL.

Confidential

Data engineer

Responsibilities:

  • Involved in requirement gathering/analysis, Design, Development, Testing and Production rollover of Reporting and Analysis projects
  • Analysis the metric dashboard reports and identified the formulas and functionality of the dashboard reports and digitizing the metric dashboards to Tableau application.
  • Published the dashboard reports to Tableau Server for navigating the developed dashboards in web.
  • Scheduled the published dashboards from Tableau Server on weekly basis.
  • Sending the dashboards to users by emails with the help of admin team with subscriptions.
  • Performance tuning of Reports by creating Linked Universes, Joins, Contexts, Aliases for resolving loops and checked the integrity of the universes using Business Objects Designer module during development.
  • Involved in integrating Tableau with Angular JS to enable self-service model kind of functionality on dashboards.
  • Given training/demos to users on Tableau Desktop for development.
  • Created Tableau worksheet which involves Schema Import, Implementing the business logic by customization.
  • Created Data Connections, published on Tableau Server for usage with Operational/Monitoring Dashboards.
  • Administered user, user groups, and scheduled instances for reports in Tableau.
  • Built complex formulas in Tableau for various business calculations.
  • Resolved various performance issues and analysed the best process distribution for different projects.
  • Provided customer support to Tableau users and Wrote Custom SQL to support business requirements
  • Created Azure SQL database, performed monitoring and restoring of Azure SQL database.
  • Performed migration of Microsoft SQL server to Azure SQL database.
  • Azure Data Factory (ADF), Integration Run Time (IR), File System Data Ingestion, Relational Data Ingestion.
  • Worked in mixed role DevOps: Azure Architect/ System Engineering, network operations and data engineering.
  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
  • Extract Transform and Load data Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics, Data Ingestion to one or more Azure Services- (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Design and implementation database solutions in Azure SQL Data Warehouse, Azure SQL.
  • Architect and implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HD Insight/ Databricks, NoSQL DB)
  • Design and implement migration strategies for traditional systems on Azure (Lift and Shit/Azure Migrate, other third-party tools).
  • Engage with business users to gather requirements, design visualizations and provide training to use self-service BI tools.
  • Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.

Confidential

Software developer

Responsibilities:

  • Proficient working experience with SQL, PL/SQL, Database objects like Stored
  • Procedures, Functions, Triggers and using the latest features to optimize performance
  • Inline views and Global Temporary tables
  • Performed the data analysis and mapping database normalization, performance
  • Tuning, query optimization data extraction, transfer, and loading (ETL) and clean up
  • Created SSIS Packages using SSIS Designer for export heterogeneous data from OLE
  • DB Source (Oracle), Excel Spreadsheet to SQL Server.
  • Extensive use of Triggers to implement business logic and for auditing changes to
  • Critical tables in the database experience in developing external Tables, Views, Joins, Cluster indexes and Cursors
  • Defining data warehouse (star and snowflake schema), fact table, cubes, dimensions,
  • Measures using SQL Server Analysis Services .
  • Used Execution Plan, SQL Profiler and Database Engine Tuning Advisor to optimize
  • Queries and enhance the performance of databases.
  • Worked on the data warehouse design and analyzed various approaches for maintaining
  • Different dimensions and facts in the process of building a data warehousing application.
  • Using reporting services (SSRS) generated various reports.
  • Optimized query performance by creating Indexes.

Environment: Oracle, SSIS, mysql, Microsoft Office Suite

We'd love your feedback!