We provide IT Staff Augmentation Services!

Senior Cloud Data Engineer Resume

5.00/5 (Submit Your Rating)

MI

SUMMARY

  • Over 10+ years of profession experience in Software Systems Development, Business Systems, experience in Big Data ecosystem related technologies.
  • Experience in data management and implementation of Big Data applications using Spark and Hadoop frameworks.
  • Hands on experience building streaming applications using Spark Streaming and Kafka.
  • Expertise in data cleansing for analysis, perform data quality testing for gaps, and liaising with data origination teams.
  • Experience in Redesigning and Migrating/ Other data warehouses to Snowflake Data warehouse.
  • Strong experience and knowledge of HDFS, MapReduce and Hadoop ecosystem components like Hive, Pig, Sqoop, NoSQL databases such as Mongo DB and Cassandra.
  • Hands - on development and implementation experience of Machine learning algorithms in Apache Spark and Hadoop MapReduce.
  • Solid experience in Data Modeling using design tool Erwin, Power Designer, ER Studio and Data Base tools.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
  • Experience in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
  • Experience in understanding Stored Procedures, Stored Functions, Database Triggers, and Packages using PL/SQL.
  • Extensive experience in advanced SQL Queries and PL/SQL stored procedures.
  • Hands on experience in big data, data visualization, R and Python development, Unix, SQL, GIT/GitHub.
  • Excellent understanding and working experience of industry standard methodologies like System Development Life Cycle (SDLC)
  • Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
  • Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
  • Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inmon Approach.
  • Good understanding and hands on experience with AWS S3, EC2 and Redshift.
  • Strong background in various Data Modeling tools using Erwin, ER/Studio and Power Designer.
  • Expertise in Normalization (1NF, 2NF, 3NF and BCNF)/De-normalization techniques for effective and optimum performance in OLTP and OLAP
  • Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Experience in developing different Statistical Machine Learning, Text Analytics, Data mining solutions to various business generating and problems data visualizations using Python, R and Tableau.
  • Extensive SQL experience in querying, data extraction and data transformations.

TECHNICAL SKILLS

Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.

Big Data Tools: Hadoop Ecosystem MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3Cloudera Manager, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Data Warehousing: Cloud Snowflake

Cloud Management: Amazon Web Services (AWS), Amazon Redshift

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio 2016 & Visual Source Safe

Operating System: Windows 7/8/10, Unix, Sun Solaris

ETL/Data warehouse Tools: Informatica v10, SAP Business Objects Business Intelligence 4.2 Service Pack 03, Talend, Tableau, and Pentaho.

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential, MI

Senior Cloud Data Engineer

RESPONSIBILITIES:

  • As a Cloud Data Engineer working closely with engineering teams and participate in the infrastructure development.
  • Meet with business and engineering teams on a regular basis to keep the requirements in sync and deliver on the requirements.
  • Participated in a collaborative team designing software and developing a Snowflake data warehouse within Azure cloud.
  • Worked in installing cluster, commissioning & decommissioning of Data nodes, Name node recovery, capacity planning, and slots configuration.
  • Played an active role in high-performance cloud data warehouse architecture and design.
  • Created Schema, Virtual Warehouses, tables, Views, Stages, Sequence and DB replication in Snowflake Cloud data warehouse.
  • Worked on the Azure Blob to load the data to snowflake.
  • Used Azure Blob to store the file and injected the files into Snowflake tables using Snow Pipe and run deltas using Data pipelines.
  • Heavily involved in testing Snowflake to understand best possible way to use the cloud resources.
  • Responsible for Cluster maintenance, managing cluster nodes.
  • Involved in managing & review of data backups and log files.
  • Used Jira as an agile tool to keep track of the stories that were worked on using the agile methodology.
  • Designed and implemented effective Analytics solutions and models with Snowflake.
  • Created Source to Target Mappings (STM) for the required tables by understanding the business requirements for the reports.
  • Designed a data workflow model to create a data lake in Hadoop ecosystem so that reporting tools like Tableau can plug-in to generate the necessary reports.
  • Developed Data pipelines using python.
  • Developed ETL pipelines in and out of data warehouse using combination of Python and Snowflake’s Snow SQL.
  • Extracted data using Sqoop Import query from multiple databases and ingest into Hive tables.
  • Developed analytical components using Scala, Spark and Spark Stream.
  • Migrated data into Data Pipeline using Snow SQL and Scala.
  • Developed Spark code using Scala and Spark SQL for faster testing and processing of data.
  • Created Hive tables on HDFS to store the data processed by Apache Spark on the Cloudera Hadoop Cluster in Parquet format.
  • Tuned and Troubleshoot Snowflake for performance and optimize utilization.
  • Responsible to store processed data into MongoDB.
  • Created multi - tier java based multiple web services to read data from MongoDB.
  • Involved in Migrating Objects from MongoDB to Snowflake.
  • Created Snowpipe for continuous data load.
  • Loading log data directly into HDFS using Flume.
  • Responsible for designing and developing data ingestion using Kafka.
  • Scheduled different Snowflake jobs using NiFi.
  • Used NiFi to ping snowflake to keep Client Session alive.
  • Hadoop Resource manager was used to monitor the jobs that were run on the Hadoop cluster.
  • Handled Prod Deployments and provided production support for fixing the defects.

ENVIRONMENT: Spark, Hive, MongoDB, Snowflake, Flume, Scala, Sqoop, shell scripting, NfFi, Spark, PySpark, Snow SQL

Confidential, NY

Data/ Cloud Engineer

RESPONSIBILITIES:

  • As a Cloud Data Engineer, involved in requirement gathering phase of the SDLC and helped team by breaking up the complete project into modules with the help of my team lead.
  • Objective of this project is to build a data lake as a cloud-based solution in Azure using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data.
  • Worked with Azure BLOB and Data lake storage and loaded data into Azure SQL Synapse analytics (DW).
  • Active involvement in design, new development and SLA based support tickets of Big Machines applications.
  • Responsible for manage data coming from different sources. Storage and Processing in Hue covering all Hadoop ecosystem components.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Worked on end to end with business team to get the requirements and integrate the process as per their requirements.
  • Used Hive query language to customize the data and provide quick updates to business people.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark.
  • Involved in creating Hive tables and loading and analyzing data using hive queries.
  • CreatedPartitionedandBucketedHivetables inParquet FileFormats withSnappycompression and then loaded data intoParquet hive tablesfromAvro hivetables.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Hive, MapReduce, Spark and Shells scripts.
  • Developed SQL scripts using Spark for handling different data sets and verifying the performance over MapReduce jobs.
  • Used Azure data factory and data Catalog to ingest and maintain data sources.
  • Developed analytics enablement layer using ingested data that facilitates faster reporting and dashboards.
  • Worked with production support team to provide necessary support for issues with CDH cluster and the data ingestion platform.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala and Python.
  • Developed and maintained batch data flow using HiveQL and unix scripting.
  • Developed and execute data pipeline testing processes and validate business rules and policies.
  • Built code for real time data ingestion using MapR-Streams.
  • Implemented Spark using Python and Spark SQL for faster processing of data.
  • Automation ofunit testingusing Python. Different testing methodologies like unit testing, Integration testing.
  • Performed multiple MapReduce jobs in Hive for data cleaning and pre-processing.
  • Created Hive Tables, loaded claims data from Oracle using Spark and loaded the processed data into target database.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Used GITHUB as version controller.
  • Used JIRA for issue/ticket tracking tool for each individual sprint
  • Reproduced production bugs and fixed in a fast-paced environment.

ENVIRONMENT: Python2.7, Spark 2, MapReduce, HDFS, Hadoop YARN, Hive 1.0, Linux/Unix, Putty, WinSCP, pyUnit, PyCharm, Oracle 12c,, Aginity, Windows, Kafka 1.1, Azure cloud services, SDLC.

Confidential, NY

Cloud Data Engineer

RESPONSIBILITIES:

  • Worked with Business Analyst to understand the user requirements, layout, and look of the interactive dashboard to be developed in tableau.
  • Used Python programs for data manipulation, automation process of generating reports of multiple data sources or dashboards.
  • Followed the Snowflake best practices like Clustering the View and Tables, created Materialized views, enabled the Result cache, Resizing and Multi clustering to improving the Performance in Snowflake.
  • Involved in Platform Modernization project to get the data into GCP.
  • Designed and building data pipelines to load the data into GCP platform.
  • Built and architect Data pipelines for Data ingestion and transformation in GCP.
  • Involved extensively in creating Tableau Extracts, Tableau Worksheet, Actions, Tableau Functions, Tableau Connectors (Live and Extract) including drill down and drill up capabilities and Dashboard color coding, formatting and report operations (sorting, filtering, Top-N Analysis, hierarchies).
  • Data blending of patient information from different sources and for research using Tableau and Python.
  • Write complex SQL statements to perform high level and detailed validation tasks for new data and/or architecture changes within the model comparing Teradata data against Netezza data.
  • Managed the Metadata associated with the ETL processes used to populate the Data Warehouse.
  • Created sheet selector to accommodate multiple chart types (Pie, Bar, Line etc) in a single dashboard by using parameters.
  • Used Replication Task to have the data back up in Snowflake.
  • Worked on complex Snow SQL and Python Queries in Snowflake.
  • Published Workbooks by creating user filters so that only appropriate teams can view it.
  • Resolved the data related issues such as: assessing data quality, testing dashboards, evaluating existing data sources.
  • Write a Python program to maintain raw file archival in GCS bucket.
  • Design star schema in Big Query
  • Created DDL scripts for implementing Data Modeling changes, reviewed SQL queries and involved in Database Design and implementing RDBMS specific features.
  • Written SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.
  • Designed the ETL process using Informatica to populate the Data Mart using the flat files to Oracle database
  • Involved in Data analysis, reporting using Tableau and SSRS.
  • Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams

ENVIRONMENT: GCP, Big Query, ETL, Tableau, Python, Snow SQL, PostgreSQL, Linux, Windows, PL/SQL

Confidential, TX

Sr. Data Analyst/Data Engineer

RESPONSIBILITIES:

  • Worked with the analysis teams and management teams and supported them based on their requirements.
  • Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL/Teradata) using SAS/SQL, SAS/macros.
  • Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
  • Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.
  • Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.
  • Developed live reports in a drill down mode to facilitate usability and enhance user interaction
  • Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Used Python to extract weekly information from XML files.
  • Developed Python scripts to clean the raw data.
  • Worked on AWS CLI to aggregate clean files in Amazon S3 and also on Amazon EC2 Clusters to deploy files into Buckets.
  • Used AWS CLI with IAM roles to load data to Redshift cluster,
  • Responsible for in depth data analysis and creation of data extract queries in both Netezza and Teradata databases
  • Extensive development in Netezza platform using PL SQL and advanced SQLs.
  • Validated regulatory finance data and created automated adjustments using advanced SAS Macros, PROC SQL, UNIX (Korn Shell) and various reporting procedures.
  • Designed reports in SSRS to create, execute, and deliver tabular reports using shared data source and specified data source. Also, Debugged and deployed reports in SSRS.
  • Optimized the performance of queries with modification in TSQL queries, established joins and created clustered indexes
  • Used Hive, Impala and Sqoop utilities and Oozie workflows for data extraction and data loading.
  • Development of routines to capture and report data quality issues and exceptional scenarios.
  • Creation of Data Mapping document and data flow diagrams.
  • Developed Linux Shell scripts by using Nzsql/Nzload utilities to load data from flat files to Netezza database.
  • Involved in generating dual-axis bar chart, Pie chart and Bubble chart with multiple measures and data blending in case of merging different sources.
  • Developed dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.
  • Created Dashboards style of reports using QlikView components like List box Slider, Buttons, Charts and Bookmarks.
  • Coordinated with Data Architects and Data Modelers to create new schemas and view in Netezza for to improve reports execution time, worked on creating optimized Data-Mart reports.
  • Worked on QA the data and adding Data sources, snapshot, caching to the report
  • Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.

ENVIRONMENT: SAS, SQL, Teradata, Oracle, PL/SQL, UNIX, XML, Python, AWS, SSRS, TSQL, Hive, Impala, Sqoop

Confidential, TX

Data Analyst/Engineer

RESPONSIBILITIES:

  • Developed various mappings using Mapping Designer and worked with Aggregator, Lookup, Filter, Router, Joiner, Source Qualifier, Expression, Stored Procedure and Sequence Generator transformations.
  • Implemented Slowly Changing Dimensions of type 1 & type 2 to store history according to business requirements.
  • Used Parameter files to pass mapping and session parameters to the session.
  • Tuned the Informatica mappings to reduce the session run time.
  • Developed PL/SQL procedures to update the database and to perform calculations.
  • Worked with SQL*Loader to load data into the warehouse.
  • Contributed to the design and development of Informatica framework model.
  • Wrote UNIX shell scripts to work with flat files, to define parameter files and to create pre and post session commands.
  • Used SAS PROC IMPORT, DATA and PROC DOWNLOAD procedures to extract the FIXED Format Flat files and convert into Teradata tables for Business Analysis.
  • Helped users by Extracting Mainframe Flat Files (Fixed or CSV) onto UNIX Server and then converting them into Teradata Tables using BASE SAS Programs.
  • Collected Multi-Column Statistics on all the non-indexed columns used during the join operations & all columns used in the residual conditions.
  • Generated and implemented Micro Strategy Schema objects and Application objects by creating facts, attributes, reports, dashboards, filters, metrics and templates using Micro Strategy Desktop.
  • Developed BTEQ scripts to load data from Teradata Staging area to Teradata data mart.
  • Worked extensively on PL/SQL as part of the process to develop several scripts to handle different scenarios.
  • Performed Unit testing and System testing of Informatica mappings.
  • Involved in migrating the mappings and workflows from Development to Testing and then to Production environments.

ENVIRONMENT: Oracle8i, SQL, PL/SQL, SQL*PLUS, HP-UX 10.20, Informatica Power Center 7, DB2, Cognos Report net 1.1, Windows 2000

We'd love your feedback!