We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

3.50/5 (Submit Your Rating)

TECHNICAL SKILLS

Big Data Eco System: HDFS, MapReduce, Hive, Yarn, Pig, Sqoop, Flume, HBase, Kafka Connect, Impala, Streamsets, Oozie, Spark, Zookeeper, NiFi, Confidential Web Services.

Hadoop Distributions: Apache Hadoop 1.x/2.x, Cloudera CDP, Hortonworks HDP

Programming Languages: Python, Java, R, Pig Latin, HiveQL, Shell Scripting.

Software Methodologies: Agile, SDLC Waterfall.

Design Patterns: Eclipse, Net Beans, IntelliJ, Spring Tool Suite.

Databases: MySQL, MS SQL SERVER, Confidential, PostgreSQL, DB2, DynamoDB

NoSQL: HBase, MongoDB, Cassandra.

ETL/BI: Power BI, Tableau, Talend,Snowflake, Informatica, SSIS, SSRS, SSAS.

Version control: GIT, SVN, Bitbucket.

Web Development: JavaScript, Node.js, HTML, CSS, Spring, J2EE, JDBC, Angular, Hibernate, Tomcat.

Operating Systems: Windows (XP/7/8/10), Linux (Unix, Ubuntu), Mac OS.

Cloud Technologies: Confidential Web Services, EC2, S3, Azure.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential

Responsibilities:

  • Implemented Workload Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer running adhocqueries.
  • This allowed for a more reliable and faster reporting interface, giving sub - second query response for basic queries.
  • Analyzing, designing, and developing ETL strategies and processes, writing ETL specifications, Informatica development, and administration.
  • Experience in multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.
  • Experience in Google Cloud components, Google container builders and GCP client libraries and cloud SDK’s.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala .
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Implemented data ingestion and handling clusters in real time processing using Kafka. customized Tableau Dashboards, integrating Custom SQL from Teradata and Confidential, Hadoop and performing data blending in reports.
  • Implemented scripts that load Google Big Query data and run queries to export data.
  • Analyzed clickstream data from Google analytics with Big Query.
  • Developed Spark Programs using Scala and Java API's and performed transformations and actions on RDD's .
  • Developed a utility which transforms and exports the data from GCP Cloud storage to GCP Cloud composer and sendsalerts and notifications to downstream systems (AI and Data Analytics) once the data is readyfor usage.
  • Launched multi-node kubernetes cluster in Google Kubernetes Engine (GKE) and migrated the dockerized application from AWS to GCP.
  • Implemented a 'server less' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Confidential S3 buckets. Created a Lambda Deployment function, and configured it to receive events from your S3 bucket.
  • Designed the data models to be used in data intensive AWS Lambda applications which are aimed to do complex analysis creating analytical reports for end-to-end traceability, lineage, and definition of Key Business elements from Aurora.
  • Creating customized Tableau Dashboards, integrating Custom SQL from Teradata and Confidential, Hadoop and performing data blending in reports.
  • Designed, developed, tested, and maintained Tableau reports and dashboards based on user requirements.
  • Performed Regression analysis on datasets in GCP Bigquery using pyspark scripts leveraging Hadoop cluster in dataproc.
  • Deploying Spark cluster and running PySpark jobs using DataProc and scheduling batch jobs through Cloud.
  • Developed pipelines for auditing the metrics of all applications using GCP Cloud functions, and Dataflow.
  • Developed end-to-end pipeline, which exports the data from parquet files in Cloud Storage to GCP Cloud SQL.
  • Collaborated with Architects to design Spark model for the existing MapReduce model and migrated them to Spark models using Scala. Worked on writing Scala Programs using Spark-SQL in performing aggregations.
  • Developed Web Services in play framework using Scala in building stream data Platform.
  • Worked with Apache Spark which provides fast engine for large data processing integrated with Scala.
  • Experience working with SparkSQL and creating RDD's using PySpark. Extensive experience working with ETL of large datasets using PySpark in Spark on HDFS.
  • Developed ETL programs to load data from Confidential to Snowflake using Informatica snowflake.
  • Developing ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Designed physical and logical data models based on Relational (OLTP), Dimensional on snowflake schema using Erwin modeler to build an integrated enterprise data warehouse.
  • Experienced in creating data pipeline integrating Kafka with spark streaming application. Created Kafka broker for structured streaming to get structured data by schema.
  • Developed Elastic Search Connector using Kafka Connect API with source as Kafka and sink as elastic search.

Data Engineer

Grand Rapids, MI

Responsibilities:

  • Responsible for validation of Target data in Data Warehouse which are Transformed, loaded using Hadoop Big data.
  • Designed the ETL process from various sources into Hadoop/HDFS for analysis and further processing of data modules.
  • Developing Informatica Cloud Jobs to migrate data from legacy Teradata Data Warehouse to Snowflake Cloud.
  • Design, develop, test, implement and support of Data Warehousing ETL using Talend and Hadoop Technologies.
  • Extensively worked with MySQL for identifying required tables and views to export into HDFS.
  • Engaged directly with IT to understand their key challenges and demonstrate and price solutions that fit their needs for PaaS and Iaas based solutions.
  • Experienced in working with various Python IDE’s using PyCharm, PyScripter, Spyder, PyStudio, PyDev, IDLE, NetBeans and Sublime Text.
  • Experience with Requests, Report Lab, NumPy, SciPy, Pytables, cv2, imageio, Python-Twitter, Matplotlib, HTTPLib2, Urllib2, Beautiful Soup, Data Frame and Pandas python libraries during development lifecycle.
  • Involved in designing of HDFS storage to have efficient number of block replicas of Data.
  • Responsible for creating Hive tables on top of HDFS and developed Hive Queries to analyze the data.
  • Staged data by persisting to Hive and connected Tableau with Spark cluster and developed dashboards.
  • Implemented UDFS, UDAFS, UDTFS in Java for Hive to process the data that can't be performed using Hive inbuilt functions. Used Hive to analyze the partitioned, bucketed data and compute various metrics for reporting.
  • DevOps role converting existing AWS infrastructure to Server-less architecture (AWS Lambda, Kinesis) deployed via CloudFormation .
  • Design and Build Global Persistent Messaging platform, to serve as one stop messaging solution for Infrastructure (IaaS/PaaS) and applications. This is a vendor agnostic solution, leveraging RabbbitMQ (AMQP)
  • Built and configured a virtual data center in the Confidential Web Services cloud to support Enterprise Data Warehouse hosting including Virtual Private Cloud, Security Groups, Elastic Load Balancer.
  • Strong experience in developing Web Services like SOAP, REST, Restful with Python programming language.
  • Loaded data into S3 buckets using AWS Glue and PySpark. Involved in filtering data stored in S3 buckets using Elasticsearch and loaded data into Hive external tables.
  • Built complex calculations using advance functions (ATTR, LOD, DATEDIFF, Ifs, Nested Ifs, RANK DENSE, OR, NOT, AND, and Quick Table calculations in tableau
  • Transformed the data using AWS Glue dynamic frames with PySpark, cataloged the transformed the data using Crawlers and scheduled the job and crawler using workflow feature.
  • Configured the hive tables to load the profitability system in Talend ETL Repository and create the Hadoop connection for HDFS cluster in Talend ETL repository.
  • Design, develop & deliver the REST APIs that are necessary to support new feature development and enhancements in an agile environment.
  • Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop.
  • Designed and Developed jobs that handles the Initial load and the Incremental load automatically using Oozie workflow.
  • Involved in creating dashboards and reports in Tableau and maintaining server activities, user activity, and customized views on server analysis. Used Tableau to convey the results by using dashboards to communicate with team members and with other data science teams, marketing and engineering teams
  • Provide expertise in Kafka brokers, zookeepers, Kafka connect, schema registry, KSQL, Rest proxy and Kafka Control center.
  • Deployed Instances, provisioned EC2, S3 bucket, Configured Security groups and Hadoop eco system for Cloudera in AWS. Experience in using distributed computing architectures like AWS products (e.g., EC2, Redshift, and EMR) and working on raw data migration to Confidential cloud into S3 and performed refined data processing.
  • Responsible for using GIT for version control to commit the code developed which further used for deployment using build and release tool Jenkins. Developed CI/CD system with Jenkins on Kubernetes container environment, utilizing Kubernetes and Docker for the CI/CD system to build, test and deploy.
  • Extensively used Databricks notebooks for interactive analytics using Spark APIs.
  • Worked on implementation and maintenance of Cloudera Hadoop cluster. Experience in working with Cloudera (CDH4 &CDH5), Horton Works, Confidential EMR, Azure HDINSIGHT on multi-node cluster.
  • Integrated Oozie with Pig, Hive, Sqoop and developed Oozie workflow for scheduling and orchestrating the Extract, Transform, and Load (ETL) process within the Cloudera Hadoop.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.

Hadoop/ETL Developer

Responsibilities:

  • Used Informatica as an ETL tool to create source/target definitions, mappings, and sessions to extract, transform and load data into staging tables from various sources.
  • Designed and Developed Informatica processes to extract data from internal check issue systems.
  • Used Informatica Power exchange to extract data from one of the EIC s operational system called Datacom.
  • Developed Tableau visualizations and dashboards using Tableau Desktop and published the same on Tableau Server.
  • Extensive experience in Building, publishing customized interactive reports and dashboards, report scheduling using Tableau Desktop and Tableau Server.
  • Extensive experience in Tableau Administration Tool, Tableau Interactive Dashboards, Tableau suite.
  • Involved in Installation and upgrade of Tableau server and server performance tuning for optimization.
  • Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into target database.
  • Involved in migrating data from on prem Cloudera cluster to AWS EC2 instances deployed on EMR cluster and developed ETL pipeline to extract logs and store in AWS S3 data lake and further processed it using PySpark.
  • Analyzed data stored in S3 buckets using SQL, PySpark and stored the processes data in Redshift and validated data sets by implementing Spark components.
  • Used debugger in Informatica Designer to resolve the issues regarding data thus reducing project delay.
  • Designed high-level view of the current state of dealer operation, leads, and website activity using Tableau.
  • Performed various types of joins in Tableau for demonstrating integrated data purpose and validated data integrity to examine the feasibility of discussed visualization design.
  • Worked as ETL developer and Tableau developer and widely involved in Designing, development and debugging of ETL mappings using Informatica designer tool as well as Created advanced chart types, visualizations, and complex calculations to manipulate the data using Tableau Desktop.
  • Used informatica to parse out the xml data into the DataMart structures that is further utilized for the reporting needs
  • Utilized Informatica PowerCenter to accomplish full phases of data flow from source data ( Confidential, SQL Server, flat files) being analyzed before extracted to transformation.
  • Used Custom SQL feature on Tableau Desktop to create very complex and performance optimized dashboards.

Data Engineer

Confidential

Responsibilities:

  • Designed, created, and implemented database applications based on business requirements.
  • Create and execute various SQL queries to gather or analyze product data.
  • SQL Development in Snowflake Database to implement complex Business Logic.
  • Configured AWS Identity Access Management (IAM) Group and users for improved login authentication requirements and efficiently handled periodic exporting of SQL data into Elastic search.
  • Extensively worked on the Extraction, Transformation and Load (ETL) process using PL/ SQL to populate the tables in database.
  • Wrote SQL queries for data manipulation to meet data specifications for reports.
  • Designed and Developed ETL using PL/SQL packages, procedures, functions, and UNIX shell scripts for the massive data loads from OLTP systems to staging database system.
  • Handful of experience in optimization of SQL statements and PL/ SQL blocks by analyzing the execution of SQL plan statements and created and modified triggers, SQL queries, stored procedures for the better performance.
  • Involved in extensive Data validation and data quality issues by writing several complex SQL queries.
  • Designed new data sources using PostgreSQL, AWS Redshift, installed missing drivers in Jasper soft.
  • Developed efficient Pyspark scripts for reading and writing data from NoSQL table and then run SQL queries on the data.
  • Performed subscriptions and refresh extracts in Power BI as well as set up data alerts on business-critical data utilizing Power BI services.
  • Deployed the Power BI reports to Power BI services and pinned visuals to improve visualizations and presentations.
  • Responsible for installation of SQL Server database design and create schema objects like tables and indexes.
  • Responsible for maintaining and improving SQL Server database code bases as well as well-designed report.

We'd love your feedback!