We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

3.00/5 (Submit Your Rating)

Tampa, FloridA

SUMMARY

  • Around 6 years of experience in Analysis, Design, Development, and Implementation as a Data Engineer
  • Experience in Cloud Azure, AWS, DevOps, Configuration management, Infrastructure automation, Continuous Integration and Delivery (CI/CD).
  • Implemented effective strategies for N - Tier application development in both Cloud and On-premises environments.
  • Expertise in building CI/CD on AWS environment using AWS Code Commit, Code Build, Code Deploy and Code Pipeline and experience in using AWS CloudFormation, API Gateway, and AWS Lambda in automation and securing the infrastructure on AWS.
  • Hands-on experience in bootstrapping the nodes using knife and automated by testing Chef Recipes, Cookbooks with test-kitchen and chef spec. Refactored Chef and Ops Works in AWS cloud environment.
  • Expert in Setting up Continuous Integration (CI) by configuring Build, Code, Deploy and test automation Jobs in Jenkins for different applications and in creation and deployment using Code ship to automate branch & project creation in Git using (Groovy language) in Jenkins file and automating (using Ansible).
  • Professional in deploying and configuring Elastic search, Log stash, Kibana (ELK) and AWS Kinesis for log analytics and skilled in monitoring servers using Nagios, Splunk, AWS Cloud Watch, Azure Monitor and ELK.
  • Experience in changing over existing AWS infrastructure to Server less architecture (AWS Lambda, AWS Kinesis) through the creation of a Server less Architecture using AWS Lambda, API gateway, Route 53, S3 buckets.
  • Experience in creating the methodologies and technologies that depict the flow of data within and between application systems and business functions/operations & developed Data Flow Diagrams.
  • Extensive working experience with large scale process flows, relational/MPP databases, dimensional modelling, BI dashboards and reporting framework, to build efficient data integration strategy (data mapping, standardization and transformation) across multiple systems.
  • Performing requirement gathering analysis design development testing implementation support and maintenance phases of Data Integration Projects
  • Data Integration in large scale implementation environments.
  • Experience in working with other IT team members business partners data stewards’ stakeholder’s steering committee members and executive sponsors for BI and data governance related activities.
  • Experience in data profiling methods, data mining & defining data quality rules.
  • Hands-on experience implementing & supporting Informatics suite of products
  • Hands-on experience writing scripts in korn shell, batch scripts, sql, pl/sql, Tsql and python to automate ETL loads to process large and unstructured data sets & to process real time data.
  • Working Knowledge of processing XML Files, SON files.
  • Broad knowledge of technologies and best practices in business analytics, data management and data virtualization.
  • Hands-on experience in infrastructure management including database management and storage management.
  • Worked with Big Data systems. Hadoop platform using tools Spark, Hive QL, Sqoop, Pig Latin and Python scripting & NoSQL databases

TECHNICAL SKILLS

Hadoop/Spark Ecosystem: Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Sqoop, Oozie, Zookeeper, Spark, Airflow, MongoDB, Cassandra, HBase, and Storm.

Hadoop Distribution: Cloudera distribution and Horton works.

Programming Languages: Scala, Spring, Hibernate, JDBC, JSON, HTML, CSS

Script Languages: JavaScript, jQuery, Python, Shell Script(bash,sh)

Databases: Oracle, MySQL, SQL Server, PostgreSQL, HBase, Snowflake, Cassandra, MongoDB

Operating Systems: Linux, Windows, Ubuntu, Unix

Web/Application server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans

IDE: Intellij, Eclipse and NetBeans

Version controls and Tools: GIT, Maven, SBT, CBT

PROFESSIONAL EXPERIENCE

Sr Data Engineer

Confidential, Tampa, Florida

Responsibilities:

  • Responsible for the execution of big data analytics, predictive analytics and machine learning initiatives.
  • Implemented a proof of concept deploying this product in AWS S3 bucket and Snowflake.
  • Utilize AWS services with focus on big data architect /analytics / enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
  • Developed Scala scripts, UDF's using both data frames/SQL and RDD in Spark for data aggregation, queries and writing back into S3 bucket.
  • Experience in data cleansing and data mining.
  • Wrote, compiled, and executed programs as necessary using Apache Spark in Scala to perform ETL jobs with ingested data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation and used Spark engine, Spark SQL for data analysis and provided to the data scientists for further analysis.
  • Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3, Teradata and snowflake.
  • Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.
  • Created scripts to read CSV, json and parquet files from S3 buckets in Python and load into AWS S3, DynamoDB and Snowflake.
  • Implemented AWS Lambda functions to run scripts in response to events in Amazon DynamoDB table or S3 bucket or to HTTP requests using Amazon API gateway
  • Migrated data from AWS S3 bucket to Snowflake by writing custom read/write snowflake utility function using Scala.
  • Worked on Snowflake Schemas and Data Warehousing and processedbatch and streaming data load pipeline using Snow Pipe and Matillion from data lake Confidential AWS S3 bucket.
  • Profile structured, unstructured, and semi-structured data across various sources to identify patterns in data and Implement data quality metrics using necessary query’s or python scripts based on source.
  • Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow.
  • Created DAG to use the Email Operator, Bash Operator and spark Livy operator to execute and in EC2 instance.
  • Deploy the code to EMR via CI/CD using Jenkins
  • Extensively used Code cloud for code check-in and checkouts for version control.

Environment: AgileScrum, MapReduce, Snowflake, Pig, Spark, Scala, Hive, Kafka, Python, Airflow, JSON, Parquet, CSV, Codecloud, AWS

Data Engineer

Confidential, Petaluma, California

Responsibilities:

  • Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day
  • Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB
  • Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.
  • Implementing and Managing ETL solutions and automating operational processes.
  • Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics
  • Wrote various data normalization jobs for new data ingested into Redshift
  • Advanced knowledge on Confidential Redshift and MPP database concepts.
  • Migrated on premise database structure to Confidential Redshift data warehouse
  • Was responsible for ETL and data validation using SQL Server Integration Services.
  • Defined and deployed monitoring, metrics, and logging systems on AWS.
  • Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer-running adhoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.
  • Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
  • Expertise knowledge in Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.

Environment: Informatica Cloud, Python, Airflow, Spark, Github, SQL Server, Hadoop, Hive, and Presto.

Data Engineer

Confidential, Columbia, SC

Responsibilities:

  • Contributed in building our cloud infrastructure in Azure. Automated Cloud deployments using Chef, Python and Azure Cloud Formation templates.
  • Used kafka stream processing to collect data from producers, configure Topics in Kafka cluster and made the data available for consumers.
  • Involved in file movements between HDFS and Azure Blob storage and extensively worked with Blob Containers in Azure.
  • Maintained the repository management tools like Artifactory and Nexus to store the WAR, JAR files which are deployed by using Chef in Jenkins tool.
  • Created data partitions on large data sets in Blob and DDL on partitioned data.
  • Implemented rapid-provisioning and life-cycle management for using Azure Virtual Machine and custom Bash scripts.
  • Extensively written Hive queries for data analysis to meet the business requirement.
  • Integrating Blob storage with Databricks to do analysis by extracting, transforming to do analysis by writing ETL jobs using pyspark.
  • Involved in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations like ReduceByKey, Aggregate By Key and other during ingestion process itself.
  • Involved in adding and decommissioning the data nodes.
  • Responsible for analyzing using Spark SQL queries result with Hive queries.
  • ImplementedSparkusing Scala and Spark SQL for faster testing and processing of data by optimizing the performance.
  • Implemented Spark using Scala and utilizingData framesand Spark SQL API for faster processing of data.
  • Created HBase tables to store data depending on column families to achieve the goal of the project.
  • Using the Maven for the deployments and Processed structured, semi structured such as XML and unstructured data as well.
  • Connector to load data to and from Snowflake and analyze to discover business goals.
  • Experience in developing/consuming Web Services (REST, SOAP, JSON) and APIs (Service-oriented architectures).
  • Built Chef based CI/CD solutions to improve developer productivity and rapid deployments.
  • Troubleshooting Linux network, security related issues, capturing packets using tools such as iptables, firewall.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive.
  • Used Tableau to create stories and interactive dashboards for detailed insights.

Environment: Hadoop (HDP 2.5.3), Informatica, HDFS, Spark, Spark SQL, Git, Kafka, Hive, Java, Scala, HBase, Maven, UNIX Shell Scripting, Azure, Kafka, python, SQL, Tableau, Pig

Data Engineer

Confidential

Responsibilities:

  • Responsible for building the datalake in Amazon AWS, ingesting structured shipment and master data from Azure ServiceBus using the AWS APIGateway, Lambda, and Kinesis Firehose into s3 buckets.
  • Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters in AWS
  • Create complex Sql queries in Teradata Data Warehouse environment to test the data flow across all the stages
  • Integrated data sources from Kafka (Producer and Consumer API) for data stream-processing in Spark using AWS Network
  • Designing the rules engine in sparkSql which will process millions of records on a Spark Cluster on the Azure Datalake.
  • Extensively involved in designing the SSIS packages to load data into Data Warehouse
  • Built customer insights on customer/service utilization, bookings & CRM data using Gainsight.
  • Executed process improvements in data workflows using Alteryx processing engine and SQL
  • Collaborated with business owners of products for understanding business needs and automated business processes and data storytelling in Tableau
  • Implemented Agile Methodology for building the data applications and framework development
  • Implemented business processing models using predictive & prescriptive analytics on transactional data with regression
  • Implemented Logistic, Random forests ML models with Python packages to decide insurance purchase by a Confidential member

Environment: Python, SQL, Tableau, Bigdata, Data lake, Alteryx, Hive, CRM, OLAP, Excel, DataRobot

We'd love your feedback!