Sr. Data Engineer Resume
Charlotte, NC
SUMMARY
- Over 7+ years of experience in Data Engineering, Data Pipeline Design, Development and Implementation and Full Software Development Life Cycle experience in Software, System analysis, design, development, testing, deployment, maintenance, enhancements, re - engineering, migration, troubleshooting and support of multi-tiered web applications in high performing environments.
- Experienced in developing web-based applications using SaaS, Python, Django, Kafka, Employed Serverless, Microservices and Container patterns while implementing cloud applications. AWS and GCP with ability to tune Big Data solutions to improve performance and end-user experience.
- Experienced with version control systems like Git, GitHub, to keep the versions and configurations of the code organized.
- Extensively worked with Pyspark/Spark SQL fordatacleansing and generatingdataframes and RDDS.
- Experienced in implementation of Cloud migrations/ building analytical platforms usingAWSServices
- Expertise in Snowflake-data modelling, ETL using SnowSQL.
- Have good understanding on Data pipelines included with integration of various tools in different phases like source to ETL and then Warehouse.
- Proficient working experience on big data tools like Hadoop, and AWS Redshift.
- Experience in Snowflake performance optimization.
- Experience in creating shared VPC with different tags in a singleGCPproject and using the same in all the projects and
- Knowledge on Kubernetes service deployments inGCP.
- Experience in different Hadoop distributions like Cloudera and Horton Works Data Platform (HDP).
- Excellent understanding and knowledge of NOSQL databases like Mongo dB and Cassandra.
- Hands on experience on design and develop ETL transformations usingPySpark, SparkSQL, S3, Lambda's and Java
- Experienced in developing scripts for doing transformations using Scala.
- Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
- Experience in transferring Streaming data, data from different data sources into HDFS and NoSQL databases using Apache Flume. Cluster coordination services through Zookeeper.
- Expertise in developing complexdatapipelines usingAWSCloud services like EC2, API Gateway, Glue, Lambda, Cloud Watch, Airflow, Prometheus.
- Experienced in executed various MYSQL database queries using Python MySQL connector and MySQL dB package.
- Experience with ETL tools and processes Sandra, HBase, Storm, Kafka ecosystem
- Experience working in Apache Hadoop, Pig, Spark, ETL tools and processes Sandra, HBase, Storm, Kafka ecosystem.
- Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data intosnowflaketable.
- Experience on cloud relational SnowflakeData Warehouse and data stewardship on statistics.
PROFESSIONAL EXPERIENCE
Sr. Data Engineer
Confidential, Charlotte, NC
Responsibilities:
- Functioned as Data Engineer responsible for data modelling,datamigration, design, preparingETLpipelines for both cloud and on Exadata.
- Designing the business requirement collection approach based on the project scope and SDLC methodology.
- Design and developing complex data pipelines using AWS Glue, Athena, Lambda's, S3, API Gateway, Snowflake.
- Designed, developed and implemented ETL pipelines using python API (PySpark) of Apache Spark on AWS EMR.
- Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
- Involved in building a real time pipeline using Kafka and Spark streaming for delivering event messages to downstream application team from an external rest-based application.
- Worked in an environment using Python, NoSQL, Docker,AWS, and Kubernetes.
- Develop Spark Scala framework read data from Hive and write to HBase for different aggregations.
- Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python.
- Worked extensively on migrating on prem workloads to AWS Cloud.
- Creating and writing aggregation logic on Snowflake Data warehouse tables.
- Worked with NoSQL databases like HBase, Cassandra, Dynamo DB (AWS) and MongoDB.
- Developed reports, dashboards using Tableau for quick reviews to be presented to Business and IT users.
- Performed bulk load of semi- structureddatafrom s3 bucket tosnowflake.
- Utilized Kubernetes and Docker for the runtime environment for theCI/CDsystem to build, test, and deploy.
- Developing ETL pipelines in and out ofdatawarehouse using combination of Python andSnowsql.
- Create programs using NIFI workflows for various data ingestion into Hadoop Data Lake from MySQL, Postgres.
- Worked on utilizing AWS cloud services like S3, EMR, Redshift, Athena, and Glue Metastore.
- Implementation ofAWScloud computing platform usingAWSEMR/EC2, Glue, Redshift, Athena, S3, RDS, Dynamo DB, RedShift, and Python.
- Using Apache Spark for streaming applications and write the API using scala, python and java.
- Installed Hadoop, Map Reduce, HDFS, and AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Worked on data into S3 buckets using AWS Lambda Functions, AWS Glue and PySpark and filtered data stored in S3 buckets using Elasticsearch and loaded data into Hive external tables.
- Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python.
- Recreated and maintained existing Access Database artifacts inSnowflake.
- Worked on google cloud platform (GCP) services like compute engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring and cloud deployment manager.
Environment: HTML5, CSS3, C#,JavaScript, Python, Scala, Cosmos, Airflow, Pyspark, Redshift, Advanced SQL, Tableau, Hive, Mongpo DB, jQuery, Apache Spark, Bootstrap, MySQL, PostgreSQL, Power BI, Snowflake, Ember.js, Tables, Formulas, Hadoop, Amazon Web Services (AWS) Lambda, Glue, EC2, S3, Kinesis, Rest API, Mobile (Windows, iOS, Android), GIT, ETL.
Data Engineer
Confidential - Miramar, FL
Responsibilities:
- Worked on several projects involving various stages of seismic data processing flow, Participated in all phases of data mining,data import/export,data cleaning, modeling, signal processing, validation and visualizations.
- Responsible for running Hadoop streaming jobs to process terabytes of xml's data, utilized cluster co-ordination services through Zookeeper.
- Design and developing complex data pipelines using AWS Glue, Athena, Lambda's, S3, API Gateway, Snowflake
- Designed and implemented Sqoop for the incremental job to read data from DB2 and load to Hive tables and connected to Tableau for generating interactive reports using Hive server2.
- Worked within databricks, developed largely in python, spark,pyspark, Mlib, Scikit-Learn, pandas, numpy.
- Worked on Sqoop to transfer the data from relational database and Hadoop.
- Designed and implemented data ingestion Spark streaming framework from various data source like REST API, Kafka using Spark Streaming Scala API and Kafka.
- Created several types of data visualizations using Python and Tableau.
- Worked on buildings tasks and procedures inSnowflaketo transform and writedatainto new tables.
- Worked in an environment using Python, NoSQL, Docker,AWS, and Kubernetes.
- Creating and writing aggregation logic on Snowflake Data warehouse tables.
- Designed, developed, deployed, and maintained MongoDB.
- Worked on Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Worked on Snowflake (SnowSQL, SnowPipe, Time Travel,) and ETL Integrations using snowflake
- Proficient with Configuration Management tool such as Chef,CI/CDsuch as Jenkins/CARA.
- Prepared files and reports by performingETL,data extraction, and data validation, managed metadata and preparedDatadictionary as per the project requirement.
Environment: Python, SnowSQL, HTML5, CSS3, Advanced SQL, MySQL, PostgreSQL, C#, LESS, JSON API, Airflow, Redshift, Tableau, Hive, jQuery, Apache Spark, Cosmos, Angular 2, ETL, Snowflake, NodeJS, Bootstrap, XML, CI/CD, Scala, Formulas, Tables, Hadoop, GIT, AWS Lambda, EC2, S3, Kinesis, CloudWatch, Glue, Power BI, JavaScript, RESTful API, Java, Spring, JIRA, Junit, Apache Tomcat, JSP, Agile Methodology.
Software Engineer
Confidential - Nashville, TN
Responsibilities:
- Involved in design, development, and testing of the system.
- Responsible for requirements analysis, application design, coding, testing, maintenance, and support. strategy planning, requirement analysis, design, development, coding, testing, debugging and implementation to production and support.
- Involved in building a real time pipeline using Kafka and Spark streaming for delivering event messages to downstream application team from an external rest-based application.
- Involved in creating Hive scripts for performing adhocdataanalysis required by the business teams.
- Used broadcast variables in spark, effective & efficient Joins, caching, and other capabilities fordataprocessing.
- Involved in continuous Integration of application using Jenkins.
Environment: Python, HTML, Bootstrap, AngularJS, SQL, MySQL, PostgreSQL, Hadoop, Airflow, Redshift, Tableau, jQuery, Apache Spark, Hive, Kafka, ETL Webpack, JavaScript, Node.js, Cosmos, CSS, C#, Chart.js, Bitbucket, Jira.