We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

4.00/5 (Submit Your Rating)

Nashville, TN

SUMMARY

  • 8+ years of overall IT experience in a variety of industries, this includes hands - on experience in Big Data Analytics and development.
  • Experience in collecting, processing, and aggregating large amounts of streaming data using Kafka, Spark Streaming.
  • Good Knowledge on Apache NiFi for automating and managing the data flow between systems.
  • Experience in designing Data Marts by following Star Schema and Snowflake Schema Methodology.
  • Highly skilled in Business Intelligence tools like Tableau, PowerBI, Plotly and Dataiku.
  • Experience in managing and analyzing massive datasets on multiple Hadoop frameworks like Cloudera and Hortonworks.
  • Experience in Spark-Scala programming with good knowledge on Spark Architecture and its In-memory Processing
  • Experience in designing and developing applications in Spark using Python to compare the performance of Spark with Hive.
  • Hands-on Experience in Service Oriented Architecture (SOA), Event Driven Architecture, Distributed Application Architecture and Software as Service (SAS).
  • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
  • Good work experience with the cutting-edge technologies like Kafka, Spark, Spark streaming.
  • Partnered with cross functional teams across the organization to gather requirements, architect, and develop proof of concept for the enterprise Data Lake environments like MAPR, CLOUDERA, HORTONWORKS, AWS, and AZURE.
  • Strong Experience in analyzing data using HIVE, Impala, Pig Latin, and Drill. Experience in writing custom UDFs in Hive and Pig to extend the functionality.
  • Experience in writing MAPREDUCE programs in java for data cleansing and preprocessing.
  • Excellent understanding/knowledge on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource manager, Node manager.
  • Hands on experience on Data Analytics Services such as Atana, Glue Data Catalog & Quick Sight
  • Good working experience with Hive and HBase/MapRDB Integration.
  • Excellent understanding and knowledge of NOSQL databases like HBase, and Cassandra.
  • Experienced in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Python.
  • Experience setting up instances behind Elastic Load Balancer in AWS for high availability and cloud integration with AWS using ELASTIC MapReduce (EMR).
  • Experience in working in Hadoop eco-system integrated to the Cloud platform provided by AWS with several services like Amazon EC2 instances, S3 bucket and RedShift.
  • Good experience working with Azure Cloud Platform services like Azure Data Factory (ADF), Azure Data Lake, Azure Blob Storage, Azure SQL Analytics, HDInsight/Databricks.
  • Expose to various software development methodologies like Agile and Waterfall.
  • Extensive experience working with spark distributed Framework involving Resilient Distributed Datasets (RDD) and Data Frames using Python, Scala and Java8.
  • Involving in developing applications on Windows, UNIX, and Linux Platforms.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Nashville, TN

Responsibilities:

  • Migrate the existing data from Teradata/SQL Server to Hadoop and perform ETL operations on it.
  • Responsible for loading structured, unstructured, and semi-structured data into Hadoop by creating static and dynamic partitions.
  • Worked on different data formats such as JSON and performed machine learning algorithms in Python.
  • Performing statistical data analysis and data visualization using Python and R
  • Implemented data ingestion and handling clusters in real time processing using Kafka.
  • Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming.
  • Created a task scheduling application to run in an EC2 environment on multiple servers.
  • Strong knowledge of various Data warehousing methodologies and Data modeling concepts.
  • Developed Hadoop streaming Map/Reduce works using Python.
  • Created Hive partitioned tables using Parquet & Avro format to improve query performance and efficient space utilization.
  • Responsibilities include Database Design and Creation of User Database.
  • Moving ETL pipelines from SQL server to Hadoop Environment.
  • Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Implemented a CI/CD pipeline using Jenkins, Airflow for Containers from Docker and Kubernetes.
  • Used SSIS, NIFI, Python scripts, Spark Applications for ETL Operations to create data flow pipelines and involved in transforming data from legacy tables to Hive, HBase tables, and S3 buckets for handoff to business and Data scientists to create analytics over the data.
  • Support current and new services that leverage AWS cloud computing architecture including EC2, S3, and other managed service offerings.
  • Implemented data quality checks using Spark Streaming and arranged bad and passable flags on the data.
  • Used advanced SQL methods to code, test, debug, and document complex database queries.
  • Design relational database models for small and large applications.
  • Designed and developed Scala workflows for data pull from cloud-based systems and applying transformations on it.
  • The ability to develop reliable, maintainable, efficient code in most of SQL, Linux shell, and Python.
  • Implemented Apache-spark code to read multiple tables from the real-time records and filter the data based on the requirement.
  • Stored final computation result to Cassandra tables and used Spark-SQL, spark-dataset to perform data computation.
  • Used Spark for data analysis and store final computation results to HBase tables.
  • Troubleshoot and resolve complex production issues while providing data analysis and data validation.

Environment: Teradata, SQL Server, Hadoop, ETL operations, Data Warehousing, Data Modelling, Cassandra, AWS Cloud computing architecture, EC2, S3, Advanced SQL methods, NiFi, Python, Linux, Apache Spark, Scala, Spark-SQL, HBase.

Data Engineer

Confidential, Dania Beach, FL

Responsibilities:

  • Create and maintain reporting infrastructure to facilitate visual representation of manufacturing data for purposes of operations planning and execution.
  • Extract, Transform and Load data from source systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and Azure Data Lake Analytics.
  • Implemented Restful web service to interact with Redis Cache framework.
  • Intake happens through Sqoop and Ingestion happens through Map Reduce, HBASE.
  • Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.
  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Constructed product-usage SDK data and data aggregations by using PySpark, Scala,
  • Spark SQL and Hive context in partitioned Hive external tables maintained in AWS S3 location for reporting, data science dashboarding, and ad-hoc analyses.
  • Involved in data processing using an ETL pipeline orchestrated by AWS Data Pipeline using Hive.
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this TEMPhas been used for adding topics, Partitions etc.
  • Experience in creating configuration files to deploy the SSIS packages across all environments.
  • Experience in writing queries in SQL and R to extract, transform and load (ETL) data from large datasets using Data Staging.
  • Implemented CI/CD pipelines using Jenkins and built and deployed the applications.
  • Worked on developing Restful endpoints to cache application specific data in in-memory data clusters like Redis and exposed them with Restful endpoints.
  • Creating Databricks notebooks using SQL, Python and automated notebooks using jobs.
  • Interacting with other data scientists and architected custom solutions for data visualization using tools like Tableau, packages in R
  • Developed predictive models using Python & R to predict customers churn and classification of customers.
  • Documenting the best practices and target approach for CI/CD pipeline.
  • Coordinated with QA team in preparing for compatibility testing of Guidewire solution.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modelling and data mining, machine learning and advanced data processing.
  • Designed and implemented by configuring Topics in the new Kafka cluster in all environments.

Environment: Hadoop, ETL operations, Data Warehousing, Data Modelling, Cassandra, AWS Cloud computing architecture, EC2, S3, Advanced SQL methods, NiFi, Python, Linux, Apache Spark, Scala, Spark-SQL, Hbase

Data Engineer

Confidential, Boise, ID

Responsibilities:

  • Migrate the existing data from Teradata/SQL Server to Hadoop and perform ETL operations on it.
  • Responsible for loading structured, unstructured, and semi-structured data into Hadoop by creating static and dynamic partitions.
  • Worked on different data formats such as JSON and performed machine learning algorithms in Python.
  • Performing statistical data analysis and data visualization using Python and R
  • Implemented data ingestion and handling clusters in real time processing using Kafka.
  • Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming.
  • Created a task scheduling application to run in an EC2 environment on multiple servers.
  • Strong knowledge of various Data warehousing methodologies and Data modeling concepts.
  • Developed Hadoop streaming Map/Reduce works using Python.
  • Created Hive partitioned tables using Parquet & Avro format to improve query performance and efficient space utilization.
  • Responsibilities include Database Design and Creation of User Database.
  • Moving ETL pipelines from SQL server to Hadoop Environment.
  • Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Implemented a CI/CD pipeline using Jenkins, Airflow for Containers from Docker and Kubernetes.
  • Used SSIS, NIFI, Python scripts, Spark Applications for ETL Operations to create data flow pipelines and involved in transforming data from legacy tables to Hive, HBase tables, and S3 buckets for handoff to business and Data scientists to create analytics over the data.
  • Support current and new services that leverage AWS cloud computing architecture including EC2, S3, and other managed service offerings.
  • Implemented data quality checks using Spark Streaming and arranged bad and passable flags on the data.
  • Used advanced SQL methods to code, test, debug, and document complex database queries.
  • Design relational database models for small and large applications.
  • Designed and developed Scala workflows for data pull from cloud-based systems and applying transformations on it.
  • The ability to develop reliable, maintainable, efficient code in most of SQL, Linux shell, and Python.
  • Implemented Apache-spark code to read multiple tables from the real-time records and filter the data based on the requirement.
  • Stored final computation result to Cassandra tables and used Spark-SQL, spark-dataset to perform data computation.
  • Used Spark for data analysis and store final computation results to HBase tables.
  • Troubleshoot and resolve complex production issues while providing data analysis and data validation.

Environment: Teradata, SQL Server, Hadoop, ETL operations, Data Warehousing, Data Modelling, Cassandra, AWS Cloud computing architecture, EC2, S3, Advanced SQL methods, NiFi, Python, Linux, Apache Spark, Scala, Spark-SQL, Hbase

Data Engineer

Confidential

Responsibilities:

  • Experience in building distributed high-performance systems using Spark and Scala.
  • Experience developing Scala applications for loading/streaming data into NoSQL databases (MongoDB) and HDFS.
  • Perform T-SQL tuning and optimizing queries for and SSIS packages.
  • Designed Distributed algorithms for identifying trends in data and processing them effectively.
  • Creating an SSIS package to import data from SQL tables to different sheets in Excel.
  • Used Spark and Scala for developing machine learning algorithms that analyze clickstream data.
  • Used Spark SQL for data pre-processing, cleaning, and joining very large data sets.
  • Co-developed the SQL server database system to maximize performance benefits for clients.
  • Assisted senior-level data scientists in the design of ETL processes, including SSIS packages.
  • Database migrations from traditional data warehouses to spark clusters.
  • Ensure the data warehouse was populated only with quality entries by performing regular cleaning and integrity checks.
  • Used Oracle relational tables and used them in process design.
  • Developed SQL queries to perform data extraction from existing sources to check format accuracy.
  • Developed automated tools and dashboards to capture and display dynamic data.
  • Installed a Linux operated Cisco server and performed regular updates and backup and used MS excel functions for data validation.
  • Coordinated data security issues and instructed other departments about secure data transmission and encryption.

Environment: T-SQL, MongoDB, HDFS, Scala, Spark SQL, Relational Databases, Redshift, SSIS, SQL, Linux, Data Validation, MS Excel.

Data Analyst

Confidential

Responsibilities:

  • Involved in designing/developing Logical Data Analyst & Physical Data Analyst using Erwin DM.
  • Worked with DB2 Enterprise, Oracle Enterprise, Teradata13, Mainframe sources, Netezza Flat files, and datasets operational sources.
  • Worked with various process improvements, normalization, de-normalization, data extraction, data cleansing, and data manipulation.
  • Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs and tools like TOAD, MS Access, Excel, XLS and SQL Server.
  • Worked with requirements management, workflow analysis, source data analysis, data mapping, Metadata management, data quality, testing strategy and maintenance of the model.
  • Used DVO to validate the data moving from Source to Target.
  • Creating the requests in answers and see the results in various views like title view, table view, compound layout, chart, pivot table, ticker and static view.
  • Assisted in production OLAP cubes, wrote queries to produce reports using SQL Server Analysis Services (SSAS) and Reporting service (SSRS) Editing, upgrading and maintaining ASP.NET website and IIS Server.
  • Used SQL Profiler for troubleshooting, monitoring, and optimization of SQL Server and non-production database code as well as T-SQL code from developers and QA.
  • Involved in data from various sources like Oracle Database, XML, Flat Files, CSV files and loaded to target warehouse.
  • Created complex mappings in Informatica Power Center Designer using Aggregate, Expression, Filter, Sequence
  • Designed the ER diagrams, logical model (relationship, cardinality, attributes, and candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata as per business requirements using Erwin.
  • Designed Power View and Power Pivot reports and designed and developed the Reports using SSRS.
  • Designed and created MDX queries to retrieve data from cubes using SSIS.
  • Created SSIS Packages using SSIS Designer for exporting heterogeneous data from OLE DB Source, Excel Spreadsheets to SQL Server.
  • Extensively worked in SQL, PL/SQL, SQL Plus, SQL Loader, Query performance tuning, DDL scripts, database objects like Tables, Views Indexes, Synonyms and Sequences.
  • Developed and supported the extraction, transformation and load process (ETL) for a Data.

Environment: ERWIN9.1, Netezza, Oracle8.x, SQL, PL/SQL, SQL Plus, SQL Loader, Informatica, CSV, Taradata13, T-SQL, SQL Server, SharePoint, Pivot tables, Power view, DB2, SSIS, DVO, LINUX, MDM, PL/SQL, ETL, Excel, Pivot tables, SAS, SSAS, SPSS, SSRS.

We'd love your feedback!