We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

Nashville, TN

SUMMARY

  • 8 years of experience as a Data Engineer, Data Analyst using Python, NumPy, Pandas, AWS, Postgres, Kafka, Cassandra, MongoDB.
  • Hands on experience using Panda’s data frames, NumPy, MatPlotLib and seaborn to create correlation, bar, time series plots.
  • Skilled in Tableau desktop for creating visualizations using bar charts, line charts, scatter plots, pie charts etc.
  • Well versed wif big data on AWS cloud services i.e., EC2, S3, Glue, Athena, DynamoDB and RedShift.
  • Good noledge on data modelling and creating schemas (Snowflake/ Star), creating tables (fact/ dimension) in data warehouse. Experienced wif 3NF, Normalization and denormalization of tables depending on use case Scenario.
  • Has good experience in conducting exploratory data analysis, data mining and working wif Statistical models.
  • Worked wif python and Bash scripting in automating tasks and creating data pipelines.
  • Good Experience in creating and designing data ingestion pipelines using Apache Kafka.
  • Data preprocessing techniques like handling missing data, categorical data (dummy variable handling), feature scaling.
  • Experienced wif performing Query optimization in MySQL, Teradata using explain commands to increase Query performance.
  • Proficiency in working wif SQL/ NoSQL like MongoDB, Cassandra, MySQL, and PostgreSQL.
  • Exposure to various AWS services like Lambda, VPC, IAM, Load Balancing, Cloud Watch, SNS, SQS, Auto scaling, Load Balancing.
  • Good experience wif data pipeline/workflow management tools like AWS Kinesis, Apache Airflow.
  • Experienced in performing ETL operations using Apache Airflow DAG, Informatica Power center to load data into Data Warehouse.
  • Has Knowledge on container Orchestration platform like Kubernetes and building images wif Docker and deploying on private registry.
  • Experienced wif full software development life cycle, architecting scalable platforms, object - oriented programming, database design and Kanban methodologies.
  • Experience in UNIX shell scripting for processing large volumes of data from varied sources and loading into databases like Teradata.
  • Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MongoDB, MySQL, and PostgreSQL database.
  • Thorough understanding of providing specifications using Waterfall and Agile Software methodology to be modeling systems and business processes.
  • Has good experience in conducting exploratory data analysis, data mining and working wif Statistical models.
  • Experience in Design and Development of ETL methodology for supporting Data Migration, data transformations &processing in a corporate wide ETL Solution using Teradata.
  • Worked on various applications using python integrated IDEs like Jupyter, Spider, Eclipse, VSCode, IntelliJ, ATOM and PyCharm.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Nashville, TN

Responsibilities:

  • Utilized AWS EMR wif spark to perform batch processing operations on various Big Data sources.
  • Installed teh applications on AWS EC2 instances and configured on teh storage S3 buckets.
  • Worked wif Linux EC2 instances and RDBMS databases.
  • Created data pipeline to extract data from various API and ingested them in S3 Buckets using Apache Airflow.
  • Deployed Lambda functions to be triggered on certain events in S3 Bucket to perform data transformation and load teh data in AWS Redshift.
  • Experience in AWS EC2, configuring teh servers for Auto scaling and Elastic load balancing.
  • Developed python and shell scripts to perform spark jobs for batch processing data from various sources.
  • Working noledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) to store objects.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD.
  • Extracted data using spark from AWS redshift and performed data Analysis.
  • Developed POC to perform ETL operations using AWS glue to load Kinesis stream data into S3 buckets.
  • Performed cleaning of data, data quality checks, data governance for incremental loads.
  • Normalized data and created correlation plots, scatter plots to find underlying patterns.
  • Filter and “clean” data by reviewing computer reports, printouts, and performance indicators to locate and correct code problems.
  • Testing dashboards, to ensure data is matching as per teh business requirements and if their are any discrepancies in underlying data.
  • Created data pipelines using Step functions, implemented state machines and ran pipelines at different times to process data.
  • Develop and implement databases, data collection systems, data analytics and other strategies dat optimize statistical efficiency and quality.
  • Work wif management to prioritize business and information needs.

Environment: Python, AWS, Jira, GIT, CI/CD, Docker, Kubernetes, Web Services, Spark streaming API, Kafka, Cassandra, Python, MongoDB, JSON, Bash scripting, Linux, Rest API, SQL, Apache Airflow.

Python Developer / Data Engineer

Confidential, Negaunee, MI

Responsibilities:

  • Implemented Spark using Python and utilized Data frames and Spark SQL API for processing and querying data.
  • Developed Spark Applications by using python Driver and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Worked wif various unstructured data sources like parquet, json, csv etc. using spark.
  • Developed preprocessing jobs using spark data frames to flatten json files.
  • Experienced in working real time streaming wif Kafka as data pipeline using spark streaming module.
  • Consumed Kafka messages and loaded data into Cassandra cluster deployed in containers.
  • Ingested data into MySQL RDBMS data and performed transformations and then export teh transformed data to Cassandra.
  • Assisted in Key space creation, Table creation, Secondary Index creation in Cassandra database.
  • Worked wif various teams in deploying containers on site using Kubernetes to run Cassandra clusters in Linux Environment.
  • Created various POC using pyspark module in python using MLlib.
  • Good Knowledge on Kubernetes architecture like scheduler, pods, nodes, cataleptic. database.
  • Experience in creating tables in Cassandra clusters, building images using docker and deploying teh images.
  • Assisted in developing and creating schemas, tables for Cassandra cluster to ensure good query performance for teh front-end Application.
  • Created airflow jobs wif snowflake operator to create fact and dimension tables in snowflake warehouse.
  • Connected snowflake to Tableau to create data reporting dashboards.
  • Worked on teh UNIX environment, developing bash shell scripts running CRON jobs.

Environment: Python, Spark, Kafka, JSON, GitHub, LINUX, Flask, Nginx, REST, CI CD, Kubernetes, Helm, MongoDB, Cassandra, Snowflake, Tableau.

Data Engineer

Confidential, St Louis, Missouri

Responsibilities:

  • Involved in developing python scripts, informatica and other ETL tools for extraction, transformation, loading of data into Teradata.
  • Worked wif a Team of Business Analysts in requirements gathering business analysis and project coordination in creating reports.
  • Performed Unit and Integration Testing of Informatica Sessions, Batches and Target Data.
  • Responsible for Using Autosys and workflow Manager Tools to schedule Informatica jobs.
  • Responsible for creating Workflows and sessions using Informatica workflow manager and monitor teh workflow run and statistic properties on Informatica Workflow Monitor
  • Developing complex Informatica mappings using different types of transformations like Connected and Unconnected LOOKUP transformations, Router, Filter, Aggregator, expression, Normalizer and Update strategy transformations for large volumes of Data.
  • Involving Testing, Debugging, Validation and Performance Tuning of data warehouse, halp develop optimum solutions for data warehouse deliverables.
  • Moving teh data from source systems to different schemas based on teh dimensions and fact tables by using teh slowly changing dimensions type two and type one.
  • Interacted wif key users and assisted them wif various data issues, understood data needs and assisted them wif Data analysis.
  • Performed Query optimization for various SQL tables using Teradata explain command.
  • Developed procedures to populate teh customer data warehouse wif transaction data, cycle and monthly summer data.
  • Experienced in Linux environment and scheduling jobs, file transfers.
  • Very Good Understanding of Database skew, PPI, Join methods, aggregate and hash.

Environment: Informatica power center, Teradata, XML, Flat files, Cron Job, Linux, Bash shell, Python scripting, Power BI.

Spark Developer

Confidential

Responsibilities:

  • Imported required modules such as Keras and NumPy on Spark session, also created directories for data and output.
  • Read train and test data into teh data directory as well as into Spark variables for easy access and proceeded to train teh data based on a sample submission.
  • Teh images upon being displayed are represented as NumPy arrays, for easier data manipulation all teh images are stored as NumPy arrays.
  • Created a validation set using Keras2DML in order to test whether teh trained model was working as intended or not.
  • Defined multiple halper functions dat are used while running teh neural network in session. Also defined placeholders and number of neurons in each layer.
  • Created neural networks computational graph after defining weights and biases.
  • Created a TensorFlow session which is used to run teh neural network as well as validate teh accuracy of teh model on teh validation set.
  • After executing teh program and achieving acceptable validation accuracy a submission was created dat is stored in teh submission directory.
  • Executed multiple SparkSQL queries after forming teh Database to gather specific data corresponding to an image.

Environment: Scala, Python, PySpark, Spark, Spark ML Lib, Spark SQL, TensorFlow, NumPy, Keras, PowerBI.

Data Analyst

Confidential 

Responsibilities:

  • Created data visualizations and developed dashboard, stories in Power BI.
  • Utilized various charts like scatterplot, bar, pie, Heatmaps in Power BI for data analysis.
  • Analyzed various KPI results frequently to assist in development of performance improvement concepts.
  • Experience in using Excel pivot chart, VBA tools to create customized reports and analysis.
  • Assisted wif sizing query optimization, buffer tuning, backup and recovery, installations, upgrades and security including other administration functions as part of profiling plan.
  • Ensured production data being replicated into data warehouse wifout any data anomalies from teh processing databases.
  • Analyzed code to improve query optimization and to verify dat tables are using indexes.
  • Created, tested MySQL programming, forms, reports, triggers and procedures for teh Data Warehouse.
  • Involved in troubleshooting and fine-tuning of databases for its performance and concurrency.
  • Automated teh code release process, bringing teh total time for code releases from 8 hours to 1 hour.
  • Developed a fully automated continuous integration system using Git, MySQL and custom tools developed in Python and Bash.
  • Working experience Power BI Desktop to generate teh reports by writing teh SQL queries.
  • Played a key role in a department wide transition from Subversion to Git, which resulted in an increase in efficiency for teh development community.

Environment: Git, Tableau, MySQL, Python, Bash, VBA, Excel.

We'd love your feedback!