Sr. Data Engineer Resume Nashville, TN - Hire IT People

SUMMARY

8 years of experience as a Data Engineer, Data Analyst using Python, NumPy, Pandas, AWS, Postgres, Kafka, Cassandra, MongoDB.
Hands on experience using Panda’s data frames, NumPy, MatPlotLib and seaborn to create correlation, bar, time series plots.
Skilled in Tableau desktop for creating visualizations using bar charts, line charts, scatter plots, pie charts etc.
Well versed wif big data on AWS cloud services i.e., EC2, S3, Glue, Athena, DynamoDB and RedShift.
Good noledge on data modelling and creating schemas (Snowflake/ Star), creating tables (fact/ dimension) in data warehouse. Experienced wif 3NF, Normalization and denormalization of tables depending on use case Scenario.
Has good experience in conducting exploratory data analysis, data mining and working wif Statistical models.
Worked wif python and Bash scripting in automating tasks and creating data pipelines.
Good Experience in creating and designing data ingestion pipelines using Apache Kafka.
Data preprocessing techniques like handling missing data, categorical data (dummy variable handling), feature scaling.
Experienced wif performing Query optimization in MySQL, Teradata using explain commands to increase Query performance.
Proficiency in working wif SQL/ NoSQL like MongoDB, Cassandra, MySQL, and PostgreSQL.
Exposure to various AWS services like Lambda, VPC, IAM, Load Balancing, Cloud Watch, SNS, SQS, Auto scaling, Load Balancing.
Good experience wif data pipeline/workflow management tools like AWS Kinesis, Apache Airflow.
Experienced in performing ETL operations using Apache Airflow DAG, Informatica Power center to load data into Data Warehouse.
Has Knowledge on container Orchestration platform like Kubernetes and building images wif Docker and deploying on private registry.
Experienced wif full software development life cycle, architecting scalable platforms, object - oriented programming, database design and Kanban methodologies.
Experience in UNIX shell scripting for processing large volumes of data from varied sources and loading into databases like Teradata.
Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MongoDB, MySQL, and PostgreSQL database.
Thorough understanding of providing specifications using Waterfall and Agile Software methodology to be modeling systems and business processes.
Has good experience in conducting exploratory data analysis, data mining and working wif Statistical models.
Experience in Design and Development of ETL methodology for supporting Data Migration, data transformations &processing in a corporate wide ETL Solution using Teradata.
Worked on various applications using python integrated IDEs like Jupyter, Spider, Eclipse, VSCode, IntelliJ, ATOM and PyCharm.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Nashville, TN

Responsibilities:

Utilized AWS EMR wif spark to perform batch processing operations on various Big Data sources.
Installed teh applications on AWS EC2 instances and configured on teh storage S3 buckets.
Worked wif Linux EC2 instances and RDBMS databases.
Created data pipeline to extract data from various API and ingested them in S3 Buckets using Apache Airflow.
Deployed Lambda functions to be triggered on certain events in S3 Bucket to perform data transformation and load teh data in AWS Redshift.
Experience in AWS EC2, configuring teh servers for Auto scaling and Elastic load balancing.
Developed python and shell scripts to perform spark jobs for batch processing data from various sources.
Working noledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) to store objects.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD.
Extracted data using spark from AWS redshift and performed data Analysis.
Developed POC to perform ETL operations using AWS glue to load Kinesis stream data into S3 buckets.
Performed cleaning of data, data quality checks, data governance for incremental loads.
Normalized data and created correlation plots, scatter plots to find underlying patterns.
Filter and “clean” data by reviewing computer reports, printouts, and performance indicators to locate and correct code problems.
Testing dashboards, to ensure data is matching as per teh business requirements and if their are any discrepancies in underlying data.
Created data pipelines using Step functions, implemented state machines and ran pipelines at different times to process data.
Develop and implement databases, data collection systems, data analytics and other strategies dat optimize statistical efficiency and quality.
Work wif management to prioritize business and information needs.

Environment: Python, AWS, Jira, GIT, CI/CD, Docker, Kubernetes, Web Services, Spark streaming API, Kafka, Cassandra, Python, MongoDB, JSON, Bash scripting, Linux, Rest API, SQL, Apache Airflow.

Python Developer / Data Engineer

Confidential, Negaunee, MI

Responsibilities:

Implemented Spark using Python and utilized Data frames and Spark SQL API for processing and querying data.
Developed Spark Applications by using python Driver and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
Worked wif various unstructured data sources like parquet, json, csv etc. using spark.
Developed preprocessing jobs using spark data frames to flatten json files.
Experienced in working real time streaming wif Kafka as data pipeline using spark streaming module.
Consumed Kafka messages and loaded data into Cassandra cluster deployed in containers.
Ingested data into MySQL RDBMS data and performed transformations and then export teh transformed data to Cassandra.
Assisted in Key space creation, Table creation, Secondary Index creation in Cassandra database.
Worked wif various teams in deploying containers on site using Kubernetes to run Cassandra clusters in Linux Environment.
Created various POC using pyspark module in python using MLlib.
Good Knowledge on Kubernetes architecture like scheduler, pods, nodes, cataleptic. database.
Experience in creating tables in Cassandra clusters, building images using docker and deploying teh images.
Assisted in developing and creating schemas, tables for Cassandra cluster to ensure good query performance for teh front-end Application.
Created airflow jobs wif snowflake operator to create fact and dimension tables in snowflake warehouse.
Connected snowflake to Tableau to create data reporting dashboards.
Worked on teh UNIX environment, developing bash shell scripts running CRON jobs.

Environment: Python, Spark, Kafka, JSON, GitHub, LINUX, Flask, Nginx, REST, CI CD, Kubernetes, Helm, MongoDB, Cassandra, Snowflake, Tableau.

Data Engineer

Confidential, St Louis, Missouri

Responsibilities:

Involved in developing python scripts, informatica and other ETL tools for extraction, transformation, loading of data into Teradata.
Worked wif a Team of Business Analysts in requirements gathering business analysis and project coordination in creating reports.
Performed Unit and Integration Testing of Informatica Sessions, Batches and Target Data.
Responsible for Using Autosys and workflow Manager Tools to schedule Informatica jobs.
Responsible for creating Workflows and sessions using Informatica workflow manager and monitor teh workflow run and statistic properties on Informatica Workflow Monitor
Developing complex Informatica mappings using different types of transformations like Connected and Unconnected LOOKUP transformations, Router, Filter, Aggregator, expression, Normalizer and Update strategy transformations for large volumes of Data.
Involving Testing, Debugging, Validation and Performance Tuning of data warehouse, halp develop optimum solutions for data warehouse deliverables.
Moving teh data from source systems to different schemas based on teh dimensions and fact tables by using teh slowly changing dimensions type two and type one.
Interacted wif key users and assisted them wif various data issues, understood data needs and assisted them wif Data analysis.
Performed Query optimization for various SQL tables using Teradata explain command.
Developed procedures to populate teh customer data warehouse wif transaction data, cycle and monthly summer data.
Experienced in Linux environment and scheduling jobs, file transfers.
Very Good Understanding of Database skew, PPI, Join methods, aggregate and hash.

Environment: Informatica power center, Teradata, XML, Flat files, Cron Job, Linux, Bash shell, Python scripting, Power BI.

Spark Developer

Confidential

Responsibilities:

Imported required modules such as Keras and NumPy on Spark session, also created directories for data and output.
Read train and test data into teh data directory as well as into Spark variables for easy access and proceeded to train teh data based on a sample submission.
Teh images upon being displayed are represented as NumPy arrays, for easier data manipulation all teh images are stored as NumPy arrays.
Created a validation set using Keras2DML in order to test whether teh trained model was working as intended or not.
Defined multiple halper functions dat are used while running teh neural network in session. Also defined placeholders and number of neurons in each layer.
Created neural networks computational graph after defining weights and biases.
Created a TensorFlow session which is used to run teh neural network as well as validate teh accuracy of teh model on teh validation set.
After executing teh program and achieving acceptable validation accuracy a submission was created dat is stored in teh submission directory.
Executed multiple SparkSQL queries after forming teh Database to gather specific data corresponding to an image.

Environment: Scala, Python, PySpark, Spark, Spark ML Lib, Spark SQL, TensorFlow, NumPy, Keras, PowerBI.

Data Analyst

Confidential

Responsibilities:

Created data visualizations and developed dashboard, stories in Power BI.
Utilized various charts like scatterplot, bar, pie, Heatmaps in Power BI for data analysis.
Analyzed various KPI results frequently to assist in development of performance improvement concepts.
Experience in using Excel pivot chart, VBA tools to create customized reports and analysis.
Assisted wif sizing query optimization, buffer tuning, backup and recovery, installations, upgrades and security including other administration functions as part of profiling plan.
Ensured production data being replicated into data warehouse wifout any data anomalies from teh processing databases.
Analyzed code to improve query optimization and to verify dat tables are using indexes.
Created, tested MySQL programming, forms, reports, triggers and procedures for teh Data Warehouse.
Involved in troubleshooting and fine-tuning of databases for its performance and concurrency.
Automated teh code release process, bringing teh total time for code releases from 8 hours to 1 hour.
Developed a fully automated continuous integration system using Git, MySQL and custom tools developed in Python and Bash.
Working experience Power BI Desktop to generate teh reports by writing teh SQL queries.
Played a key role in a department wide transition from Subversion to Git, which resulted in an increase in efficiency for teh development community.

Environment: Git, Tableau, MySQL, Python, Bash, VBA, Excel.

We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Nashville, TN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship