Sr. Data Engineer Resume
Nashville, TN
SUMMARY
- 8 years of experience as a Data Engineer, Data Analyst using Python, NumPy, Pandas, AWS, Postgres, Kafka, Cassandra, MongoDB.
- Hands on experience using Panda’s data frames, NumPy, MatPlotLib and seaborn to create correlation, bar, time series plots.
- Skilled in Tableau desktop for creating visualizations using bar charts, line charts, scatter plots, pie charts etc.
- Well versed wif big data on AWS cloud services i.e., EC2, S3, Glue, Athena, DynamoDB and RedShift.
- Good noledge on data modelling and creating schemas (Snowflake/ Star), creating tables (fact/ dimension) in data warehouse. Experienced wif 3NF, Normalization and denormalization of tables depending on use case Scenario.
- Has good experience in conducting exploratory data analysis, data mining and working wif Statistical models.
- Worked wif python and Bash scripting in automating tasks and creating data pipelines.
- Good Experience in creating and designing data ingestion pipelines using Apache Kafka.
- Data preprocessing techniques like handling missing data, categorical data (dummy variable handling), feature scaling.
- Experienced wif performing Query optimization in MySQL, Teradata using explain commands to increase Query performance.
- Proficiency in working wif SQL/ NoSQL like MongoDB, Cassandra, MySQL, and PostgreSQL.
- Exposure to various AWS services like Lambda, VPC, IAM, Load Balancing, Cloud Watch, SNS, SQS, Auto scaling, Load Balancing.
- Good experience wif data pipeline/workflow management tools like AWS Kinesis, Apache Airflow.
- Experienced in performing ETL operations using Apache Airflow DAG, Informatica Power center to load data into Data Warehouse.
- Has Knowledge on container Orchestration platform like Kubernetes and building images wif Docker and deploying on private registry.
- Experienced wif full software development life cycle, architecting scalable platforms, object - oriented programming, database design and Kanban methodologies.
- Experience in UNIX shell scripting for processing large volumes of data from varied sources and loading into databases like Teradata.
- Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MongoDB, MySQL, and PostgreSQL database.
- Thorough understanding of providing specifications using Waterfall and Agile Software methodology to be modeling systems and business processes.
- Has good experience in conducting exploratory data analysis, data mining and working wif Statistical models.
- Experience in Design and Development of ETL methodology for supporting Data Migration, data transformations &processing in a corporate wide ETL Solution using Teradata.
- Worked on various applications using python integrated IDEs like Jupyter, Spider, Eclipse, VSCode, IntelliJ, ATOM and PyCharm.
PROFESSIONAL EXPERIENCE
Sr. Data Engineer
Confidential, Nashville, TN
Responsibilities:
- Utilized AWS EMR wif spark to perform batch processing operations on various Big Data sources.
- Installed teh applications on AWS EC2 instances and configured on teh storage S3 buckets.
- Worked wif Linux EC2 instances and RDBMS databases.
- Created data pipeline to extract data from various API and ingested them in S3 Buckets using Apache Airflow.
- Deployed Lambda functions to be triggered on certain events in S3 Bucket to perform data transformation and load teh data in AWS Redshift.
- Experience in AWS EC2, configuring teh servers for Auto scaling and Elastic load balancing.
- Developed python and shell scripts to perform spark jobs for batch processing data from various sources.
- Working noledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) to store objects.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD.
- Extracted data using spark from AWS redshift and performed data Analysis.
- Developed POC to perform ETL operations using AWS glue to load Kinesis stream data into S3 buckets.
- Performed cleaning of data, data quality checks, data governance for incremental loads.
- Normalized data and created correlation plots, scatter plots to find underlying patterns.
- Filter and “clean” data by reviewing computer reports, printouts, and performance indicators to locate and correct code problems.
- Testing dashboards, to ensure data is matching as per teh business requirements and if their are any discrepancies in underlying data.
- Created data pipelines using Step functions, implemented state machines and ran pipelines at different times to process data.
- Develop and implement databases, data collection systems, data analytics and other strategies dat optimize statistical efficiency and quality.
- Work wif management to prioritize business and information needs.
Environment: Python, AWS, Jira, GIT, CI/CD, Docker, Kubernetes, Web Services, Spark streaming API, Kafka, Cassandra, Python, MongoDB, JSON, Bash scripting, Linux, Rest API, SQL, Apache Airflow.
Python Developer / Data Engineer
Confidential, Negaunee, MI
Responsibilities:
- Implemented Spark using Python and utilized Data frames and Spark SQL API for processing and querying data.
- Developed Spark Applications by using python Driver and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Worked wif various unstructured data sources like parquet, json, csv etc. using spark.
- Developed preprocessing jobs using spark data frames to flatten json files.
- Experienced in working real time streaming wif Kafka as data pipeline using spark streaming module.
- Consumed Kafka messages and loaded data into Cassandra cluster deployed in containers.
- Ingested data into MySQL RDBMS data and performed transformations and then export teh transformed data to Cassandra.
- Assisted in Key space creation, Table creation, Secondary Index creation in Cassandra database.
- Worked wif various teams in deploying containers on site using Kubernetes to run Cassandra clusters in Linux Environment.
- Created various POC using pyspark module in python using MLlib.
- Good Knowledge on Kubernetes architecture like scheduler, pods, nodes, cataleptic. database.
- Experience in creating tables in Cassandra clusters, building images using docker and deploying teh images.
- Assisted in developing and creating schemas, tables for Cassandra cluster to ensure good query performance for teh front-end Application.
- Created airflow jobs wif snowflake operator to create fact and dimension tables in snowflake warehouse.
- Connected snowflake to Tableau to create data reporting dashboards.
- Worked on teh UNIX environment, developing bash shell scripts running CRON jobs.
Environment: Python, Spark, Kafka, JSON, GitHub, LINUX, Flask, Nginx, REST, CI CD, Kubernetes, Helm, MongoDB, Cassandra, Snowflake, Tableau.
Data Engineer
Confidential, St Louis, Missouri
Responsibilities:
- Involved in developing python scripts, informatica and other ETL tools for extraction, transformation, loading of data into Teradata.
- Worked wif a Team of Business Analysts in requirements gathering business analysis and project coordination in creating reports.
- Performed Unit and Integration Testing of Informatica Sessions, Batches and Target Data.
- Responsible for Using Autosys and workflow Manager Tools to schedule Informatica jobs.
- Responsible for creating Workflows and sessions using Informatica workflow manager and monitor teh workflow run and statistic properties on Informatica Workflow Monitor
- Developing complex Informatica mappings using different types of transformations like Connected and Unconnected LOOKUP transformations, Router, Filter, Aggregator, expression, Normalizer and Update strategy transformations for large volumes of Data.
- Involving Testing, Debugging, Validation and Performance Tuning of data warehouse, halp develop optimum solutions for data warehouse deliverables.
- Moving teh data from source systems to different schemas based on teh dimensions and fact tables by using teh slowly changing dimensions type two and type one.
- Interacted wif key users and assisted them wif various data issues, understood data needs and assisted them wif Data analysis.
- Performed Query optimization for various SQL tables using Teradata explain command.
- Developed procedures to populate teh customer data warehouse wif transaction data, cycle and monthly summer data.
- Experienced in Linux environment and scheduling jobs, file transfers.
- Very Good Understanding of Database skew, PPI, Join methods, aggregate and hash.
Environment: Informatica power center, Teradata, XML, Flat files, Cron Job, Linux, Bash shell, Python scripting, Power BI.
Spark Developer
Confidential
Responsibilities:
- Imported required modules such as Keras and NumPy on Spark session, also created directories for data and output.
- Read train and test data into teh data directory as well as into Spark variables for easy access and proceeded to train teh data based on a sample submission.
- Teh images upon being displayed are represented as NumPy arrays, for easier data manipulation all teh images are stored as NumPy arrays.
- Created a validation set using Keras2DML in order to test whether teh trained model was working as intended or not.
- Defined multiple halper functions dat are used while running teh neural network in session. Also defined placeholders and number of neurons in each layer.
- Created neural networks computational graph after defining weights and biases.
- Created a TensorFlow session which is used to run teh neural network as well as validate teh accuracy of teh model on teh validation set.
- After executing teh program and achieving acceptable validation accuracy a submission was created dat is stored in teh submission directory.
- Executed multiple SparkSQL queries after forming teh Database to gather specific data corresponding to an image.
Environment: Scala, Python, PySpark, Spark, Spark ML Lib, Spark SQL, TensorFlow, NumPy, Keras, PowerBI.
Data Analyst
Confidential
Responsibilities:
- Created data visualizations and developed dashboard, stories in Power BI.
- Utilized various charts like scatterplot, bar, pie, Heatmaps in Power BI for data analysis.
- Analyzed various KPI results frequently to assist in development of performance improvement concepts.
- Experience in using Excel pivot chart, VBA tools to create customized reports and analysis.
- Assisted wif sizing query optimization, buffer tuning, backup and recovery, installations, upgrades and security including other administration functions as part of profiling plan.
- Ensured production data being replicated into data warehouse wifout any data anomalies from teh processing databases.
- Analyzed code to improve query optimization and to verify dat tables are using indexes.
- Created, tested MySQL programming, forms, reports, triggers and procedures for teh Data Warehouse.
- Involved in troubleshooting and fine-tuning of databases for its performance and concurrency.
- Automated teh code release process, bringing teh total time for code releases from 8 hours to 1 hour.
- Developed a fully automated continuous integration system using Git, MySQL and custom tools developed in Python and Bash.
- Working experience Power BI Desktop to generate teh reports by writing teh SQL queries.
- Played a key role in a department wide transition from Subversion to Git, which resulted in an increase in efficiency for teh development community.
Environment: Git, Tableau, MySQL, Python, Bash, VBA, Excel.