We provide IT Staff Augmentation Services!

Data Engineer/data Analyst Resume

New, JerseY


  • 7 years of experience as a Data Engineer, Data Analyst using Python, NumPy, Pandas, AWS, Postgres, Kafka, Cassandra, MongoDB.
  • Hands on experience using Panda’s data frames, NumPy, matplotlib and seaborn to create correlation, bar, time series plots.
  • Skilled in Tableau desktop for creating visualizations using bar charts, line charts, scatter plots, pie charts etc.
  • Have good experience in conducting exploratory data analysis, data mining and working with Statistical models.
  • Worked with python and Bash scripting in automating tasks and creating data pipelines.
  • Good Experience in creating and designing data ingestion pipelines using Apache Kafka.
  • Data preprocessing techniques like handling missing data, categorical data (dummy variable handling), feature scaling.
  • Good knowledge on data modelling and creating schemas (Snowflake/ Star), creating tables (fact/ dimension) in data warehouse. Experienced with 3NF, Normalization and Denormalization of tables depending on use case Scenario.
  • Well versed with Big data on AWS cloud services i.e., EC2, S3, Glue, Athena, DynamoDB and RedShift.
  • Experienced with performing Query optimization in MySQL, Teradata using explain commands to increase Query performance.
  • Proficiency in working with SQL/ NoSQL like MongoDB, Cassandra, MySQL and PostgreSQL.
  • Exposure to various AWS services like Lambda, VPC, IAM, Load Balancing, CloudWatch, SNS, SQS, Autoscaling, Load Balancing.
  • Good experience with data pipeline/workflow management tools like AWS Kinesis, Apache Airflow.
  • Experienced in performing ETL operations using Apache Airflow DAG, Informatica Power center to load data into Data Warehouse.
  • Have Knowledge on container Orchestration platform like Kubernetes and building images with Docker and deploying on private registry.
  • Experienced with full software development life cycle, architecting scalable platforms, object - oriented programming, database design and Kanban methodologies.
  • Experience in UNIX shell scripting for processing large volumes of data from varied sources and loading into databases like Teradata.
  • Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MongoDB, MySQL, and PostgreSQL database.
  • Thorough understanding of providing specifications using Waterfall and Agile Software methodology to modelling systems and business processes.
  • Have good experience in conducting exploratory data analysis, data mining and working with Statistical models.
  • Experience in Design and Development of ETL methodology for supporting Data Migration, data transformations &processing in a corporate wide ETL Solution using Teradata.
  • Worked on various applications usingpythonintegrated IDEs like Jupyter, Spider, Eclipse, VSCode, IntelliJ, ATOM and PyCharm.


Libraries: Keras, TensorFlow, GYM, scikit, matplotlib, seaborn, NumPy, Pandas, Boto3, Beautiful Soup, PySpark, Gurobi, scikit.

SQL/NoSQL: PostgreSQL, MongoDB, Cassandra, MySQL, MS SQL, Kafka

Language: Python, R, C, C++.

Operating System: Windows, Red Hat Linux

Version Control: Git, GitHub, SVN

Architecture: Relational DBMS, OLAP, OLTP.

Reporting Tools: Power-BI, Tableau, SSRS (SQL Server Reporting Services)

ETL Tools: Apache Airflow, Informatica, SSIS (SQL Server Integration Services)


Confidential - New Jersey

Data Engineer/Data Analyst


  • Utilized AWS EMR with spark to perform batch processing operations on various Big Data sources.
  • Installed the applications on AWS EC2 instances and configured on the storage S3 buckets.
  • Created data pipeline to extract data from various API and ingested them in S3 Buckets using Apache Airflow.
  • Deployed Lambda functions to be triggered on certain events in S3 Bucket to perform data transformation and load the data in AWS Redshift.
  • Worked with Linux EC2 instances and RDBMS databases.
  • Developed python and shell scripts to perform spark jobs for batch processing data from various sources.
  • Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) to store objects.
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD.
  • Extracted data using spark from AWS redshift and performed data Analysis.
  • Developed POC to perform ETL operations using AWS glue to load Kinesis stream data into S3 buckets.
  • Performed cleaning of data, data quality checks, data governance for incremental loads.
  • Normalized data and created correlation plots, scatter plots to find underlying patterns.
  • Filter and “clean” data by reviewing computer reports, printouts, and performance indicators to locate and correct code problems.
  • Testing dashboards, to ensuredatais matching as per the business requirements and if there are any discrepancies inunderlying data.
  • Created data pipelines using Step functions, implemented state machines and ran pipelines at different times to process data.
  • Develop and implement databases, data collection systems, data analytics and other strategies that optimize statistical efficiency and quality.
  • Work with management to prioritize business and information needs.

Environment: Python, AWS, Jira, GIT, CI/CD, Docker, Kubernetes, Web Services, Spark streaming API, Kafka, Cassandra, Python, MongoDB, JSON, Bash scripting, Linux, SQL, Apache Airflow.

Confidential - Sioux Falls, SD

Data Engineer


  • Implemented Spark using Python and utilized Data frames and Spark SQL API for processing and querying data.
  • Developed Spark Applications by using python Driver and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Worked with various unstructured data sources like parquet, json, csv etc. using spark.
  • Experienced in working real time streaming with Kafka as data pipeline using spark streaming module.
  • Consumed Kafka messages and loaded data into Cassandra cluster deployed in containers.
  • Developed preprocessing jobs using spark data frames to flatten json files.
  • Ingested data into Mysql RDBMS data and performed transformations and then export the transformed data to Cassandra.
  • Assisted in Key space creation, Table creation, Secondary Index creation in Cassandra database.
  • Performed Query optimization of the tables through load testing using Cassandra stress tool.
  • Created various POC using pyspark module in python using MLlib.
  • Worked with various teams in deploying containers on site using Kubernetes to run Cassandra clusters in Linux Environment.
  • Good Knowledge on Kubernetes architecture like scheduler, pods, nodes, kubectl api and etcd database.
  • Experience in creating tables in Cassandra clusters, building images using docker and deploying the images.
  • Assisted in developing and creating schemas, tables for Cassandra cluster to ensure good query performance for the front-end Application.
  • Experience in managing MongoDB environment from availability, performance and scalability perspectives.
  • Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
  • Used GitHub as a version control.
  • Worked on the UNIX environment.

Environment: Python, Spark, Kafka, JSON, GitHub, LINUX, Flask, Varnish, Nginx, REST, CI CD, Kubernetes, Helm, MongoDB, Cassandra.

Confidential - Norfolk, VA

Data Engineer


  • Involved in developing python scripts, informatica and other ETL tools for extraction, transformation, loading of data into Teradata.
  • Worked with a Team of Business Analysts in requirements gathering, business analysis and project coordination in creating reports.
  • Performed Unit and Integration Testing of Informatica Sessions, Batches and Target Data.
  • Responsible for Using Autosys and workflow Manager Tools to schedule Informatica jobs.
  • Responsible for creating Workflows and sessions using Informatica workflow manager and monitor the workflow run and statistic properties on Informatica Workflow Monitor
  • Developing complex Informatica mappings using different types of transformations like Connected and Unconnected LOOKUP transformations, Router, Filter, Aggregator, Expression, Normalizer and Update strategy transformations for large volumes of Data.
  • Involving Testing, Debugging, Validation and Performance Tuning of data warehouse, help develop optimum solutions for data warehouse deliverables.
  • Moving the data from source systems to different schemas based on the dimensions and fact tables by using the slowly changing dimensions type two and type one.
  • Interacted with key users and assisted them with various data issues, understood data needs and assisted them with Data analysis.
  • Performed Query optimization for various SQL tables using Teradata explain command.
  • Developed procedures to populate the customer data warehouse with transaction data, cycle and monthly summer data.
  • Experienced in Linux environment and scheduling jobs, file transfers.
  • Very Good Understanding of Database skew, PPI, Join methods, aggregate and hash.

Environment: Informatica power center, Teradata, XML, Flat files, Cron Job, Linux, Bash shell, Python scripting.


Data Analyst


  • Created data visualizations and developed dashboard, stories in Tableau.
  • Utilized various charts like scatterplot, bar, pie, heatmap in Tableau for data analysis.
  • Analyzed various KPI results frequently to assist in development of performance improvement concepts.
  • Experience in using Excel pivot chart, VBA tools to create customized reports and analysis.
  • Assisted with sizing query optimization, buffer tuning, backup and recovery, installations, upgrades and security including other administration functions as part of profiling plan.
  • Ensured production data being replicated into data warehouse without any data anomalies from the processing databases.
  • Designed databases for referential integrity and involved in logical design plan.
  • Analyzed code to improve query optimization and to verify that tables are using indexes.
  • Created, tested MySQL programming, forms, reports, triggers and procedures for the Data Warehouse.
  • Involved in troubleshooting and fine-tuning of databases for its performance and concurrency.
  • Automated the code release process, bringing the total time for code releases from 8 hours to 1 hour.
  • Developed a fully automated continuous integration system using Git, MySQL and custom tools developed in Python and Bash.
  • Working experience Tableau Desktop to generate the reports by writing the SQL queries.
  • Played a key role in a department wide transition from Subversion to Git, which resulted in an increase in efficiency for the development community.

Environment: Python, MSQL, Tableau, JSON, GitHub, LINUX.

Hire Now