We provide IT Staff Augmentation Services!

Data Engineer Resume

Philadelphia, PA


  • An enthusiastic professional with a passion for engineering and Big Data analytics. Have an experience in Media and health business domains.
  • Professional IT experience in technology methodologies like Hadoop Big data Ecosystem and Java/J2EE related technologies
  • Have a very good experience creating Tableau Views
  • Have very good experience using SQL, PostgreSQL
  • Experience in Java, Python, Scala programming Languages
  • Experience working in Spark EMR clusters
  • Configured Spark streaming to receive real - time data from messaging system like Apache Kafka and then store the data into Amazon S3
  • Monitoring the Hadoop clusters using AWS cloudwatch
  • Have good experience working with Linux based Operating systems
  • Experience in enterprise tools like JIRA, Confluence, Jenkins.


Big Data Ecosystem: Spark, Apache Kafka, HDFS, Zookeeper

Languages: Java, Scala, Python

DB languages: MySQL, PL/ SQL, Oracle

AWS Components: S3, EMR, RDS, Redshift

Operating Systems: Linux based like CentOS, RedHat


Confidential, Philadelphia, PA

Data Engineer


  • As part of Analytics and Dashboard team, worked on implementing pipelines for ETL purposes and developed business specific dashboards which helped Executives and Directors of different teams to take analytical based decisions
  • Worked closely with Release management and Product teams to provide analysis on different releases and versions using SQL, Postgres for analysis and Tableau dashboards to provide different insights to the users
  • Analyzed complex data sets to create correlation between large data sets using Java, SQL, PL/ SQL Worked on AWS Redshift, RDS and S3 - Improved the Tableau performance by using Data blending
  • Developed one of the important projects for calculating the health index for the whole data set using Apache Spark - Improved performance of the system by designing the cluster and writing the Spark Submit jobs. Was able to optimize the compute time to 40 mins which was taking one hour to process the whole dataset
  • Implemented the tasks using Scala with Spark. Used Spark SQL and data frames to process data. Also experience with Python Pandas
  • Co - ordinated with the Product owners to perform Cohort analysis, capture user engagement and improve the dashboard views
  • Developed dashboards by getting data from Splunk. Worked on writing Splunk queries to filter JSON data and put into CSV - Improved performance by creating only aggregate tables in RDS.
  • Implemented ETL processes using shell scripts by refactoring old Java programs and improved performance by Query optimization - This increased the application run time by 50 %
  • Owned the deployment of applications to Production. Worked on Jenkins to build the applications and deploy the jobs in Production VM’s. Experience with source control version Git.

Confidential, San Francisco, CA

Machine Vision Intern


  • As a part of the project at Confidential, was involved in researching different cost-effective technologies for Machine Vision
  • Was given a chance to experiment with different devices like Kinect to capture images and collect the data and work on it
  • As part of the project need to create application on Face recognition and paralysis detection using Machine learning techniques have used Java and Python to develop the application was also involved in various brainstorming sessions to make the product better


Hadoop Intern


  • As a part of the project at Confidential, was involved in researching different technologies related to Cloud Computing
  • Was involved in creating ER relations diagrams for the relational database
  • Involved in loading data form Unix file system to HDFS
  • Monitoring the Hadoop clusters using AWS cloudwatch
  • Managing and scheduling jobs on Hadoop Cluster
  • Extracted feeds from social media platforms such as Facebook, Twitter using Python Scripts
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive
  • Was responsible for building Map Reduce programs in Java for data cleaning and preprocessing

Hire Now