We provide IT Staff Augmentation Services!

Data Engineer Resume

Marlborough, MA


  • Professional with 6 years of experience in IT industry comprising of build release management, software configuration, design, development and cloud implementation.
  • Excellent problem solving, communication, and teamwork skills, including 2 years with Big Data Technologies working on Hadoop platform with expertise in key Hadoop technologies such as Hive, Sqoop, Spark, Python and Kafka and AWS cloud services such as EMR, S3, EBS,EFS,RDS, Redshift, Glue, Snowball, Lambda, SNS, Cloudwatch etc
  • Hands on experience with application migration from RDBMS toHadoop platform, Healthcare domain, Real time streaming with Apache Kafka and Spark streaming, and Strong knowledge on Python, Python and Java development including Spark RDD and Dataframe programming.
  • Progressive hands - on experience in analysis, ETL processes, design and development of enterprise level data warehouse architectures, designing, coding, testing, and integrating ETL.
  • Solid knowledge on big data framework: Hadoop, HDFS, Apache Spark, Hive, Map/Reduce and Sqoop
  • Hands on Expertise on Python development including Spark RDD and Data frame programming
  • Strong experience with application/database migration from RDBMS to Hadoop
  • Sound relational database concepts and extensively worked with SQL Server, Oracle. Expert in writing complex SQL queries and stored procs
  • Use JSON and XML SerDe Properties to load JSON and XML data into Hive tables.
  • Built Spark Applications using IntelliJ and Maven.
  • Extensively worked on Python programing language for Data Engineering using Spark
  • Experience with Real time streaming involving Apache Kafka and Spark Streaming.
  • Strong knowledge of Database architecture and Data Modeling including Hive and Oracle.
  • Excellent interpersonal and communication skills, technically competent and result-oriented with problem solving and leadership skills.
  • Sound understanding of agile development and Agile Tools


Big Data Ecosystem: Hadoop, Apache Spark, Hive, Kafka, HDFSMapReduce, Sqoop, Hbase, Zookeeper, Python

Cloud: AWS (S3,IAM, EMR, Redshift, Lambda, Glue, SES, EBS,EFS,SNS, SES,CloudWatch, CloudTrail, Snowball)

Databases: MySQL, Hbase, MS SQL

Programming Languages: Java, Python, Scala, SQL, NoSQL, HiveQL, T-SQL

Tools: IntelliJ, Eclipse, PyCharm, Putty, Tableau

Operating Systems: Linux, Windows

Packages: VMWare, Oracle VM Vitual Box, MS Office, SSIS(ETL)


Confidential, MARLBOROUGH, MA

Data Engineer


  • Developed data ingestion modules for moving data from/to various layers such as Inbound, Raw and Curated S3 Layers as well as Redshift using AWS services such as Lambda, PySpark, data pipeline and Glue
  • Developed Pyspark scripts for data transfer between S3 to redshift
  • Developed Shell scripts with AWS CLI commands to copy data from On Prem Gateway Server to AWS S3 as part of daily data sych process
  • Loaded the data of tests performed at local RRL from the SQL server to Hive using Sqoop
  • Set up life cycle policy to move files from Inbound bucket to glacier after 7 days
  • Used different data formats(Text, Avro, Pacquet, JSON, ORC) while files were being loaded to HDFS
  • Created spark jobs to apply data cleansing/data validation rules on new source files in Inbound bucket and reject records to reject-data S3 bucket
  • Generate Parquet files from the cleansed source files in the Raw S3 bucket
  • Created Hive tables which stored the processed results in a Tabular format
  • Created Managed and External tables in Hive, loaded data from HDFS and performed complex HiveQL queries on the tables based on business needs for reporting
  • Created Partitioning and Buckecting HiveQL Queries based on STAT criteria, which helped optimize Query performance
  • Created and scheduled Sqoop jobs for automated batch data load
  • Created complexapplication logic to help process the tests data by using SparkSQL and Spark Dataframe to cleanse and integrate imported data based on business requirement
  • Built Spark applications using IntelliJ and Maven, and used Python/Scala programming languages for data engineering in Spark framework
  • Also monitored and maintained all Hive, Sqoop and Spark jobs to make sure they were optimized at all stages. Used Resource manager and YARN queues to manage and monitor Hadoop jobs


Test Automation Engineer


  • Installed and configured AWS CLI in prem servers
  • Created shell scripts to load data from on prem gateway server to S3 using AWS CLI
  • Created below S3 buckets for inbound, raw, curated and reject data sets
  • Set up life cycle policy to move files from Inbound bucket to glacier after 7 days
  • Created hive databases and external hive tables in EMR pointing to files in Raw S3 bucket
  • Created spark jobs to apply data cleansing/data validation rules on new source files in Inbound bucket and reject records to reject-data S3 bucket
  • Generate Parquet files from the cleansed source files in the Raw S3 bucket
  • Apply MSCK Repair on a daily basis for the hive partition refresh post the data load
  • Created tables along with sort and distribution keys in AWS Redshift
  • Loaded Redshift tables using COPY commands and spark jobs to perform various data loads such as Truncate&Load, Append to existing data sets Update and Insert using Merge commands
  • Developed AWS Lambda to invoke glue job as soon as a new file is available in Inbound S3 bucket
  • Glue job will read the cleansed data in Raw bucket and generate a parquet file
  • Successor glue job will read the parquet file om S3 and load into Reshift tables
  • Reject record handling based on threshold value

Confidential, Atlanta, GA

Test Automation Engineer


  • Participating in Sprint Grooming and Planning calls for understanding of User stories of the application and tracking the task and hours in Pivotal Tool.
  • Developing the Selenium automatio scripts with the use of Java and Cucumber framework(BDD framework)
  • Working on Regression Automation Test suite, which consolidates the automated regression test cases covering the user stories of each script.
  • Uploading the regression test cases to Git following CI/CD model.
  • Running the automated regression test cases at the end of each sprint on CI/CD model with help of Jenkin jobs and Sauce labs.
  • Defect tracking and test Management in Pivotal.
  • Supporting the business teams for the User Acceptance testing.
  • Working on SQL queries and automation scripts to interact with TeraData Database.
  • Analyzing the Functional Requirements.
  • Understanding company standard protocols.
  • Handle the tasks of improving system effeiciency by implementing software security standards.
  • Involved in Functional, Re-Testing, Regression.
  • Desighned and executed test cases as per functional specifications.
  • Prepared Test Data for test execution.
  • Held discussions with the Designer /Developer and prepared test cases. After execution, validated the behavior of the website.
  • Tracking the Defects to Closure and Defects Verification.
  • Analyzed existing business scenarios and business rules.
  • Performed Negative testing to find how the functions performed when system encountered invalid, unexpected values.
  • Performed manual testing prior to automate the testing on the application.
  • Involved in developing detailed Test Plan, Test Cases and Test Scripts using Quality Center for Functional, Security and Regression Testing.
  • Used Quality Center to track and report system defects and bug fixes.
  • Responsible for creating and filing bugs .
  • Created Test input requirements and prepared the Test data and Test scripts using HOMER framework.
  • Participated in the Bug review meetings and Bug Traiage meetings.

Confidential, Sandy, UT

Java Developer


  • Involved in SDLC Requiremnt gathering, Analysis, Design, Development and testing of applications using Agile Model.
  • Developed web applications using Spring MVC
  • Expereince on Java Multi-Threading, collection,interfaces,synchronization,and exception handling
  • Created test plans and Junit test cases and test suite for testing the application
  • Used Maven script for building and deploying the application
  • Involved in bug fixes, enhancements and code refactoring.
  • Involved in the preparation of the test data
  • Interacting with the team and resolving the issues.
  • Involved in coding using Java,JSP,Servlets, and Spring.
  • Developed application using Eclipse IDE.
  • Used Spring MVC design pattern while developing the application. Used JSPs in the presentation tier along
  • Interacted with MYSQL and wrote complex queries to fix the data related issues for the stores.
  • Used Hibernate to persist data to MYSQL database and wrote SQL quieres.

Confidential, HOLDEN, MA

Infrastructure Support Services Specialist


  • Responded to request for technical assistance to diagnose and resolve user network (WAN, LAN), hardware and software issues.
  • Managed and maintained daily checks, tasks and changes to windows, Linux and storage systems.
  • Administered and Implemented software, firmware and driver updates to all supported systems in a timely manner.
  • Performed end user device software patching as directed by department policies
  • Conducted periodic sessions on employee desktop applications (e.g., Microsoft Office) and hardware to augment formal .
  • Performed end user account provisioning and maintenance within active directory
  • Provided support for infrastructure and application development projects, and provided internal to users.
  • SQL Database administration, SQL scripting, MS Access Database, and BO report writing
  • Provided support to MS office word, Outlook, Powerpoint and Excel when they were needed.

Hire Now