We provide IT Staff Augmentation Services!

Data Engineer Resume

San Jose, CaliforniA


Big Data Technologies: Kafka, Spark, Hive, HDFS, Linux, Pig, Impala, YARN, MapReduce, Git, AirFlow, Oozie, Redshift, NiFi.

Programming Languages: Java, Scala and Python

NoSQL: Hbase, MongoDB

SQL: MySQL, Oracle, Spark SQL and HQL

Cloud: AWS and GCP


Confidential, San Jose, California

Data Engineer


  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Design and Deploy infrastructure in AWS.
  • Front - end communication with Clients for on boarding their new projects/deliverables.
  • Contributing to preparing High Level Diagram (HLD) and Low-Level Diagram (LLD) for on boarding clients.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Migrating Services from On-premises to AzureCloud Environments.
  • Collaborate with development and QA teams to maintain high-quality deployment
  • Designed Client/Server telemetry adopting latest monitoring techniques.
  • Infrastructure Migrations: Drive Operational efforts to migrate all legacy services to a fully Virtualized Infrastructure.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database equeries from Python using Python-MySQL connector MySQL dB package to retrieve information.
  • Developed various algorithms for generating several data patterns.Used JIRA for bug tracking and issue tracking. source using Scala, Spark by applying the transformations.
  • •Experience in Python pandas and numpy
  • Performance optimization of Pyspark code in the Shell Scripting
  • Have worked on Flask or Django or any other API framework
  • Experience with data frameworks in Python/Scala
  • Familiar with the concept of building libraries in Python/Scala
  • Transform current model into Python Spark or similar technologies
  • Analyze various results produced by the models and reconcile outputs with current model outputs
  • Responsible for building scalable distributed data solutions on Cloudera distributed Hadoop.
  • Involved in using spark streaming and SPARK jobs for ongoing transactions of customers and Spark SQL to handle structured data in Hive.
  • Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
  • Applied both supervised and unsupervised machine learning algorithms for performing churn prediction based on the customer information.
  • Experience with Spark MLlib library for implementing the machine learning algorithms in cluster and distributed mode, where data was stored in HDFS.
  • Import the data from different sources like HDFS/MYSQL into SparkRDD.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Worked extensively on AWS cloud building a data lake from scratch, data migrations from legacy systems (MySQL, ORACLE, SQL Server, Teradata) into cloud usingPaaS, Iaas, SaaS, Db as a service and networking as a Service using S3, EMR, Step Function, EC2,RDS, Lambda, Glue, Athena, KMS, CloudWatch, CloudTrail, VPC, SageMaker, Kinesis, Macie, Quicksight.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, AWS, Yarn


Hadoop Developer


  • Hands on experience with Google cloud platform, Big Query, Google Data Studio and Flow
  • Developing ETL pipeline for SQL server as well using SSIS.
  • For Reporting and Analysis using SSIS, SSRS and SSAS cubes.
  • Having amazing experience with Big data framework and open source technologies (Apache Nifi, Kafka, Spark and Cassandra, HDFS, Hive Docker/Cassandra/ Postgres SQL, Git, Bash Scripts Jenkins, MongoDB, Elastic Search, Ignite, TiDB.
  • Managing data warehouse Big Data cluster services and developments of Data Flows.
  • Writing big data/Spark ETL applications for different sources (SQL, Oracle, CSV, JSON) to support different department for analytics.
  • Extensive work with Hive, Hadoop, Spark, Docker, Apache Nifi
  • Supporting different department for big data analytics.
  • Build multiple end to end Fraud monitoring alert based systems.
  • Preferable language is Scala and python as well.

Confidential, Jersey City, NJ

Software Test Engineer


  • Involved in preparing Test Plan and Test cases.
  • Used java with Testing framework for automating scripts.
  • Developed test scripts to automate process of testing in Selenium WebDriver.
  • Implemented Data Driven Frameworks to create parameterized test scripts and generate XSLT reports using Selenium Web driver and Testing framework.
  • Involved in writing Gherkins/scenarios and generated step definitions and methods using Cucumber, for different functionalities.
  • Developed test scripts to automate process of testing in Selenium WebDriver
  • Created automated Test scripts using automated tools and ran the test scripts on various Builds and instances.
  • Implemented robust logging mechanisms within new and existing automated tests using log4j.
  • Repository for version controlling.
  • Performed Cross browser testing using Selenium Grid and Java on various browsers.
  • Responsible for attending daily scrums and discuss daily issues and testing activities.
  • Executed automated seleniumscripts and reproduced failures manually.
  • Developed and executed test cases for various web services using SOAPUI
  • Prepared Traceability Matrix to show the test coverage requirement vs. Test scripts.
  • Walkthroughs and peer review participation with team members and other project teams.
  • Involved in web service testing with SOAP UI and validated various responses against annotations.
  • Performed database testing by passing SQL Queries to retrieve data.
  • Performed usability, GUI, Functionality and regression testing of the new builds.
  • Performed browser (IE, Firefox, and Chrome) and platform (Windows 7) compatibility testing.
  • Identified and isolated software defects and reported them via JIRA.
  • Attended Daily Scrums and reporting daily activities or issues to scrum master.
  • Performed functional, compatibility testing on different browsers like Firefox, Chrome and IE.
  • Used GIT as a version control tool to store the test scripts.
  • Responsible for tracking daily testing activities and provide daily testing updates to higher management.

Environment: Java, Cucumber, Selenium, Web Driver, Data Driven, Test NG, Eclipse, Jira, SOAP UI v4.5, Oracle v9i/8i, XML, SQL, Windows 7, MS Project, HTML, Firebug, Fire path, Git.

Hire Now