We provide IT Staff Augmentation Services!

Data Engineer Resume

Durham, NC


  • Around 8years of extensive hands - on experience in IT industry and including an experience in development using Big Data/Hadoop ecosystem tools
  • Experience in new Hadoop 2.0 architecture YARN and developing YARN Applications on it.
  • Good experience in processing Unstructured, Semi-structured and Structured data.
  • Thorough understanding of the HDFS, Map Reduce framework and extensive experience in developing Map Reduce Jobs
  • Experienced in building highly scalable Big-data solutions using Hadoopandmultiple distributions i.e., Cloudera, Horton works and NoSQL platforms
  • Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
  • Good working experience in PySpark and SparkSQL.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache Cloudera distributions, Hortonworks and amazon web services (AWS), EMR, EC2.
  • Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, Pig, Sqoop, Flume, Scala,Kafka& knowledge of Mapper/Reduce/HDFS Framework.
  • Having Experience in Loading Tuple shaped data into Pigand Generate Normal Datainto Tuples. Ability to build User-Defined Functionalities(UDFs) not available in core Hadoop.
  • Ability to build deployment on AWS, build scripts (Boto 3 & AWS CLI) and automate solutions using Shell and Python.
  • Ability to move the data in and out of Hadoop RDBMS, No-SQL, UNIX and Mainframefrom various systems using SQOOPand other traditional data movement technologies.
  • Good experienced with Hbase Schema design.
  • Expertise inHadoop security.
  • Worked on migrating the old java stack to Type safe stack using Scala for backend programming.
  • Worked on HBase Shell, CQL, HBase API and Cassandra Hector API as part of the proof of concept.
  • Having good knowledge on TDD and JENKINS.
  • Experience in Hadoop Distributions like Cloudera, HortonWorks, BigInsights, MapR WindowsAzure, and Impala. Hands-on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning).
  • Experience in using NIFI processor groups, processors and concepts on process flow management.
  • Have been working with AWS cloud services (VPC, EC2, S3, Redshift, Data Pipeline, EMR, DynamoDB, Lambda, Kinesis, SNS, SQS).
  • Having hands on experience in Data Warehousing to design and loading tables with large data, and can develop the enterprise levels of data.
  • Planning the Dev, SIT and QA environments and taking ETL architecture decisions.
  • Maintained, audited and built new clusters for testing purposes using the cloudera manager.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.


Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Spark, Scala, Storm, Kafka, Oozie, MongoDB, Cassandra

Languages: C, Core Java, Unix, SQL, Python, R, C#, Haskell, Scala

J2EE Technologies: Servlets, JSP, JDBC, Java Beans, Jenkins, Git.

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE).

Monitoring and Reporting: Ganglia, Nagios, Custom Shell scripts.

NoSQL Technologies: Cassandra, MongoDB, Neo4j,HBase

Frameworks: MVC, Struts, Hibernate, And Spring.

Operating Systems: Windows XP/Vista/7, UNIX.

Web Servers: WebLogic, WebSphere, Apache Tomcat.



Data Engineer, Durham, NC


  • Having knowledge on the MDM tool and experienced in using the data profiling tools.
  • Strong knowledge on the SQL using the ZEPPELIN tool on the various systems within AAP to understand the data and harmonization issues.
  • Strong knowledge on troubleshooting the data harmonization issues in Enterprise Service Bus (ESB), Cloud Solution), On-Premise data store (Example: Oracle RDBMS)
  • Strong problem solving and analytic skills.
  • Excellent verbal and written communication skills
  • Expert in Data quality and data risk management including understanding of standards, methods, processes, tools, and controls to manage enterprise-wide.
  • Ability to travel as necessary to meet organizational and administrative demands (up to 30% of the time).
  • Having experience in converting the data quality issues into a solution, ability to resolve the data quality problems through the appropriate choice of error detection and correction, process control and improvement, or process design strategies collaborating with subject matter experts and data stewards.
  • Monitored scorecard process/execution and provides feedback to the Data Governance Office(DGO).
  • Very strong understanding of the Modern Cloud MDM solution (Preferably Reltio) available in the marketplace, and how to measure the quality of the data using various APIs provided and build a repeatable process around it to govern them in the production environment.
  • Expert at developing and maintaining database, with strong background of working on high-end servers would like to establish a fulfilling proactive career and assume ever increasing responsibility in the course of time.
  • Provided and designed DB tools to assist in the database management, transactions and processing environments.
  • Provided technical support for SQL database environment by overseeing databases development and organization.
  • Monitoring response of database system for user queries and making necessary changes in scripting.

Confidential, BaskingRidge, NJ

Spark/Scala Developer/Analyst


  • Worked on loading disparate data sets coming from different sources to BDpaas ( HADOOP ) environment using Spark.
  • Developed UNIX scripts in creating Batch load for bringing huge amount of data from Relational databases to BIGDATA platform.
  • Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the Data Lake.
  • Involved in configuring batch job to perform ingestion of the source files in to the Data Lake.
  • Developed Pig queries to load data to HBase.
  • Leveraged Hive queries to create ORC tables .
  • Developed HIVE scripts for analyst requirements for analysis.
  • Streamed AWS log group into Lambda function to create service now incident.
  • Worked extensively on Hive to create, alter and drop tables and involved in writing hive queries.
  • Created and altered Hbase tables on top of data residing in Data Lake .
  • Created Views from Hive Tables on top of data residing in Data Lake.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka .
  • Created Reports with different Selection Criteria from Hive Tables on the data residing in Data Lake.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS .
  • Worked closely with scrum master and team to gather information and perform daily activities.
  • Deployed Hadoop components on the Cluster like Hive, HBase, Spark, Scala and others with respect to the requirement.
  • Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop .
  • Worked closely with HDInsight production team for the optimization of the performance of Spark Jobs on the cluster.
  • Implemented the Business Rules in Spark/ SCALA to get the business logic in place to run the Rating Engine.
  • Used Spark UI to observe the running of a submitted Spark Job at the node level.
  • Used Spark to do Property Bag Parsing of the data to get the required fields of data.
  • Created external Hive tables on the Blobs to showcase the data to the Hive MetaStore .
  • Used both Hive context as well as SQL context of Spark to do the initial testing of the Spark job.
  • Used Microsoft Visio to put the complex working structure in a diagrammatic representation.
  • Worked with platform and system teams as part of the AWS Data Lake migration efforts and successfully completed the POC end to end following the enterprise standards.
  • Experience using WINSCP and FTP 6 to view the data storage structure in the server and to upload JARs which were used to do the Spark Submit.
  • Developed code from scratch in Spark using SCALA according to the technical requirements.

Environment : Hadoop, Map Reduce, Yarn, Hive, Pig, Hbase, Sqoop, Spark,Scala, MapR, Core Java, R Language, SQL, Python, Eclipse, Linux, Unix.

Confidential, NYC, NY

Hadoop developer


  • Importing and exporting data into HDFS and Hive using sqoop and Kafka.
  • Involved in developing different components of system like Hadoop process involves Map Reduce&Hive.
  • Defined and build job flows and data-pipelines.
  • Develop interface validation process to provide validates incoming data arrival in Hadoop HDFS before kicking off Hadoop process.
  • Responsible for Cluster maintenance, adding and decommissioning the datanodes.
  • Worked extensively with Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses. having experience in using slick to query and storing in database in Scala Fashion using the strong Scala collection Framework.
  • Used Scala Collection Framework to store and process the complex consumer informations.
  • Implemented AWS provides a variety of computing and networking services to meet the needs of applications.
  • Having experience in working on Elastic search 2.x which supports the SPARK features.
  • Have written hive queries using optimized ways like using window functions, customizing Hadoop shuffle & sort parameters, ORC file format.
  • Worked and learned a great deal from AmazonWebServices (AWS) Cloud services like EC2, S3, EMR.
  • Worked in tuning Hive and Pig to improve performance and solved performance issues in Hiveand Pig scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs.
  • Developing map reduce programs using Combiners, Sequence Files, Compression techniques, Chained Jobs, multiple input and output API.
  • Cluster monitoring and troubleshooting and review and manage data backups and also manage and review Hadoop log files.
  • Having experience in working on node tools which offers a number of commands to return Cassandra metrics pertaining disk usage.
  • Worked on Amazon EMR processes data across a Hadoop Cluster of viral servers on Amazon Elastic Computing Cloud (EC2).
  • Worked on AWS Management Console to browse the Graphical User interface (GUI) for Amazon Web Services(AWS).
  • The logs and semi structured content that are stored on HDFS were preprocessed using PIG and the processed data is imported into Hive warehouse which enabled business analysts to write Hive queries.
  • Worked on Unix shell scripts for business process and loading data from different interfaces to HDFS
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Involved in creating Hive tables, loading with data and writing hive queries.
  • Hands on experience in eclipse, VPN, Putty, winSCP, VNCviewer, etc.

Environment: Red Hat Enterprise Linux 5, Jenkins, Hadoop 1.0.4, Map Reduce, Hive 0.10, PIG, Shell Script, SQOOP 1.4.3, Eclipse, Java SDK 1.6

Confidential, NJ

Hadoop Developer


  • Developed MapReduce program to convert mainframe fixed length data to delimited data.
  • Used Pig Latin to apply transaction on systems of record.
  • Experience on Hadoop Cluster monitoring tools like Nagios, Ganglia, and Cloudera Manager.
  • Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x .
  • Worked on harmonizing Elastic search with Spark RDD and Dataframes acros ES and HDFS with Hive.
  • Developed Pig scripts and UDFs extensively for Value Added Processing (VAPs).
  • Design and developed custom Avro storage to use in Pig Latin to load and store data.
  • Worked on Cassandra which has Hadoop integration with Map reduce support.
  • Actively involved in design analysis, coding and strategy development.
  • Developed SQOOP commands to pull data from Teradata and push to HDFS.
  • Developed Hive scripts for implementing dynamic partitions and buckets for retail history data.
  • Streamlined Hadoop jobs and workflow operations using Oozie workflow and scheduled through AutoSys on a monthly basis.
  • Developed MapReduce to generate sequence id in Hadoop.
  • Experienced in on HBase Shell, CQL, HBase API and Cassandra Hector API as part of the proof of concept.
  • Developed Pig scripts to convert the data from Avro to text file format.
  • Developed Pig scripts and UDF's as per the business rules.
  • Developed Oozie workflows and they are scheduled through AutoSys on monthly basis
  • Designed and developed read lock capability in HDFS.
  • Developed Hive scripts for implementing control tables logic in HDFS.
  • Developed NDM scripts to pull data from the Mainframe.
  • End-to-end implementation with Avro and Snappy.
  • Provided production support in my initial stages for the product which is already developed.
  • Created POC for Flume implementation.
  • Helping other teams to get started with the Hadoop ecosystem.

Confidential, NC

Java Developer


  • Understanding the business requirements and developed code for module of the application.
  • Developing the application based on MVC Architecture, and implemented Action classes.
  • Implemented Model Classes and Struts2 tags as views.
  • Implemented mapping files for corresponding tables using Hibernate 3.0 in developing the Project.
  • Involved in Adding Server-side Validations.
  • Creating unit test case documents.
  • Developed business components to process requests from the user and used the Hibernate to retrieve and update patient information.
  • Worked with database Objects using TOAD and SQL Navigatorfor development and administration of various relational databases.
  • Wrote and used Java Bean classes, JSP, Stored Procedures and JSP custom tags in the web tier to dynamically generate web pages.

Environment: Java5, Struts2.x, Hibernate3.x, Oracle, JSP, JBoss, SVN, Jenkins,Eclipse Html


Java/J2EE Developer


  • Utilized Agile Methodologies to manage full life-cycle development of the project.
  • Implemented MVC design pattern using Struts Framework.
  • Form classes of Struts Framework to write the routing logic and to call different services.
  • Created tile definitions, Struts - config files, validation files and resource bundles for all modules using Struts framework.
  • Developed web application using JSP custom tag libraries, Struts Action classes and Action. Designed Java Servlets and Objects using J2EE standards.
  • UsedJSP for presentation layer, developed high performance object/relational persistence and query service for entire application utilizing Hibernate.
  • Developed the XML Schema and Web services for the data maintenance and structures.
  • Developed the application using Java Beans, Servlets and EJB’s.
  • Created Stateless Session EJB’s for retrieving data and Entity Beans for maintaining User Profile.
  • Used WebSphere Application Server and RAD to develop and deploy the application.
  • Worked with various Style Sheets like Cascading Style Sheets ( CSS ).
  • Designed database and created tables, written the complex SQL Queries and stored procedures as per the requirements.
  • Involved in coding for JUnit Test cases, ANT for building the application.

Environment: Java/J2EE, Oracle 10g, SQL, PL/SQL, JSP, EJB, Struts, Hibernate, WebLogic, HTML, AJAX, Java Script, Jenkins, Git, JDBC, XML, JMS, XSLT, UML, Junit.


Junior Java Developer


  • Primary responsibilities included the development of the code using core Java and web Development skills.
  • Use Struts and JavaScript for web page development and front-end validations
  • Fetch and process customer related data using Mercator (IBM WTX) as interface between Confidential workstation with Mainframes
  • Created Servlets, JSPs and used JUnit framework for unit testing.
  • Developed EJBs, DAOs, stored Procedures and SQL queries to support system functionality.
  • Application design and documentation UML system use cases, class, sequence diagrams developed using MS Visio.
  • Use Ant scripts to automate application build and deployment processes.
  • Support Production/Stage application defects, track and document using Quality Center.
  • Implemented various Unix Shell Scripts as per the internal standards.

Environment: Java 1.4.2, Struts 1.2, Java script, JDBC, CVS, Eclipse, Web logic Server 9.1, Oracle 9i, Toad, Linux.

Hire Now