We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Alexandria, VA


  • Well round professional with 9 years of experience in design, development, maintenance,and support of Java/J2EE and Big Data/Hadoop applications.
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Around 4+ years of experience in Big Data Hadoop Ecosystem (Spark, HBase, Map reduce, Hive, Pig, SOLR, Kafka, Flume, Sqoop) as Developer.
  • Skilled in Map Reduce programming using JAVA, Implementation of XML providing data summarization, query, and analysis of large datasets using Hive.
  • Strong knowledge on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Resource Manager, Node Manager and Map Reduce programming paradigm.
  • Extensive experience in designing of complex data flows using Streamsets.
  • Expert in Amazon EMR, Spark, Kinesis, S3, ECS, Elastic Cache, Dynamo DB and Redshift.
  • Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark Streaming, Spark - SQL, Data Frame, Pair RDD’s, Spark YARN.
  • Sound knowledge in using Apache SOLR to search against structured and un-structured data.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Technology stack assessment implementing various proof of concepts (POC) to eventually adopt them to benefit from the Big Data Hadoop initiative.
  • Strong Experience in working with Databases like Oracle, DB2, SQL Server and MySQL and proficiency in writing complex SQL queries.
  • Experienced in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Extensive experience in Java/J2EE programming - JDBC, Servlets, JSP, JSTL, JMS, EJB2.0/3.0 .
  • Expert knowledge over J2EE Design Patterns like MVC Architecture, Front Controller, Session Facade, Business Delegate and Data Access Object for building J2EE Applications.
  • Experienced in web development using HTML, DHTML, XHTML, CSS, JavaScript, AJAX and Angular JS technologies.
  • Experienced in developing MVC framework based websites using JSF, Strutsand Spring.
  • Experience in building web applications using Spring Framework features like MVC (Model View Controller), AOP (Aspect Oriented Programming), IOC (Inversion Of Control), DAO (Data Access Object) and template classes.
  • Experience in creating and consuming Restful Web Services using JAX-RS (Jersey).
  • Working knowledge in multi-tiered distributed environment, OOAD concepts, good understanding of Software Development Lifecycle (SDLC) and Service Oriented Architecture (SOA).
  • Experience in working in evironments using Agile(SCRUM), RUP and Test Driven development methodologies.
  • Experience in working in both Windows and Unix platforms including programming and debugging skills in Unix Shell Scripting .
  • Extensive experience in developing Use Cases, Activity Diagrams, Sequence Diagrams and Class Diagrams using Visio.
  • Good Knowledge of using IDE Tools like Eclipse, NetBeans, BlueJ, Rational Application Developer(RAD) for Java/J2EE application development.
  • Experience in using Maven for build automation.
  • Familiar withNeo4j graph database and writing cypher queries


Big Data Technologies: Hadoop2.7/2.5, HDFS1.2.4, Map Reduce, HBase, Pig, Hive, Flume, Impala, Oozie, Spark, Yarn

NoSQL Databases: HBase, Cassandra, MongoDB 3.2

Languages: Java, J2EE, PL/SQL, Pig Latin, HQL, R, Python, Xpath

Java Tools & Web Technologies: EJB, JSF, Servlets, JSP, JSTL, CSS3/2, HTML5/4, XHTML, CSS, XML, XSL, XSLT

Databases: Oracle12c/11g, MYSQL, DB2, MS SQL Server 2016/2014

Frame Works: Struts, Spring, Hibernate, MVC

Web Services: SOAP, Restful, JAX-WS, Apache Axis

Application Server: Apache Tomcat, Jboss, IBM Web sphere, Web Logic

Scripting Languages: Shell Scripting, Java Script.

Tools and IDE: SVN, Maven, Gradle, Eclipse 4.6, Netbeans 8.2

Open Source: Hibernate, Spring IOC, Spring MVC, Spring Web Flow, Spring AOP

Methodologies: Agile, RAD, JAD, RUP, Waterfall & Scrum


Confidential - Alexandria, VA

Sr. Big Data Engineer


  • Responsible for design and development of Big Data applications using Cloudera Hadoop.
  • Coordinated with business customers to gather business requirements
  • Importing and exporting data into HDFS from MySQL and vice versa using Sqoop and manage the data coming from different sources.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, hive, Spark and Sqoop.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster for generating reports.
  • Designed AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
  • Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
  • Developed Apache Spark jobs using Scala in test environment for faster data processing and used SparkQL for querying.
  • Utilize AWS services with focus on big data Architect/analytics/enterprise data warehouse and business intelligence solutions
  • Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume and managing.
  • Created and written queries in SQL to update the changes in MySQL when we upload or delete file in HDFS.
  • Worked in Agile development environment in organizing tasks participated in daily scrum and other design related meetings.
  • Installed and configured Talend and Sqoop on the Pivotal HD 2.2 Hadoop distribution.
  • Worked with Oozie work flow engine to schedule time based jobs to perform multiple actions
  • Worked on GIT for version control, JIRA for project tracking and Jenkins for continuous Integration.
  • Migrated HiveQL queries on structured into SparkQL to improve performance
  • Analyzed data using Hadoop components Hive and Pig and created tables in Hive for the end users
  • Involved in writing Hive queries and pig scripts for data analysis to meet the business requirements.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Developed customized Hive UDFs and UDAF in Java, JDBC connectivity with Hive development and execution of Pig scripts and Pig UDF.
  • Written Oozie flows and shell scripts to automate the flow
  • Optimized MapReduce and hive jobs to use HDFS efficiently by using Gzip, LZO
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS
  • Tuned Hive table and queries to achieve performance
  • Written algorithms to calculate the most valuable households based on the data provided by external providers
  • Involved in the requirement analysis, design, development and Unit Testing use of MRUnit and Junit.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
  • Involved in writing Spark applications for reading data from Kafka topic and ingesting data into HBase tables.
  • Written Map Reduce procedures to power data for extraction, transformation and aggregation from various xml files generated during different phases of a transaction.
  • Indexed data in SOLR for faster query response and implemented a custom logic on retail and digital data which was required to business.
  • Worked on loading data from Hadoop Cluster to Amazon S3 storage for processing in Amazon EMR.
  • Composing the application classes as Spring Beans using Spring IOC/Dependency Injection.
  • Designed and Developed server side components using Java, REST, WSDL
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project
  • Coordinated with various support teams in process of many deployments.
  • Involved in resolving data discrepancies, debugging issues, and performed data integrity checks between two data centers.

Environment: MapReduce, HDFS, Yarn, Hive, Pig, Sqoop, Flume, GIT, MySQL, Spark, Kafka, Impala, Oozie, Struts, Servlets, HTML, XML, SQL, J2EE and Java.

Confidential - Wayne, PA

Sr. Big Data/Hadoop Engineer


  • Configured, supported and maintained all network, firewall, storage, load balancers, operating systems, and software in AWS EC2.
  • Implemented the use of Amazon EMR for Big Data processing among a Hadoop Cluster of virtual servers on Amazon related EC2 and S3.
  • Handled data imports and exports from various operational sources, performed transformations using Sqoop, Hive, Pig and MapReduce.
  • Implemented partitioning, bucketing in Hive for better organization of the data.
  • Involved in deploying code into version control GIT and provided support of code validation after checked in.
  • Involved in integrating HBase with Spark to import data into HBase and also performed some CRUD operations on HBase.
  • Used various HBase commands and generated different Datasets as per requirements and provided access to the data when required using grant and Revoke
  • Written multiple MapReduce programs in Java for Data Analysis.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Worked on Oozie and Zookeeper for managing Hadoop jobs.
  • Implemented the mapping, session, workflow to achieve the extract, transform and load by using Informatica.
  • Designed ETL control table to perform the incremental and delta loads.
  • Migrated the objects form lower environment to higher environment by using deployment groups in repository manager.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins
  • Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive, Impala.
  • Solved performance issues in hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MapReduce jobs.
  • Wrote Pig scripts to transform raw data from several data sources in to forming baseline data.
  • Involved to generate the extracts in HDFS with synchronized with existing system reports.
  • Implementation of ETL jobs and applying suitable data modeling techniques.
  • Used Reporting tools like Tableau to connect to Hive ODBC connector generate daily reports of data
  • Implemented Hive custom UDF's to transform large volumes of data with respect to business requirement and achieve comprehensive data analysis.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Data ingestion from Netezza to HDFS using automated Sqoop scripts.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Developed Sqoop scripts to import and export data from RDBMS and handled incremental loading on the customer and transaction information data dynamically.
  • Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.

Environment:CDH4.X,CDH5.X, MapReduce, Pig, Hive, HDFS, HBase, Avro, Oozie, Java 1.7, JIRA, Crucible, GitHub, Maven

Confidential - Dallas, TX

Hadoop Developer


  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, HBase and MapReduce.
  • Extracted data of everyday transaction of customers from DB2 and export to Hive and setup Online analytical processing.
  • Installed and configured Hadoop, MapReduce, and HDFS clusters.
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Loaded the structured data which was resulted from MapReduce jobs into Hive tables.
  • Analyzed user request patterns and implemented various performance optimization measures including but not limited to implementing partitions and buckets in HiveQL.
  • Identified issues on behavioral patterns and analyzed the logs using Hive queries.
  • Analyze and transform stored data by writing MapReduce or Pig jobs based on business requirements
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and import to HDFS.
  • Used Oozie, developed workflow to automate the tasks of loading the data into HDFS and pre-process with Pig scripts.
  • Worked on various compression techniques like GZIP and LZO.
  • Integrated MapReduce with HBase to import bulk data using MR programs
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
  • Developed data pipeline using Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
  • Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers, using tools like SQL Profiler and Database Tuning Advisor (DTA)
  • Installed a cluster, commissioned & decommissioned data node, performed name node recovery, capacity planning, and slots configuration adhering to business requirements

Environment: HDFS, Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, HiveQL, Java, Maven, Cloudera, AWS EC2, Avro, Eclipse and Shell Scripting.

Confidential - Cleveland, OH

Sr. Java/J2EE Developer


  • Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application using AGILE methodology (SCRUM)
  • Involved in creation of dynamic web pages with the use of JSP and HTML. Form validation using JavaScript and design changes using CSS2.
  • Developed the business logic using Spring and persistence logic using Hibernate
  • Developed the Spring AOP programming to configure logging for the application.
  • Used XSL/XSLT for transforming and displaying reports.
  • Provided data persistence by object relational mapping solution via Hibernate for application save, update, delete operations
  • Developed Client applications to consume the web services based on both SOAP and REST protocol using JAX-RS API.
  • Used JAX-WS (SOAP) for producing web services and involved in writing programs to consume the web services using SOA with CXF and APACHE CXF framework.
  • Worked on developing complex SQL queries on Oracle 10g and SQL Server databases for implementing various database requirements and transactions.
  • Used XPATH to navigate through Attributes and Elements in an XML Document.
  • Good hands on UNIX commands used to see the log files on the production server.
  • Used JMS (Java Mailing Service) API for mailing detailed notifications depending upon the success and failure once the backend process is completed and also for mailing administrator of any system related problems.
  • Worked on Camel-based integration middle-ware solution for Provisioning Services by design and Implementation of business logic and data processing routes using AngularJS.
  • Used Angular JS framework for single page application in a very clean and maintainable way.
  • Developed responsive web application pages. Used Angular.JS services, controllers and directives for front end UI and consumed Restful web service API.
  • Configured and deployed the web application on Weblogic.
  • Used Node.js to create server side applications for Java script codes to build real-time web API's.
  • Created various Tables required for the project in Oracle database and used the SQL Stored Procedures in the application for frequent operations on tables.
  • Performed Unit Testing both manually and automated using JUNIT.
  • Actively involved in deployment EJB service jars, Application war files in WebLogic Application server.
  • Implemented Dynamic form generation, auto-completion of forms and user-validation functionalities using AJAX.
  • Used Spring Security to provide authentication, authorization, and access-control features for this application.
  • Involved in building and deployment of application in Linux environment.
  • Involved in writing Maven scripts for building and deploying the code.
  • Used Log4j to capture the log that includes runtime exception and for logging info which is useful for debugging.
  • Used JENKINS to build and deploy the code in Dev and SIT environments.
  • Worked on GitHub for configuration management.
  • Managed and headed the monthly production release process from code review, testing, creating the release packages and deployment.

Environment: Java/J2EE, Spring, AGILE, SDLC, AJAX, Mongo DB, Log4j, WebLogic, EJB, Node JS, Angular JS, JMS, Oracle, SQL, XML, JAX-WS, JAXB, XPATH, CSS3, HTML, JSP, JavaScript


Java Developer


  • Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support in waterfall methodology.
  • Designed and developed dynamic Web Pages using JSP, HTML, CSS, JavaScript and JQuery.
  • Implemented the Struts framework based on MVC design pattern and Session Façade Pattern using Session and Entity Beans.
  • Used Struts for web tier development and created Struts Action Controllers to handle the requests.
  • Involved in writing the struts-config files and implemented the Struts Tag library.
  • Responsible for designing, coding and developed the application in J2EE using Struts MVC.
  • Implemented Struts framework (Action & Controller classes) for dispatching request to appropriate classes.
  • Used simple Struts Validation for validation of user input as per the business logic and initial data loading.
  • Developed Restful Services and SOAP based Web Services.
  • Developed Web Service provider methods (bottom up approach) using WSDL and SOAP for transferring data between the applications.
  • Worked on XML technologies like XML Parsers, JAXB for binding data to java objects.
  • Used Java Messaging Services (JMS) for reliable and asynchronous communication.
  • Implemented the persistence layer using Hibernate and JDBC Template and developed the DAL (Data Access Layer) to store and retrieve data from the database.
  • Responsible to writing JDBC programming to persist the data in My SQL database.
  • Written some SQL Queries and PL/SQL procedures to fetch data from the database.
  • Tested Service and data access tier using JUnit.
  • Used Web Logic for application deployment and Log 4J used for Logging/debugging.
  • Used CVS version controlling tool and project build tool using ANT.
  • Worked with production support team in debugging and fixing various production issues.

Environment: Java, J2EE, JSP, HTML, CSS, JavaScript, JQuery, Struts, Restful Services, SOAP, WSDL, Hibernate, JDBC, JMS, My SQL, CVS, ANT, Log4j and Web Logic.

Hire Now