We provide IT Staff Augmentation Services!

Big Data Engineer Resume



  • Around 7+ years of IT experience as a Developer, Designer & quality reviewer with cross platform integration experience using Hadoop, Java, J2EE and SOA.
  • Skilled experience in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Sqoop, Flume, Yarn, Spark, Kafka andOozie.
  • Strong understanding of Hadoop daemons and Map - Reduce concepts.
  • Strong experience in importing-exporting data into HDFS format.
  • Experienced in developing UDFs for Hive using Java.
  • Worked with Apache Falconwhich is a data governance engine that defines, schedules, and monitorsdatamanagementpolicies.
  • Hands on experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Oozie, Flume andHBase).
  • Strong understanding and strong knowledge in NoSQL databases like HBase, MongoDB& Cassandra.
  • Experience in working with Anguar 4, Nodejs, Bookshelf, Knex, MariaDB.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
  • Good skills in developing reusable solution to maintain proper coding standard across different java project.
  • Good exposure to Python programming.
  • Expertise in debugging and optimizing Oracle and java performance tuning with strong knowledge in Oracle 11g and SQL
  • Ability to work effectively in cross-functional team environments and experience of providing training to business users.
  • Good experience in using Sqoop for traditional RDBMS data pull.
  • Good working knowledge of Flume.
  • Worked with Apache Ranger console to create and managepolicies for access to files, folders, databases, tables, or columns.
  • Worked with Yarn Queue Manager to allocate queue capacities for different service accounts.
  • Hands on experience on Hortonworks and Cloudera Hadoop environments.
  • Familiar with handling complex data processing jobs using Cascading.
  • Strong database skills in IBM- DB2, Oracle andProficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.
  • Extensive experience in Shell scripting.
  • Experience in component design using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
  • Excellent analytical and programming abilities in using technology to create flexible and maintainable solutions for complex development problems.
  • Good communication and presentation skills, willing to learn, adapt to new technologies and third party products.


Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka, Stormand ZooKeeper.

No SQL Databases: HBase,Cassandra, MongoDB

Languages: C, C++, Java, Python, Scala, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: HP: UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Version control: SVN, CVS, GIT


Confidential, IL

Big Data Engineer


  • Part of planning/migration team for Application Migration from MapR distribution to HDP environment.
  • Reviewing application architectures for better understanding of the dependencies, file formats, types of data, tools, service-accounts etc.., i.e. important factors in order to migrate the apps to HDP platform.
  • Coordinating with teams for issue resolving regarding workflows, schemas, scripts and kerberized environment.
  • Used ApacheFalcon for mirroring of HDFS and HIVE data.
  • Used Apache Falcon to design data pipelines and trace them for dependencies, tagging, audits and lineage.
  • Worked with Apache Ranger console to manage policies for access to files, folders, databases, tables, or columns.
  • Used HBASEsnapshotting to migrate HBASE tables.
  • Worked in Kerberos environment.
  • Worked with Oozie to design workflows and scheduled with Falcon.
  • Ingested various types of data into Hive using ELakeIngestionFramework which internally uses Pig, Hive and Spark for data processing.
  • Worked with Hortonworks for issue resolving regarding various tools like Hive, HBase and Falcon etc.
  • Worked with Avroschemas for Hive.
  • Created Hive tables on top of HBase using StorageHandler for effective OLAP analysis.
  • Worked with Flume to ingest data from MySql to HDFS.
  • Working with Nodejs to extract the Apache Ranger policies from several REST endpoints from different clusters and store it in MariaDB.
  • Used Knex as Querybuilder and Bookshelf for ORM.

Environment: Hadoop, Hortonworks, MapReduce, HDFS, Hive, Pig, Sqoop, Oozie, Falcon, Linux, XML, MySQL, HBase.

Confidential, Charlotte, NC

Hadoop Developer


  • Prepared an ETLframework with the help of sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
  • Processed HDFS data and created externaltables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Developed analytical components using Scala, Spark and Spark Stream.
  • Experienced with NoSQL databases like HBase, MongoDBand Cassandra.
  • Involved inCassandraDatamodelling to create Keyspaces and Tables in AmazonCloudenvironment.
  • Developed ETL jobs usingSpark-Scala to migrate data from Oracle to newCassandratables.
  • Rigorously usedSpark-Scala (RRD's, Dataframes,SparkSql) andSpark-Cassandra-Connector API's for various tasks (Data migration, Business report generation etc.)
  • DevelopedSparkStreaming application for realtime sales analytics.
  • Built real time pipeline for streaming data usingKafkaandSparkStreaming.
  • Experience in migration of data across cloud environment to Amazon EC2 clusters.
  • EC2-to-S3 data synch
  • Analyzed the SQL scripts and designed the solution to implement using PySpark
  • Extracted the data from other data sources into HDFS using Sqoop
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Expert in importing and exporting data into HDFS using Sqoop and Flume.

Environment: CDH5, Spark, Cassandra, Kafka, Scala, Hive, SQOOP, Pig, Linux, XML, MySQL, PL/SQL, SQL connector

Confidential, Wayne, PA

Hadoop/Spark Developer


  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Experienced with batch processing of data sources using Apache Spark, Elastic search.
  • Developed code base to stream data from sample data files > Kafka > Kafka Spout >Storm Bolt > HDFS Bolt.
  • Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs, Python and Scala.
  • Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance ofSpark, with Hive and SQL/Teradata.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark
  • Uploaded data to Hadoop Hive and combined new tables with existingdatabases.
  • Deployed the Cassandracluster in cloud (Amazon AWS) environment with scalable nodes as per the business requirement.
  • Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
  • Implemented the ETL design to dump the Map-Reduce data cubes toCassandra cluster.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores

Environment: Hadoop, MapReduce, HDFS, Hive, Apache Spark, Apache Kafka,Apache Cassandra, Apache Storm, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse.

Confidential, Madison, WI

Hadoop Developer


  • Responsible for understanding the scope of the project and requirement gathering.
  • Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
  • Created & documented Test Strategy, scenarios and procedures.
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
  • Conducted POC’s for ingesting data using Flume.
  • Created Hivequeries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
  • Creating views for restricting data access by business area
  • Developed Pigscripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
  • Perform structured application code reviews and walkthroughs.
  • Conduct/Participate in project team meetings to gather status, discuss issues & action items
  • Provide support for research and resolution of testing issues.
  • Coordinating with Business for UAT sign off
  • Create implementation plan and Detailed TaskSchedules

Environment: - Hadoop, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Autosys, IBM Data studio, WinSCP, UltraEdit, NDM, Quality Center 9.2, Windows & Microsoft Office.


Java Developer


  • Implemented J2EE standards, MVC2 architecture using Struts Framework
  • Implementing Servlets, JSP and Ajax to design the user interface
  • Used JSP, JavaScript, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface
  • Used JBoss for EJB and JTA, for caching and clustering purpose
  • Used EJBs (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests
  • All the Business logic in all the modules is written in core Java
  • Wrote Web Services using SOAP for sending and getting data from the external interface
  • Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML
  • Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework
  • Used Design patterns such as Business delegate, Service locator, Model View Controller, Session, DAO
  • Implemented the presentation layer with HTML, XHTML, JavaScript, and CSS
  • Developed web components using JSP, Servlets and JDBC
  • Involved in fixing defects and unit testing with test cases using JUnit
  • Developed user and technical documentation
  • Made extensive use of Java Naming and Directory interface (JNDI) for looking up enterprise beans
  • Developed presentation layer using HTML, CSS and JavaScript
  • Developed stored procedures and triggers in PL/SQL

Environment: JAVA multithreading, collections, J2EE, EJB, UML, SQL, PHP, Sybase, Eclipse, JavaScript, WebSphere, JBOSS, HTML5, DHTML, CSS, XML, ANT, STRUTS 1.3.8, JUNIT, JSP, Servlets, Rational Rose, Hibernate, JSP, Servlets, JDBC, CSS, MySQL, JUnit, Apache Tomcat.


Associate Java Developer


  • Involved in the complete SDLC software development life cycle of the application from requirement gathering and analysis to testing and maintenance.
  • Developed the modules based on MVC Architecture.
  • Developed UI using JavaScript, JSP, HTML and CSS for interactive cross browser functionality and complex user interface.
  • Created business logic using servlets and session beans and deployed them on ApacheTomcatserver.
  • Created complex SQL Queries, PL/SQL Stored procedures and functions for back end.
  • Prepared the functional, design and test case specifications.
  • Performed unit testing, system testing and integration testing.
  • Developed unit test cases. Used JUnit for unit testing of the application.
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.

Environment: Java, JSP, Servlets, ApacheTomcat, Oracle, JUnit, SQL

Hire Now