We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Kansas, MO


  • Around 8+ years of IT experience as a Developer, Designer & quality reviewer with cross platform integration experience using Hadoop, Java, J2EE and SOA.
  • Skilled experience in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Sqoop, Flume, Yarn, Spark, Kafka andOozie.
  • Strong understanding of Hadoop daemons and Map - Reduce concepts.
  • Strong experience in importing-exporting data into HDFS format.
  • Expertise in Java, Python and Scala .
  • Experienced in developing UDFs for Hive using Java.
  • Worked with Apache Falcon which is a data governance engine that defines, schedules, and monitorsdatamanagementpolicies.
  • Experience in AmazonAWS services such as EMR, EC2, S3, CloudFormation, RedShift, Dynamo DB which provides fast and efficient processing of Big Data.
  • Hands on experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Oozie, Flume andHBase).
  • Good experience transformation and storage: HDFS, MapReduce, Spark
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark Streaming and Spark SQL.
  • Strong understanding and strong knowledge in NoSQL databases like HBase, MongoDB& Cassandra.
  • Experience in working with Anguar 4, Nodejs, Bookshelf, Knex, MariaDB.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
  • Good skills in developing reusable solution to maintain proper coding standard across different java project.
  • Selected relevant demographic attributes from Data Management Platform ( DMP)
  • Worked on audience segmentation for targeted advertising. Selected relevant demographic attributes from Data Management Platform ( DMP).
  • Good exposure to Python programming.
  • Good knowledge on Python Collections, Python Scripting and Multi-Threading.
  • Written multiple Map Reduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file EJB, Hibernate, Java Web Service, SOAP, REST Services, Java Thread, Java Socket, Java Servlet, JSP, JDBC formats.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression.
  • Expertise in debugging and optimizing Oracle and java performance tuning with strong knowledge in Oracle 11g and SQL
  • Ability to work effectively in cross-functional team environments and experience of providing training to business users.
  • Good experience in using Sqoop for traditional RDBMS data pull.
  • Good working knowledge of Flume.
  • Worked with Apache Ranger console to create and managepolicies for access to files , folders , databases , tables , or columns .
  • Worked with Yarn Queue Manager to allocate queue capacities for different service accounts .
  • Hands on experience on Hortonworks and Cloudera Hadoop environments.
  • Familiar with handling complex data processing jobs using Cascading.
  • Strong database skills in IBM- DB2, Oracle andProficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.
  • Extensive experience in Shell scripting.
  • Involved in the Talend Development Standards document which describes the general guidelines for Talend developers, the naming conventions to be used in the Transformations and also development and production environment structures.
  • Expertise with Talend Data Integration frequently used components tOracleInput, tMSSqlInput, tMap, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tContextLoad, tXmlmap, tjava, tJavaRow, tHashInput & tHashOutput, tDie and more
  • Leading the testing efforts in support of projects/programs across a large landscape of technologies ( Unix, Angular Js, AWS, sauseLABS, Cucumber JVM, Mongo DB, GITHub, SQL, NoSQL database, API, Java, Jenkin)
  • Testing automation by using Cucumber JVM to develop a world class ATDD process .
  • Setup JDBC connection for database testing using cucumber framework.
  • Experience in component design using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
  • Expertisein installation, configuration, supporting and managing HadoopClusters using Apache, Cloudera (CDH3, CDH4) distributions, Hortonworks and on Amazonweb services (AWS).
  • Excellent analytical and programming abilities in using technology to create flexible and maintainable solutions for complex development problems.
  • Good communication and presentation skills, willing to learn, adapt to new technologies and third party products.


Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka, Storm and ZooKeeper.

No SQL Databases: HBase, Cassandra, MongoDB

Languages: C, C++, Java, Python, Scala, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Version control: SVN, CVS, GIT


Confidential, Kansas,MO

Big data Engineer


  • Part of planning/migration team for Application Migration from MapR distribution to HDP environment.
  • Reviewing application architectures for better understanding of the dependencies , file formats , types of data , tools , service-accounts etc.., i.e. important factors in order to migrate the apps to HDP platform.
  • Coordinating with teams for issue resolving regarding workflows , schemas , scripts and kerberized environment.
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
  • Experience in setting up the whole app stack, setup and debug log stash to send Apache logs to AWS Elastic search.
  • Well versed with Talend Big data, Hadoop, Greenplum, Hawq, Hive and used Talend Big data components like tHdfsoutput, tHdfsInput, tHiveLoad.
  • Created complex mappings in Talend 6.2.1 Big Data Edition using tMap, tParallelize, tJava, tAggregateRow, tFixedFlowInput, tFlowToIterate, tHash, tMSSqlInput, tMSSqlRow etc
  • Used ApacheFalcon for mirroring of HDFS and HIVE data.
  • Used Apache Falcon to design data pipelines and trace them for dependencies, tagging, audits and lineage.
  • Used Flume to handle streaming data and loaded the data into Hadoop cluster.
  • Developed and executed hive queries for de-normalizing the data.
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analyses.
  • Responsible for executing hive queries using Hive Command Line, Web GUI HUE,and Impala to read, write and query the data into HBase.
  • Moved data from HDFS to Cassandra using MapReduce and Bulk Output Format class
  • Developed bash scripts to bring the T-log files from ftp server and then processing it to load into hive tables.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Worked with Apache Ranger console to manage policies for access to files , folders , databases , tables , or columns .
  • Loaded the data into Spark RDD and performed in-memory data computation to generate the output response .
  • Developed pyspark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used HBASEsnapshotting to migrate HBASE tables.
  • Worked in Kerberos environment .
  • Wrote python scripts to parse XML documents and load the data in database
  • Worked with Oozie to design workflows and scheduled with Falcon .
  • Ingested various types of data into Hive using ELakeIngestionFramework which internally uses Pig, Hive and Spark for data processing .
  • I nvolved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Created reusable Python script and added it to distributed cache in Hive to generate fixed width datafiles using an offset file.
  • Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles.
  • Designed complex application database SQL statements for querying, updating and reporting using Python Database Connector.
  • Worked with Hortonworks for issue resolving regarding various tools like Hive , HBase and Falcon etc.
  • Worked with Avroschemas for Hive .
  • Created Hive tables on top of HBase using StorageHandler for effective OLAP analysis.
  • Worked with Flume to ingest data from MySql to HDFS .
  • Working with Nodejs to extract the Apache Ranger policies from several REST endpoints from different clusters and store it in MariaDB .
  • Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWareVm's as required in the environment.
  • Used Knex as Querybuilder and Bookshelf for ORM .

Environment: Hadoop, Hortonworks, MapReduce, HDFS, Hive, Spark, Kafka, Pig, Sqoop, Oozie, Falcon, Linux, XML, MySQL, HBase, Apache NiFi, AWS, Talend Open Studio for Data Integration 6.2.1, Talend Administration Console .

Confidential, Texas

Hadoop Developer


  • Prepared an ETLframework with the help of sqoop , pig and hive to be able to frequently bring in data from the source and make it available for consumption .
  • Processed HDFS data and created externaltables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Developed analytical components using Scala, Spark and Spark Stream.
  • Experienced with NoSQL databases like HBase, MongoDBand Cassandra.
  • Wrote complex SQL queries to take data from various sources and integrated it with Talend.
  • Published events to Kafka from Talend and consumed events from Kafka.
  • Involved in Cassandra Datamodelling to create Keyspaces and Tables in AmazonCloudenvironment .
  • Developed ETL jobs using Spark-Scala to migrate data from Oracle to new Cassandra tables.
  • Rigorously used Spark-Scala ( RRD's , Dataframes , SparkSql ) and Spark-Cassandra-Connector API's for various tasks ( Data migration , Business report generation etc.)
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL , Data Frame, Pair RDD's, YARN.
  • Developed Spark Streaming application for realtime sales analytics .
  • Built real time pipeline for streaming data using Kafka and Spark Streaming.
  • Experienced in performing in memory data processing for batch, real time, and advanced analytic using Apache Spark (Spark SQL &Spark-Shell).
  • Implemented Spark using Scala , Python and SparkSQL for faster testing and processing of data.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark using Scala,Python .
  • Profound experience in creating real time data streaming solutions using Apache Spark/SparkStreaming, Kafka .
  • Tuning spark application to improve performance. Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
  • Experience in migration of data across cloud environment to Amazon EC2 clusters .
  • Extracted the data from other data sources into HDFS using Sqoop.
  • Wrote and execute ATDD (Acceptance Test Driven Development), Selenium , and Java/ Cucumber .
  • Automating regression and functional tests by developing test scripts and test suites using UFT / QTP , Selenium WebDriver, Java, TestNG and BBD using cucumber JVM.
  • Handled importing of data from various data sources, performed transformations using Hive , MapReduce , loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Expert in importing and exporting data into HDFS using Sqoop and Flume .

Environment : CDH5, Spark, Cassandra, Kafka, Scala,Python, Hive, SQOOP, Pig, Apache Spark, Cucumber-JVM,Linux, XML, MySQL, PL/SQL, SQL connector, Talend Administration Console, Talend open Studio BigData 6.x, Java,Apache.

Confidential, Wayne, PA

Hadoop/Spark Developer


  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Experienced with batch processing of data sources using Apache Spark, Elastic search.
  • Developed code base to stream data from sample data files > Kafka > Kafka Spout >Storm Bolt > HDFS Bolt.
  • Developed PySpark code to mimic the transformations performed in the on-premise environment.
  • Used Spark-StreamingAPIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
  • Responsible for loading Data pipelines from web servers and Teradata using Sqoop with Kafka and Spark Streaming API.
  • Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
  • Setup and benchmarked Hadoop/HBase clusters for internal use
  • Developed multiple POCs using PySpark and deployed on the Yarn cluster , compared the performance of Spark , with Hive and SQL/Teradata .
  • Analyzed the SQL scripts and designed the solution to implement using PySpark
  • Uploaded data to Hadoop Hive and combined new tables with existing databases.
  • Deployed the Cassandracluster in cloud (Amazon AWS) environment with scalable nodes as per the business requirement.
  • Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
  • Implemented the ETL design to dump the Map-Reduce data cubes to Cassandra cluster.
  • Developed Realtime data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka, Flume and JMS.
  • Wrote Cucumber - Behavior Driven Development (BDD) / Test Driven Development (TDD) using Cucumber-JVM.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores

Environment: : Hadoop, MapReduce, HDFS, Hive, Apache Spark, Apache Kafka,Apache Cassandra, Apache Storm, Apache HBase,Cucumber-JVM,Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (jdk 1.6),Python, Eclipse.

Confidential, Madison, WI

Hadoop Developer


  • Responsible for understanding the scope of the project and requirement gathering.
  • Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
  • Created & documented Test Strategy, scenarios and procedures.
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
  • Conducted POC’ s for ingesting data using Flume.
  • Created Hivequeries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
  • Creating views for restricting data access by business area
  • Developed Pigscripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
  • Perform structured application code reviews and walkthroughs.
  • Conduct/Participate in project team meetings to gather status, discuss issues & action items
  • Provide support for research and resolution of testing issues.
  • Coordinating with Business for UAT sign off
  • Create implementation plan and Detailed TaskSchedules

Environment: - Hadoop, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Autosys, IBM Data studio, WinSCP, UltraEdit, NDM, Quality Center 9.2, Windows & Microsoft Office.


Java Developer


  • Implemented J2EE standards, MVC2 architecture using Struts Framework
  • Implementing Servlets, JSP and Ajax to design the user interface
  • Used JSP, JavaScript, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface
  • Used JBoss for EJB and JTA, for caching and clustering purpose
  • Used EJBs (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests
  • All the Business logic in all the modules is written in core Java
  • Wrote Web Services using SOAP for sending and getting data from the external interface
  • Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML
  • Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework
  • Used Design patterns such as Business delegate, Service locator, Model View Controller, Session, DAO
  • Implemented the presentation layer with HTML, XHTML, JavaScript, and CSS
  • Developed web components using JSP, Servlets and JDBC
  • Involved in fixing defects and unit testing with test cases using JUnit
  • Developed user and technical documentation
  • Made extensive use of Java Naming and Directory interface (JNDI) for looking up enterprise beans
  • Developed presentation layer using HTML, CSS and JavaScript
  • Developed stored procedures and triggers in PL/SQL

Environment: JAVA multithreading, collections, J2EE, EJB, UML, SQL, PHP, Sybase, Eclipse, JavaScript, WebSphere, JBOSS, HTML5, DHTML, CSS, XML, ANT, STRUTS 1.3.8, JUNIT, JSP, Servlets, Rational Rose, Hibernate, JSP, Servlets, JDBC, CSS, MySQL, JUnit, Apache Tomcat.


Associate Java Developer


  • Involved in the complete SDLC software development life cycle of the application from requirement gathering and analysis to testing and maintenance.
  • Developed the modules based on MVC Architecture.
  • Developed UI using JavaScript, JSP, HTML and CSS for interactive cross browser functionality and complex user interface.
  • Created business logic using servlets and session beans and deployed them on ApacheTomcatserver.
  • Created complex SQL Queries, PL/SQL Stored procedures and functions for back end.
  • Prepared the functional, design and test case specifications.
  • Performed unit testing, system testing and integration testing.
  • Developed unit test cases. Used JUnit for unit testing of the application.
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.

Environment: Java, JSP, Servlets, ApacheTomcat, Oracle, JUnit, SQL

Hire Now