We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Mt Laurel, NJ


  • 8+ years of IT experience in various domains with Hadoop Ecosystems and Java J2EE technologies.
  • Solid Mathematics, Probability and Statistics foundation and broad practical statistical and data mining techniques cultivated through various industry work and academic programs
  • Involved in the Software Development Life Cycle (SDLC)phases which include Analysis, Design, Implementation, Testing and Maintenance.
  • Experience in integrating relational databases and graph databases (Neo4j) and imported data from relational stores.
  • Strong technical, administration, and mentoring knowledge in Linux and Big Data/Hadoop technologies.
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.
  • Solid understanding of RDD operations in Apache Spark i.e., Transformations &Actions, Persistence (Caching),Accumulators, Broadcast Variables, Optimising Broadcasts.
  • In depth understanding of Apache spark job execution Components like DAG,lineage graph,Dag Scheduler, Taskscheduler, Stages and task.
  • Experience in exposing Apache Spark as web services.
  • Experience in real time processing using Apache Spark and Kafka.
  • With Cloudera Manager 5.0.1 or later and CDH 5.0.1 or later, the NFS gateway works on all operating systems
  • Migrated Python Machine learning modules to scalable,high performance and fault - tolerant distributed systems like Apache Spark.
  • Strong experience in Spark SQL UDFs,Hive UDFs, Spark SQL Performance, Performance Tuning.Hands on experience in working with input file formats like orc, parquet, json, avro.
  • Extensively worked on Mainframe/Unix and Informatica environments to invoke Teradata Utilities and file handlings
  • Good expertise in coding in Python,Scala and Java.
  • Involved in writing test scripts using java and executed it through selenium cucumber .
  • Designed and developed applications using Spring MVC and Javascript and HTML.
  • Used Hibernate and Spring JDBC to connect to oracle database and retrieved data.
  • Good understanding of the map reduce framework architectures (MRV1 & YARN Architecture).
  • Good Knowledge and understanding of Hadoop Architecture and various components in Hadoop ecosystems - HDFS, Map Reduce, Pig, Sqoop and Hive.
  • Handled importing of data from various data sources, performed transformations using Map Reduce, Spark and loaded data into HDFS.
  • Manage and review Hadoop log files.
  • Troubleshooting production support issues post-deployment and come up with solutions as required.
  • Worked on data analysis and giving reports on daily basis.
  • Check the registered logs in database whether the file status is properly updated or not.
  • Handling the backup for input files in HDFS
  • Worked with Avro Data Serialization system.
  • Experience in writing shell scripts do dump the shared data from landing zones to HDFS.
  • Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
  • Expertise in Client Side designing and validations using HTML and Java Script.
  • Excellent communication and inter-personal skills detail oriented, analytical, time bound, responsible team player and ability to coordinate in a team environment and possesses high degree of self-motivation and a quick learner.


Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, Map Reduce, Cloudera, Mongo DB.

Big data distribution: Cloudera, Amazon EMR

Programming languages: Core Java, Scala, Python, SQL, Shell Scripting

Operating Systems: Windows, Linux (Ubuntu)

Databases: Oracle, SQL Server

Designing Tools: Eclipse

Java Technologies: JSP, Servlets, Junit, Spring, Hibernate

Web Technologies: XML, HTML, JavaScript, JVM, JQuery, JSON

Linux Experience: System Administration Tools, Puppet, Apache

Web Services: Web Service (RESTful and SOAP)

Frame Works: Jakarta Struts 1.x, Spring 2.x

Development methodologies: Agile, Waterfall

Logging Tools: Log4j

Application / Web Servers: Cherrypy,Apache Tomcat, WebSphere

Messaging Services: ActiveMQ, Kafka, JMS

Version Tools: Git, SVN and CVS

Analytics: Tableau, SPSS, SAS EM and SAS JMP


Confidential, Mt Laurel, NJ

Big Data Engineer


  • Installed and configured HDFS, Hadoop Map Reduce, developed various Map Reduce jobs in Java for data cleaning and preprocessing.
  • Analyzed various RDDS using Scala2.11, Python with Spark.
  • Performed complex mathematical, statistical and machine learning analysis using SparkMlib, Spark Streaming and GraphX.
  • Performed data ingestion from various data sources.
  • Helped oversee Agile to SAFe transitiontabas
  • Worked with various types of daes like SQl, NOSQl and Relational for transferring data to and from HDFS.
  • Used Impala for data processing on top of Hive.
  • Experience in design and developing Application-leveraging MongoDB.
  • Ensure the continuous availability of our mission critical MongoDB clusters.
  • Facilitate meetings with integration partners.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases
  • Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Optimized MapReduce jobs to use HDFS efficiently by using Gzip, LZO, Snappy compression techniques.
  • Imported and exported data into HDFS and Hive using Sqoop.
  • Experience in loading and transforming huge sets of structured, semi structured and unstructured data.
  • Worked on different file formats like XML files, Sequence files, JSON, CSV and Map files using Map Reduce Programs.
  • Continuously monitored and managed Hadoop cluster using Cloudera Manager.
  • Performed POC’s using latest technologies like spark, Kafka, scala..
  • Cassandra is used for storing data permanently.
  • Created Hive tables, loaded them with data and wrote hive queries.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Experience in managing and reviewing Hadoop log files.
  • Executed test scripts to support test driven development and continuous integration.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Worked on tuning the Pig queries performance.
  • Installed Oozie workflow to run several MapReduce jobs.
  • Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.

Environment: Cloudera 5.x Hadoop, Linux, IBM DB2, HDFS, Yarn, Impala, Pig, Hive, Sqoop, Spark, Scala, Hbase, MapReduce, Hadoop Datalake, Informatica BDM 10


Hadoop Developer


  • Support all business areas of ADAC with critical data analysis that helps team members make profitable decisions as a forecast expert and business analyst and utilize tools for business optimization and analytics.
  • Experience and talents to be a part of ground breaking thinking and visionary goals. As an Executive Analytics, we take the lead to Delivery analyses/ ad-hoc reports including data extraction and summarization using big data tool set.
  • Ensures technology roadmaps are incorporated into data and database designs.
  • Experience in extracting large data sets is a HUGE plus.
  • Extensively used Zookeeper as job scheduler forSpark jobs.
  • Worked on generating reports Neo4J graph database
  • Worked on data processing using SPARK and Python
  • Experience in data management and analysis technologies like Hadoop, HDFS.
  • Create list and summary view reports.
  • Created Data Pipeline NiFi Cluster
  • Worked on analyzingHadoop cluster and different big data analytic tools including Pig, Hive and Impala
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Experience in managing and reviewing Hadoop log files.
  • Data Streaming Architecture with Apache Flink development of real-time streaming Flink applications
  • Created featured develop release branches in GIT for different application to support releasesand CI builds.
  • Handling and communicating with business and understanding the problems from business perspective rather than as a developer perspective.
  • Implementations were done using the spark API's and SparkSQL written in Python.
  • Implementing cost and resource optimized solution considering SQL licenses, EC2 instance types, evaluating available options and decision making.
  • Experience with configuration of Hadoop Ecosystem components: Hive, Spark, Drill, Impala, HBase, Pig, Sqoop, Mahout, Zookeeper and Flume.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Expertise in using Teradata SQL Assistant, Teradata Manager and data load/export utilities like BTEQ, Fast Load, Multi Load, Fast Export and exposure to Tpump on UNIXenvironment.
  • Used Hive Query language (HQL) to further analyze the data to identify issues and behavioral patterns.
  • Used Apache Spark and Scala language to find patients with similar symptoms in the past and medications used for them to achieve best results.
  • Configured Apache Sentry server to provide authentication to various users and provide authorization for accessing services that they were configured to use.
  • Used Pig as ETL tool to do transformations, joins and pre-aggregations before loading data onto HDFS.
  • Worked on large sets of structured, semi structured and unstructured data
  • Preparing the Unit Test Plan and System Test Plan documents.
  • Preparation & Execution of unit test cases and Troubleshooting and debugging.

Environment: Cloudera Hadoop, Linux, HDFS, Maprduce, Hive, Pig, Sqoop, Oracle, SQL Server, Eclise, Java and Oozie scheduler.

Confidential, Salt lake City, UT

Hadoop Developer


  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Installed and configured Apache Hadoop, Hive, and HBase.
  • Cassandra is used for storing data permanently.
  • Worked on Hortonworks cluster, which is responsible for providing open source platform based on Apache Hadoop for analysing, storing and managing big data.
  • Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
  • Developed multiple map reduce jobs in java for data cleaning and pre-processing.
  • Sqoop was used to pull data into Hadoop distributed file system from RDBMS and vice versa
  • Defined workflows using Oozie.
  • Develop Visualizations and Dashboards in Kibana that give rich analytics to data of interest
  • Used Hive to create partitions on hive tables and analyzes this data to compute various metrics for reporting.
  • Created Data model for Hive tables
  • Configure NFS server and mount exported NFS resources at the client side.
  • Created Kibana Visualizations and Dashboards for Software Engineering Metrics
  • Developed the LINUX shell scripts for creating the reports from Hive data.
  • Good Experience in managing and reviewing Hadoop log files
  • Responsible for maintaining system logs for unauthorized root usage and access. NFS/CIFS filesystem mounting and support for developers.
  • Used Pig as ETL tool to do transformations, joins and pre-aggregations before loading data onto HDFS.
  • Worked on large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources
  • Installed and configured Hive and also developed Hive UDFs to extend core functionality of hivec
  • Responsible for loading data from UNIX file systems to HDFS.

Environment: Apache Hadoop, Hortonworks, MapReduce, HDFS, Hive, HBase, Pig, Oozie,Linux, Java, Eclipse 3.0, Tomcat 4.1, MySQL.


Java Developer


  • Participated in the discussions with business experts to understand Business requirements and translate them into technical requirements towards development.
  • Designed concepts for frameworks using Spring and Hibernateand assisted with development environment configuration.
  • Created various spring boot and spring batch applications to connect them to various databases and created queries to retrieve data and modify the tables for the databases.
  • Prepared the proof of concept by configuring the Spring MVC and Hibernate for various modules.
  • Designed and developed functionality with excellent understanding of design patterns like singleton, List Iterator, Command, Factoryetc.
  • Used HTTP Request and SOAP based Web services to post XML data to the End client.
  • Exposed web services to the client applications by sharing the WSDL.
  • Used Spring Frameworkto develop beans from already developed parent bean.
  • Used Dependency Injection feature of spring framework and O/R mapping tool Hibernate for rapid development and ease of maintenance.
  • Involved in designing, developing and deploying reports in MS SQL Server environment using SSRS-2008 and SSIS in Business Intelligence Development Studio (BIDS).
  • Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse.
  • Worked with Cassandra Query Language (CQL) to execute queries on the data persisting in the Cassandra cluster.
  • Developed database objects in SQL Server 2005 and used SQL to interact with the database during to troubleshoot the issues.
  • Updated and saved the required data in the DB2 database using JDBC, corresponding to actions performed in the struts class.
  • Involved in bug fixing and resolving issues with the QA.
  • Developed SQL scripts to store data validation rules in Oracle database.
  • Configured Log4j for logging activity at various levels and written test cases using JUnit.
  • Involved in developing Ant build scripts for automating deployment on WebSphere test environment.
  • Addressing high severity production issues on regular basis by researching and proposing quick fix or design change as required

Environment: JAVA 1.6, J2EE1.6, Servlets, JDBC, Spring, Hibernate3.0, JSTL, JSP2, JMS, Oracle10g, Web Services, SOAP, Restful, Maven, Apache AXIS, SOAP UI, XML1.0, JAXB2.1, JAXP, HTML, JavaScript, CSS3, AJAX, JUnit, Eclipse, WebLogic10.3, SVN, Shell Script

Hire Now