We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00 Rating

Massachusetts, BostoN


  • Overall 8+ years of experience in Architect, Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects. Around 4+ years of experience in Big Data in implementing end - to-end Hadoop solutions.
  • Hands on experience in installing, configuring and using ApacheHadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, Impala, HBASE, Apache Crunch, ZOOKEEPER, SQOOP, Hue, Scala, Solr, Git, Maven, AVRO, JSON and CHEF.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce,Apache Crunch, Hive, Pig andSolr.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • Good Knowledge onHadoopClusterarchitectureand monitoring the cluster.
  • Experience with Storm and Kafka for the real time processing of data.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRv1 and MRv2 (YARN).
  • Experience in analyzing data using Hive QL, Impala, Pig Latin, and custom MapReduce programs in Java.
  • Experience in job workflow scheduling and monitoring tools like Oozie, Zookeeper and CHEF.
  • Hands on experience inVPN Putty winSCP VNCvieweretc.
  • Scripting to deploy monitors checks and critical system admin functions automation.
  • Experience in Real-time streaming data using Kafka and Storm.
  • Experience withExtraction Transformationand Loading (ETL)toolInformatica Power Center 6.2/7.1.1/8.6.1 andOracle Data Integrator,
  • Experience in developingInformatica Mappingswithtransformations, designing and schedulingworkflows, merging repositories.
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Hands on experience in installing configuring and using Hadoop ecosystem components like HadoopMapReduce HDFS HBase Hive Sqoop Pig ZookeeperandFlume.
  • Experience in optimization of Map reduce algorithm using combiners and partitioners to deliver the best results.
  • Good understanding of NoSQL databases like MongoDB, REDIS.
  • Experience in managing Hadoop clusters usingCloudera Manager Tool.
  • Very good experience in complete project life cycle design development testing and implementation of Client Server andWeb applications.
  • Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Redhat.
  • Experience in using Apache Spark for Big Data and “Fast Data” (Streaming) platform.
  • Expertise in core Java, J2EE, Multithreading, JDBC, Hibernate, spring, Shell Scripting and proficient in using Java API’s for application development.
  • Proficient in Working with Various IDE tools including Eclipse Galileo, IBM Rational Application Developer (RAD) and IntelliJ IDEA.
  • Hands on experience in application development using Linux Shell scripting.
  • Experience in Database design, Entity relationships, Database analysis, Programming SQL, Stored procedure’s PL/ SQL.
  • Hands on experience in Ruby on Rails to develop the Widgets, Slicers, and Time series for UI dashboard.
  • Experience coding and testing the Crawlers, Standardization, Normalization, Load, Extract and AVRO models to filter/massage the data and its validation.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Worked on various operating systems like UNIX/Linux, MAC OS and Windows.
  • Excellent interpersonal, analytical, verbal and written communications skills.


Hadoop/Big Data: HDFS, MapReduce, Pig, Hive, Spark, HBase, Impala, Apache CrunchSolr, Sqoop, Oozie, Zookeeper, Scala, Hue, Kafka, Storm AVRO, JSON.

Java & J2EE technologies: Core Java, JSP, JDBC

IDE Tools: Eclipse, IntelliJ IDEA.

Programming languages: Java, Linux shell scripts, Scala, Python.

Web Frameworks: Struts1.x, Struts 2.x, Spring3.x, Hibernate.

Database: Oracle 11g/10g/9i, DB2, PL/SQL, SQL Developer, MongoDB, Cassandra DB.

ETL Tool: Informatica, Oracle Data integrator.

Web Technologies: HTML, XML, JavaScript, and Ruby on Rails.

Operating Systems: Windows 95/98/2000/XP, MAC OS, UNIX, LINUX.

Testing Tools: JUnit, MRUnit.

Other: Git, Maven, Jenkins, Clover, Version One (Scrum), JIRA (Scrum)SharePoint, BMC Remedy, Clarity, WINSQL, Lotus Notes, MS OfficeVisio and Clear Quest.


Confidential, Massachusetts, Boston

Big Data Engineer


  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and pre-processing.
  • Involved in loading data from UNIX file system to HDFS using Flume and Kettle and HDFS API.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Involved in managing and reviewing Hadoop log files.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Load and transform large sets of structured, semi structured and unstructured data with map reduce, Impala and Pig.
  • Wrote pig UDF’s.
  • Developed HIVE queries for the analysts.
  • Configured and Maintained different topologies in Storm cluster and deployed them on regular basis for real time processing data.
  • Documented routine tasks likeETL process flow diagrams, mapping specs, source-targetmatrixand unit test documentation.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Created databases table, tables and views in Hive QL, Impala and Pig Latin.
  • Wrote functional and technical specifications for Solr, HBase, Hive and other components.
  • Exported the result set from HIVE to MySQL using Kettle (Pentaho data-integration tool).
  • Used Zookeeper for various types of centralized configurations.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map
  • Automated all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR jobs and PIG/Hive using Kettle and Oozie(Work Flow management).
  • Written Kafka API to collect events from front end.
  • Maintain System integrity of all sub-components (primarily HDFS, MR and Flume).
  • Writing unit test cases using MR Unit
  • Monitor System health and logs and respond accordingly to any warning or failure conditions.

Environment: Hadoop (Cloudera), HDFS, Map Reduce, Hive, Pig, Kafka, Impala, Storm, Sqoop, Solr, WebSphere,Struts, Hibernate, spring, Oozie, REST Web Services, Solaris, Db2, UNIX Shell Scripting, Kettle.

Confidential, Austin, TX

Big Data Engineer


  • Prepare the AVRO model designs on required fields from the database to represent the data to Dashboards.
  • Perform coding and testing on Standardization, Normalization processes to filter/massage the data and its validation.
  • Written Kafka API to collect events from front end.
  • Perform the coding and testing on Load, Extract the data into the HBASE database.
  • Developed the technical strategy for Spark integrated for pure streaming and more general data-computation needs.
  • Implemented search on HDFS using operational metadata stored in Elastic Search/Solr.
  • Write and test the Metrics Map/Reduce code to for aggregations on identified, validated data.
  • Perform the coding and testing on Ruby on Rails to develop the Widgets, Slicers, and time series for UI dashboard.
  • Real-Time streaming process in Hadoop using Storm.
  • Work with the ETL team to make sure that allETL jobsare run on time so that refreshed data is available for OBIEE reporting.
  • Created databases table, tables and views in Hive QL, Impala and Pig Latin.
  • Load and transform large sets of structured, semi structured using Hive and Impala.
  • Develop a Restful API to provide access to data in Solr, HBase and HDFS.
  • Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Involved in managing and reviewing Hadoop log files.
  • Extracted feeds from social media using python and shell scripting.
  • Automatethe ETL packages using Oracle Data Integrator built-in agent.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Wrote pig UDF’s.
  • Wrote unit test cases using MR Unit.
  • Fix the code review comments; Build the Jenkins and support for the code deployment into the production. Fix the postproduction defects to perform the Map/Reduce code to work as expected.

Environment: Hadoop, HDFS, MapReduce, Spark, Apache Crunch, Kafka, Python, Impala Storm, Informatica8.6, HBase, Hive, Solr, Scala, AVRO, JSON, Oozie, Hue, Git, Maven, Spring, Hibernate, Shell, REST Web Services and Java.

Confidential, Mountain View, CA

Hadoop Developer


  • Extracted files from DB2 through Kettle and placed in HDFS and processed.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Building Hadoop-based big data enterprise platforms coding in python.
  • Developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
  • Involved in unit testing using MR unit for Map Reduce jobs.
  • Involved in loading data from LINUX file system to HDFS.
  • Responsible for managing data from multiple sources.
  • Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi-structured data.
  • Responsible to manage data coming from different sources.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Python, LINUX, MRUnit and Big Data

Confidential, Seattle, WA

Java Developer


  • Enhanced the existing systems using Java, J2EE, JSP, Spring, Hibernate, RESTFul, Web Services and Java Beans
  • Designed and Developed DN2 cross connections, DB2 batching, EKSOS and Q3Stack modules using Spring, Hibernate, JMS, JavaScript, Servlets, CSS and XML
  • Enhanced Cross connections vs path finding process using java multi-threading techniques for DN2, and DB2 nodes
  • Enhanced Q3 stack values against EKSOS NM sync-up using threading concepts
  • Involved in requirements gathering and designing, server side coding using Spring and Hibernate, DAO, Actions, Filters, Handlers and JSP’s using HTML, CSS, JavaScript and Ajax
  • Worked on performance tuning the database by tuning queries, creating indexes, stored procedures
  • Involved in design for major enhancements to the existing systems
  • Written Approach Notes documents for all NM upgrades and Client Specific Re-Branding activities
  • Helping manager in handling risk assessment and subsequently created contingency and mitigation plans
  • Ensured the resolution of queries, Incidents, and bugs within agreed SLA time frame
  • Ensured client satisfaction by giving support during odd hours & holidays.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Planned and implemented project plan to facilitate the definition of project scope, goals and deliverables, and defined project tasks
  • Tracked project deliverables at all milestones defined for the project.
  • Always set and met realistic deadlines. Forecasts changes and communicates current and projected issues

Environment: Java, JSP, Struts, Spring, Hibernate, Oracle8i, Web Logic 9, Eclipse, Linux, and Solaris.


Java Developer


  • Involved in analysis, design and development of POS (Point of Sale) system and developed specs that include Use Cases, Class Diagrams, Sequence Diagrams and Activity Diagrams.
  • Involved in designing the user interfaces using JSP’s.
  • Developed custom tags, JSTL to support custom User Interfaces.
  • Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
  • Implemented Business processes such as user authentication.
  • Implemented code for JSP, Servlets and Struts.
  • Used Spring Framework to support the Hibernate tool and Struts.
  • Implemented the application using the concrete principles laid down by several design patterns such as MVC, Business Delegate, Data Access Object, and Singleton.
  • Deployed the applications on IBM Web sphere Application Server.
  • Developed JUnit test cases for all the developed modules.
  • Used CVS for version control across common source code used by developers.
  • Used JDBC to invoke Stored Procedures and database connectivity to Oracle.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Java,J2EE, JSP 2.0, Struts, MVC, EJB, JMS, JNDI, Oracle, HTML, XML, ANT, IBM Web Sphere Application Server 5.1, Hibernate 2.0, LOG4J, CVS.


Java Developer


  • Designed User Interface using Java Server Pages (JSP) and XML.
  • Developed the Enterprise Java Beans (Stateless, Stateful Session beans, Entity beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
  • Implemented Service Oriented Architecture (SOA) using JMS in MDB for sending and receiving messages while creating web services.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings
  • Worked on Web Services for data transfer from client to server and vice versa using SOAP, WSDL, and UDDI.
  • Involved in testing the web services using SOAP UI
  • Extensively worked on JMS using point-point, publisher/subscriber-messaging Domains for implementing Exchange of information through Messages.

Environment: Windows, Java 1.4, HTML, JavaScript 1.6, XML, JUnit, JMS, Web Services, SOAP 1.1, UDDI 2, Maven 2.0, Eclipse IDE, CVS, Oracle 10g.

We'd love your feedback!