We provide IT Staff Augmentation Services!

Sr. Big Data/hadoop Engineer Resume

4.00/5 (Submit Your Rating)

Boston, MA

SUMMARY

  • Above 9+ working experience as a Big Data/Hadoop Engineer in designed and developed various applications like big data, Hadoop, Java/J2EE open - source technologies.
  • Strong development skills in Hadoop, HDFS, Map Reduce, Hive, Sqoop, HBase with solid understanding of Hadoop internals.
  • Experience in working with SDLC methodologies like Agile and Waterfall.
  • Experience in Programming and Development of java modules for an existing web portal based in Java using technologies like JSP, Servlets, JavaScript and HTML, SOA with MVC architecture.
  • Expertise in ingesting real time/near real time data using Flume, Kafka, Storm
  • Good knowledge of NOSQL databases like Mongo DB, Cassandra and HBase.
  • Excellent knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRA and MRv2 (YARN).
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig and SOLR, Splunk.
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, Pig, Hive, HBase, Apache Crunch, Zookeeper, Scoop, Hue, Scala, AVRO.
  • Strong Programming Skills in designing and implementing of multi-tier applications using Java, J2EE, JDBC, JSP, JSTL, HTML, CSS, JSF, Struts, JavaScript, Servlets, POJO, EJB, XSLT, JAXB.
  • Extensive experience in SOA-based solutions - Web Services, Web API, WCF, SOAP including Restful APIs services
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Experienced in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
  • Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
  • Expertise in developing a simple web based application using J2EE technologies like JSP, Servlet, and JDBC.
  • Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
  • Work Extensively in Core Java, Struts2, JSF2.2, Spring3.1, Hibernate, Servlets, JSP and Hands-on experience with PL/SQL, XML and SOAP.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node.
  • Well versed working with Relational Database Management Systems as Oracle 9i/12c, MS SQL, MySQL Server
  • Hands on experience in working on XML suite of technologies like XML, XSL, XSLT, DTD, XML Schema, SAX, DOM, JAXB.
  • Hands on experience in advanced Big-Data technologies like Spark Ecosystem (Spark SQL, MLlib, Spark, R and Spark Streaming), Kafka and Predictive analytics
  • Knowledge of the software Development Life Cycle (SDLC), Agile and Waterfall Methodologies.
  • Experienced on applications using Java, python and UNIX shell scripting
  • Experience in consuming Web services with Apache Axis using JAX-RS(REST) API's.
  • Experienced in building tool Maven, ANT and logging tool Log4J.
  • Experience in working with Web Servers like Apache Tomcat and Application Servers like IBM Web Sphere and JBOSS.
  • Good knowledge of NoSQL databases such as HBase, MongoDB and Cassandra.
  • Experience in working with Eclipse IDE, Net Beans, and Rational Application Developer.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Experience in Data Ingestion, In-Stream data processing, Batch Analytics and Data Persistence Strategy.
  • Experienced in building tool Maven, ANT and logging tool Log4J.
  • Experience in working with Eclipse IDE, NetBeans and BlueJ.
  • Having intensive experience in handling structured, semi-structured and unstructured data.
  • Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.

TECHNICAL SKILLS

Web Technologies: HTML5/4, DHTML, AJAX, JavaScript, jQuery, CSS3/2, JSP, and Bootstrap 3/3.5

Big data/Hadoop: Hadoop3.0/2.7/2.5, HDFS1.2.4, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka

Database: Oracle 12c/11g, MYSQL, SQL Server

NoSQL Databases: HBase, MongoDB3.6.1/3.2 & Cassandra

Application Server: Apache Tomcat, Jboss, IBM Web sphere, Web Logic

Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS

Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala

IDE and Tools: Eclipse 4.7, Netbeans 8.2, BlueJ, Maven

Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile, STLC (Software Testing Life cycle), UML, Design Patterns (Core Java and J2EE)

Operating Systems: Windows8/7, UNIX/Linux and Mac OS.

Other Tools: Maven, ANT, WSDL, SOAP, REST.

PROFESSIONAL EXPERIENCE

Confidential - Boston, MA

Sr. Big Data/Hadoop Engineer

Responsibilities:

  • Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
  • Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
  • Worked on NoSQL (HBase) for support enterprise production and loading data into HBASE using Impala and SQOOP.
  • Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Worked on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Created tables in HBase to store variable data formats of PII data coming from different portfolios.
  • Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Worked on Proof of concept with Spark with Scala and Kafka.
  • Worked on visualizing the aggregated datasets in Tableau.
  • Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Implemented Map Reduce jobs in HIVE by querying the available data.
  • Configured Hive meta store with MySQL, which stores the metadata for Hive tables.
  • Performed data analytics in Hive and then exported those metrics back to Oracle Database using Sqoop.
  • Performance tuning of Hive queries, MapReduce programs for different applications.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements.
  • Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts

Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Oracle 12c, Flume, Oozie, HBase, Impala, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera.

Confidential - Seattle, WA

Sr. Big Data/ Hadoop Engineer

Responsibilities:

  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modelling, development, Implementation, testing.
  • Scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
  • Created Hive Tables, loaded transactional data from Teradata using Sqoop and worked with highly unstructured and semi structured data of 2 Petabytes in size.
  • Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
  • Responsible for building scalable distributed data solutions using Hadoop Cloudera.
  • Designed and developed automation test scripts using Python
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Writing Pig-scripts to transform raw data from several data sources into forming baseline data.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Implemented Hive Generic UDF's to incorporate business logic into Hive Queries.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into S3.
  • Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
  • Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Worked on custom talend jobs to ingest, enrich and distribute data in Cloudera Hadoop ecosystem.
  • Creating Hive tables and working on them using Hive QL.
  • Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
  • Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and Involved in End-to-End implementation of ETL logic.
  • Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
  • Worked on Cluster co-ordination services through Zookeeper.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
  • Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Creating the cube in talend to create different types of aggregation in the data and also to visualize them.

Environment: Hive, Teradata, MapReduce, HDFS, Sqoop, AWS, Hadoop, Pig, Python, Kafka, Apache Storm, SQL scripts, data pipeline, HBase, JSON, Oozie, ETL, Zookeeper, Maven, Jenkins, RDBMS

Confidential - Atlanta, GA

Sr. Java/Hadoop Developer

Responsibilities:

  • Involved in Installing Hadoop Ecosystem components.
  • Involved in HDFS maintenance and administering it through Hadoop - Java API.
  • Analyzed the data using Spark, Hive and produced summary results to downstream systems.
  • Created Shell scripts for scheduling data cleansing scripts and ETL loading process.
  • Installed and Configured multi-nodes fully distributed Hadoop cluster.
  • Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, UDF, Pig, Sqoop, Zookeeper and Spark.
  • Designed and implemented MapReduce based large-scale parallel relation-learning system.
  • Developed and delivered quality services on-time and on-budget. Solutions developed by the team use Java, XML, HTTP, SOAP, Hadoop, Pig and other web technologies.
  • Created the Jdbc data sources in the Weblogic.
  • Used the existing database reference tables in order for consumption using Jdbc mapping.
  • Used Html, CSS, JDBC Driver, JSP, AJAX, Google API and Webmashup.
  • Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
  • Imported data into HDFS from various SQL databases and files using Sqoop and from streaming systems using Storm into Big Data Lake.
  • Involved in scripting (python and shell) to provision and spin up virtualized hadoop clusters.
  • Worked with NoSQL databases like Base to create tables and store the data Collected and aggregated large amounts of log data using Apache Flume and staged data in HDFS for further analysis.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Wrote Pig scripts to store the data into HBase
  • Created Hive tables, dynamic partitions, buckets for sampling, and worked on them using Hive QL
  • Exported the analyzed data to Teradata using Sqoop for visualization and to generate reports for the BI team. Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Developed the code which will create XML files and Flat files with the data retrieved from Databases and XML files.
  • Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
  • Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).
  • Configured Fair Scheduler to provide service level agreements for multiple users of a cluster.
  • Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop.
  • Involved in writing Java API's for interacting with HBase
  • Involved in writing Flume and Hive scripts to extract, transform and load data into Database
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Ingested semi structured data using Flume and transformed it using Pig.

Environment: Cloudera, HDFS, HBase, MapReduce, Hive, UDF, Pig, Sqoop, Zookeeper, Spark, RDBMS, Kafka, Teradata, Java, XML, HTTP, SOAP, Hadoop, Pig, and Flume

Confidential

Java/J2EE Developer

Responsibilities:

  • Worked in SDLC methodology followed Waterfall environment including Acceptance Test Driven Design and Continuous Integration/Delivery.
  • Responsible for analyzing, designing, developing, coordinating and deploying web based application.
  • Developed the application using Spring MVC Framework that uses Model View Controller (MVC) architecture with JSP as the view.
  • Used Spring MVC for the management of application flow by developing configurable handler mappings, view resolution.
  • Used Spring Framework to inject the DAO and Bean objects by auto wiring the components.
  • Involved in coding, maintaining, and administering Servlets and JSP components to be deployed on JBoss and WebSphere Application servers in both UNIX and Windows environments.
  • Used Spring 3.6 Framework to integrate the application with Hibernate.
  • Implemented Hibernate in the data access object layer to access and update information in the Oracle Database.
  • Used Entity beans for storing the database in to database.
  • Developed Session Beans as the clients of Entity Beans to maintain the Client state.
  • Used various Core Java concepts such as Multithreading, Exception Handling, Collection APIs to implement various features and enhancements.
  • Used JMS messaging framework in the application to connect a variety of external systems that house member and provider data to a medical term translation application called Auto coder.
  • Developed UI components and faces-config.xml file using JSF MVC Framework.
  • Created POJOs in the business layer.
  • Developed Ant Scripts to build and deploy EAR files on to Tomcat Server.
  • Analyzed the EJB performance in terms of scalability by various Loads, Stress tests using Bean- test tool.
  • Extensively used Eclipse while writing code as IDE.
  • Written complex SQL queries, stored procedures,functions and triggers in PL/SQL.
  • Worked on a variety of defects to stabilize Aerial application.
  • Worked on Session Facade design pattern to access domain objects.
  • Developed presentation layer using HTML and JSP's for user interaction.
  • Performed client side validations using JavaScript.
  • Used Maven to build, run and create Aerial-related JARs and WAR files among other uses.
  • Wrote test cases in JUnit for unit testing of classes.
  • Used AJAX to create interactive front-end GUI.
  • Produced and consumed Restful web services for transferring data between different applications.
  • Used integration tools like Hudson/Jenkins.
  • Used Eclipse IDE for developing code modules in the development environment.
  • Implemented the logging mechanism using Log4j framework.
  • Developed test cases and used JUnit for Unit Testing.
  • Used SVN version control to track and maintain the different version of the application.

Environment: MVC, Spring Framework, MVC Framework, DAO, UNIX, JSP, JBoss, WebSphere, Hibernate, Oracle10g, HTML, JavaScript, Maven, JUnit, AJAX, Jenkins, Log4j, SVN

Confidential

Java Developer

Responsibilities:

  • Involved in prototyping, proof of concept, design, Interface Implementation, testing and maintenance.
  • Involved in the development of the User Interfaces using HTML, JSP, JavaScript, Bootstrap and AJAX.
  • Created use case diagrams, sequence diagrams, and preliminary class diagrams for the system using UML/Rational Rose.
  • Designed and developed the persistence tier using Hibernate framework.
  • Designed and developed front view components using JSP and HTML.
  • Developed re-usable utility classes in core java for validation which are used across all modules.
  • Developed UI navigation using Spring MVC architecture. (JSP, JSF, tiles, JSTL, Custom Tags).
  • Created JSF components for presentation layer.
  • Used JNDI to support transparent access to distributed components, directories and services
  • Used Core Spring for Dependency Injection of various component layers.
  • Used SOA REST (JAX-RS) web services to provide/consume the Web services from/to down-stream systems.
  • Deployed and tested the application with web server Tomcat, WebSphere.
  • Developed Interactive web pages using AJAX and JavaScript.
  • Worked on Report Module, generating PDF/ CSV according to template.
  • Configured and tested the application with database server MySQL.
  • Developed various objects using java and HTML and DHTML to maintain well-structured GUI and to interact with Controllers to get data from MySQL database.
  • Involved in developing various reusable Helper and Utility classes using Core Java, which are being used across all the modules of the application.
  • Used the built tools Maven and ANT for clubbing all source files and web content in to war files.
  • Helped in database design and tuning.
  • Deployed WAR applications in Web Logic.
  • Created stored procedures, Views in the Database.

Environment: Core Java (Multithreading, Collections), JSP, JSTL, Servlets, Spring, MySQL, XML, HTML, Java Script, AJAX, SOAP, JDBC, Shell Scripting, JUnit, log4j, JMS, Apache Tomcat, WebSphere.

We'd love your feedback!