We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Wayne, PA

SUMMARY

  • Above 8+ years of IT experience as Big Data/Hadoop Engineer in all phases of Software Development Life Cycle which includes hands on experience in Java/J2EE Technologies.
  • Experience in Apache Hadoop ecosystem components like HDFS, Map Reduce, Pig, Hive, Impala, HBase, SQOOP, Flume and Oozie.
  • Well versed experience in Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
  • Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
  • Experienced in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Proficient in Core Java, Enterprise technologies such as EJB, Hibernate, Java Web Service, SOAP, REST Services, Java Thread, Java Socket, Java Servlets, JSP, JDBC etc.
  • Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
  • Written multiple MapReduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Experience in working on the Hadoop Eco system, also have little experience on installing and configuring of the Hortonworks distribution and Cloudera distribution (CDH3 and CDH4).
  • Experience in NoSQL database HBase, MongoDB and Cassandra.
  • Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming.
  • Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
  • Extracted & processed streaming log data from various sources and integrated in to HDFS using Flume.
  • Strong knowledge on implementation of SPARK core - SPARK SQL, MLlib, GraphX and Spark streaming.
  • Extensively worked with different data sources non-relational databases such as XML files, parses like SAX, DOM and other relational databases such as Oracle, MySQL.
  • Experience working on Application servers like IBM WebSphere, JBoss, BEA WebLogic and Apache Tomcat.
  • Extensive experience in Internet, client/server technologies using Java, J2EE, Struts, Hibernate, Spring, HTML, HTML5, DHTML, CSS, JavaScript, XML, PERL.
  • Expert in deploying the code trough web application servers like Web Sphere/Web Logic/ Apache Tomcat in AWS CLOUD.
  • Expertise in core Java, J2EE, Multithreading, JDBC, Hibernate, Shell Scripting Servlets, JSP, Spring, Struts, EJBs, Web Services, XML, JPA, JMS, JNDI and proficient in using Java API's for application development
  • Good working experience in Application and web Servers like JBoss and Apache Tomcat.
  • Experience in writing Pig and Hive scripts and extending the core functionality by writing custom UDF's.
  • Extensive experience with Agile Development, Object Modeling using UML and Rational Unified Process (RUP).
  • Strong knowledge of Object Oriented Programming (OOP) concepts including the use of Polymorphism, Abstraction, Inheritance and Encapsulation.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experienced in building tool Maven, ANT and logging tool Log4J.
  • Experience in working with Eclipse IDE, NetBeans and BlueJ.

TECHNICAL SKILLS

Big data/Hadoop: Hadoop2.7/2.5, HDFS1.2.4, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka and Spark2.0/2.0.2

NoSQL Databases: HBase, MongoDB3.2 & Cassandra

Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS

Programming Languages: Java, Python, SQL, PL/SQL, AWS, Hive QL, Unix Shell Scripting, Scala

IDE and Tools: Eclipse 4.6, NetBeans 8.2, BlueJ

Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014

Web Technologies: HTML5/4, DHTML, AJAX, JavaScript, JQuery and CSS3/2, JSP, Bootstrap 3/3.5

Application Server: Apache Tomcat, JBoss, IBM Web sphere, Web Logic

Operating Systems: Windows8/7, UNIX/Linux and Mac OS.

Other Tools: Maven, ANT, WSDL, SOAP, REST.

Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile, STLC (Software Testing Life cycle), UML, Design Patterns (Core Java and J2EE)

PROFESSIONAL EXPERIENCE

Confidential - Wayne, PA

Sr. Big Data Engineer

Responsibilities:

  • As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Responsible for the planning and execution of big data analytics, predictive analytics and machine learning initiatives.
  • Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Worked in exporting data from Hive 2.0.0 tables into Netezza 7.2.x database.
  • Implemented the Big Data solution using Hadoop, hive and Informatica 9.5 to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache NiFi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Used AWS Cloud with Infrastructure Provisioning / Configuration.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Active involvement in design, new development and SLA based support tickets of Big Machines applications.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Loaded the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.

Environment: Apache Spark 2.3, Hive 2.3, Informatica, HDFS, MapReduce, Scala, Apache Nifi 1.6, Yarn, HBase, PL/SQL, Mongo DB, Pig 0.16, Sqoop 1.2, Flume 1.8

Confidential - Washington, DC

Sr. Java/Hadoop Engineer

Responsibilities:

  • Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Developed Java Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Wrote shell scripts for Key Hadoop services like zookeeper, and also automated them to run by using CRON.
  • Developed PIG scripts for the analysis of semi structured data.
  • Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer)
  • Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Designed and implemented MapReduce based large-scale parallel processing.
  • Developed and updated the web tier modules using Struts 2.1 Framework.
  • Modified the existing JSP pages using JSTL.
  • Implemented Struts Validator for automated validation.
  • Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQlServer.
  • Performed building and deployment of EAR, WAR, JAR files on test, stage systems in Web logic Application Server.
  • Developed Java and J2EE applications using Rapid Application Development (RAD), Eclipse.
  • Used Singleton, DAO, DTO, Session Facade, MVC design Patterns.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
  • Writing complex SQL and PL/SQL queries for stored procedures.
  • Developed Reference Architecture for E-Commerce SOA Environment
  • Used UDF's to implement business logic in Hadoop
  • Custom table creation and population, custom and package index analysis and maintenance in relation to process performance.
  • Used CVS for version controlling and JUnit for unit testing.

Environment: Eclipse, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, MySQL, Cassandra, Java, Shell Scripting, MySQL, SQL.

Confidential - Hartford, CT

Sr. Java Developer

Responsibilities:

  • Worked on designing and developing the Web Application User Interface and implemented its related functionality in Java/J2EE for the product.
  • Used JSF framework to implement MVC design pattern.
  • Developed and coordinated complex high quality solutions to clients using J2SE, J2EE, Servlets, JSP, HTML, Struts, Spring MVC, SOAP, JavaScript, JQuery, JSON and XML.
  • Wrote JSF managed beans, converters and validators following framework standards and used explicit and implicit navigations for page navigations.
  • Designed and developed Persistence layer components using Hibernate ORM tool.
  • Designed UI using JSF tags, Apache Tomahawk & Rich faces.
  • Used Oracle 10g as backend to store and fetch data.
  • Experienced in using IDEs like Eclipse and Net Beans, integration with Maven
  • Created Real-time Reporting systems and dashboards using XML, MySQL, and Perl
  • Worked on Restful web services which enforced a stateless client server and support JSON (few changes from SOAP to RESTFUL Technology)
  • Involved in detailed analysis based on the requirement documents.
  • Involved in Design, development and testing of web application and integration projects using Object Oriented technologies such as Core Java, J2EE, Struts, JSP, JDBC, Spring Framework, Hibernate, Java Beans, Web Services (REST/SOAP), XML, XSLT, XSL and Ant.
  • Designing and implementing SOA compliant management and metrics infrastructure for Mule ESB infrastructure utilizing the SOA management components.
  • Used NodeJs for server side rendering. Implemented modules into NodeJs to integrate with designs and requirements.
  • Used JAX-WS to interact in front-end module with backend module as they are running in two different servers.
  • Responsible for Offshore deliverables and provide design/technical help to the team and review to meet the quality and time lines.
  • Migrated existing Struts application to Spring MVC framework.
  • Provided and implemented numerous solution ideas to improve the performance and stabilize the application.
  • Extensively used LDAP Microsoft Active Directory for user authentication while login.
  • Developed unit test cases using JUnit.
  • Created the project from scratch using Angular JS as frontend, Node Express JS as backend.
  • Involved in developing Perl script and some other scripts like java script
  • Tomcat is the web server used to deploy OMS web application.
  • Used SOAPLite module to communicate with different web-services based on given WSDL.
  • Prepared technical reports &documentation manuals during the program development.

Environment: JDK 1.5, JSF, Hibernate 3.6, JIRA, NodeJs, Cruise control, Log4j, Tomcat, LDAP, JUNIT, NetBeans.

Confidential - Atlanta, GA

Sr. Big Data/Hadoop Engineer

Responsibilities:

  • As a Sr. Big Data Engineer worked on Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
  • Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins.
  • Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS.
  • Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
  • Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Handled importing of data from various data sources, performed transformations using Hive, Pig, and loaded data into HDFS.
  • Involved in identifying job dependencies to design workflow for Oozie & Yarn resource management.
  • Designed solution for various system components using Microsoft Azure.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro.
  • Developed and designed data integration and migration solutions in Azure.
  • Worked on Proof of concept with Spark with Scala and Kafka.
  • Used Sqoop to import functionality for loading Historical data present in RDBMS to HDFS
  • Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2)
  • Configured Hive bolts and written data to hive in Hortonworks as a part of POC.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Responsible for importing real time data to pull the data from sources to Kafka clusters.
  • Worked with spark techniques like refreshing the table and handling parallel and modifying the spark defaults for performance tuning.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Involved in migrating MapReduce jobs into Spark jobs and used SparkSQL and Data frames API to load structured data into Spark clusters.
  • Involved in using Spark API over Hadoop YARN as execution engine for data analytics using Hive and submitted the data to BI team for generating reports, after the processing and analyzing of data in Spark SQL.
  • Used version control tools like Github to share the code snippet among the team members.
  • Involved in daily Scrum meetings to discuss the development/progress and was active in making scrum meetings more productive.

Environment: Hadoop 3.0, Agile, Pig 0.17, HBase 1.4.3, Jenkins 2.12, NoSQL, Sqoop 1.4, Impala 3.0.0, Hive 2.3, MapReduce, YARN, Oozie, Microsoft Azure, Nifi, Avro, MYSQL, Kafka, Scala 2.12, Spark, Apache Flume 1.8

Confidential

Java Developer

Responsibilities:

  • Developed using new features of Java 1.5 Annotations, Generics, enhanced for loop and Enums.
  • Used Struts and Hibernate for implementing IOC, AOP and ORM for back end tiers.
  • Designing of the system as per the change in requirement using Struts MVC architecture, JSP, DHTML
  • Designed the application using J2EE patterns.
  • Developed Java Beans for business logic.
  • Design of REST APIs that allow sophisticated, effective and low cost application integrations.
  • Developed the presentation layer using Struts Framework.
  • Developed persistence layer using ORM Hibernate for transparently store objects into database.
  • Responsible for coding all the JSP, Servlets used for the Used Module.
  • Developed the JSP, Servlets and various Beans using WebSphere server.
  • Wrote Java utility classes common for all of the applications.
  • Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
  • Designed and implemented highly intuitive, user friendly GUI from scratch using Drag and Drop with Java Swing and CORBA.
  • Extensively used multithreading concepts.
  • Deployed the jar files in the Web Container on the IBM WebSphere Server 5.x.
  • Designed and developed the screens in HTML with client side validations in JavaScript.
  • Developed the server side scripts using JMS, JSP and Java Beans.
  • Adding and modifying Hibernate configuration code and Java/SQL statements depending upon the specific database access requirements.
  • Design database Tables, View, Index's and create triggers for optimized data access.
  • Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
  • Creating XML based configuration, property files for application and developing parsers using JAXP, SAX, and DOM technologies.
  • Developed Web Services using Apache AXIS tool.

Environment: Java 1.5, Struts MVC, JSP, Hibernate 3.0, JUnit, UML, XML, CSS, HTML, Oracle 9i, Eclipse, JavaScript, WebSphere 5.x, Rational Rose, ANT.

Hire Now