We provide IT Staff Augmentation Services!

Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Battle Creek, MI

PROFESSIONAL SUMMARY:

  • Over 9+ years of IT experience with extensive knowledge in Software Development Life Cycle (SDLC) involving Requirements Gathering, Architect, Design, Analysis, Development, Maintenance, Implementation and Testing.
  • Having experience in Hadoop Architect /Developer (with knowledge of Hive, Sqoop, MR, Storm, Pig, HBase, Flume, Spark).
  • Architect, design & develop Big Data Solutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data.
  • Experienced in application development using Java, J2EE, JDBC, spring, Junit.
  • Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
  • Experienced in Big Data Hadoop Eco System including Map Reduce, Map reduce 2, YARN, flume, Sqoop, Hive, Apache Spark, Scala.
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Expertise in Big Data Tools like Map Reduce, Hive SQL, Hive PL/SQL, Impala, Pig, Spark Core, YARN, SQOOP etc.
  • Hands - on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
  • Selecting appropriate AWS services to design and deploy an application based on given requirements.
  • Expertise in Distributed Processing Framework like MapReduce, Spark and Tez.
  • Expertise in architecting Big data solutions using Data ingestion, Data Storage
  • Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
  • Expertise in NOSQL databases like HBase, MongoDB.
  • Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
  • Expertise in data analysis, design and modeling using tools like ErWin.
  • Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL.
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experienced in using various Hadoop infrastructures such as Map Reduce, Hive, Sqoop, and Oozie.
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
  • Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3.
  • Experienced in testing data in HDFS and Hive for each transaction of data.
  • Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Strong Experience in working with Databases like Oracle 11g/10g/9i, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
  • Experienced in using database tools like SQL Navigator, TOAD.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS)
  • Experienced with Akka building high performance and reliable distributed applications in Java and Scala.
  • Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good experience in Shell programming.
  • Knowledge in configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
  • Experienced in developing web based GUIs using JavaScript, JSP, HTML, JQuery, XMLand CSS.
  • Experienced to develop enterprise applications with J2EE/MVC architecture with application servers and Web servers such as, JBoss, and Apache Tomcat 6.0/7.0/8/0.

TECHNICAL SKILLS:

Hadoop/Big Data: Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume,Scala, Akka, Kafka, Storm.

Java/J2EE Technologies: JDBC, Java Script, JSP, Servlets, JQuery

No SQL Databases: Cassandra, mongo DB

Web Technologies: HTML, DHTML, XML, XHTML, JavaScript, CSS, XSLT, Dynamo DB

Web/Application servers: Apache Tomcat6.0/7.0/8.0, JBoss.

AWS: EC2, EMR, S3, ECSLanguages: Java, J2EE, PL/SQL, Pig Latin, HQL, R, Python, XPath, Spark

Databases: Oracle 12c/11g/10g, Microsoft Access, MS SQL, Mongo DB.

Frameworks: MVC Struts, spring, Hibernate.

Operating Systems: UNIX, Ubuntu Linux and Windows, Centos, Sun Solaris.

Network protocols: TCP/IP fundamentals, LAN and WAN.

PROFESSIONAL EXPERIENCE:

Confidential, Battle Creek, MI

Big Data Engineer

Responsibilities:

  • Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
  • Ingest Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Worked in exporting data from Hive 2.0.0 tables into Netezza 7.2.x database.
  • Implemented the Big Data solution using Hadoop, hive and Informatica 9.5.1 to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Active involvement in design, new development and SLA based support tickets of Big Machines applications.
  • Experience in Server infrastructure development on Gateway, ELB, Auto Scaling, Dynamo DB, Elastic search, Virtual Private Cloud (VPC).
  • Involved in Kafka and building use case relevant to our environment.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed Oozie 3.1.0 workflow jobs to execute hive 2.0.0, Sqoop 1.4.6 and map-reduce actions.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Developed numerous MapReduce jobs in Scala 2.10.x for Data Cleansing and Analyzing Data in Impala 2.1.0.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products
  • Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Kafka and Storm for real time data injestion and processing.
  • Hands-on experience in developing integration with Elastic search in any of the programming languages. Having knowledge of advance reporting using Elastic search and Node JS.
  • AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
  • Worked on writing Perl scripts covering data feed handling, implementingmark logic, communicating with web-services through SOAP Lite module and WSDL.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Design of Redshift Data model, Redshift Performance improvements/analysis
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Performed File system management and monitoring on Hadoop log files.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Developed customized classes for serialization and Deserialization in Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.

Environment: : Pig, Sqoop, Kafka, Apache Cassandra, Elastic search, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

Confidential, Seattle, WA

Sr. Big Data Engineer

Responsibilities:

  • Developed Big Data solutions focused on pattern matching and predictive modeling.
  • Implementation of Big Data eco system (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture.
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
  • Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
  • Worked with Spark and Python.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Identify query duplication, complexity and dependency to minimize migration efforts Technology stack: Oracle, Hortonworks HDP cluster, Attunity Visibility, Cloudera Navigator Optimizer, AWS Cloud and Dynamo DB.
  • Designed and developed weekly, monthly reports related to the marketing and financial departments using Teradata SQL.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Involved in business requirement analysis and technical design sessions with business and technical staff to develop end to end ETL solutions.
  • Involved in business requirement analysis and technical design sessions with business and technical staff to develop end to end big data analytical solution.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Configured auto-scaling groups for applications like elasticsearch and kafka to scale automatically when needed.
  • Performed data profiling and transformation on the raw data using Pig, Python, and Java.
  • Experienced with batch processing of data sources using Apache Spark.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Implement enterprise grade platform (mark logic) for ETL from mainframe to NoSQL (cassandra).
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Responsible for importing log files from various sources into HDFS using Flume
  • Worked on tools Flume, Storm and Spark.
  • Expert in performing business analytical scripts using Hive SQL.
  • Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP) and MapR.
  • Experience in integrating oozie logs to kibana dashboard.
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Used Spark SQL to process the huge amount of structured data.
  • Assigned name to each of the columns using case class option in Scala.
  • Managed Amazon Web Services like EC2, S3 bucket, ELB, Auto-Scaling, Dynamo DB, Elastic search.
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.

Environment: Big Data, Spark, YARN, HIVE, Kafka, Pig, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL.

Confidential, Edison, NJ

Sr. Java/ Hadoop Developer

Responsibilities:

  • Transferred the structured data in RDBMS to HDFS for computation preparation with Sqoop.
  • Involved in extracting data features through Spark ML with Python.
  • Designed and developed Enterprise Eligibility business objects and domain objects with Object Relational Mapping framework such as Hibernate.
  • Developed the Web Based Rich Internet Application (RIA) using JAVA/J2EE (spring framework).
  • Used the light weight container of the Spring Frame work to provide architectural flexibility for inversion of controller (IOC).
  • Involved in end to end implementation of Big data design.
  • Developed and Implemented new UI's using Angular JS and HTML.
  • Developed Spring Configuration for dependency injection by using Spring IOC, Spring Controllers.
  • Implementing Spring MVC and IOC methodologies.
  • Used the JNDI for Naming and directory services.
  • Involved in the coding and integration of several business critical modules of application using Java, Spring, Hibernate and REST web services on WebSphere application server.
  • Experience working with big data and real time/near real time analytics and big data platforms like Hadoop, Spark using programming languages like Scala and Java.
  • Worked closely with Business Analysts in understanding the technical requirements of each project and prepared the use cases for different functionalities and designs.
  • Analyzed Business Requirements and Identified mapping documents required for system and functional testing efforts for all test scenarios.
  • Deliver Big Data Products including re-platforming Legacy Global Risk Management System with Big Data Technologies such as Hadoop, Hive and HBase.
  • Worked with NoSQL Mongo DB and heavily worked on Hive, Hbase and HDFS
  • Developed Restful web services using JAX-RS and used DELETE, PUT, POST, GET HTTP methods in spring 3.0 and OSGI integrated environment.
  • Created scalable and high-performance web services for data tracking and done High-speed querying.
  • Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
  • Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report on IBM WebSphere MQ messaging system.
  • Developed presentation layer using Java Server Faces (JSF) MVC framework.
  • Participated in JAD meetings to gather the requirements and understand the End Users System.
  • Developed user interfaces using JSP, HTML, XML and JavaScript.
  • Generated XML Schemas and used XML Beans to parse XML files.
  • Modified the existing JSP pages using JSTL.
  • Developed web components using JSP, Servlets, and JDBC.
  • Developed web pages using JSPs and JSTL to help end user make online submission of rebates. Also used XML Beans for data mapping of XML into Java Objects.
  • Used Spring JDBC Dao as a data access technology to interact with the database.
  • Developed Unit and E2E test cases using Node JS.

Environment: JSP 2.1, Hadoop 1x, Hive, Pig, HBASE, JSTL 1.2, Java, J2EE, Java SE 6, UML, Servlets 2.5, Spring MVC, Hibernate, JSON, Eclipse Kepler-Maven, Serena Dimensions, Unix, JUnit, DB2, Oracle, Restful Web services, Big Data, jQuery, AJAX, Angular Js, JAXB, IRAD Web sphere Integration Developer, Web Sphere 7.0.

Confidential, Wilmington, DE

Sr. Java/ J2EE Developer

Responsibilities:

  • Developed Intranet Web Application using J2EE architecture, using JSP to design the user interfaces, and JSP tag libraries to define custom tags and JDBC for database connectivity.
  • Used JPA (Java Persistence API) with Hibernate as Persistence provider for Object Relational mapping.
  • Extensively worked on OIM Connectors like Active Directory, ED, IBM RACF, RSA, OID, OIF, Database User Management, CA Top Secret Advanced and Flat File.
  • Expertise in designing and creating Restful API's using Apache Solr and Spring WS Developed and modified database objects as per the requirements.
  • Implemented Log4j for the project to compile and package the application, used ANT and MAVEN to automate build and deployment scripts.
  • Created POJO classes, java beans, EJB Beans and wrote JUnit test cases to test code as per the acceptance criteria throughout the application during development and testing Phase.
  • Used JDBC to connect to the oracle database and JNDI to lookup administered objects.
  • Used Multithreading to improve the performance for processing of data feeds.
  • Developed a multi-user web application using JSP, Servlets, JDBC, spring and Hibernate framework to provide the needed functionality.
  • Implemented database layer using EJB and Java Persistence API(JPA) in maintenance Projects
  • Used IBM MQ Series for messaging services for the users who want to become as registered users.
  • Developed the front end for Account login and Activities using Struts framework, JSP, Servlets.
  • Used JSP, JavaScript, Bootstrap, jQuery, AJAX, CSS2, and HTML3 as data and presentation.
  • Implemented form validations across the site using jQuery and JavaScript.
  • Developed & Customizing UI JavaScript Plugins using jQuery, Object Oriented JS and JSON.
  • Generated DAO's to map with database tables using Hibernate. Used HQL (Hibernate Query Language) and Criteria for database querying and retrieval of results.
  • Involved in J2EE Design Patterns such as Data Transfer Object (DTO), DAO, Value Object and Template.
  • Developed application using JMS for sending and receiving Point-to-Point JMS Queue messages.
  • Developed web services using SOAP, WSDL and Apache Axis, which helped communicating through different modules of the application.
  • Worked with JVM, application performance and Garbage Collection tuning with JDK 1.6.
  • Generated POJO classes with JPA Annotations using Reverse Engineering.
  • Worked on XML parsing using the SAX and DOM parsers for better connectivity of JVM with the server.
  • Developed SQL Queries for performing CRUD operations in Oracle for the application.
  • Used Maven for generating system builds and Jenkins for continuous integration.
  • Junit was used for unit testing and implementing Test Driven Development (TDD) methodology.
  • Wrote JUNIT Test cases for Spring Controllers and Web Service Clients in Service Layer using Mockito.
  • Designed and developed the application using Waterfall methodology.
  • Using JIRA to manage the issues/project work flow.
  • Used Git as version control tools to maintain the code repository.
  • Used log data from different resources and transfer the data type to hive tables to store in JSON, XML and Sequence file formats.

Environment: Java, JSF, Spring, Hibernate, Linux Shell Script, JaxWS, SOAP, WSDL, CSS3, html3, JBOSS, JSF, Rally, Hudson xml, html, Clear Case, Clear Quest, My Eclipse, ANT, Oracle, Linux, Oracle 10g database.

Confidential

Java Developer

Responsibilities:

  • Responsible for design, development, test and maintenance of applications designed on Java technologies.
  • Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC .
  • Created rapid prototypes of interfaces to be used as blueprints for technical development.
  • Developed user stories using Core Java and Spring 3.1 and consumed rest web services exposed from the profit center.
  • Used UML diagrams Use Cases, Object, Class, State, Sequence and Collaboration to design the application using Object Oriented analysis and design
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Developed JavaScript behavior code for user interaction.
  • Created database program in SQL server to manipulate data accumulated by internet transactions.
  • Wrote Servlets class to generate dynamic HTML pages.
  • Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
  • Developed the XML Schema and Web services for the data maintenance and structures Wrote test cases in JUnit for unit testing of classes.
  • Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
  • Debugged the application using Firebug to traverse the documents.
  • Involved in developing web pages using HTML and JSP .
  • Provided Technical support for production environments resolving the issues, analysing the defects, providing and implementing the solution defects.
  • Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server.
  • Developed the presentation layer using CSS and HTML taken from bootstrap to develop for browsers.
  • Used Oracle 10g as the backend database using UNIX OS.
  • Used JSP, HTML, Java Script, Angular JS and CSS3 for content layout and presentation.
  • Did core Java coding using JDK 1.3, Eclipse Integrated Development Environment (IDE), clear case, and ANT.
  • Used Spring Core and Spring-web framework. Created a lot of classes for backend.

Environment: Java, XML, HTML, JavaScript, JDBC, CSS, SQL, PL/SQL, XML, Web MVC, Eclipse, Ajax, JQuery, spring with Hibernate, Active MQ, Jasper Reports, Ant as build tool, My SQL, Apache Tomcat

We'd love your feedback!