Big Data Engineer Resume
Newark, NJ
SUMMARY
- Over 7 years IT experience including 3 years on Big Data Ecosystem and 4 years on Java EE application development.
- Extensive working experience in Finance, Bank, and Entertainment domains.
- Experienced in developing big data applications for processing tera - bytes of data using Hadoop ecosystem (HDFS, MapReduce, Spark, Sqoop, Kafka, Hive, Pig, Oozie) and In-depth knowledge of MR1 (classic) and MR2 (YARN) frameworks
- Develop data warehouses, data lakes and analytics solutions using Big Data.
- Experienced on fast streaming big data components like NiFi, Flume, Kafka, and Storm.
- Experienced in extract, transform, and load (ETL) data from multiple federated data sources with DataFrames in Spark.
- Experience with NiFi specifically with developing custom processors and workflows.
- Experienced in Spark using Scala and Spark SQL for processing of data files.
- Extensive experience in writing MapReduce jobs with Java API to parse and analyze unstructured data.
- Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Experienced in writing custom Hive UDF's to incorporate business logic with Hive queries.
- Extensive experience in writing PIG Latin script and HiveQL/Impala queries to process and analyze large volumes of data structured in different level.
- Extensive knowledge in creating PL/SQL Stored Procedures, Packages, Functions, Cursors against Oracle (9i, 10g, 11g, 12c) and MySQL 5.0.
- Worked on NoSQL database such as HBase 0.98, Cassandra 3.2.
- Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
- Experienced with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and PIG.
- Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquette.
- Strong in core java, data structure, algorithms design, Object-Oriented Design (OOD) and Java components like Collections Framework, Exception handling, I/O system, and Multithreading.
- Extensive knowledge in DataMining algorithms such as Decision Trees, Regresdsion, Classification, K-Means Clustering, and ANOVA tests using libraries in R and Python.
- Experienced inSASreport procedures like PROC REPORT, PROC FREQ, PROC TABULATE, PROC MEANS, PROC SUMMARY, PROC PRINT, and PROC SQL.
- Experienced with distributions Hortonworks HDP 2.2.
- Extensive knowledge in DevOps Skills and concepts of continuous inspection, continuous integration, and continuous deployment (CI/CD).
- Experienced in Docker platform for application development and testing.
- Extensive Experience in Unit Testing with JUnit, Scala Test and Pytest with TDD (Test Drive Development) environment.
- Worked in development environment and DevOps tools like Git, JIRA and Jenkins.
- Experienced in Agile/Scrum and Waterfall methodologies.
- A good team-player and work independently in a fast-paced multitasking environment, and a self-motivated learner.
TECHNICAL SKILLS
Big Data Technologies\Utilities/Tools\: HDFS, Spark 2.1.0, MapReduce V2, \Eclipse, Tomcat, NetBeans, JUnit, SQL, \MapReduce V1, Sqoop 1.4.5, Flume 1.4.0, \GITHUB, Log4j, Tiles, SOAP UI, ANT, \Zookeeper3.4.6, Oozie 4.0.1, Kafka 0.8.0, \Maven, QTP Automation and MR-Unit, JIRA\Hive 1.2.4, Pig 0.14.0\
Programming Languages\Operating Systems\: Java, Scala, Python, R, SAS, C, C++\Unix, Linux, Windows XP/7/8/10, Mac OS\
Databases/RDBMS\Scripting/ Web Languages\: Cassandra 3.2, HBase 0.98, \JavaScript, HTML5, CSS3, XML, SQL, \Oracle 11g/10g/9i/, MySQL 5.0\Shell, WSDL, XSL\
Environment: \Office Tools\: Agile, Jenkins, Waterfall, Spiral\MS-Office, MS-Project and Risk Analysis tools\
PROFESSIONAL EXPERIENCE
Confidential, Newark, NJ
Big Data Engineer
Responsibilities:
- Created high level design forDataLaketo ingest, store and processdata.
- Use a variety of ETL/ELT methods to transfer data between data warehouse, and data lake sources.
- Develop multiple Kafka Producers and Consumers as per software requirement.
- Configure Spark streaming to get real time data and store the information to NoSQL database (Cassandra) and HDFS.
- Load the data intoSparkRDD and do in memory data computation to generate the output response.
- Perform various Spark Transformations and Actions operations in Scala.
- Implement various checkpoints on RDDs to disk for fault tolerance and reviewed log files.
- Involve in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
- Create Hive tables using Scala API, and use Hive Metastore to store Parquet files.
- Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Import and export data intoHDFSusing Sqoop which includes incremental loading and transfer data between Relational Database and Hadoop System.
- Perform unit testing using Scala Test and JUnit with Test Driven Development (TDD).
- Worked with data science team and Involve in improving models using machinelearningalgorithms such as Decision Tree, linear regression, multivariate regression, K-means methods in Spark using MLlib API.
- Worked with Yarn, Zookeeper and Oozie to manage job workflow and job coordination in the cluster.
- Implement automated continuous integration (CI) pipeline to build, test, analyze and deploy applications with Jenkins.
- Apply Jenkins workflow and continuous deliver (CD) better software.
- Troubleshoot issues making recommendations and delivering on those recommendations.
- Use Git for version control, JIRA for project tracking and Jenkins for continuous integration.
- Involve inAgilemethodologies from Scrum like daily status meetings.
Environment: Kafka 0.8.0, Nifi 1.1, Cassandra 3.2, Spark Streaming, Scala, Sqoop, HDFS, Hive 1.2.4, Oozie, Zookeeper, Git, JIRA, Jenkins, DevOps
Confidential, Jersey City, NJ
Hadoop Developer
Responsibilities:
- Develop data pipelines to consume data from Enterprise Data Lake for analytics solution.
- Used Flume to collect, aggregate, and store the log data from different web servers.
- Developed Sqoop Scripts to extract data from relational databases onto HDFS.
- Worked on developing MapReduce programs in JAVA for data cleaning and data processing.
- Involved in managing and reviewing Hadoop log files to identify issues when job fails.
- Used Sqoop to import and export data from HDFS and Hive.
- Created Hive tables, worked on loading data into hive tables, wrote hive queries, and developed customized User Defined Functions (UDF) in JAVA.
- Created Partitions and Buckets to further process using Hive and ran the scripts in parallel to improve the performance.
- Involved in data visualization and provided the files required for the team by analyzing the data inHiveand developed Pig scripts for advanced analytics on the data.
- UsedOozieworkflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as Map-Reduce, Hive and Sqoop as well as system specific jobs.
- Used JUnit for debugging, testing and maintaining the system state.
- Used Git for collaboration and version control.
- UsedAgilemethodologies like Scrum which involved participating in daily stand-up meeting.
Environment: Hadoop1.2.1, Java JDK1.6, MapReduce V2, Sqoop, Pig 0.13.0, Hive 1.2.4, Oozie 4. 0.1, Sqoop 1.4.5, Flume 1.4.0, DB2, GIT, Jenkins
Confidential
Big Data Engineer
Responsibilities:
- Worked on Apache Hadoop tools like Hive, Pig, HBase and Sqoop for application development and unit testing.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Involved in database connection using Sqoop.
- Involved in creating Hive tables, loading data and writing hive queries using the HiveQL.
- Involved in partitioning and joining Hive tables for Hive query optimization.
- Used NoSQL (HBase) for faster performance, which maintains the data in the De-Normalized way for OLTP.
- The data is collected from distributed sources into Avro models. Applied transformations and standardizations and loaded into HBase for further data processing.
- Used Oozie to orchestrate the workflow.
- Exported the analyzed data to the relational databases using HIVE for visualization and to generate reports for the BI team.
Environment: Hadoop, Linux, MapReduce V1, HDFS, Hive 0.11.0, Pig 0.10.1, Sqoop 1.2.0, Shell Scripting
Confidential
Java Developer
Responsibilities:
- Designed and developed UI Search and results screens for legal Professionals and legal Organizations using JSP, JavaScript, HTML and CSS.
- Developed multiple formatting, validation utilities inJava, JavaScript functions and CSS Style Sheets so that they can be reused across the application.
- Also worked with HTML/DHTML and JavaScript for GUI development and to have rich User Interfaces and to also provide the pages with user data validations.
- Prepared unit Designed and prepared unit Test case using JUNIT and easy mock l for code review to check the SunJavaCoding standards, to identify the duplicate code, object or component complexity and dependency etc.
- Involved in writing SQL, Stored procedures and PL/SQL for back end. Used Views and Functions at the Oracle Database end.
- Wrote SQL queries, stored procedures and database triggers as required on the database objects.
Environment: Java, XML, SQL Server, Maven2, JUnit, J2EE (JSP, Java beans, DAO), Eclipse, Apache Tomcat Server, Spiral Methodology
Confidential
Jr. Java Developer
Responsibilities:
- Prepared Requirements Specification Document (RSD) and high-level technical documents
- Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
- Participated in design, development and testing phases.
- Used various CoreJavaconcepts such as Exception Handling, Collection APIs to implement various features and enhancements.
- Involved in complete requirement analysis, design, coding and testing phases of the project.
- Developed user interfaces using JSP, HTML, XML and JavaScript.
- Developed parts of User Interface using CoreJava, HTML/JSP and client side validations using JavaScript.
- Tested method level and class level functionality using JUnit.
- Used SVN as a repository for managing/deploying application code.
- Involved in the Database design and development, Created SQL scripts and stored procedures for efficient data access.
Environment: Java 1.3, UML, JSP, Java Mail API, Java Script, HTML, MySQL 2.1, Swing, Java Web Server 2.0
