- Experienced Hadoop Developer with over 8+ years of IT experience has a strong background with Big - data arena, File Systems, Data Management/ Analysis and Java Based enterprise application using Java/ J2EE technologies.
- Expertise with installing, configuring, testing and using Apache Hadoop framework 2.7.x, its ecosystem components like HDFS, MapReduce, Sqoop 1.4.6, Flume 1.7.0, Hive 2.1.1, Pig 0.16.0, Spark 2.1.0, Scala 2.12.0, Kafka 0.10.1, Yarn, Oozie 3.1.3 and Zookeeper 3.4.10.
- Experience and strong knowledge on implementation of Spark Core -Spark Streaming, Spark SQL.
- Developed applications in Spark 2.1.0 using Scala 2.12.0 to compare the performance of Spark with Hive.
- Implemented POC's and developed pipeline using Kafka 0.10.1, Spark Streaming and Spark SQL.
- Extensive experience in Spark 2.1.0 transformations using Scala 2.12.0 and Spark SQL for faster testing and processing of data files.
- Hands on real time processing with Spark modules, Spark RDD, Dataset API using Scala 2.12.0.
- Hands on experience of writing programs in MapReduce to analyze unstructured data.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS Well versed.
- Expert in writing Pig Latin scripts and HiveQL queries to process and analyze data.
- Experience in Spark , and in-depth knowledge on Spark-SQL , RDD's , Lazy transformation and actions.
- Worked on Apache Kafka 0.10.1 and Flume for handling megabytes of streaming data.
- Transformed various formats of data like sequence File, RC, ORC, Parquet, JSON, AVRO, and experience in dealing with Compression techniques such as Gzip, Snappy.
- Deep knowledge in batch job scheduling workflow using Oozie 4.2.0. and zookeeper 3.4.10.
- Understanding of Amazon Web Services stack and hands-on experience in using S3, EC2 and EMR.
- Worked with different NOSQL databases such as Cassandra 3.10, HBbase 1.3.0 and MongoDB.
- Experience with Relational databases including Oracle, SQL Server and MySQL and experience in writing complex SQL queries, PL/SQL Stored Procedures, Triggers, sequences.
- Well versed with the working of Data Visualization tools such as Tableau 9.3, D3.js.
- Expertise in Core Java, Data Structures, Multithreading, JDBC, J2EE, Algorithms, Object Oriented Design (OOD) and Exception Handling and frameworks like Spring MVC and Hibernate.
- Produced and consumed SOAP and RESTful Web Services and experience in developing Hadoop applications on Spark using Scala as a functional and object-oriented programming.
- Experienced in writing ANT and Maven scripts to build and deploy Java applications.
- Experienced in TDD (Test-Driven Development) and SDLC methodologies such as Agile(Scrum).
- Through knowledge of development environments such as Maven, Git, JIRA 6.4, Jenkins and Confluence.
- Excellent understanding and knowledge of Hadoop architecture and various daemons such as Name Node, Data Node, Job Tracker, Task Tracker, Resource Manager and MapReduce programming paradigm.
- Knowledge of Cyber Security concepts like cryptography, Access Control, Data Security in Linux, Unix.
- Expertise in Creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in HiveQL also data transformation & file processing using Pig Latin Scripts.
- Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Possess excellent presentation, documentation, communication skills, detail oriented, zeal to learn new technologies with a cooperative team focused attitude, strong analytical and problem-solving skills.
Hadoop Ecosystem Database: Hadoop 2.7.X, Spark 2.1.0, Mapreduce, Hive Spark SQL, Cassandra 3.10, HBase 1.3.0, 2.1.1, Sqoop 1.99.7, Kafka 2.1.0, Oozie 3.1.3, MongoDB, Oracle 12c/11g/8i, MySQL 5.x, Yarn 0.21.3, Pig 0.16.0, Flume 1.7.0, Ambari PL/SQL
Testing: CloudWatch, CloudFormation, IAM, CloudTrail, JUnit, MRUnit, Pytest, ScalaTest Glacier, Storage Gateway
Data Analysis & Visualization Web Services: Tableau 9.3, D3.js SOAP, Restful
Environment: Code/Build/Deployment : WinSCP, Putty, SQL Developer, Agile Git, Svn, Maven, Jenkins, Jira 6.4, Confluence
Operating Systems Tools: Mac OS, Ubuntu, Centos, Windows Hive, MongoDB, Spark, Tableau, Cloudera, Hue
Confidential, Smithfield, RI
Big Data Hadoop Developer
- Experience in creating Hive tables to store the processed results in a tabular format, optimizing Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries and creating custom user defined functions.
- Designed and Implemented Sqoop incremental imports, delta imports on tables without primary keys and dates from Oracle and appends directly into Hive Warehouse.
- Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using Scala to generate the output response.
- Writing a spark shell code to audit report to bring all the unique field under metadata.
- Implemented HQL scripts for daily based data loading & further aggregations .
- Scheduling Oozie workflow and job to automate the run of Unix script daily as per the team requirement.
- Working on clean up the old data and purge the database for having accurate data in the system.
- Involved in building the ETL architecture and Target mapping to load data into Data Lake.
- Developed Workflows with the help of Oozie to manage the flow of jobs and wrote Custom Expression Language (EL) functions for complex workflows.
- Working on Install process monthly or by weekly new sets of report for new table or existing table and validating all the data coming from working directory to the HDFS.
- Involved in the complete Software Development Life Cycle (SDLC) phases of the project as a part of Agile scrum methodology.
- Loaded datasets from MySQL to HDFS and Hive respectively on daily basis.
- Worked on Apache Hue as a central web admin User to Query on Hive or Impala and to Schedule job workflow.
- Assisted in Installation and Configuration of Apache Hadoop clusters CHD and Hadoop tools for application development includes HDFS, HUE, YARN, Sqoop, Impala and Hive.
- Used Cloudera Manager for continuous managing and monitoring the Hadoop cluster.
Environment: Spark 2.3.0, Scala 2.13.0, Sqoop 1.4.6, Impala 2.2.0, Oracle, Hive 2.3.3, HDFS, Spark SQL, CHD 5.16.1, HiveQL, Hue 4.2.0, YARN 1.12.3, Oozie 4.3.0, Agile.
Confidential, New York, NY
Big Data Analytics Solutions Developer
- Experience in Transform, Stage and Store data using Spark which includes writing Spark applications in Scala.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Developed Spark code using Scala and Spark - SQL for faster testing and data processing.
- Setting up and managing Kafka and Zookeeper for Stream processing.
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
- Configured, deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Spark scripts using Scala, Spark SQL to access Hive tables in Spark for faster data processing
- Stored real-time data from Spark on HBase for future analysis.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
- Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the Data Lake.
- Generate reports for analytics development through Zeppelin.
- Developed Scala code with Spark Streaming for faster testing and processing of data.
- Worked with Spark RDD and Dataframes for sessionization and other transformations.
- Used Apache Avro to transform data between different format.
- Streamed log files using Flume into HDFS and load into Hive tables to query data.
- Created multiple Hive tables, implemented partitioning, dynamic partitioning and bucketing in Hive for efficient data access.
- Performed daily ad-hoc data analysis and pulled data from Hadoop using Hive (HiveQL).
- Worked closely with team members, managers and other teams in Agile Scrum environment.
Environment: Kafka 0.10.1, Spark 2.1.0, Scala 2.12.0, Sqoop 1.4.6, Avro, MySql5.x, HBase 1.3.0, Zeppelin, Hive 2.1.1, HDFS, Spark SQL, Flume 1.7.0, HiveQL, Zookeeper, Agile.
Confidential, New York, NY
- Worked on performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN using Scala.
- Responsible for Importing and exporting data into HDFS and Hive using Sqoop from Oracle 11g and MySQL.
- Develop and maintain several batch jobs to run automatically depending on business requirements.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS using shell scripting.
- Implemented Partitioning, Dynamic Partitioning, Buckets in Hive.
- Developed PIG scripts using Pig Latin and worked on tuning the performance of Pig queries.
- Involved in managing and reviewing Hadoop log files.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented test scripts to support test driven development and continuous integration.
- Exported the analyzed data to the relational databases like MySQL using Sqoop for visualization and to generate reports for the BI team.
- Created MapReduce programs using Java API that filter un-necessary records and find out unique records based on different criteria.
- Experienced in implementing POC's to migrate iterative map reduce programs into Spark transformations, actions using Scala.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop .
- Worked in Test Driven Deployment environment and used Confluence for documentation.
- Performed unit testing using MRunit.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Used Jira for project tracking, Bug tracking and Project Management.
Environment: Spark 2.1.0, Apache Hadoop, MapReduce, HDFS, Hive 2.1.1, Java, Pig 0.16.0, Sqoop 1.4.6, MRunit, Oozie 3.1.3, HBase1.3.0, TDD, MySQL5.x, Oracle 11g
- Used Spring web-flow for MVC pattern.
- Used Apache-tiles for JSP page fragments for various flows.
- Used SOAP Web-Services for sending data to Published New Services for given WSDL file.
- GUI developing using custom JSTL tag library and used AJAX calls for client side Http-Requests.
- Followed Agile Methodology and regular SCRUM meetings.
- Involved in creating the Hibernate 3.0 POJO Objects and mapped using Hibernate Annotations.
- Used Hibernate, object/relational-mapping (ORM) solution, technique of mapping data representation from MVC model to Oracle Relational data model with a SQL-based schema.
- Provide support to the users for all the service components and help them in production issues.
- Involved in designing test plans, test cases and overall Unit testing of the system using JUnit.
Environment: Java 1.7, JSP, Eclipse, JUnit, Hibernate 3.0, Oracle, Maven, Restful, Git, Scrum, Spring-Web Flow, SQL
- Implemented this application based on MVC Architecture using open source Struts.
- Used UML to create class diagrams, Sequence diagrams, and State diagrams and implemented these diagrams in Microsoft Visio.
- Developed Server-Side Validations using Struts Validation Framework.
- Created a Transaction History Web Service using SOAP that is used for internal communication in the workflow process.
- Used Core java concepts in application such as multithreaded programming, Synchronization of threads used thread wait, notify, join methods.
- Developed Data Access Objects for accessing Relational Database.
- Designed test cases for unit testing with the help of JUnit.
- Created database connection using JDBC classes for interacting with Oracle 9i database.