- Having overall 7 plus years of Experience as a Hadoop/Spark Developer with experience in all phases of Software Application requirement analysis, design, development and maintenance of Hadoop/Big Data application and web applications using java/J2EE technologies with specializing in Finance, Health care, Insurance, Retail and Telecom Domains.
- Strong Knowledge of Software Development Life Cycle (SDLC) and the Role of Hadoop/Spark developer in different developing methodologies like Agile and Waterfall.
- Expertise in all components of Hadoop Ecosystem - Hive, Pig, HBase, Impala, Sqoop, HUE, Flume, Zookeeper, Oozie and Apache Spark.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
- Hands-on experience on YARN (MapReduce 2.0) architecture and components such as Resource Manager, Node Manager, Container and Application Master and execution of a MapReduce job.
- Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
- Experienced in integrating Kafka with Spark streaming for high speed data processing.
- Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Exposure in working with data frames.
- Experience in collecting the log data from different sources (webservers and social media) using Flume, Kafka and storing in HDFS to perform the MapReduce jobs.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO , PARQUET, CSV format
- Strong knowledge of Pig and Hive's analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
- Expertise in developing PIG Latin Scripts and Hive Query Language for data Analytics.
- Well-versed in and implemented Partitioning, Dynamic-Partitioning and bucketing concepts in Hive to compute data metrics.
- Integrated BI tool like Tableau with Impala and analyzed the data.
- Experience with NoSQL databases like HBase, MongoDB and Cassandra.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.
- Used Oozie job scheduler to schedule MapReduce jobs and automate the job flows and Implemented cluster coordination services using Zookeeper.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Experienced in working with Amazon Web Services ( AWS ) using EC2 for computing and S3 as storage mechanism.
- Knowledge in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
- Experience in working with different relational databases like MySQL, MS SQL and Oracle.
- Strong experience in database design, writing complex SQL Queries and Stored Procedures
- Expertise in various faces of Software Development including analysis, design, development and deployment of applications using Servlets, JSP, Java Beans, Struts, Spring Framework, JDBC.
- Having Experience on Development applications like Eclipse, NetBeans etc.
- Proficient in software documentation and technical report writing.
- Versatile team player with good communication, analytical, presentation and inter-personal skills.
Big Data Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Apache Spark, Apache STORM, Apache Kafka, Sqoop, Flume.
Operating Systems: Windows, Ubuntu, RedHat Linux, Unix
Programming Languages: C, C++, Java, Python, SCALA
Scripting Languages: Shell Scripting, Java Scripting
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, SQL, PL/SQL, Teradata
NoSQL Databases: HBase, Cassandra, and MongoDB
Hadoop Distributions: Cloudera, Hortonworks
Build Tools: Ant, Maven, sbt
Development IDEs: NetBeans, Eclipse IDE
Web Servers: Web Logic, Web Sphere, Apache Tomcat 6
Version Control Tools: SVN, Git, GitHub
Packages: Microsoft Office, putty, MS Visual Studio
Sr. Hadoop/Spark Developer
- Developed data pipeline using Kafka, Sqoop, Hive and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
- Developing design documents considering all possible approaches and identifying best of them.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Responsible to manage data coming from different sources.
- Developing business logic using Scala.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Writing MapReduce programs to convert text files into AVRO and loading into Hive tables.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Imported results into visualization BI tool Tableau to create dashboards.
- Worked in Agile Methodology and used JIRA for maintain the stories about project.
- Involved in gathering the requirements, designing, development and testing.
Environment: Hadoop, Map Reduce, Hive, Java, Maven, Impala, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Tableau, Unix, Cloudera, Kafka, Sqoop, Scala, HBase.
Confidential, St. Louis, MO
Sr. Hadoop Developer
- Involved in loading and transforming large sets of structured, semi-structured and unstructured data.
- Adept in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
- Worked on importing data from HDFS to MySQL database and vice-versa using SQOOP.
- Extensive experience in writing HDFS & Pig Latin commands.
- Responsible for writing Pig Latin scripts.
- Develop UDF's to provide custom hive and pig capabilities and apply business logic on that data.
- Created Hive internal/external tables with proper static and dynamic partitions.
- Using Hive analyzed unified historic data in HDFS to identify issues & behavioral patterns
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Performance tuning using Partitioning, bucketing of HIVE tables.
- Experience in NoSQL database such as HBase.
- Created HBase tables, loading large data sets coming from Linux, NoSQL, and MySQL.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Very capable Confidential using AWS utilities such as EMR, S3 and Cloud watch to run and monitor Hadoop/Spark jobs on AWS.
- Installed Oozie workflow engine to run multiple MapReduce, Hive, Zookeeper and Pig jobs which run independently with time and data availability.
- Experienced in Requirement gathering, create Test Plan, constructed and executed positive/negative test cases in-order to prompt and arrest all bugs within QA environment.
Environment: HDFS, Map Reduce, CDH5, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Zookeeper, AWS, MySQL, Java, Linux Shell Scripting, XML .
Confidential, Durham, NC
- Interacting with the Business Requirements and the design team and preparing the Low-Level Design and high-level design documents.
- Provide in-depth technical and business knowledge to ensure efficient design, programming, implementation and on-going support for the application.
- Involved in identifying possible ways to improve the efficiency of the system.
- Developed multiple MapReduce jobs in java for log data cleaning and preprocessing and scheduled the job to collect aggregate the log on an hourly basis.
- Implemented MapReduce programs using Java.
- Logical implementation and interaction with HBase.
- Efficiently put and fetched data to/from HBase by writing MapReduce job.
- Developed Map Reduce jobs to automate transfer of data from/to HBase.
- Assisted with the addition of Hadoop processing to the IT infrastructure.
- Used flume and Kafka to collect all the web log from the online ad-servers and push into HDFS.
- Implemented Map/Reduce job and execute the Map/Reduce job to process the log data from the ad-servers.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on MongoDB, and Cassandra.
- Prepared multi-cluster test harness to exercise the system for better performance.
Environment: Hadoop, HDFS, MapReduce, HBase, Hive, Kafka, Flume, Cassandra, Hadoop distribution of Hortonworks, Cloudera, Eclipse (Juno), Java Batch, SQL* PLUS and Oracle 10g.
- Developed JMS API using J2EE package.
- Made use of Java script for client-side validation.
- Used Struts Framework for implementing the MVC Architecture.
- Wrote various Struts action classes to implement the business logic.
- Involved in the design of the project using UML Use Case Diagrams, Sequence Diagrams, Object diagrams, and Class Diagrams.
- Understand concepts related to and written code for advanced topics such as Java IO, serialization and multithreading.
- Used DISPLAY TAGS in the presentation layer for better look and feel of the web pages.
- Developed Packages to validate data from Flat Files and insert into various tables in Oracle Database.
- Provided UNIX scripting to drive automatic generation of static web pages with dynamic news content.
- Participated in requirements analysis to figure out various inputs correlated with their scenarios in Asset Liability Management (ALM).
- Assisted design and development teams in identifying DB objects and their associated fields in creating forms for ALM modules.
- Also involved in developing PL/SQL Procedures, Functions, Triggers and Packages to provide backend security and data consistency.
- Responsible for performing Code Reviewing and Debugging.
Environment: Java, J2EE, UML, Struts, HTML, XML, CSS, Java Script, Oracle 9i, SQL*Plus, PL/SQL, MS Access, UNIX Shell Scripting.
- Involved in understanding the functional specifications of the project.
- Assisted the development team in designing the complete application architecture
- Developed connection components using JDBC.
- Designed Screens using HTML and images.
- Cascading Style Sheet (CSS) was used to maintain uniform look across different pages.
- Involved in creating Unit Test plans and executing the same.
- Did the documents/code reviews and knowledge transfer for the status updates of the ongoing project developments
- Deployed web modules in Tomcat web server.