- Having overall 7 years of Experience as a Hadoop/Spark Developer with experience in all phases of Software Application requirement analysis, design, development and maintenance of Hadoop/Big Data application and web applications using java/J2EE technologies with specializing in Finance, Health care, Telecom Domain Strong Knowledge of Software Development Life Cycle (SDLC) and the Role of Hadoop/Spark developer in different developing methodologies like Agile and Waterfall.
- Expertise in all components of Hadoop Ecosystem - Hive, Pig, HBase, Impala, Sqoop, HUE, Flume, Zookeeper, Oozie and Apache Spark.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
- Hands-on experience on YARN (MapReduce 2.0) architecture and components such as Resource Manager, Node Manager, Container and Application Master and execution of a MapReduce job.
- Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
- Experienced in integrating Kafka with Spark streaming for high speed data processing.
- Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Exposure in working with data frames.
- Experience in collecting the log data from different sources (webservers and social media) using Flume, Kafka and storing in HDFS to perform the MapReduce jobs.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET , CSV format
- Strong knowledge of Pig and Hive's analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
- Expertise in developing PIG Latin Scripts and Hive Query Language for data Analytics.
- Well-versed in and implemented Partitioning, Dynamic-Partitioning and bucketing concepts in Hive to compute data metrics.
- Well-versed with Agile Development process tools like Jira.
- Integrated BI tool like Tableau with Impala and analyzed the data.
- Experience with NoSQL databases like HBase, MongoDB and Cassandra.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non-Relational Database Systems and vice-versa.
- Used Oozie job scheduler to schedule MapReduce jobs and automate the job flows and Implemented cluster coordination services using Zookeeper.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Experienced in working with Amazon Web Services (AWS ) using EC2 for computing and S3 as storage mechanism.
- Knowledge in creating different visualizations using Bars, Lines and Pies, Maps, Scatter plots, Histograms, Highlight tables and application of local and global filters according to the end user requirement in Tableau.
- Knowledge in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
- Experience in working with different relational databases like MySQL and Oracle.
- Strong experience in database design, writing complex SQL Queries and Stored Procedures
- Expertise in various faces of Software Development including analysis, design, development and deployment of applications using Servlets, JSP, Java Beans, Struts, Spring Framework, JDBC.
- Having Experience on Development applications like Eclipse, NetBeans etc.
- Proficient in software documentation and technical report writing.
- Versatile team player with good communication, analytical, presentation and inter-personal skills.
Big Data Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Apache Spark, Apache STORM, Apache Kafka, Sqoop, Flume.
Operating Systems: Windows, Ubuntu, RedHat Linux, Unix
Programming Languages: C, C++, Java, SCALA
Scripting Languages: Shell Scripting, Java Scripting
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, SQL, PL/SQL
NoSQL Databases: HBase, Cassandra, and MongoDB
Hadoop Distributions: Cloudera, Hortonworks
Build Tools: Maven, sbt
Development IDEs: NetBeans, Eclipse IDE
Web Servers: Web Logic, Web Sphere, Apache Tomcat 6
Version Control Tools: SVN, Git, GitHub
Packages: Microsoft Office, putty, MS Visual Studio
Confidential, Alpharetta, GA
Sr. Hadoop/Spark Developer
- Developed data pipeline using Kafka, Sqoop, Hive and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis. Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
- Developing design documents considering all possible approaches and identifying best of them.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Responsible to manage data coming from different sources.
- Developing business logic using Scala .
- Responsible for loading data from UNIX file systems to HDFS . Installed and configured Hive and written Pig/ Hive UDFs .
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Writing MapReduce programs to convert text files into AVRO and loading into Hive tables.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop.
- Import the data from different sources like HDFS/HBase using Spark.
- Experienced with Spark Core, Spark -SQL, Spark Streaming .
- We use storm to perform live analytics after streaming the data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark and Scala.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Imported results into visualization BI tool Tableau to create dashboards.
- Experienced on Apache Oozie in production to schedule the jobs.
- Worked in Agile Methodology and used JIRA to maintain the stories about project.
- Worked as technology lead and managed offshore team members.
- Involved in gathering the requirements, designing, development and testing.
Environment: Hadoop, Map Reduce, Hive, Java, Maven, Impala, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Tableau, Unix, Cloudera, Kafka, Sqoop, Scala, HBase .
Confidential, New York, NY
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
- Used Sqoop to transfer data between databases (Oracle & Teradata) and HDFS and used Flume to stream the log data from servers.
- Developed MapReduce programs for pre-processing and cleansing the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Experienced in managing and reviewing Hadoop log files.
- Developed Map Reduce programs in Java for parsing the raw data and populating staging tables.
- Created Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
- Load and transform large sets of structured, semi structured and unstructured data.
- Extensively worked on combiners, Partitioning, distributed cache to improve the performance of Map Reduce jobs.
- Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.
- Implemented Hive Generic UDF's to implemented business logic around custom data types
- Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
- Conducted data extraction that may include analyzing, reviewing, modeling based on requirements using higher Level Tools such as Hive and Pig.
- Implemented Partitions, Buckets in Hive for optimization.
- Involved in creating Hive tables, loading structured data and writing hive queries which will run internally in map reduce way.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Experience in troubleshooting in MapReduce jobs by reviewing log files.
Environment: Hadoop, MapReduce, Hive QL, MySQL, HBase, HDFS, HIVE, Impala, PIG, Sqoop, Oozie, Flume, Cloudera, Zookeeper, Hue Editor, Eclipse, Oracle 11g, PL/SQL, SQL*PLUS, UNIX, Tableau.
Confidential, Stamford, CT
- Designing technical architecture and developed various Big Data workflows using custom map Reduce, Pig, Hive, Cassandra and Sqoop.
- Deployed on premise cluster and tuned the cluster for optimal performance for job execution needs and processes large data sets.
- Built re-usable Hive UDF libraries for business requirements which enabled various business analysts to use these UDF’s in Hive querying.
- Used FLUME to dump the application server logs into HDFS.
- The logs that are stored on HDFS are analyzed and the cleaned data is imported into Hive warehouse which enabled end business analysts to write Hive queries.
- Configured various big data workflows to run on the top of Hadoop using Oozie and these workflows comprise of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce.
- Experience in working with NoSQL database HBase in getting real time data analytics.
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Assigned the tasks of resolving defects found in testing the new application and existing applications.
- Analyzing the requirements, designing and developing solutions.
- Managing Project team in achieving the project goals including resource allocation, resolving technical issues and mentoring the resources.
- Used Linux (Ubuntu) machine for designing, developing and deploying of Java modules.
Environment: MapReduce, Pig, Hive, Sqoop, Kafka, FLUME, HBase, JDK 1.6, Maven, Linux.
Confidential, Cleveland, OH
- Architecting and Delivering projects for large Customers on Big Data Platforms.
- Design and build Hadoop solutions for big data problems.
- Worked on setting up pig, Hive and HBase on multiple nodes and developed using Pig, Hive and HBase, MapReduce.
- Developed MapReduce application using Hadoop, MapReduce programming and HBase.
- Involved in developing the Pig scripts
- Involved in developing the Hive Reports.
- Developed the sqoop scripts to make the interaction between Pig and MySQL Database
Environment: Hadoop, Apache Pig, Hive, OOZIE, SQOOP, UNIX, MySQL, Ubuntu
- Production support role for the application which was developed in JSP, Servlets to solve the sensitive issues.
- Designed and deployed the required Stateful Session Beans to achieve various functionalities.
- Used Java Beans to handle the form data and the data from the back-end database.
- Created and deployed Servlets, Session Beans on to WebLogic Server.
- Extensively used XML to save and retrieve the user preferences.
- Used DOM parser for manipulating XML document.
- Created dynamic web pages using JSP, static pages using HTML and developed business logic using EJB and XML.
- Numerous XSL style sheets created for highly complex, graphically Presentations.
- Developed JavaBeans to ease the implementation and deployment of application components.
- Developed dynamic templates and Servlets, and provide excellent application management capabilities.
- Developed Swing Suite for look and feel as well as binding data to the GUI.
- Involved in coding of Java, JDBC and Servlets interact with client and Database.
- Involved in writing procedures, complex queries using PL-SQL to extract data from database and to delete the data and to reload the data Oracle database.