- Around 5 years of experience in software industry with 3 years of experience in Hadoop Eco System, Worked in Agile Environments.
- Strong Experience in distinct phases of software development life cycle (SDLC) including Planning, Design, Development and Testing during the development of software applications.
- In depth understanding/knowledge of Hadoop architecture and its components such as HDFS, job tracker, Task Tracker, Name node, Data Node and MapReduce.
- Experience in Deploying and Managing multi node clusters with different Hadoop components (HDFS, YARN, HIVE, SQOOP, OOZIE, FLUME, ZOOKEEPER, SPARK, IMPALA) using Cloudera Manager and Horntonworks Ambari.
- Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
- Very good at loading data into spark schema RDD’s and querying them using Spark - SQL
- Experience in writing MapReduce programs from Scratch according to the requirement.
- Experience in writing joins and sorting algorithms in MapReduce using java.
- Expertise in writing Hadoop jobs for analyzing data using MapReduce, Hive and pig.
- Familiar with importing and exporting data using Sqoop into HDFS and Hive.
- Experience in using flume and Knowledge in Kafka to ingest the data from web servers into HDFS.
- Have good knowledge on Apache Storm.
- Hands on Experience in Extending pig and hive core functionality by writing custom UDF’s.
- Experience in handling different file formats like parquet, apache Avro, sequence file, JSON, Spreadsheets, Text files, XML and Flat File Format.
- Have good knowledge on NoSQL databases like HBase, Cassandra.
- Have good Knowledge on BI tools like Tableau and ETL Tools like Talend, Informatica.
- Basic knowledge on Machine learning and predictive Analysis.
- Hands on experience in Application development using Java, RDBMS and Linux Shell Scripting.
- Good experience in using Relational databases Oracle, SQL Server.
- Good Working Knowledge of Amazon Web Service Components like EC2, EMR, S3, Elasticsearch.
Programming languages: C, Java, Scala
Web Languages: HTML, CSS
Framework: Hadoop, Map Reduce, Hive, Pig, Spark, Kafka
J2EE technologies: JDBC, Servlets, JSP
Database: Oracle DB, SQL Server, HBase, MongoDb, Cassandra
Operating Systems: Windows, Linux, Centos, Macintosh
Tools: /IDE’s: Sqoop, Flume, Oozie, NetBeans, Eclipse
Hadoop / Spark Developer
- Worked on cloud platform which was built with a scalable distributed data solution using Hadoop on a 40-node cluster using AWS cloud to run analysis on Terabytes of customer usage data on daily basis.
- Involved in creating end to end Spark applications for various data transformation activities.
- Performed series of ingestion jobs using Sqoop, Kafka and custom Input adapter to move data from various sources to HDFS.
- Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing customer behavioral data.
- Created Spark jobs to see trends in data usage by users.
- Real time streaming the data using Spark with Kafka.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Configured Kafka to read and write messages from external programs.
- Converted Hive queries into Spark transformations using Spark RDDs.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Imported the data from different data sources into HDFS using Sqoop by making the required transformations using Hive.
- Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Scheduled and executed workflows in Oozie to run Hive and Spark jobs.
- Used to monitor and manage the Hadoop cluster using Cloudera Manager.
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
Environment: Hadoop Distribution of Cloudera, AWS Service (Clusters on cloud), HDFS, Map Reduce, Sqoop, Kafka, Spark, Spark SQL, Hive, Cassandra, LINUX, Java, Scala, Eclipse, Oracle, Tableau, UNIX Shell Scripting, Putty.
- Gathering data requirements and identifying sources for acquisition.
- Development and ETL Design in Hadoop
- Developed MapReduce input format to read specific data format.
- Developed Hive quires and UDF’s as per requirement.
- Involved in extracting customer’s big data from various data sources into Hadoop, This included data from mainframes, databases and also logs data from servers.
- Used Sqoop to efficiently transfer data between databases and Hdfs and used Flume to stream the log data from servers.
- Developed MapReduce programs to cleanse the data in Hdfs obtained from multiple sources to make it suitable for ingestion into hive schema for analysis.
- Implemented Partitioning, Bucketing in hive for better organization of the data.
- Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as java MapReduce, Hive and Sqoop as well as system specific jobs
- Used to monitor and debug Hadoop jobs/applications running in production.
- Used Solr for Searching.
- Worked on Cloudera upgrade from CDH to CDH.x.
- Worked on providing user support and application support on Hadoop infrastructure.
- Worked on evaluating, comparing different tools for test data management with Hadoop
- Helping testing team on Hadoop Application Testing.
Environment: Hadoopv1.2.1, HDFS, Map Reduce, Hive, Sqoop, Pig, Oracle, XML, CDH4.x, Zookeeper, Oozie
Big Data Developer
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and Sqoop.
- Coordinated with business customers to gather business requirements. And also, interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
- Extensively involved in Design phase and delivered Design documents.
- Set up 3 node Hadoop clusters with IBM Big Insights.
- Worked with highly unstructured and semi structured data.
- Extracted the data from Oracle into HDFS using Sqoop (version 1.4.3) to store and generate reports for visualization purpose.
- Leveraged Solr API to search user interaction data for relevant matches.
- Designed the Solr Schema, and used the SolrJ client api for storing, indexing, querying the schema fields
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile using Apache Flume and stored the data into HDFS for analysis.
- Extensive experience in writing Pig (version 0.12.0) scripts to transform raw data into baseline data.
- Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
- Worked on Oozie workflow engine for job scheduling.
- Created Hive tables, partitions and loaded the data to analyze using HiveQl queries
- Loading the data to HBASE by using bulk load and HBASE API
Environment: IBM Big Insights 2.1.2, Java, Hive, Pig, HBase, Sqoop, Flume, Oozie, Solr, Shell script.
Jr. Software Engineer
- Involved in gathering business requirements, analyzing the project and creating use Cases.
- Coordination with the Design team, Business analysts and endusers of the system.
- Programming using core java language.
- Worked with Solr for indexing the data and used JSP for the Web application.
- Used JAXP (DOM, XSLT), XSD for XML data generation and presentation.
- Wrote JUnit test classes for the services and prepared documentation.
- Support and Bug fixing.
Environment: Java, JDBC, JSP, Servlets, HTML, JUnit, Java APIs, Design Patterns, MySQL, Eclipse IDE.