- Working as a Passionate Developer with overall 8 years of experience in developing, configuring, implementing Hadoop and Big - data ecosystems on various platforms as well as development and documentation of various applications using Java, engaged in several projects Confidential different phases of Implementation, Up-gradation and Post production support in a highly customized environment.
- Proficient in analysing and developing Hadoop and Big-Data environments based on the customers technical requirements.
- Capacious working experience on Hadoop eco-system components like MapReduce (MRv1), (MRv2), Hive, Pig, Sqoop, Oozie, Kafka, ZooKeeper.
- Expertise in writing MapReduce programs and using Cloudera API, Apache Hadoop API, and Hortonworks distributions to interpret structure and unstructured data.
- Hands on expertise in ETL tools for data integration on big data and also importing and exporting the data from Relational Data Base Systems (RDBMS) to HDFS using SQOOP.
- Worked on NoSQL databases such as MongoDB, HBase and Cassandra.
- Also have hands on experience in Pig Latin scripts.
- Experienced in developing udf's for Pig and Hive using Java to extend the core functionality.
- Have good experience on data cleansing and analysis using Hive, Pig as well as HadoopJavaAPI.
- Good knowledge on setting up job streaming and scheduling with Oozie, and working on messaging system such as Kafka integrated with Zookeeper.
- Worked on Producer API and created a custom partitioner to publish the data to the Kafka Topic.
- Worked on POC for streaming data using Kafka and spark streaming.
- Implemented Kafka Customer with Spark-streaming and Spark SQL using Scala.
- Analyzed the volume of the existing batch process and designed the Kafka Topic and partition.
- Knowledge in installing Hadoop clusters, disaster recovery plan, configuring, performance tuning.
- Strong experience in working with BI tools like Tableau, QlikView, Pentaho and working accordingly depending on the requirements and implementing in the Big Data eco systems.
- Comprehensive knowledge on software development using Shell scripting, core Java and web technologies.
- Highly skilled in working with Spark streaming and Scala and also have sound knowledge in Spark SQL and Spark GraphX.
- Sound knowledge on real time data streaming solutions using Apache Spark Streaming, Kafka and Flume.
- Good knowledge in working with cloud integration with Amazon's Simple Storage Service (S3), Amazon Elastic MapReduce (EMR), Amazon Cloud Compute (EC2) and Microsoft Azure HDInsight.
- Hands on experience in writing queries on Impala to fetch data from multiple tables and also eliminating duplicates of data from the tables.
- Experienced in developing simple to complex Map/Reduce jobs using Hadoop technologies to handle files in multiple formats (JSON, Parquet, XML, Avro, Sequence File and etc.)
- Sound knowledge of Object Oriented Programming Methodologies with developmental experience in Java technologies.
- Hands on experience on verifying the cleansed data on Talend and also worked on Talend Administrative Console for adding users and scheduling jobs.
- Good knowledge in Build management tools like Maven, ANT and ScalaBuildTool.
- Experienced in developing pipe lines to store the data into HDFS using Kafka integration with Storm.
- Sound knowledge in using powerful SQL search engines like Apache Solr which helps to search against structured and un-structured data.
- Good knowledge in developing applications using RedHat & CentOS Linux environment.
- Ability to work independently or in a group with dedication culminated with strong communication skills helps to meet the dead lines.
Languages : Java, C, C++, PHP, Python, Scala
IDE's : My Eclipse, Eclipse, IntelliJ IDEA, NetBeans, Java Script, Visual Studio, DreamweaverCS3(9.0)
Big Data : Hadoop - 1.0, 2.0, MapReduce, HDFS, Logi4j, Hive, HBase, Impala, Sqoop, Spark, Kafka, ANT-1.8, Maven-3.3.1,3.3.9, Pig, Cassandra.
Operating Systems: Windows, Linux (Red Hat 5/6.5, CentOS 6/7, Ubuntu)
Data Bases : MYSQL, SQL Server, WebLogic Server, DB2, Teradata, Druid, Oracle - 10g,11g
Cloud Technologies : AmazonWebservices (AWS), Cloudera CDH3, CDH4, CDH5, Hortonworks, Mahout, Microsoft Azure Insight, Amazon RedShift
Confidential, Alpharetta - GA
Sr Hadoop Developer
- Played a key role in discussing about the requirements, analysis of the entire system along with estimation, development and testing accordingly keeping BI requirements as a note.
- Involved in developing and installations of Sqoop, Hive and FTP integrations to down systems.
- Did a Sqoop job to help the import of data which is in different formats such as Avro, XML obtained from different vendors.
- Played a key role in dynamic Partitioning and Bucketing of the data stored in HIVE Metadata.
- Involved in writing MapReduce java scripts to process the data from HBase tables.
- Implemented functionality using Servlet, JSP, Lucene, Elastic Search, JVM Tuning, ETL, Tridian, HTML and Struts Framework., Hibernate, Spring, Java Scripts and Weblogic .
- Loaded data into Solr and performed search queries to retrieve the data.
- Responsible for indexing data in solr.
- Responsible for developing search queries in solr .
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
- Developed a Map-Reduce project in Hadoop using python to compute the highest, lowest and the total sale and total number of hits made by the server IP address.
- Developed UNIX shell scripts for the business process and assimilation of data from different interfaces.
- Established an Oozie component to implement a job scheduler which should occur on a daily basis.
- Also involved in developing a pipe line to load the data into tables using Spark streaming and Kafka which is integrated with ZooKeeper.
- Responsible for Configuring Kafka Consumer and Producer metrics to visualize the Kafka System performance and monitoring
- Involved in developing Sqoop scripts that load the data from different interfaces to HDFS.
- Processing the real-time data using Spark and connecting it to HIVE tables to store the real-time data.
- Worked on debugging and performance tuning on MapReduce, HIVE and Sqoop jobs.
- Involved in diagnosing different possible ways to optimize and improve the efficiency of the system.
- Obtained experience in maintaining and understanding the Hadoop log files.
- Involved in creating and maintaining of the technical documentation for the MapReduce, Hive, Sqoop, UNIX jobs along with Hadoop clusters and also reviewing them to fix the post production issues.
Environment: Red Hat Enterprise Linux 5.0, Hadoop 1.0, Impala 2.9.0, HBase 1.0.3, SOLR, MapReduce, Hive 2.1.0, Java SDK, IBM-DB2 11.1, SQOOP 1.4.6, Spark 1.5, Scala 2.11, SBT, Maven 3.3.9, Jenkins 2.0, GitHub.
Confidential, Horsham - PA.
Sr Hadoop Developer
- Involved in setting up the Hadoop clusters, configuration, monitoring and performance tuning.
- Installed and configured various Hadoop components such as Hive, Pig, Sqoop, Oozie.
- Developed Kafka consumers to consume data from Kafka topics.
- Used Apache Kafka (Message Queues) for reliable and asynchronous exchange of important information between multiple business applications.
- Integrated Kafka source to read the payment confirmation messages
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily
- Involved in developing Hive DDLs to create, alter and drop Hive tables and storm, & Kafka.
- Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers
- Played a key role in developing and designing data management system using MySQL and worked on CSV file and JSON while retrieving the data.
- Hands on experience on cloud services like Amazon web services (AWS)
- Managed and developed workflow in Oozie which automates the data loading into HDFS and pre-processing with Pig.
- Analysing the data by performing bucketing and partitioning in Hive and also by writing the Pig Latin scripts to know the customer behavioural pattern.
- Developed both frontend and backend modules using Python and ASP.net web frame work.
- Mentored in developing Hive queries and also connecting the Hive tables by OD BC connector to instantiate the report data to the front end.
- Scala job is created to implement on POC to migrate MapReduce job to SPARKRDD.
- Implemented Storm integration with Kafka and ZooKeeper for the processing of real time data.
- Played a key role in exporting and analysing the data from the relational databases and generating a report for the BI team.
- Worked on importing some of the data from NoSQL databases including HBASE, Cassandra.
- Monitoring the system health, reports and logs in process to act swiftly in the terms of failure and also alerting the team regarding the failures.
Environment: Cloudera Hadoop 5.0.5, MapReduce, Kafka, Cassandra 3.0, HDFS, Hive 1.2.1, Java7.0, Pig 0.15.0, Sqoop 1.4.6, Tableau, XML, HBase 1.0.2, ASP.Net, Python 3.0, PIG-Latin, AWS.
Confidential, San Francisco - CA
- Developing Hive scripts to select the Delta (CDC) and load into HBase tables using pig script
- Transforming data using pig scripts
- Developing MapReduce scripts to count large number of records in HBase tables
- Working on different hive optimization and performance tuning techniques.
- Working on Ingestion of logs into Hadoop using Flume and Kafka
- Processing logs using spark streaming and loaded into hive tables
- Using Hive SerDe to read and write data in different formats.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS.
- Hands on experience in the process of Cassandra technology
- Monitoring the production clusters and handling tickets such as related to disk issues, service/OS Level issues on clusters
- Have involved in the process of upgrading, Adding & removing nodes also, have collected production metrics by using SPLUNK tool
- Involved in Design, Architecture and Installation of Big Data and Hadoop ecosystem components.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Creating workflows using Oozie.
- Automated Hadoop jobs using Oozie scheduler.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Environment: Cloudera Hadoop 5.0.5, MapReduce, Cassandra 3.0, HDFS, SQL, Scala, Hive 1.2.1, Java7.0, Pig 0.15.0, Sqoop 1.4.6, XML, HBase 1.0.2, ASP.Net, Python 3.0, PIG-Latin, AWS, Oozie.
- Played a key role in installation managing and the support Linux Operating Systems such as CentOS and Ubuntu.
- Implemented in setting up the Amazon web services, Clusters, and data buckets on S3 with Hive scripts to process Big Data.
- Installed and monitored Hadoop Cloudera CDH5 and setting up Hadoop Cloudera distribution system.
- Used Core Java features such as Multi-Threading, Collections, and Exception handling to efficiently process high volume transactions.
- Played an important role in extracting the data from different data sources and transferring or loading them to Hadoop clusters.
- Played a major role in writing complex Hive queries in order to advance the data from the databases to Hadoop Distributed Files System(HDFS).
- Involved in creation of automation job to process the data from different data sources to the cluster using Oozie.
- Setup and monitor the development and production environment.
Environment: Hadoop 1.0, Oracle 11g, Maven 3.3.9, Pig 0.11.1, Junit 4.11, XML, Hive 1.4.2, HTML, Eclipse IDE, AWS, JDBC 4.0, SQL 11.0.
- Played an important role in gathering all the requirements for the project.
- Developed web services client interface with JAX-RPC from WSDL files for invoking the methods using SOAP.
- Developed REST full web services for the downstream systems.
- Oracle 11g was used as the database for the application which was hosted in the cloud using Amazon web services.
- Developed the application using Eclipse and used it for editing, debugging, and formatting and build automation.
- Created Stored Procedures and other SQL scripts using PL-SQL.
- Implemented various cross-project functionalities using Spring AOP.
- Generated Hibernate XML files for the configured beans. The business logic was written in EJB DAO classes and the service layer classes were configured in Spring-service.xml.
- Implemented Address normalization using AJAX calls
- Used Spring BOOT to create stand-alone spring applications.
- Used JMS passing the message from one database to another.
- Developed cross browser compatibility code using CSS and jQuery.
- Used Hibernate to create Configuration and Mapping files.
- Developed SOAP based web services to provide services for front-end web.
- Developed Action classes & Servlets to route the submittals to the EJB components.
- Used JUnit framework for Unit testing of application.
- Writing JSP's for user interfaces, JSP's uses Java Beans objects to produce responses.
- Involved in testing the XML files and checked whether data is parsed and loaded to staging tables.
- Store persistent JMS messages or temporarily store messages sent using the store-and-forward feature.
- Used Hibernate to persist and retrieve data from database.
- Developed JSP custom tags to display data.
- Version maintenance using SVN, GIT.
- Implemented UNIX shell scripting.
- Coordinated with the testing team to find and fix the bugs before production.