- Working as a Passionate Hadoop Developer with over 8 years of experience in developing, configuring, implementing Hadoop and Big - data ecosystems on various platforms as well as development and documentation of various applications using Java, engaged in several projects at different phases of Implementation, Up-gradation and Post production support in a highly customized environment.
- Proficient in analysing and developing Hadoop and Big-Data environments based on the customers technical requirements.
- Capacious working experience on Hadoop eco-system components like MapReduce (MRv1), (MRv2), Hive, Pig, Sqoop, Oozie, Kafka, ZooKeeper.
- Expertise in writing MapReduce programs and using Cloudera API, Apache Hadoop API, and Hortonworks distributions to interpret structure and unstructured data.
- Hands on expertise in ETL tools for data integration on big data and also importing and exporting the data from Relational Data Base Systems (RDBMS) to HDFS using SQOOP.
- Worked on NoSQL databases such as MongoDB, HBase and Cassandra.
- Also have hands on experience in Pig Latin scripts.
- Experienced in developing udf's for Pig and Hive using Java to extend the core functionality.
- Have good experience on data cleansing and analysis using Hive, Pig as well as HadoopJavaAPI.
- Good knowledge on setting up job streaming and scheduling with Oozie, and working on messaging system such as Kafka integrated with Zookeeper.
- Knowledge in installing Hadoop clusters, disaster recovery plan, configuring, performance tuning.
- Strong experience in working with BI tools like Tableau, QlikView, Pentaho and working accordingly depending on the requirements and implementing in the Big Data eco systems.
- Comprehensive knowledge on software development using Shell scripting, core Java and web technologies.
- Highly skilled in working with Spark streaming and Scala and also have sound knowledge in Spark SQL and Spark GraphX.
- Sound knowledge on real time data streaming solutions using Apache Spark Streaming, Kafka and Flume.
- Good knowledge in working with cloud integration with Amazon's Simple Storage Service (S3), Amazon Elastic MapReduce (EMR), Amazon Cloud Compute (EC2) and Microsoft Azure HDInsight.
- Hands on experience in writing queries on Impala to fetch data from multiple tables and also eliminating duplicates of data from the tables.
- Experienced in developing simple to complex Map/Reduce jobs using Hadoop technologies to handle files in multiple formats (JSON, Parquet, XML, Avro, Sequence File and etc.)
- Sound knowledge of Object Oriented Programming Methodologies with developmental experience in Java technologies.
- Hands on experience on verifying the cleansed data on Talend and also worked on Talend Administrative Console for adding users and scheduling jobs.
- Good knowledge in Build management tools like Maven, ANT and ScalaBuildTool.
- Experienced in developing pipe lines to store the data into HDFS using Kafka integration with Storm.
- Sound knowledge in using powerful SQL search engines like Apache Solr which helps to search against structured and un-structured data.
- Good knowledge in developing applications using RedHat & CentOS Linux environment.
- Ability to work independently or in a group with dedication culminated with strong communication skills helps to meet the dead lines.
Languages: Java, C, C++, PHP, Python, Scala
IDE's: My Eclipse, Eclipse, IntelliJ IDEA, NetBeans, Java Script, Visual Studio, DreamweaverCS3(9.0)
Big Data: Hadoop - 1.0, 2.0, MapReduce, HDFS, Logi4j, Hive, HBase, Impala, Sqoop, Spark, Kafka, ANT-1.8, Maven-3.3.1,3.3.9, Pig, Cassandra.
Operating Systems: Windows, Linux (Red Hat 5/6.5, CentOS 6/7, Ubuntu)
Data Bases: MYSQL, SQL Server, WebLogic Server, DB2, Teradata, Druid, Oracle - 10g,11g
Cloud Technologies: AmazonWebservices (AWS), Cloudera CDH3, CDH4, CDH5, Hortonworks, Mahout, Microsoft Azure Insight, Amazon RedShift
Confidential, Sunnyvale, CA
Sr Hadoop Developer
- Played a key role in discussing about the requirements, analysis of the entire system along with estimation, development and testing accordingly keeping BI requirements as a note.
- Involved in developing and installations of Sqoop, Hive and FTP integrations to down systems.
- Did a Sqoop job to help the import of data which is in different formats such as Avro, XML obtained from different vendors.
- Played a key role in dynamic Partitioning and Bucketing of the data stored in HIVE Metadata.
- Involved in writing MapReduce java scripts to process the data from HBase tables.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
- Developed a Map-Reduce project in Hadoop using python to compute the highest, lowest and the total sale and total number of hits made by the server IP address.
- Developed UNIX shell scripts for the business process and assimilation of data from different interfaces.
- Established an Oozie component to implement a job scheduler which should occur on a daily basis.
- Also involved in developing a pipe line to load the data into tables using Spark streaming and Kafka which is integrated with ZooKeeper.
- Involved in developing Sqoop scripts that load the data from different interfaces to HDFS.
- Processing the real-time data using Spark and connecting it to HIVE tables to store the real-time data.
- Worked on debugging and performance tuning on MapReduce, HIVE and Sqoop jobs.
- Involved in diagnosing different possible ways to optimize and improve the efficiency of the system.
- Obtained experience in maintaining and understanding the Hadoop log files.
- Involved in creating and maintaining of the technical documentation for the MapReduce, Hive, Sqoop, UNIX jobs along with Hadoop clusters and also reviewing them to fix the post production issues.
Environment: Red Hat Enterprise Linux 5.0, Hadoop 1.0, Impala 2.9.0, HBase 1.0.3, MapReduce, Hive 2.1.0, Java SDK, IBM-DB2 11.1, SQOOP 1.4.6, Spark 1.5, Scala 2.11, SBT, Maven 3.3.9, Jenkins 2.0, GitHub.
Confidential, Pleasanton, CA
Sr Hadoop Developer
- Involved in setting up the Hadoop clusters, configuration, monitoring and performance tuning.
- Installed and configured various Hadoop components such as Hive, Pig, Sqoop, Oozie.
- Played a key role in developing and designing data management system using MySQL and worked on CSV file and JSON while retrieving the data.
- Hands on experience on cloud services like Amazon web services (AWS)
- Managed and developed workflow in Oozie which automates the data loading into HDFS and pre-processing with Pig.
- Analysing the data by performing bucketing and partitioning in Hive and also by writing the Pig Latin scripts to know the customer behavioural pattern.
- Developed both frontend and backend modules using Python and ASP.net web frame work.
- Mentored in developing Hive queries and also connecting the Hive tables by OD BC connector to instantiate the report data to the front end.
- Scala job is created to implement on POC to migrate MapReduce job to SPARKRDD.
- Implemented Storm integration with Kafka and ZooKeeper for the processing of real time data.
- Played a key role in exporting and analysing the data from the relational databases and generating a report for the BI team.
- Worked on importing some of the data from NoSQL databases including HBASE, Cassandra.
- Monitoring the system health, reports and logs in process to act swiftly in the terms of failure and also alerting the team regarding the failures.
Environment: Cloudera Hadoop 5.0.5, MapReduce, Cassandra 3.0, HDFS, Hive 1.2.1, Java7.0, Pig 0.15.0, Sqoop 1.4.6, Tableau, XML, HBase 1.0.2, ASP.Net, Python 3.0, PIG-Latin, AWS.
Confidential, Englewood, CO
- Played a key role in installation managing and the support Linux Operating Systems such as CentOS and Ubuntu.
- Implemented in setting up the Amazon web services, Clusters, and data buckets on S3 with Hive scripts to process Big Data.
- Installed and monitored Hadoop Cloudera CDH5 and setting up Hadoop Cloudera distribution system.
- Used Core Java features such as Multi-Threading, Collections, and Exception handling to efficiently process high volume transactions.
- Played an important role in extracting the data from different data sources and transferring or loading them to Hadoop clusters.
- Played a major role in writing complex Hive queries in order to advance the data from the databases to Hadoop Distributed Files System(HDFS).
- Involved in creation of automation job to process the data from different data sources to the cluster using Oozie.
- Setup and monitor the development and production environment.
Environment: Hadoop 1.0, Oracle 11g, Maven 3.3.9, Pig 0.11.1, Junit 4.11, XML, Hive 1.4.2, HTML, Eclipse IDE, AWS, JDBC 4.0, SQL 11.0.
- Played an important role in gathering all the requirements for the project.
- Developed UI pages using struts tags in JSP.
- Extended standard action classes provided by the struts framework to handle the client request.
- Developed web services client interface with JAX-RPC from WSDL files for invoking the methods using SOAP.
- Developed RESTfulweb services for the downstream systems.
- Oracle 11g was used as the database for the application which was hosted in the cloud using Amazon web services.
- Developed the application using Eclipse and used it for editing, debugging, and formatting and build automation.
- Used Ajax to provide ease to the user by providing the data while filling the forms in the application.
- Developed ANT scripts for the build process and deployed in Web Logic Server.
- Development and implementation of tool.
- Developed JSP custom tags to display data.
- Version maintenance using CVS.
- Implemented UNIX shell scripting.
- Coordinated with the testing team to find and fix the bugs before production.
- Implemented test cases and the test the changed application.
Environment: Java 6.0, J2EE 1.4, Servlets 2.4, Struts 1.3, Junit 4.8, EJB 3.1, BEA WebLogic 10.3, JDBC 4.0, SQL 10.0.