- Overall 8 years of commendable experience in IT industry with proven expertise in Big Data Analytics and Development.
- Over 4+ years of work experience in ingestion, storage, querying, processing and analysis of Big Data, with hands on experience in big data related technologies such asSpark, MapReduce, HDFS, Hive, Pig, HBase, Sqoop, Flume, Zookeeper, Kafka, Oozie, Storm and AWS.
- Executed all phases of a Big Data project life cycle during the tenure that includes Scoping Study, Requirements Gathering, Design, Development, Implementation, Quality Assurance, Application Support for end - to-end IT solution offerings.
- Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache SPARK, Cloudera and AWS Service console.
- Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, JSON and Avro.
- Expertise in developing solutions around NoSQL databases like HBase,MongoDB and Cassandra.
- In-depth understanding of Data Structure and Algorithms and hands-on experience handing multi terabytes of datasets.
- Experience in installation, configuration, supporting and managing Hadoop Clusters using Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce and YARN architecture and good understanding of workload management, scalable and distributed platform architectures.
- Developed multiple MapReduce programs to process large volumes of semi/unstructured data files using different MapReduce design patterns.
- Implemented batch processing solution for certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Experience in extending HIVE and PIG core functionalities by creating custom UDF’s and UDAF’s by extending required class and implementing methods to evaluate expressions.
- Proficient in Big data ingestion tools like Flume, Kafka, spark Streaming and Sqoop for streaming and batch data ingestion.
- Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
- Expertise in implementing Spark and Scala application using higher order functions for both batch and interactive analysis requirement.
- Extensively experienced working with Spark tools like RDD transformations, SparkMLlib and sparkSQL.
- Good knowledge on executing SparkSQL queries against data in Hive by using hive context in spark.
- Experienced in moving data from different sources using Kafka producers, consumers and preprocess data using Storm topologies.
- Experienced in migrating ETL transformations using Pig Latin Scripts, transformations and join operations.
- Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia.
- Experienced in developing BI reports and dashboards using Pentaho Reports and Pentaho Dashboards.
- Experience in testing MapReduce programs using MRUnit and Junit.
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI,JSP, Servlets, JSF, Struts, Spring, Hibernate and EJB.
- Extensive experience in working with SOA based architectures using Rest based web services using JAX-RS and SOAP based web services using JAX-WS.
- Experienced and have good knowledge on creational, structural and behavior design patterns like Singleton, Builder, Abstract Factory, Adapter, Bridge, Façade, Decorator, Template, Visitor, Iterator and Chain of Responsibility.
- Highly proficient in SQL, PL/SQL including writing queries, stored procedures, functions and database performance tuning.
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, SparkSQL,Talend, Ambari, Mahout, Avro, Parquet and Snappy
Hadoop Distributions: Cloudera, Hortonworks, MapR and Apache
No SQL Databases: Cassandra, MongoDB and HBase
RDBMS: Oracle 9i, 10g, 11i, MS SQL Server, MySQL, DB2 and Teradata
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and Struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM) and JAXB
Development / Build Tools: Eclipse, Ant, Maven, Jenkins, IntelliJ, JUNIT, log4J and ETL
Version Control: Subversion, Git, Win CVS
Frameworks: Struts 2.x, spring3.x/4.x and Hibernate, Akka
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
Operating systems: UNIX, LINUX, Mac and Windows Variants
Data analytical tools: R and MATLAB
ETL Tools: Informatica, Talend and Pentaho
- Developed simple and complex Map Reduce programs in Java for Data Analysis on different data formats
- Extensively used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value.
- Experienced in implementing static and dynamic partitioning in hive.
- Experience in customizing map reduce framework at different levels like input formats, data types and partitioners.
- Experience in developing Kafka producers and consumers in java and ingesting data into HDFS.
- Involved in migrating hive queries and UDF’s in hive to Spark SQL and implemented using Pysparkand Scala.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats
- Implemented API tool to handle streaming data using Flume.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing
- Created Oozie workflow engine to run multiple Hive jobs
- Used Zookeeper for providing coordinating services to the cluster.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Configured build scripts for multi module projects with Maven and Jenkins CI.
- Prepared dashboards of plans and various metrics using Tableau reporting tool
Environment: Hadoop, Cloudera (CDH 4.2), HDFS, Hive,Spark, Scala,Flume, Storm, Kafka Sqoop, Pig, Java, Eclipse, Oracle, Jenkins, Ubuntu, UNIX, and Maven, Tableau.
Confidential, Patskala, Ohio
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Developed Java MapReduce program for mileage Calibration, Car status summarization and data filtering.
- Designed and implemented Spark-based large-scale parallel relation-learning system.
- Worked on the proof-of-concept for Apache Spark framework initiation.
- Responsible for implementing Machine learning algorithms like K-Means clustering and collaborative filtering in Spark.
- Responsible for implementing POC's to migrate iterative map reduce programs into Spark transformations using Spark and Scala.
- Created Hive Dynamic partitions to load time series data.
- Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
- Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
- Collected and aggregated huge amount of log data from multiple sources and integrated into HDFS using Flume.
- Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Experienced with AWS services to smoothly manage application in the cloud and creating or modifying instances.
- Checking the health and utilization of AWS resources using AWS CloudWatch.
- Provisioned AWS S3 buckets for backup of the application and sync this contents with remaining S3 backups by creating entry for AWS S3 SYNC in crontab.
- Experienced working on Pentaho suite (Pentaho Data Integration, Pentaho BI Server, Pentaho Meta Data and Pentaho Analysis Tool).
- Used Pentaho Reports and Pentaho Dashboard in developing Data Warehouse architecture, ETL framework and BI Integration.
- Used Hive as the core database for the data warehouse where it is used to track and analyze all the data usage across our network.
- Used Solr for indexing and search operations and configuring Solr by modifying schema.xml file as per our requirements.
- Used Oozie to coordinate and automate the flow of jobs in the cluster accordingly.
- Worked on different file formats like Text files, Sequence Files, Avro and Record columnar files (RC).
Environment: HDFS, MapReduce, Pig, Hive, Flume, Sqoop, Oozie, Kafka, Spark, Scala, Akka, HBase, MongoDB, Elastic search, Pentaho, Linux- Ubuntu, Git, Jenkins, Solr, Python, Kafka.
Confidential - Houston
Java/ Hadoop Developer with ETL
- Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Developed Java Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
- Worked on the ingestion of files into HDFS from remote systems using MFT (Managed File Transfer).
- Used Solr for creating search pattern of customer details, sorting and retrieving.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
- Created data flow diagrams, data mapping from Source to stage and Stage to Target mapping documents indicating the source tables, columns, data types, transformations required and business rules to be applied.
- Developed and updated the web tiermodules using Struts 2.1 Framework.
- Validated ETL mappings and tuned them for better performance and implemented various Performance and Tuning techniques.
- Used JBoss Application server as the JMS provider to manage the sessions and queues.
- Implemented Struts Validator for automated validation.
- Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQLserver.
- Data integrity / quality testing. Custom table creation and population, custom and package index analysis and maintenance in relation to process performance.
- Used CVS for versioncontrolling and JUnit for unit testing.
Environment: Hadoop, HDFS, Map Reduce, Flume, Hive, Pig, Sqoop, Solr, Oozie, Oracle, Java, Shell Scripting, SQL, JMS.
Confidential, Fremont, CA
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Involved in loading data from LINUX file system to HDFS.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented test scripts to support test driven development and continuous integration.
- Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System). developed multiple Map Reduce jobs in java for data cleaning.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in Map Reduce way.
- Supported MapReduce Programs those are running on the cluster.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked on tuning the performance Pig queries.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple MapReduce jobs.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as Required.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Linux, Java, Oozie, HBase.
- Responsible for requirement gathering, requirement analysis, defining scope, and design.
- Developed the Use Cases, Class Diagrams and Sequence Diagrams using Rational Rose.
- Developed user Interface using JSP and HTML.
- Written Server Side programs using Servlets .
- Used Java Script for client side Validation.
- Used HTML, AWT with Java Applets to create web pages.
- Responsible for database designand developed stored procedures and triggers to improve the performance.
- Used Eclipse IDE for all coding in Java, Servlets and JSPs.
- Co-ordinate with the QA lead for development of test plan, test cases, test code and actual testing, responsible for defects allocation and ensuring that the defects are resolved.
- Used Flex Styles and CSS to manage the Look and Feel of the application.
- Deployed the application on Web SphereApplication server.