- Over 8 years of Professional experience in IT Industry in Developing, Implementing, configuring, Java, J2EE, Big Data Technologies, working knowledge in Hadoop Ecosystem its stack including big data analytics and expertise in application Design and Development in various domains with an emphasis on Data warehousing tools using industry accepted methodologies .
- 3+ years’ experience in Hadoop Framework, and its ecosystem.
- Experienced Hadoop Developer, have a strong background with file distribution systems in a big - data arena. Understands the complex processing needs of big data and have experience developing codes and modules to address those needs.
- Extensive work experience in the areas of Banking, Finance, Insurance and Marketing Industries.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modelling and data mining, machine learning and advanced data processing.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster along with CDH3&4 clusters.
- Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analyst to write HQL queries.
- Real time experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Hive, Pig, Sqoop, Job Tracker, Task Tracker, Name Node, Data.
- Expertise in writing Hadoop Jobs for analyzing data using MapReduce, Hive &Pig.
- Knowledge in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, spark, kafka, storm, Zookeeper and Flume.
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience in importing and exporting data using Sqoop from HDFS to RDBMS and vice-versa.
- Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
- Experience in building, maintaining multiple Hadoop clusters of different sizes and configuration and setting up the rack topology for large clusters.
- Experience in installation, configuration, supporting and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
- Experience in NoSQL databases such as HBase and Cassandra.
- Experienced in job workflow scheduling tool like Oozie and in managing Hadoop cluster using Cloudera Manager Tool.
- Implemented a secured distributed systems network using Algorithm programming
- Experience in performance tuning by identifying the bottle necks in sources, mappings, targets and Partitioning.
- Wrote content explaining installation, configuration, and administration of core Data Platform (HDP) Hadoop components (YARN, HDFS) and other Hadoop components.
- Experience in Object Oriented Analysis, Design and development of software using UML Methodology.
- Excellent Java development skills using J2jEE, spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC.
- Excellent Java development skills using J2EE, spring, J2SE, Servlets, JUnit, JSP, JDBC.
- Experience in application development using Java, RDBMS, TALEND and Linux shell scripting and DB2.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
Big Data Ecosystem: MapReduce, HDFS, HBase, Spark, Scala, Zookeeper, Hive, Pig, Sqoop Cassandra, Oozie, MongoDB, Flume.
ETL Tools: Informatica, Talend.
Java Technologies: Core Java, Servlets, JSP, JDBC, Java 6, Java Help API.
Frameworks: MVC, Struts, Hibernate and Spring.
Programming Languages: C, C++, Java, Python, Linux shell scripts.
Methodologies: Agile, waterfall, UML, Design Patterns
Database: Oracle 10g, 11g, MySQL, No-SQL SQL Server 2008 R2, HBase.
Application Server: Apache Tomcat 5.x, 6.0.
Tools: SQL developer, Toad, Maven, SQL Loader.
Operating System: Windows 7, Linux Ubuntu.
Testing API: JUNIT
Confidential, Sanburno, CA
Big Data Developer
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, HBase database and Sqoop.
- Written multiple MapReduce programs to extract data for extraction, transformation and aggregation from more than 20 sources having multiple file formats including XML, JSON, CSV &other compressed file formats.
- Implemented Spark Core in Scala to process data in memory.
- Performed job functions using Spark API’s in Scala for real time analysis and for fast querying purposes.
- Involved in creating Spark applications in Scala using cache, map, reduceByKey etc. functions to process data.
- Created Oozie workflows for Hadoop based jobs including Sqoop, Hive and Pig.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Performed data validation on the data ingested using MapReduce by building a custom model to filter all the invalid data and cleanse the data.
- Handled the importing of data from various data sources, performed transformations using hive, Map-Reduce, loaded data into HDFS and extracted data from MySQLinto HDFS using Sqoop.
- Wrote HiveQL queries by configuring number of reducers and mappers in the query needed for the output.
- Transferred data between Pig Scripts and Hive using HCatalog, transferred relational database using Sqoop.
- Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
- Responsible for building scalable distributed data solutions using Hadoop. Installed and configured Hive, Pig, Oozie, and Sqoop on Hadoop cluster.
- Developed simple to complex Map-Reduce jobs using Java programming language that was implemented using Hive and Pig.
- Ran many performance tests using the Cassandra -stress tool in order to measure and improve the read and write performance of the cluster
- Configuring the Kafka, Storm and Hive to get and load the real time messaging.
- Supported MapReduce Programs that are running on the cluster. Cluster monitoring, maintenance and troubleshooting.
- Analysed the data by performing Hive queries (HiveQL) and running Pig Scripts (Pig Latin).
- Cluster coordination services through Zookeeper. Installed and configured Hive and also written Hive UDFs.
- Worked on the Analytics Infrastructure team to develop a stream filtering system on top of Apache Kafka and Storm.
- Worked on a POC on Spark and Scala parallel processing. Real streaming the data using Spark with Kafka.
Environment: s: Hadoop, Spark, HDFS, Hive, Pig, HBase, Big Data, Apache Storm, Oozie, Sqoop, Kafka, Flume, Zookeeper, MapReduce, Cassandra, Scala, Linux, NoSQL, MySQL Workbench, Java, Eclipse, Oracle 10g, SQL.
Confidential, Plymouth, MN
Big Data Developer
- Performing all phases of software engineering including requirements analysis, design, and code development and testing.
- Designing and implement product features in collaboration with business and IT stakeholders.
- Working very closely with Architecture group and driving solutions.
- Design and develop innovative solutions to meet the needs of the business and interacts with business partners and key contacts.
- Implement the data management Framework for building Data Lake for Optum.
- Support the implementation and drive it to stable state in production.
- Provide alternate design solutions along with project estimates.
- Reviewing code and providing feedback relative to best practices, improving performance etc.
- Troubleshooting production support issues post-deployment and come up with solutions as required.
- Demonstrate substantial depth of knowledge and experience in a specific area of Big Data and development.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF's in Hive querying.
- Worked on the backend using Scala and Spark to perform several aggregation logics.
- Worked on implementing hive-HBase integration by creating hive external tables and using HBasestorage handler.
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase.
- Drive the team and collaborate to meet project timelines.
- Worked on expertize with Big data technologies (HBASE, HIVE, MAPR PIG and Talend).
- Hadoop, Cloudera CDH 4.5, HDFS, PIG Scripting, Hive, Map Reduce, Sqoop, Flume, Oozie, Spark, Autosys, Unix scripting, Tableau, Talend Big data ETL.
- Designed and implemented Spark test bench application to evaluate quality of recommendations made by the engine.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics.
- Created and Implemented highly scalable and reliable highly scalable and reliable distributed data design using NoSQL HBase.
- Demonstrated expertise in Java programs Frameworks in an Agile/Scrum methodology.
- Bachelor’s degree or equivalent experience in a related field.
- Probably Unix and Kafka I can go a little light but the others are what we are actually using the project.
- Intake happens through Sqoop and Ingestion happens through Map Reduce, HBASE.
- Hive registration happens and query exposed for Business and Analysts.
- The cluster is on MapR. All functions, transformations are written on Pig.
- The complete process is synchronized by Talend the individual stages are called from Talend Workflow.
- Post Enrichments, the final copy is exposed to Spark SQL for end users to query.
- They need to get data in near real time; previously they tried CDC, now they are exploring Kafka to pull data as frequently as possible.
Environment: Hadoop, MapR, Spark, HDFS, Hive, Pig, HBase, Big Data, Oozie, Sqoop, Scala, Kafka, Flume, Zookeeper, MapReduce, Spark SQL, Tableau, Scala, Unix and Java.
Confidential, Woodland Hills, CA
- Experience in working with Flume to load the log data from multiple sources directly into HDFS .
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Takes care about performance and security across all the Restful API .
- Implemented data ingestion and handling clusters in real time processing using Apache Storm and Kafka.
- Prepare required Restful API guide for User Interface developer and HTML in front end and it uses Restful API web services.
- Used the search capabilities provided by Solr like faceted search, collapsing/grouping, function queries etc.
- Experience with Core Distributed computing and Data Mining Library using Apache Spark.
- Used Hive to process data and Batch data filtering .Used Spark for any other value centric data filtering.
- Worked extensively with Flume for importing data from various webservers to HDFS.
- Worked on Large-scale Hadoop YARN cluster for distributed data processing and analysis using Sqoop, Pig, Hive, Impala and NoSQL databases. Develop Hadoop data processes using Hive and/or Impala.
- Zookeeper, and Accumulate stack, aiding in the development of specialized indexes for performant queries on big data implementations.
- Worked on deploying Hadoop cluster with multiple nodes and different big data analytic tools including Pig, Hbase database and Sqoop . Got good experience with NoSQL database.
- Responsible for building scalable distributed data solutions using Datastax Cassandra.
- Those WIFI data through EMS/JMS get stored in hadoop ecosystem and through Oryx, Spark .
Environment: Hadoop, HDFS, Hive, Pig, HBase, Big Data, Oozie, Sqoop, Zookeeper, MapReduce, Cassandra, Scala, Linux, NoSQL, MySQL Workbench, Java, Eclipse, Oracle 10g, SQL.
Confidential, Tampa, FL
Big Data/Hadoop Developer
- Handled importing of data from various data sources, performed data transformations using HAWQ , Map Reduce.
- Analysed the web log data using the HiveQL.
- Developed hive queries on data logs to perform a trend analysis of user behavior on various online modules.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Involved in the setup and deployment of Hadoop cluster.
- Developed Map Reduce programs for some refined queries on big data.
- Involved in loading data from UNIX file system to HDFS.
- Loaded data into HDFS and extracted the data from MySQ L into HDFS using Sqoop.
- Exported the analyzed data to the relational databases using Sqoop and generated reports for the BI team.
- Managing and scheduling jobs on a Hadoop cluster using Oozie .
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Used Test driven approach for developing the application and Implemented the unit tests using Python Unit test framework.
- Developed storm-monitoring bolt for validating pump tag values against high-low and
- High high - low low values from preloaded metadata.
- Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.8.3 API's to produce messages.
- Installed, Configured Talend ETL on single and multi-server environments.
- Troubleshooting, debugging & fixing Talend specific issues, while maintaining and performance of the ETL environment.
- Developed Merge jobs in Python to extract and load data into MySQL database.
- Created and modified several UNIX shell Scripts according to the changing needs of the project and client requirements. Developed UNIX shell scripts to call Oracle PL/SQL packages and contributed to standard framework.
- Developed Simple to complex Map/reduce Jobs using Hive .
- Implemented Partitioning and bucketing in Hive. Mentored analyst and test team for writing Hive Queries.
- Involved in setting up of HBase to use HDFS.
- Installation of patches and packages using RPM and YUM in Red hat and suse Linux and also using patchadd and pkgadd in Solaris 10 Operating System.
- Extensively used Pig for data cleansing. Loaded streaming log data from various webservers into HDFS using Flume.
- Developed entire frontend and backend modules using Python on Django including Tasty pie Web Framework using Git
- Created SOLR XML schema for the all entities. Used SOLRJ API to write indexes on
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based Data pipeline. This pipeline is also involved in Amazon Web Services EMR, S3 and RDS.
- Planed and implemented UNIX shell scripting to automate cross-domain file flow to move time sensitive files from high side network down, and out to other sites.
- Involved in creating data-models for customer data using Cassandra Query Language.
- Performed benchmarking of the No-SQL databases, Cassandra and HBase.streams.
- Knowledgeable of Spark and Scala mainly in framework exploration for transition from Hadoop/MapReduce to Spark.
- Supported in setting up QA environment and updating configurations for implementing scripts With Pig and Sqoop.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Configured Flume to extract the data from the web server output files to load into HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Environment: Unix Shell Scripting, Python, Oracle 11g, DB2,HDFS, Kafka, Storm, Spark, ETL, 1Java (jdk1.7), Pig, Linux,HiveQL, AWS EMR, Cassandra, MapReduce, Ms Access,Toad,SQL,Scala, MySQL Workbench, XML, No-SQL, MapReduce, SOLR, HBase, Hive,Sqoop, Flume, Talend, Oozie.
Confidential, Cincinnati, OH
- Involved in gathering business requirements, analyzing the project and created UML diagrams such as Use Cases, Class Diagrams, Sequence Diagrams and flowcharts for the optimization Module using Microsoft Visio.
- Configured faces-config.xml for the page navigation rules and created managed and backing beans for the Optimization module.
- Developed JSP web pages for rate Structure and Operating cost using JSF HTML and JSF CORE tags library.
- Designed and developed the framework for the IMAT application implementing all the six phases of JSF life cycle and wrote Ant build, deployment scripts to package and deploy on JBoss application server.
- Designed and developed Simulated annealing algorithm to generate random Optimization schedules and developed neural networks for the CHP system using Session Beans.
- Integrated EJB 3.0 with JSF and managed application state management, business process management (BPM) using JBoss Seam.
- Wrote AngularJS controllers, views, and services for new website features.
- Developed Cost function to calculate the total cost for each CHP Optimization schedule generated by the Simulated Annealing algorithm using EJBs.
- Implemented spring web flow for the Diagnostics Module to define page flows with actions and views and created POJOs and used annotations to map them to SQL Server database using EJB.
- Wrote DAO classes, EJB 3.0 QL queries for Optimization schedule and CHP data retrievals from SQL Server database.
- Used Eclipse as IDE tool to develop the application and JIRA for bug and issue tracking
- Created combined deployment descriptors using XML for all the session and entity beans.
- Wrote Message Driven Bean to implement the Diagnostic Engine and configured the JMS queue details and involved in performance tuning of the application using JProbe and JProfiler.
- Designed and coded application components in an Agile environment utilizing a test driven development approach.
- Skilled in test driven development and Agile development.
- Wrote JUnit test cases to test the Optimization Module and created functions, sub queries and stored procedures using PL/SQL.
- Tested the Simulated Annealing algorithm with different input schedules (always-on, always-off, human optimized schedule and five random input schedules) and stored the test results in a spread sheet.
- Created technical design document for the Diagnostics Module and Optimization module covering Cost function and Simulated Annealing approach.
- Involved in code reviews and performed version guidelines.
Confidential, New York, NY
- Analysis of system requirements and development of design documents.
- Involved in various client implementations.
- Development of Spring Services
- Development of persistence classes using Hibernate framework.
- Development of SOA services using Apache Axis web service framework.
- Development of user interface using Apache Struts2.0, JSPs, Servlets, JQuery and Java Script.
- Developed client functionality using ExtJS.
- Development of JUnit test cases to test business components.
- Extensively used Java Collection API to improve application quality and performance.
- Vastly used Java 5 features like Generics, enhanced for loop, type safe etc.
- Providing production support and enhancements design to the existing product.
Confidential, Boston, MA
- Wrote stored procedures using PL/SQL for data retrieval from different tables.
- Worked extensively on bug fixes on the server side and made cosmetic changes on the UI side.
- Part of performance tuning team and implemented caching mechanism and other changes.
- Recreated the system architecture diagram and created numerous new class and sequence diagrams.
- Designed and developed UI using HTML, JSP and Struts where users have all the items listed for auctions.
- Developed Authentication and Authorization modules where authorized persons can only access the inventory related operations.
- Developed Controller Servlets, Action and Form objects for process of interacting with Oracle database and retrieving dynamic data.
- Responsible for coding SQL Statements and Stored procedures for back end communication using JDBC.
- Developed the Login screen so that only authorized and authenticated administrators can only access the application.
- Developed various activities like transaction history, search products that enable users to understand the system efficiently.
- Involved in preparing the Documentation of the project to understand the system efficiently.