- Over 9+ years of experience in Information Technology which includes experience in Big data, HADOOPEcosystem, Core Java/J2EEand strong in Design, Software processes, Requirement gathering, Analysis and development of software applications
- Excellent Hands on Experience in developing HadoopArchitecture in Windows and Linux platforms.
- Experience in building bigdata solutions using Lambda Architecture using Cloudera distribution of Hadoop, TwitterStorm, Trident, MapReduce, Cascading, HIVE, PIG and Sqoop.
- Expertise in various components of Hadoop Ecosystem - Map Reduce, Hive, Pig, Sqoop, Impala, Flume, Oozie, HBase, MongoDb, Cassandra, Scala, Spark, Kafka, YARN.
- Experienced in J2EE Design Patterns such as MVC, Business Delegate, Service Locator, Singleton, Transfer Object, Singleton, Session Façade, and Data Access Object.
- Worked on Hadoop, Hive, JAVA, python, Scala Struts web framework.
- Excellent working experience on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
- Experienced in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Hands on experience working on NoSQL databases including Hbase, MongoDB, Cassandra and its integration with Hadoopcluster.
- Strong Knowledge and experience on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
- Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVEQL.
- Good knowledge in RDBMS concepts (Oracle 11g, MS SQL Server 2000) and strong SQL, PL/SQL query writing skills (by using TOAD & SQL Developer tools), Stored Procedures and Triggers.
- Expertise in developing jobs using Spark framework modules like Spark-Core, Spark-SQL and Spark Streaming using Java, Scala, Python.
- Expertise in Amazon Web Services including Elastic Cloud Compute (EC2) and Dynamo DB and expertise in Automating deployment of large Cassandra Clusters on EC2 using EC2 APIs
- Experienced in developing, deploying enterprise applications on IBM WebSphere, BEA WebLogic, Oracle Application Server, JBoss, Tomcat, and Jetty.
- Experienced in development and utilization of ApacheSOLR with Data Computations and Transformation for use by Down Stream Online Applications.
- Excellent knowledge of database such as Oracle 8i/9i/10g/11g, 12c, MicrosoftSQLServer, DB2, Netezza.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall.
- Experienced in importing and exporting data using Sqoop from HDFS (Hive & HBase) to Relational Database Systems (Oracle &Teradata) and vice-versa.
- Expertise in using IDE like WebSphere (WSAD), Eclipse, NetBeans, MyEclipse, WebLogic Workshop.
- Experienced in developing and designing Web Services (SOAP and Restful Web services).
- Highly Proficient in writing complex SQL Queries, stored procedures, triggers and very well experienced in PL/SQL or T-SQL.
- Experienced in developing Web Interface using Servlets, JSP and Custom Tag Libraries
- Absolute knowledge of software development life cycle (SDLC), database design, RDBMS, data warehouse.
- Experience in writing ComplexSQLQueries involving multiple tables inner and outer joins.
- Expertise in various Java/J2EE technologies like JSP, Servlets, Hibernate, Struts, spring.
Big Data: Hadoop, Storm, Hbase, Hive, Flume, Cassandra, Kafka, Storm, Sqoop, Oozie, PIG, Spark, MapReduce, ZooKeeper, Yarn, MongoDB, Cassandra, Cloudera.
Operating Systems: UNIX, Mac, Linux, Windows 2000 / NT / XP / Vista, Android
Programming Languages: Java (JDK 5/JDK 6&7), R, HTML, SQL, PL/SQL
Frameworks: Hibernate 2.x/3.x, Spring 2.x/3.x,Struts 1.x/2.x and JPA
Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST, Jersey
Databases: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL 4.x/5.x
Middleware Technologies: Web sphere Message Queue, Web sphere Message Broker, XML gateway, JMS
Testing Frameworks: Mockito, PowerMock, EasyMock
Web/Application Servers: IBM Web sphere Application server, JBoss, Apache Tomcat
Others: Software Borland Star team, Clear case, Junit, ANT, Maven, Android Platform, Microsoft Office, SQL Developer, DB2 control center, MicrosoftVisio, Hudson, Subversion, GIT, Nexus, Artifactory
Development Strategies: Agile, Lean Agile, Pair Programming, Water-Fall and Test Driven Development
Confidential, Chicago, IL
Sr. BigData Architect
- Gathered the business requirements from the Business Partners and Subject Matter Experts and involved in installation and configuration of Hadoop Ecosystem components with Hadoop Admin.
- Working on architected solutions that process massive amounts of data on corporate and AWS cloud based servers.
- Supported MapReducePrograms those are running on the cluster and also wrote MapReduce jobs using JavaAPI.
- Configure a number of node (Amazon EC2 spot Instance) Hadoop cluster to transfer the data from Amazon S3 to HDFS and HDFS to AmazonS3 and also to direct input and output to the HadoopMapReduce framework.
- Involved in HDFS maintenance and loading of structured and unstructured data and imported data from mainframe dataset to HDFS using Sqoop.
- Developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using python & java map reduce.
- Handled importing of data from various data sources (i.e. Oracle, DB2, HBase, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
- Developed prototype Spark applications using Spark-Core, Spark SQL, DataFrame API and developed several custom User defined functions in Hive & Pig using Java & python
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Importing the data into Spark from Kafka Consumer group using Spark Streaming APIs.
- Wrote Hivequeries for data analysis to meet the business requirements. Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the datausing Spark with Kafka for faster processing.
- Configured Sparkstreaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Written pythonscripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Participated in building CDH4 test cluster for implementing Kerberos authentication. Upgraded the HadoopCluster from CDH4 to CDH5 and setup High availability Cluster to Integrate the HIVE with existing applications
- Written Spark applications using Scala to interact with the MySQL database using Spark SQL Context and accessed Hive tables using Hive Context.
- Extensively used Spring & Hibernate Frameworks and implemented MVC architecture and worked on Spring RESTful for dependency injection.
- Implemented AWS EC2, Key Pairs, Security Groups, AutoScaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services and implemented Reporting, Notification services using AWS API.
- Used AWS (Amazon Web services) compute servers extensively and create Snapshots of EBS Volumes. Monitor AWS EC2 Instances using Cloud Watch.
- Worked on AWS Security Groups and their rules
- Worked on Kafka, Kafka-Mirroring to ensure that the data is replicated without any loss.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
- Design the solution and develop the program for data ingestion using - Sqoop, map reduce, Shell script & python
- Responsible for migrating tables from traditional RDBMS into Hive tables using Sqoop and later generate required visualizations and dashboards using Tableau.
- Worked on documentation of all Extract, Transform and Load, designed, developed, validated and deploy the Talend ETL processes for Data ware house team using PIG, HIVE.
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data. Implemented scripts for loading data from UNIX file system to HDFS.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hivetables using HiveODBC connector.
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Developed fully customized framework using python, shell script, Sqoop & hive and developed export framework using python, Sqoop, Oracle & MySQL.
- Worked with teams in setting up AWS EC2 instances by using different AWS services like S3, EBS, Elastic Load Balancer, and Auto scaling groups, IAM roles, VPC subnets and CloudWatch.
- Implemented Daily Oozie jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
- Involved in importing and exporting data between HDFS and Relational Database Systems like Oracle, MySQL and SQL Server using Sqoop.
- Prototype done with HDPKafka and Storm for click stream application.
- Updated maps, sessions and workflows as a part of ETLchange and also modified existing ETLCode and document the changes.
Confidential, NYC NY
Sr. Big Data Developer/Architect
- Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Pig, Hive, HBase, Oozie, ZooKeeper, Sqoop, Spark and Kafka.
- Developed Sparkcode using Scala and Spark-SQL/Streaming for faster testing and processing ofdata.
- Used SparkAPI over ClouderaHadoopYARN to perform analytics on data in Hive.
- As a Big Data Developer implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, MongoDB, Hive, Oozie, Flume, Sqoop and Talend etc.
- Developed a job server (REST API, spring boot, ORACLE DB) and job shell for job submission, job profile storage, job data (HDFS) query/monitoring.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, PairRDD's, SparkYARN.
- Deployed application to AWS and monitored the load balancing of different EC2 instances
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from SQL into HDFS using Sqoop.
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Deployed application to AWS and monitored the load balancing of different EC2 instances
- Installed Hadoop, Map Reduce, and HDFS and developed multiple MapReduce jobs in PIG and Hive fordata cleaning and pre-processing.
- Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods.
- Managed Hadoop jobs using Oozie workflow scheduler system for Map Reduce, Hive, Pig and Spark transformation actions.
- Extensively worked on Python and build the custom ingest framework and w orked on Rest API using python.
- Developed Kafka producer and consumers, Spark and HadoopMapReduce jobs.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Imported the data from different sources like HDFS/Hbase into SparkRDD.
- Configured deployed and maintained multi-node Dev and Test KafkaClusters.
- Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and administration.
- Created Elastic Map Reduce (EMR) clusters and Configured the Data pipeline with EMR clusters for scheduling the task runner and provisioning of Ec2 Instances on both Windows and Linux.
- Worked on AWS Relational Database Services, AWS Security Groups and their rule and implemented Reporting, Notification services using AWS API.
- Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services.
- Involved in converting MapReduce programs into Sparktransformations using Spark RDD's on Scala.
- Developed Sparkscripts by using ScalaShell commands as per the requirement.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Implemented using SCALA and SQL for faster testing and processing of data. Real time streaming the data using with KAFKA.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed and designed automation framework using Python and Shell scripting.
- Involved in writing Java API for Amazon Lambda to manage some of the AWS services.
- Design & implement ETL process using Talend to load data from Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
- Load the data into SparkRDD and do in memory data Computation to generate the Output response.
- Worked on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Scala, Sqoop and Flume.
- Developed Hive Scripts, Pig scripts, UNIX Shell scripts, programming for all ETL loading processes and converting the files into parquet in the Hadoop File System.
- Worked with Oozie and Zookeeper to manage job workflow and job coordination in the cluster.
- Developed and written ApachePIGscripts and HIVEscripts to process the HDFS data.
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Created and maintained Technical documentation for launching HadoopClusters and for executing Hivequeries and PigScripts.
Confidential, NYC, NY
Sr. Big Data/Hadoop Developer
- Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive and MapReduce.
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning and processing.
- Designed and configured Flume servers to collect data from the network proxy servers and store to HDFS and HBASE.
- Worked on implementing Spark using Scala and SparkSQL for faster analyzing and processing ofdata.
- Utilized Java and MySQL from day to day to debug and fix issues with client processes
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
- Implemented AWS EC2, Key Pairs, Security Groups, AutoScaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services.
- Involved in launching and Setup of HADOOP/ HBASE Cluster which includes configuring different components of HADOOP and HBASE Cluster.
- Hands-on experience of Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology
- Handled in Importing and exporting data into HDFS and Hive using SQOOP and Kafka
- Involved in creating Hive tables, loading the data and writing hivequeries, which will run internally in map reduce.
- Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
- Involved in developing PigScripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in HDFS maintenance and WEBUI it through Hadoop-Java API.
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
- Implemented Reporting, Notification services using AWS API and used AWS (Amazon Web services) compute servers extensively.
- Involved in scheduling Oozieworkflow engine to run multiple Hive and pig jobs.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Worked on Designing and Developing ETLWorkflows using Java for processing data in HDFS/Hbase using Oozie.
- Worked on importing the unstructured data into the HDFS using Flume.
- Wrote complex Hivequeries and UDFs.
- Create Snapshots of EBS Volumes. Monitor AWS EC2 Instances using Cloud Watch and worked on AWS Security Groups and their rules
- Involved in developing Shellscripts to easy execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Worked with NoSQL databases like Hbase in creating tables to load large sets of semi structureddata.
- Generated JavaAPIs for retrieval and analysis on No-SQL database such as HBase.
- Created ETL jobs to generate and distribute reports from MySQL database using Pentaho DataIntegration.
- Worked on loading data from UNIX file system to HDFS
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Hadoop, Java/J2EE, HDFS, MapReduce, AWS, EC2, RDS, S3, Cloud Watch, Hive Sqoop, Pig, Hbase, Apache Spark, Oozie Scheduler, Java, UNIX Shell Scripts, Kafka, Git, Maven, PLSQL, MongoDB, HBase, Cassandra, Python, Scala, Teradata, Netezza, Oracle.
Confidential, Cincinnati, OH
Sr. Java/Hadoop Developer
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.
- Involved in writing Client side Scripts using Java Scripts and Server Side scripts using Java Beans and used servlets for handling the business.
- Developed applications in Hadoop BigData technologies- Pig, Hive, Map-Reduce, Hbase and Oozie.
- Created Elastic Map Reduce (EMR) clusters and Configured the Data pipeline with EMR clusters for scheduling the task runner.
- Developed Scala programs with Spark for data in Hadoop ecosystem.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop 2, 3, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
- Developed another user based Web services (SOAP) through WSDL using WebLogic application server and JAXB as binding framework to interact with other components.
- Managed and reviewed Hadoop Logfiles as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Provisioning of Ec2 Instances on both Windows and Linux and worked on AWS Relational Database Services, AWS Security Groups and their rules
- Implemented Reporting, Notification services using AWS API.
- Developed MapReduce jobs using apache commons components.
- Used Service Oriented Architecture (SOA) based SOAP and REST Web Services (JAX-RS) for integration with other systems.
- Collected and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Involved in designing and developing the application using JSTL, JSP, Java script, AJAX, HTML, CSS and collection.
- Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services.
- Created HBasetables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and translate to MapReduce jobs.
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
- Coordinated with various stakeholders such as the End Client, DBA Teams, Testing Team and Business Analysts.
- Developed Java Web Applications using JSP and Servlets, Struts, Hibernate, spring, RestWebServices, SOAP.
- Involved in gathering requirements and developing a project plan.
- Involved in understanding requirements, functional specifications, designing documentations and testing strategies.
- Involved in UI designing, Coding, Database Handling.
- Involved in UnitTesting and BugFixing.
- Worked over the entire Software Development Life Cycle (SDLC) as a part of a team as well as independently.
- WrittenSQLqueries to query the database and providing data extracts to users as per request.
Environment: Java 1.5, JSP, Servlet, Spring, AWS EC2, RDS, S3, Hibernate 3.0, TDD, Struts framework, Hadoop, Map Reduce, HDFS, HBase, Hive, Pig, Sqoop, Flume, Kafka, Spark, Scala, ETL, Cloudera CDH ApacheHadoop, HTML, XML, Log 4j, Eclipse, Unix, Windows XP
Confidential, Des Moines, IA
- Designed and implemented the strategic modules like Underwriting, Requirements, Create Case, User Management, Team Management and Material Data Changes.
- Provide support in all phases of Software development life cycle (SDLC), quality management systems and project life cycle processes. Utilizing Database Such as MYSQL, Following HTTP and WSDL Standards to Design the REST/ SOAP Based Web API’S using XML, JSON, HTML, and DOM Technologies.
- Involved in Installation and Configuration of Tomcat, SpringSource Tool Suit, Eclipse, unittesting.
- Back end server side coding and development using Java data structure as a Collections including Set, List, Map, Exception Handling, Vaadin, Spring with dependency injection, Struts Framework, Hibernate, Servlets, Action, Action Forms &Java beans, etc.
- Involved in Migrating existing distributed JSPframework to StrutsFramework, designed and involved in research of StrutsMVCframework
- Developed Ajaxframework on service layer for module as benchmark
- Implemented Service and DAO layers in between Struts and Hibernate.
- Used Agile practices and Test Driven Development (TDD) techniques to provide reliable, working software.
- Applied MVC pattern of Ajaxframework which involves creating Controllers for implementing Classes.
- Developed Spring REST Web services for opening, closing the locker door Webservice operations.
- Responsible to enhance the UI using HTML, Java Script, XML, JSP, CSS as per the requirements and providing the client side using JQuery validations.
- Involved in write application level code to interact with APIs, Web Services using AJAX, JSON and XML.
- Wrote lots of JSP's for maintains and enhancements of the application. Worked on Front End using Servlets, JSP and also backend using Hibernate.
- Implemented business process, database retrievals, access of information and User Interface usingJava, Struts, and Planet Interact Framework.
- Implemented the Application using many of the Design Patterns and Object Oriented Process in the view of future requirements of Insurance domain.
- Used JAXB for converting Java Object into a XML file and for converting XML content into a JavaObject.
- Web services were built using Spring and CXF operating within Mule ESB; offering both REST and SOAP interfaces.
- Used Maven as the build tool for the application.
- Used JIRA for bug/task tracking and time tracking.
- Used agilemethodology for development of the application.