- Over 8+ years of experience in Information Technology which includes experience in Kafka, Big data, HADOOP Ecosystem, Core Java/J2EEand strong in Design, Software processes, Requirement gathering, Analysis and development of software applications
- Excellent Hands on Experience in developing Hadoop Architecture in Windows and Linux platforms.
- Experience in building bigdata solutions using Lambda Architecture using Cloudera distribution of Hadoop, Twitter Storm, Trident, MapReduce, Cascading, HIVE, PIG and Sqoop.
- Expertise in various components of Hadoop Ecosystem - Map Reduce, Hive, Pig, Sqoop, Impala, Flume, Oozie, HBase, MongoDb, Cassandra, Scala, Spark, Kafka, YARN.
- Experienced in J2EE Design Patterns such as MVC, Business Delegate, Service Locator, Singleton, Transfer Object, Singleton, Session Façade, and Data Access Object.
- Worked on Hadoop, Hive, JAVA, python, Scala Struts web framework.
- Excellent working experience on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
- Experienced in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Hands on experience working on NoSQL databases including Hbase, MongoDB, Cassandra and its integration with Hadoop cluster.
- Strong Knowledge and experience on implementing Big Data in Confidential Elastic MapReduce ( Confidential EMR) for processing, managing Hadoop framework dynamically scalable Confidential EC2 instances.
- Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVEQL.
- Having good experience on all flavors of Hadoop (Cloudera, Hortonworks, and MapR etc.)
- Good knowledge in RDBMS concepts (Oracle 11g, MS SQL Server 2000) and strong SQL, PL/SQL query writing skills (by using TOAD & SQL Developer tools), Stored Procedures and Triggers.
- Expertise in developing jobs using Spark framework modules like Spark-Core, Spark-SQL and Spark Streaming using Java, Scala, Python.
- Expertise in Confidential Web Services including Elastic Cloud Compute (EC2) and Dynamo DB and expertise in Automating deployment of large Cassandra Clusters on EC2 using EC2 APIs
- Experienced in developing, deploying enterprise applications on IBM WebSphere, BEA WebLogic, Oracle Application Server, JBoss, Tomcat, and Jetty.
- Experienced in development and utilization of Apache SOLR with Data Computations and Transformation for use by Down Stream Online Applications.
- Excellent knowledge of database such as Oracle 8i/9i/10g/11g, 12c, Microsoft SQL Server, DB2, Netezza.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall.
- Experienced in importing and exporting data using Sqoop from HDFS (Hive & HBase) to Relational Database Systems (Oracle &Teradata) and vice-versa.
- Expertise in using IDE like WebSphere (WSAD), Eclipse, NetBeans, My Eclipse, WebLogic Workshop.
- Experienced in developing and designing Web Services (SOAP and Restful Web services).
- Highly Proficient in writing complex SQL Queries, stored procedures, triggers and very well experienced in PL/SQL or T-SQL.
- Experienced in developing Web Interface using Servlets, JSP and Custom Tag Libraries
- Absolute knowledge of software development life cycle (SDLC), database design, RDBMS, data warehouse.
- Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.
- Expertise in various Java/J2EE technologies like JSP, Servlets, Hibernate, Struts, spring.
Big Data: Hadoop, Storm, Hbase, Hive, Flume, Cassandra, Kafka, Storm, Sqoop, Oozie, PIG, Spark, MapReduce, ZooKeeper, Yarn, MongoDB, Cassandra, Cloudera.
Operating Systems: UNIX, Mac, Linux, Windows 2000 / NT / XP / Vista, Android
Programming Languages: Java (JDK 5/JDK 6&7), R, HTML, SQL, PL/SQL
Frameworks: Hibernate 2.x/3.x, Spring 2.x/3.x,Struts 1.x/2.x and JPA
Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST, Jersey
Databases: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL 4.x/5.x
Middleware Technologies: Web sphere Message Queue, Web sphere Message Broker, XML gateway, JMS
Testing Frameworks: Mockito, PowerMock, EasyMock
Web/Application Servers: IBM Web sphere Application server, JBoss, Apache Tomcat
Others: Software Borland Star team, Clear case, Junit, ANT, Maven, Android Platform, Microsoft Office, SQL Developer, DB2 control center, Microsoft Visio, Hudson, Subversion, GIT, Nexus, Artifactory
Development Strategies: Agile, Lean Agile, Pair Programming, Water-Fall and Test Driven Development
Confidential, Seattle, WA
Sr. BigData Engineer/Developer
- Gathered the business requirements from the Business Partners and Subject Matter Experts and involved in installation and configuration of Hadoop Ecosystem components with Hadoop Admin.
- Wrote Hive queries for data analysis to meet the business requirements. Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
- Participated in building CDH4 test cluster for implementing Kerberos authentication. Upgraded the Hadoop Cluster from CDH4 to CDH5 and setup High availability Cluster to Integrate the HIVE with existing applications
- Working on architected solutions that process massive amounts of data on corporate and AWS cloud based servers.
- Configure a number of node ( Confidential EC2 spot Instance) Hadoop cluster to transfer the data from Confidential S3 to HDFS and HDFS to AmazonS3 and also to direct input and output to the Hadoop MapReduce framework.
- Involved in HDFS maintenance and loading of structured and unstructured data and imported data from mainframe dataset to HDFS using Sqoop and written the PySpark Script to process the HDFS data.
- Handled importing of data from various data sources (i.e. Oracle, DB2, HBase, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
- Developed prototype Spark applications using Spark-Core, Spark SQL, Data Frame API and developed several custom User defined functions in Hive & Pig using Java & python and i mporting the data into Spark from Kafka Consumer group using Spark Streaming APIs.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Developed ETL framework using Spark and Hive (including daily runs, error handling, and logging) to useful data.
- Written python scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
- Design, Implement and maintain Database Schema, Entity relationship diagrams, Data modeling, Tables, Stored procedures, Functions and Triggers, Constraints, clustered and non-clustered indexes, partitioning tables, Schemas, Functions, Views, Rules, Defaults and complex SQL statement for business requirements and enhancing performance.
- Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
- Written Spark applications using Scala to interact with the MySQL database using Spark SQL Context and accessed Hive tables using Hive Context.
- Extensively used Spring & Hibernate Frameworks and implemented MVC architecture and worked on Spring RESTful for dependency injection.
- Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services and implemented Reporting, Notification services using AWS API.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts and design the solution and develop the program for data ingestion using - Sqoop, map reduce, Shell script & python.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift and responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
- Responsible for migrating tables from traditional RDBMS into Hive tables using Sqoop and later generate required visualizations and dashboards using Tableau.
- Worked on documentation of all Extract, Transform and Load, designed, developed, validated and deploy the Talend ETL processes for Data ware house team using PIG, HIVE.
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data. Implemented scripts for loading data from UNIX file system to HDFS and involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
- Development of data pipeline for ingesting data asynchronously by multiple producers into kafka pipeline.
- Implemented Spark using Scala and also used Pyspark using Python for faster testing and processing of data and developed Python code to gather the data from HBase and designs the solution to implement using PySpark.
- Updated maps, sessions and workflows as a part of ETL change and also modified existing ETL Code and document the changes and developed complex SQL scripts for Teradata database for creating BI layer on DW for Tableau reporting.
Environment: Hadoop, Kafka, Python, MapReduce, Python, HDFS, AWS Glue, Hbase, Hive, Pig, Linux, XML, Eclipse, Kafka, Storm, Spark, Cloudera, CDH4/5 Distribution, DB2, Scala, Redshift, Kinesis, SQL Server, Oracle 12c, MySQL, Talend, MOngoDB, Cassandra, Pyspark, Tableau, Oozie.
Confidential, Chicago, IL
Sr. Big Data Enigineer
- As a Big Data Developer implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, MongoDB, Hive, Oozie, Flume, Sqoop and Talend etc.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data and Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
- Involved in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams with all related entities and relationship with each entity based on the rules provided by the business manager using ER Studio.
- Developed data pipeline programs with Spark Python APIs, data aggregations with Hive, and formatting data (json) for visualization, and generating. E.g. High charts: Outlier, data distribution, Correlation/comparison
- A nalyzed the SQL scripts and designed the Solution to Implement Using PySpark and created custom new columns depending up on the use case while ingesting the data into Hadoop lake using pyspark.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, PairRDD's, Spark YARN.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from SQL into HDFS using Sqoop.
- Installed Hadoop, Map Reduce, and HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods.
- Extensively worked on Python and build the custom ingest framework and w orked on Rest API using python.
- Created DDL's for tables and executed them to create tables in the warehouse for ETL data loads and wsed Pig as ETL tool to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
- Primarily responsible for Tableau customization for statistical dashboard to monitor sales effectiveness and also used Tableau for customer marketing data visualization.
- Imported the data from different sources like HDFS/Hbase into SparkRDD and configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Created Elastic Map Reduce (EMR) clusters and Configured the Data pipeline with EMR clusters for scheduling the task runner and provisioning of Ec2 Instances on both Windows and Linux.
- Worked on AWS Relational Database Services, AWS Security Groups and their rule and implemented Reporting, Notification services using AWS API.
- Involved in converting MapReduce programs into Spark transformations using Spark RDD's on Scala and developed Spark scripts by using Scala Shell commands as per the requirement.
- Implemented using SCALA and SQL for faster testing and processing of data. Real time streaming the data using with KAFKA.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Design & implement ETL process using Talend to load data from Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
- Load the data into SparkRDD and do in memory data Computation to generate the Output response.
- Worked on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Scala, Sqoop and Flume.
- Developed Hive Scripts, Pig scripts, UNIX Shell scripts, programming for all ETL loading processes and converting the files into parquet in the Hadoop File System.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
Environment: Hadoop, Python, HDFS, Spark, AWS Redshift, AWS Glue, MapReduce, Pig, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, Python, Java, SQL Scripting and Talend, Pyspark, Linux Shell Scripting, Kinesis, Cassandra, Zookeeper, HBase, MongoDB, Cloudera, Cloudera Manager, EC2, EMR, S3, Oracle, MySQL.
Confidential, NYC, NY
Sr. Big Data/Hadoop Developer
- Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive and MapReduce.
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning and processing.
- Designed and configured Flume servers to collect data from the network proxy servers and store to HDFS and HBASE.
- Worked on implementing Spark using Scala and SparkSQL for faster analyzing and processing of data.
- Utilized Java and MySQL from day to day to debug and fix issues with client processes
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
- Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services.
- Involved in launching and Setup of HADOOP/ HBASE Cluster which includes configuring different components of HADOOP and HBASE Cluster.
- Hands-on experience of Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology
- Involved in creating Hive tables, loading the data and writing hive queries, which will run internally in map reduce.
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza and Handled in Importing and exporting data into HDFS and Hive using SQOOP and Kafka
- Implemented Reporting, Notification services using AWS API and used AWS ( Confidential Web services) compute servers extensively.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data and involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
- Create Snapshots of EBS Volumes. Monitor AWS EC2 Instances using Cloud Watch and worked on AWS Security Groups and their rules
- Involved in developing Shell scripts to easy execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Generated JavaAPIs for retrieval and analysis on No-SQL database such as HBase and worked with NoSQL databases like Hbase in creating tables to load large sets of semi structured data.
- Created ETL jobs to generate and distribute reports from MySQL database using Pentaho Data Integration.
- Worked on loading data from UNIX file system to HDFS and analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Hadoop, Java/J2EE, HDFS, MapReduce, AWS, EC2, RDS, S3, Cloud Watch, Hive Sqoop, Pig, Hbase, Apache Spark, Oozie Scheduler, Java, UNIX Shell Scripts, Kafka, Git, Maven, PLSQL, MongoDB, HBase, Cassandra, Python, Scala, Teradata, Netezza, Oracle.
Confidential, Cincinnati, OH
Sr. Java/Hadoop Developer
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.
- Involved in writing Client side Scripts using Java Scripts and Server Side scripts using Java Beans and used servlets for handling the business.
- Developed applications in Hadoop BigData technologies- Pig, Hive, Map-Reduce, Hbase and Oozie.
- Created Elastic Map Reduce (EMR) clusters and Configured the Data pipeline with EMR clusters for scheduling the task runner.
- Developed Scala programs with Spark for data in Hadoop ecosystem.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop 2, 3, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
- Developed another user based Web services (SOAP) through WSDL using WebLogic application server and JAXB as binding framework to interact with other components.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Provisioning of Ec2 Instances on both Windows and Linux and worked on AWS Relational Database Services, AWS Security Groups and their rules
- Implemented Reporting, Notification services using AWS API and developed MapReduce jobs using apache commons components.
- Used Service Oriented Architecture (SOA) based SOAP and REST Web Services (JAX-RS) for integration with other systems.
- Collected and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Involved in designing and developing the application using JSTL, JSP, Java script, AJAX, HTML, CSS and collection.
- Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services.
- Created HBasetables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and translate to MapReduce jobs.
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
- Developed Java Web Applications using JSP and Servlets, Struts, Hibernate, spring, Rest Web Services, SOAP.
- Worked over the entire Software Development Life Cycle (SDLC) as a part of a team as well as independently.
- Written SQL queries to query the database and providing data extracts to users as per request.
Environment: Java 1.5, JSP, Servlet, Spring, AWS EC2, RDS, S3, Hibernate 3.0, TDD, Struts framework, Hadoop, Map Reduce, HDFS, HBase, Hive, Pig, Sqoop, Flume, Kafka, Spark, Scala, ETL, Cloudera CDH ApacheHadoop, HTML, XML, Log 4j, Eclipse, Unix, Windows XP
- Provide support in all phases of Software development life cycle (SDLC), quality management systems and project life cycle processes. Utilizing Database Such as MYSQL, Following HTTP and WSDL Standards to Design the REST/ SOAP Based Web API’S using XML, JSON, HTML, and DOM Technologies.
- Involved in Installation and Configuration of Tomcat, Spring Source Tool Suit, Eclipse, unit testing.
- Back end server side coding and development using Java data structure as a Collections including Set, List, Map, Exception Handling, Vaadin, Spring with dependency injection, Struts Framework, Hibernate, Servlets, Action, Action Forms &Java beans, etc.
- Involved in Migrating existing distributed JSP framework to Struts Framework, designed and involved in research of Struts MVC framework
- Developed Ajax framework on service layer for module as benchmark and implemented Service and DAO layers in between Struts and Hibernate.
- Used agile practices and Test Driven Development (TDD) techniques to provide reliable, working software.
- Applied MVC pattern of Ajax framework which involves creating Controllers for implementing Classes.
- Responsible to enhance the UI using HTML, Java Script, XML, JSP, CSS as per the requirements and providing the client side using JQuery validations.
- Involved in write application level code to interact with APIs, Web Services using AJAX, JSON and XML.
- Wrote lots of JSP's for maintains and enhancements of the application. Worked on Front End using Servlets, JSP and also backend using Hibernate.
- Implemented the Application using many of the Design Patterns and Object Oriented Process in the view of future requirements of Insurance domain.
- Used JAXB for converting Java Object into a XML file and for converting XML content into a Java Object.
- Web services were built using Spring and CXF operating within Mule ESB; offering both REST and SOAP interfaces.
- Used Maven as the build tool for the application and used JIRA for bug/task tracking and time tracking.