- Around 9 years of experience in the field of Information Technology which includes a major concentration on Big Data Tools and Technologies, various Relational Databases and NoSQL Databases, Java Programming language and J2EE technologies various sectors like banking, Healthcare and pharmaceutical.
- Having 4+ years of experience in the field of software development creating solutions using Enterprise Applications and Web based Applications using JAVA & J2EE Technologies.
- Having 4 years of experience as a Big Data Engineer with good understanding of Hadoop framework, Big Data Tools like Map - Reduce, HDFS, Yarn/MRV2, Pig, Hive, Sqoop, kafka, flume, HBase, spark, oozie and Technologies for implementing Data analytics.
- Hadoop developer: Excellent hands on experience using Hadoop tools like HDFS, Hive, Pig, Apache Spark, Apache Sqoop, Flume, Oozie, Apache Kafka, Apache storm, Yarn, Impala, Zookeeper, Hue . Experience in analyzing data using HiveQL, Pig Latin, and MapReduce Programs.
- Experienced in ingesting data into HDFS from various Relational databases like MYSQL, Oracle, DB2, Teradata, Postgres using sqoop.
- Experienced in importing real time streaming logs and aggregating the data to HDFS using Kafka and Flume.
- Excellent knowledge on creating real-time data streaming solutions using Apache storm, spark streaming and building spark applications using scala.
- Well versed with various Hadoop distributions which includes Cloudera(CDH), Hortonworks(HDP) and knowledge on MAPR distribution.
- Experienced in creating various tables in Hive which include Managed Tables and External tables and loading data into Hive from HDFS.
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Implemented Pig scripts for analyzing large data sets in the HDFS by performing various transformations.
- Experience in analyzing data using HiveQL, PigLatin, HBase.
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting system application architecture.
- Experience working on NoSQL Databases like HBase, Cassandra and MongoDB.
- Experience in Python, Scala, shell scripting, Spark R.
- Experience in Creating various Oozie jobs to manage processing workflows with actions that run Hadoop MapReduce and Pig jobs.
- Experience in using AWS Cloud components S3, EC2, EMR, IAM, RDS, Elastic beanstalk and DynamoDB.
- Having knowledge network authentication protocol Kerberos .
- Experience in using various file formats including XML, JSON, CSV and other file formats like text, sequence files, avro, ORC and Parquette using various compression techniques like snappy, gzip,LZO.
- Experience with Testing Map Reduce programs using MRUnit, Junit and EasyMock.
- Knowledge on Machine Learning algorithms and Predictive Analysis using spark MLLib, Mahout and leveraging them using spark R.
- Experience on ETL methodology for supporting Data Extraction, transformations and loading processing using Hadoop.
- Worked on data visualization tools like Tableu and also integrated the data using ETL tool Talend.
- Worked on various Relational Databases like Teradata, Postgres, MySQL, Oracle 10g, DB2.
- Hands on development experience with JAVA, Shell Scripting, RDBMS, including writing complex SQL queries, PL/SQL, views, stored procedure, triggers, etc .
- Diverse experience in utilizing Java tools in business, Web, and client-server environments including Java Platform, J2EE, EJB, JSP, Java Servlets, Junit, Java database Connectivity (JDBC) technologies and application servers like Web Sphere and Weblogic.
- Experience on various build tools like ANT, MAVEN, Graddle, SBT.
- Knowledge on creating dashboards/reports using reporting tools like Tableu, Qlickview.
- Development experience with IDE’s Eclipse, NetBeans, IntelliJ and repositories SVN, GIT and CVS.
- Having good experience in different software methodologies like waterfall and agile approach.
- Knowledge on writing YARN applications.
- Familiarity in working with popular frameworks likes Struts, Hibernate, Spring MVC and AJAX and Web Services using XML, HTML and SOAP.
- Passionate about working on the most cutting-edge Big Data technologies.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
- Willing to update my knowledge and learn new skills according to business requirement.
Hadoop Technologies: HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Flume, Oozie, Zookeeper, Ambari, Hue, Spark, Strom, Kafka, Yarn, NiFi,Ganglia, TEZ
Operating System: Windows, Unix, Linux
Languages: Java, J2EE, SQL, PL/SQL, Shell Script, Python, scala,R
Testing tools: Junit, MRunit, EasyMock
SQL Databases: MySQL, Oracle 11g/10g/9i, SQL Server,TeraData,Postgres
NoSQL Databases: HBase, Cassandra, MongoDB, Neo4j,Redshift
File System: HDFS
Reporting Tools: Tableau, Qlickview
IDE Tools: Eclipse, NetBeans, Spring Tool Suite, IntelliJ
Application Server: IBM WebSphere, Web Logic, JBoss
Version control: SVN, GIT and CVS
Build Tools: Maven, Graddle, ANT,SBT.
ETL Tools: Talend, Datastage, Informatica.
Messaging & Web Services Technology: SOAP, WSDL, REST, UDDI, XML, SOA, JAX-RPC, IBM WebSphere MQ v5.3, JMS.
Big Data Developer/ Spark Developer
- Responsible for building scalable distributed data solutions using Hadoop.
- Importing different log files using Apache Kafka into HDFS and performed data analytics using spark.
- Involved in importing the data from various data sources into HDFS using Sqoop and applying various transformations using Hive, Spark and then loading data into Hive tables.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra .
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using kafka.
- Experience in developing Kafka consumers and Kafka producers by extending low level and high level consumer and producer API’s.
- Involved in converting Hive/SQL queries into spark transformations using spark RDDs and python(pyspark).
- Involved in running analytics workloads and long running services on Apache Mesos cluster manager.
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Experience in developing various Spark Streaming API’s using python. (pyspark).
- Developing spark code using pyspark to applying various transformations and actions for faster data processing.
- Working knowledge on Spark Streaming API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.
- Used Spark Stream processing to get data into in-memory, implemented RDD transformations, and performed actions.
- Developed various Kafka Producers and consumers for importing various transaction logs.
- Used Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka .
- Involved in integrating HBase with pyspark to import data into HBase and also performed some CRUD operations on Hbase.
- Used various HBase commands and generated different Datasets as per requirements and provided access to the data when required using grant and Revoke
- Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
- Developed various spark applications using pyspark and numpy.
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Experience in working with Elastic MapReduce(EMR) and setting up environments on amazon AWS EC2 instances.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
- Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
- Loaded and performed some transform data into Hadoop cluster from large set of structured data using Talend Big data studio .
- Worked with different File Formats like textfile, avro, orc for HIVE querying and processing based on business logic.
- Experience in pulling the data from AWS Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
- Involved in writing Custom Talend jobs to ingest, enrich and distribute data in Hadoop ecosystem.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Developed efficient ETL processes, including workflows and jobs scheduling, to move data from source to target according to requirements using Talend.
- Implemented Hive, Pig UDF's to implement business logic and Responsible for performing extensive data validation using Hive.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
- Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Knowledge on Machine Learning algorithms like clustering, classification and regression.
- Written multiple Map Reduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Implemented various machine learning algorithm based on business logic using SparkMLLib.
- Used Talend Open Studio for data integration and for data migration from various location across the business.
- Integrated data quality plans as a part of ETL processes using Talend.
- Experience in build scripts using Maven and did continuous system integrations like Jenkins.
- Used JIRA for bug tracking and GIT for version control.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Cloudera, Map Reduce, HDFS, Pig, Scala,Hive, Sqoop, Spark, Kafka, Oozie, Java, Linux, Maven, HBase, Zookeeper, Kerberos, Tableau, Python,Talend Open studio, AWS .
Confidential, Philadelphia, PA
Big Data Engineer
- Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
- Involved in collecting, aggregating and moving data from servers to HDFS using Flume.
- Collecting data from various Flume agents that are imported on various servers using Multi-hop Flow.
- Knowledge on various flume sources, channels and sink by which data is ingested into HDFS
- Responsible for performing various transformations like sort, join, aggregations, filter in-order to retrieve various datasets using spark.
- Experience in extracting appropriate features from datasets in-order to handle bad, null, partial records using spark SQL.
- Worked on storing the dataframe into hive as table using Python(PySpark).
- Experienced in ingesting data into HDFS from various Relational databases like Teradata using sqoop and exported data back to Teradata for data storage.
- Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
- Experience in developing various spark application using Spark-shell(Scala).
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Designed and implemented Incremental Imports into Hive tables and writing Hive queries to run on TEZ.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts
- Implemented the workflows using Apache Oozie framework to automate tasks
- Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs. Worked with Avro Data Serialization system to work with JSON data formats.
- Exported data to Cassandra(NoSQL) database from HDFS using sqoop and performed various CQL commands on Cassandra to obtain various datasets as required.
- After performing all the transformations data is stored in MongoDB(NOSQL)using Sqoop.
- Created and imported various collections, documents into MongoDB and performed various actions like query, project, aggregation, sort, limit.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Created Hive UDFs and UDAFs using python scripts & Java code based on the given requirement
- Automated all the jobs to pull the data and load into Hive tables, using Oozie workflows
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Knowledge on microservices architecture in spring Boot integrating with various restful webservices.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig Scripts.
Environment: Hadoop, HDFS, Map Reduce, spark, Sqoop, Oozie, Pig, Kerberos, Hive, Flume, TEZ, LINUX, Java, Eclipse, Cassandra,python, MongoDB.
- Responsible for loading the customer's data and event logs from MSMQ into HBase using Java API.
- Created HBase tables to store variable data formats of input data coming from different portfolios.
- Involved in adding huge volumes of data in columns to store data in HBase.
- Used Sqoop for transferring data from HBase to HDFS and vice versa.
- Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
- Involved in initiating and successfully completing Proof of Concept on FLUME for Pre-Processing, Increased Reliability and Ease of Scalability over traditional MSMQ.
- Used Flume to collect the log data from different resources and transfer the data type to hive tables using different SerDe’s to store in JSON, XML and Sequence file formats.
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie workflows.
- Involved in creating Hive tables, loading data and running hive queries in those data.
- Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- End-to-end performance tuning of Hadoop clusters and MapReduce routines against very large data sets.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Monitored Hadoop cluster job performance and performed capacity planning and managed nodes on Hadoop cluster.
- Proficient in using Cloudera Manager, an end to end tool to manage Hadoop operations.
Environment: : Hadoop (CDH4), BigData, HDFS, Pig, Hive, MapReduce, Sqoop, Cloudera manager, LINUX, FLUME, HBase, Pig, Hive
Confidential, Atlanta, GA
- Used Hibernate ORM tool as persistence Layer - using the database and configuration data to provide persistence services (and persistent objects) to the application.
- Implemented Oracle Advanced Queuing using JMS and Message driven beans.
- Responsible for developing DAO layer using Spring MVC and configuration XML's for Hibernate and to also manage CRUD operations (insert, update, and delete).
- Implemented Dependency injection of spring frame work.
- Developed and implemented the DAO and service classes.
- Developed reusable services using BPEL to transfer data.
- Participated in Analysis, interface design and development of JSP.
- Configured log4j to enable/disable logging in application.
- Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
- Responsible for deploying the application using WebSphere Server and worked with SOAP, XML messaging.
- Implemented PL/SQL queries, Procedures to perform data base operations.
- Wrote UNIX Shell scripts and used UNIX environment to deploy the EAR and read the logs.
- Used JUnit to develop Test cases for performing Unit Testing.
- Used Building tools like Maven to build, package, test and deploy application in the application server.
- Actively involved in code review and bug fixing for improving the performance.
- Involved in code deployment activities for different environments.
- Agile Scrum Methodology been followed for the development process.
JAVA/ J2EE Developer
- Responsible and active in the analysis, design, implementation and deployment of full software development lifecycle(SDLC) of the project.
- Developed Struts action classes, action forms and performed action mapping using Struts framework and performed data validation in form beans and action classes.
- Extensively used Struts framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
- Developed build and deployed scripts using Apache ANT to customize WAR and EAR files.
- Used DAO and JDBC for database access.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in post-production support and maintenance of the application.
Environment: : Oracle, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat .
Jr. Java/ Web Developer
- Implemented the project according to the Software Development Life Cycle (SDLC).
- Analyzing and Preparing the requirement Analysis Document.
- Involved in developing Web Services using SOAP for sending and getting data from external interface.
- Involved in requirement gathering, requirement analysis, defining scope, and design.
- Worked with various J2EE components like Servlets, JSPs, JNDI, JDBC using Web Logic Application server.
- Involved in developing and coding the Interfaces and classes required for the application and created appropriate relationships between the system classes and the interfaces provided.
- Assisting project managers with drafting use case scenarios during the planning stages.
- Developing the Use Cases, Class Diagrams and Sequence Diagrams.
- Used Java Script for client-side Validation.
- Involved in Database design and developing SQL Queries, stored procedures on MySQL.