- Over 7 years of professional IT experience including 4+ years of strong experience working on Apache Spark, Big Data / Apache Hadoop ecosystem and 3 years of strong end - to-end experience in Java Programming involved in Design, Developing and implementing various web-based applications using Java, J2EE Technologies.
- Hands on experience in developing and deploying enterprise-based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Spark, Map Reduce Impala, Kafka, Storm, Oozie, HBase, Flume, Sqoop and Zookeeper.
- Experience in working with Amazon Web Service components like EC2, IAM, SQS, SNS, Elastic Beanstalk, DynamoDB, Cloud watch, EMR, S3.
- Configuring and Installing HortonWorks Hadoop cluster on 10 nodes in Test Environment using Amazon EC2 instances and Amazon Elastic block Storage volumes.
- Good experence in developing applications using Scala,Spark, Java and linux shell scripting.
- In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Spark Streaming.
- Experience in using Scala to convert Hive/SQL queries into RDD transformations in Spark.
- Experience in deploying and managing the SOLR from development to production.
- Strong knowledge of real time data analytics using Spark Streaming, Kafka & Flume.
- Proficient knowledge with kafka and spark with YARN Local & Standalone modes.
- Expertise in writing Spark RDD transformations, Actions, Case classes for input data and performing data transformations using Spark-Core
- Implementing Scheduler using Crontab, Airflow and Oozie.
- Experience in using DStreams, Broadcast Variables, RDD caching for Spark Streaming.
- Improving the performance and optimizing existing algorithms in Hadoop using Spark context, Spark-SQL, DataFrames,Pair RDD's & Spark YARN.
- Hands on experience with ORC, AVRO, Sequence and Parquet file formats.
- Experience in analyzing data using PIG Latin, HiveQL, Spark SQL
- Experience with Hadoop Distributions like Cloudera and Hortonworks.
- Extensive knowledge on designing Hive Manged/External tables,Views & Hive Analytical functions.
- Experience in tuning the performance of hive queires using Partioning and Bucketing.
- Experience working with FLUME to handle large volume of streaming data ingestion.
- Experience in developing customized UDFs and UDAFs to extend core functionality if PIG and Hive.
- Experience in various Big Data application phases like Data Ingestion, Data analytics and Data visualization.
- Proficient in working with NoSQL databases such as HBase, MongoDB, DynamoDB and Cassandra.
- Expertise in writing pig and hive queries for analyzing data to meet business requirements.
- Experience in design and pipeline flows with Jenkins, Tonomi and Azkaban.
- Exposed to build tools like MAVEN,SBT and bug tracking tool JIRA in the work environment.
- Experience with SVN and GIT for code management and version control.
- Good Knowledge in scheduling Job/Workflow and monitoring tools like Oozie & Zookeeper.
- Hands on Experience in Importing/Exporting Data from RDBS to HDFS using SQOOP.
- Experience in working with large scale Hadoop environments build and support including Design, configuration, installation, performance tuning and monitoring
- Excellent programming skills at high level abstraction using Java, Scala, Python & SQL.
- Experienced in developing enterprise applications with various open source frameworks like Java/J2EE, JSP, JSTL, Servlets, EJB, Struts.
- Experiences in Java Spring framework. Implemented Spring Core, Spring Boot, Spring MVC, Spring Data JPA, and AOP.
- Hands on experience on implementation projects like Agile /Waterfall methodologies.
Big Data Ecosystem: HDFS, YARN, Spark, Hive, Pig,Spark Streaming, Spark SQL,Oozie, ZooKeeper, Impala,Ambari.
Big Data Ingestion: Apache Sqoop, Apache Kafka, Apache Flume, Apache Spark, Storm.
Amazon Technologies: EC2, S3, EMR, SNS, SQS, Route53, Cloud Watch,Kinesis, SWF, Elastic Beanstalk, Redshift, Dynamo DB, IAM.
Big Data Distribution: Hortonworks, Cloudera.
Build Tools/ IDE: Maven, Jenkins, SBT, Ant, Eclipse, Intellij, Sublime, MS Visual Studio, Net Beans.
Programming Languages: Scala, Core Java, Java Script, Python, SQL, PL/SQL, HTML,CSS,JQuery,Bootstrap,JSON,XML, Linux Shell Scripting.
Databases: MONGO DB, MySQL,Oracle 11g, Hbase, Netezza.
Database Tools: Oracle Netca, Data pump, DBCA, Data Guard.
Version Controls: GIT, SVN .
Operating Systems: Linux, Windows, Kali Linux.
Data Visualisation: Tableau.
Cluster Management Tools: Ambari, Hue, Cloudera Manager, Zookeeper, Oozie.
JAVA/J2EE Technologies: Struts, JSON, JSP, Junit, JDBC, AJAX.
Methodologies: Agile/Scrum, Waterfall.
- Involved in Analyzing, Designed, Coded, Implemented and delivered ecommerce project to DEV, UAT and Production.
- Hands on experience in loading data pipelines from various webservers and Terradata using SQOOP with kafka.
- Experience in creating tables,altering at run time without blocking updates and quires using Hive and HBase.
- Implemented ETL for Raw and Gold layers by using Hive and Pig scripts
- Closely communicated with DevOps team for fixing project related configurations for Sqoop, Hive, Spark issues.
- Writing Hive UDFS to extract data from staging tables and analyzed the web log data using the HiveQL.
- Implemented streaming using Scala and Kafka to get Kohl's ecom orders data in daily three times.
- Involved in new Spark jobs in Scala to analyze the customers data and sales history.
- Developed Pig scripts for the analysis of semi structured data.
- Experienced in designing and deployment of Hadoop cluster and various Big Data components including HDFS, Map Reduce, Hive, Pig, Oozie, Zookeeper, Sqoop.
- Experienced in Oozie operational service to create workflows and automate the MapReduce, Hive, Pig jobs.
- Implemented design and pipeline flows with Jenkins, Tonomi and Azkaban for Ecom project.
- Extensivily worked on Text,ORC,AVRO and parquet file formats .
- Worked on Sequence file, RC files and Mapside joins, Bucketing, Partitioning for improving hive performance and storage improvement and utilizing Hive SerDes like AVRO and JSON .
- Design Internal and External tables to optimize the performance.Experience in Pig scripts for Sorting, Joining, Grouping the data.
- Experience in developing Spark programs to parse the raw data,populate staging tables and store the refined data in Partitioned table in enterprise data warehouse.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Involved in implementing ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the Spark Cluster.
- Used Spark-SQL to load JSON data, created schema RDD’s and loaded into Hive tables and handled structured data using Spark-SQL.
- Experience in writing hive quires in Spark SQL for analysing and processing data.Used Scala Program for performing transformations and applying business logic.
- Implemented Apache Nifi flow topologies to perform cleansing operations.
- Proficient in using XML and JSON Serde’s for Serialization and de-Serialization to load JSON and XML data into hive tables.
- Hands-on experience with Amazon EC2, Amazon S3 for the computing and storage of data.
- Developed and maintained the continuous integration and deployment systems using Jenkins, ANT, Akka and MAVEN.
- Expert in testing raw data and executing performance scripts.
- Hands on experience to extract files from MongoDB using sqoop and ingesting it to HDFS and processed.
- To ensure Data quality and availability, collabareted with Infrastructure,network,database and BI teams.
- Experience in working with Agile environment in sprint cycles of two weeks and Participated in daily Scrum and other stand up meetings.
Environment: HDFS, Pig, JSON, Spark, Scala, Hive, Sqoop, Oozie, Kafka, Zookeeper, SQL, Impala, YARN, Jenkins, JIRA, Amazon AWS, Cassandra,GIT, Tonomi, Azkaban, Nifi,Kubernets, Docker,Kibana.
- Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
- Responsible for building scalable distributed data solutions using hadoop.
- Developed Scala Scripts, UDFs using both Dataframes in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experience in analysing and manipulating large datasets and finding patterns & insights within Structured and Unstructured data .
- Experience in handling large datasets using Partitions, Spark in Memory capabilities.
- Experience in Broadcasts in spark, effective and efficient Joins,Transformations and other operations during Data Ingestion.
- Experience in operating the cluster on AWS using EC2,EMR,S3 and Elastic Search.
- Using AWS services like EC2 and S3 for small data sets.
- Experience in creating Hive Tables and load the data using Sqoop and worked on them using Hive.
- Hands on experience with Amazon EC2, S3, EMR, CLOUDWATCH, IAM for computing and storage of data.
- Experience in performing Transformations and Actions on RDD’s by importing data from Amazon S3 into Spark RDD.
- Worked on various file formats like JSON, AVRO, ORC and Parquet.
- Experience in migrating HQL into Impala to minimize query performance.
- Experience in Continously monitored and managed hadoop cluster using Cloudera manager.
- Experience in identifying the issues reported by QA with hadoop jobs by configuring to local file system.
- Experience in Job management Fair Schedular and developed job processing scripts using Oozie workflow to run multiple Spark jobs in sequence for processing data.
- Involved in Setting up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig Map Reduce to access cluster for new users.
- Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable.
- Involed in Optimizing the Hive Queries using Map-side join,Partitioning,Bucketing and Indexing.
- Involved in planning process of iterations under the Agile Scrum Technology.
Environment: Hadoop, Pig, Hive, Scala, Spark, Oozie, Zookeper, YARN, Impala, Sqoop, MapReduce, Kafka, Amzon AWS,Kerberos, GIT, Amazon EMR, MongoDB, Flume.
- Involved in Design and Development phase of SDLC using Scrum methodology.
- Performed various transformations for analyzing the datasets by using Cloudera Distribution for Hadoop Ecosystem.
- Experience with Pig, Hive, Hbase, and Sqoop for analyzing the hadoop cluster as well as BigData.
- Experience in Writing and Implementing Apache Pig scripts to load and store data into Hive.
- Worked with Hbase in creating tables to load large datasets of Semi structured data coming from various sources.
- Created Hive UDFs and UDAFs using Java code based on business requirement.
- Used Pig to perform Data validation on the data ingested using Sqoop and Flume and the cleansed data set it pushed into Hbase.
- Involved in using Hcatalog to acces hive table metadata from Pig or Mapreduce.
- Involved in Installing Oozie workflow engine to run multiple Mapreduce,Hive and Pig jobs which run independently with time and date availability.
- Involved in creating Hive tables, loading the data and writing hive quires which will invoke and run Mapreduce jobs in the backend and further Partioning and Bucketing was done when required.
- Responsible for submitting and tracking Map Reduce jobs using Job Tracker.
- Worked with different file formats like Text,Sequence,Avro,RC files, bucketing,partioning for hive performance enhancement.
- Rendered and delivered reports in desired formats by using reporting tool such as Tableau.
- Handing importing of data from various data sources, performed transformations using hive and load to HDFS and extract data from MYSQL into HDFS using SQOOP.
- Used Multi-threading concepts and clustering concepts for data processing.
- Collaorated with Java teams for creating Mapreduce programs to parse the data for claim report generation and running the Jars in Hadoop.
- Writing Java programs to retrieve data from HDFS and providing REST services.
- Implemented Struts tab libraries for HTML,beans and tiles for developing UI.
- Used JUnit for unit testing for integration testing.
Environment: Hadoop, Hbase, HDFS, Pig, Hive, Mapreduce, Java, Sqoop, MySQL, SQL, Oozie, HiveQL, Hcatalog, Eclipse, Tableau, JUnit, Cassandra, Sqoop.
- Designed and developed Java Enterprise Web based MVC architectured application.
- Involved in analysis, design and development and testing phases of the application development using Agile Scrum methodology.
- Involved in producing and consuming SOAP beased web services using JAX-WS.
- Implimented Java Spring framework to impliment Spring MVC architecture for the web application.
- Implemented Struts Custom Exceptions, which helps in redusing the development time to handle the exceptions.
- Implemented Hibernate to impliment Hibernate annotaions to map POJO classes to the tables in the Datebase. Also to implement type of relationships between them.
- Used Spring Data JPA for querying the database and implemented Spring Security for applying filters to the incoming requests.
- Develpoed Server side utilities using J2EE technologies like Servlets, JSP and JSTL.
- Written SQL queries for Insertng, Updating and Removing data from the MySQL database.
- Implemented unit test cases by using Junit and Mockito, Log4J is used for Logging and Debugging.
- Used Maven for building the application to respective WAR/JAR files and pused them to the server.
- Implemented Continious Integration / Continious Deployment Using Jenkins.
- Used SVN as the respository and for the version control.
- Deployed, tested and logged the application using IBM Websphere application server.
- Involved in Requirements Analysis, Design and Development phase of the project with major emphasis on development of the modules.
- Involved in various phases of the Software Development Life Cycle of the application like Requirement gathering, Design, Analysis and Code Development.
- Followed Test Driven Development and Agile methodologies in the development process.
- Developed and Documented the Detailed Design of the new Requirements by proposing the feasibility of implementing them in technical aspect.
- Created a detailed Construction plan with all the modules to be developed, their sub tasks and the time needed for their completion.
- Worked on Controller Servlets that dispatches requests to appropriate class.
- Developed Web Services by using Restful API and a Restlet implementation as a RESTful framework.
- Used AJAX for a responsive design in the User interface.
- Involved in writing the SQL Quires, Stored Procedures and triggers.
- Created Connection pool Java Classes using JDBC drivers to communicate with the database.
- Implementing the Unit test cases and Integration test cases using Junit.
- SOAP web services have been consumed through WSDL files whenever information is needed like Verification/Validation and Continuous Process flow.
- Developed and Debugged the application using Eclipse IDE.
- Implemented Maven as a Build and Configuration tool.
- Used Apache Tomcat Application Server for application deployment.
- Used CVS as repository and version control tool.
Environment: Agile, Java, J2EE, Spring, Spring MVC, JSP, HTML, CSS, Servlets, REST, AJAX, SQL, JDBC, Junit, WSDL, SOAP, Eclipse, Maven, Apache Tomcat, CVS.
- Involved in complete software development life cycle(SDLC) using Agile including effort estimations.
- Designing of Business flows and implementation of Business Process Implementation Human Services for different teams.
- Involved in Project Design Documentation, Design Reviews and Code Reviews.
- Developed Use Case Diagrams, Class Diagrams and Sequence UML Diagrams using Rational Rows based on the requirements.
- Developed Application on Java 1.3 and implemented Core java concepts like Multithreading, Collections and Generics.
- Extensively worked on developing application using J2EE platform and implemented the Model View Controller(MVC) architecture using STRUTS.
- Developed Servlets for communication between the backend server and the frontend.
- Used XML formatted data for sending data from/to the server.
- Responsible for exposing different capabilities of web services by means of SOAP/WSDL.
- Using JDBC Drivers and created connection pool to connect to the Oracle database.
- Worked on designing the architecture of the schemas in the Oracle database.
- Wrote Stored procedures in PL/SQL for Data Entry and Data Retrieval.
- Used Log4J for logging and Debugging the applications.
- Involved in Unit testing support using Junit.
- Used Ant for Building the application to JAR/WAR files.