- Total 8 years of industry experience including 4 years in developing, implementing and configuring Hadoop ecosystem and 4 years in developing application using java, J2EE
- Experience in the Hadoop ecosystem components like HDFS, Spark with Scala and pythonZookeeper, Yarn,MapReduce, Pig, Sqoop, HBase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Flume, and TEZ
- Hands on experience in developing SPARK applications using Spark API's like Spark core, Spark Streaming,Spark MLlib and Spark SQL and worked with different file formats such as Text, Sequence files, Avro, ORC, JSON and Parquette.
- Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
- Changed map - reduce jobs and Hive scripts with Spark Data-Frame transformation and action.
- Excellent knowledge on Spark Architecture and Hadoop Architecture and its ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in working with Hive data warehouse tool-creating tables, distributing data by doing static partitioning and dynamic partitioning, bucketing and using hive optimization techniques.
- Hands on experience working with NoSQL databases such as HBase, MongoDB and Cassandra. Used HBase in accordance with PIG/Hive as and when required for real time low latency queries
- Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
- Experience in tuning and debuggingSpark application and using spark optimization techniques.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database systems (RDBMS) and vice-versa.
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
- Experience in configuring and working with Flume and Kafka to load the data from multiple web sources directly into HDFS.
- Developed various shellscripts and pythonscripts to automate Spark jobs and hive scripts.
- Scheduled job workflow for FTP, Sqoop and hive scripts using Oozie coordinators.
- Involved in ingesting data into HDFS using Apache Nifi. Developed and deployed Apache Nifi flows across various environments, optimized Nifi data flows and written QA scripts in python for tracking missing files.
- Imported the data from different sources like AWSS3, Local file system into Spark RDD and worked on cloud Amazon Web Services (EMR, S3, EC2, Lambda).
- Experience with developing and maintaining Applications written for Amazon Simple Storage, AWS Elastic Beanstalk, and AWS Cloud Formation.
- Dealt with huge transaction volumes while interfacing the front-end application written inJava, JSP, Hibernate, Spring
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
- Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).
- Hands-on experience in core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization of streaming applications
Big Data: HDFS, Zookeeper, Yarn, MapReduce, PIG, SQOOP, HBase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Scala, TEZ, AWS, S3, EC2, EMR, Lambda, Shell Scripting, Cloudera, Horton Works.
Big Data Clusters: CDH, Hortonworks
Operating System: Windows, Linux, Unix.
Languages: Java, Scala, Python
Frameworks: Spring, Hibernate
Databases: Oracle, SQL Server, MySQL, MS Access.
Web Technologies: JSP, Servlets, HTML, CSS, Java Script, JDBC, SOAP, Ajax
Cloud Platform: Amazon Web services (AWS)
IDE: Eclipse, STS, IntelliJ, Maven, ANT
Web/App Server: Apache Tomcat, Glassfish
Confidential, Los Angeles
Sr. Spark/Scala/Python Developer
- Worked on Big Data infrastructure for batch processing and real time processing. Built scalable distributed data solutions using Hadoop.
- Importing and exporting tera bytes of data using Sqoop and real time data using Flume and Kafka .
- Written Programs in Spark using Scala and Python for Data quality check.
- Created various hive external tables, staging tables and joined the tables as per the requirement. Implemented static Partitioning , Dynamic partitioning and Bucketing in Hive using internal and external table.
- Written transformations and actions on data frames, used Spark SQL on data frames to access hive tables into spark for faster processing of data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs , Python and Scala .
- Used Hive to do transformations, joins, filter and some pre-aggregations after storing the data to HDFS .
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Implemented the workflows using Apache Oozie framework to automate tasks. Used Zookeeper to co-ordinate cluster services.
- Have used Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema, snowflake schema in the project.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real time processing using Kafka .
- Performed various benchmarking steps to optimize the performance of spark jobs and thus improve the overall processing.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive andinvolved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud ( EC2 ) and Amazon Simple Storage Service ( S3 ).
- Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau.
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
Environment: Hadoop, MapReduce, HDFS, Yarn, Hive, Sqoop, Cassandra, Oozie, Spark, Scala, Python, AWS, Flume, Kafka, Tableau, Linux, Shell Scripting.
Environment : Hadoop, HDFS, Hive, Sqoop, Flume, Kafka, Spark, Shell Scripting, Cassandra, Scala, Python scripting, Agile, Zoo Keeper, Maven, AWS EMR, MySQL, Pig, Cloudera, Zoo Keeper, Tableau.
- Worked on Big Data infrastructure for batch processing and real-time processing using Apache Spark. Built scalable distributed Hadoop cluster running Hortonworks Data Platform
- Responsible for design and development of Spark SQL Scripts based on Functional Specifications.
- Worked on the large-scale Hadoop YARN cluster for distributed data processing and analysis using Spark, Hive.
- Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
- Developed Apache Spark applications by using spark for data processing from various streaming sources.
- Processed the Web server logs by developing Multi-Hop Flume agents by using Avro Sink and loaded into MongoDB for further analysis. Extracted files from MongoDB through Sqoop and placed in HDFS and processed it using Hive.
- Writing MapReduce programs to convert text files into AVRO and loading into Hive table.
- Designed and developed functionality to get JSON document from MongoDB document store and send it to client using RESTful web service.
- Optimized Hive QL/ pig scripts by using execution engine like TEZ . Tested Apache TEZ , an extensible framework for building high performance batch and interactive data processing applications on Pig and Hive job
- Developed a Nifi Workflow to pick up the data from SFTP server and send that to Kafka broker. Loaded D-Stream data into Spark RDD and did in-memory data computation to generate output response.
- Spark Streaming collects data from Kafka in near-real-time and performs necessary transformations and aggregation to build the common learner data model and stores the data in NoSQL store (MongoDB)
- Generated various kinds of reports using Pentaho and Tableau based on Client specification.
Environment: Hadoop, Sqoop, Hive, Pig, Oracle, Java, Nifi, Spark, Scala, Java, Mongo DB, Eclipse IDE, Horton Works.
Confidential, New York
- Experience in installing Hadoop cluster using different distributions of cloud era distribution
- Written MapReduce code to parse the data from various sources and storing parsed data into HBase and Hive.
- Created HBase tables to store different formats of data as a backend for user portals.
- Developed Kafka producer and consumers, HBase and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Migration of huge amounts of data from different databases (i.e. Oracle, SQL Server) to Hadoop.
- Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive. Ingest data into Hadoop /Hive/HDFS from different data resources.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Experienced in managing and reviewing the Hadoop log files, used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Written helper classes using the Java Collection Framework, Written, JUnit Test Cases for the classes developed.
- Utilized Flume to filter out the input data read to retrieve only the data needed to perform analytics by implementing flume interception.
- Worked on Pig script to count the number of times a particular URL was opened in a particular duration. Later a comparison of the count of various other URL's shows the relative popularity of that particular website among employees.
- Written unit testing querying for newly developed components using Junit, Involvement in Automation Environment setup using Eclipse, java, selenium web driver java language bindings and TestNG jars.
Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, Flume, Unix, Java/J2EE, JDBC, Junit, JSON, MAVEN, Cloudera, HBASE
Hadoop/ Java Developer
- Creating MapReduce programs to parse the data for claim report generation and running the Jars in Hadoop. Co-ordinated with Java team in creating MapReduce programs
- Created reports for the BI team using Sqoop to export data into HDFS and Hive
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis
- Worked on Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Collaborated with BI teams to ensure data quality and availability with live visualization
- Worked on creating Pig scripts for most modules to give a comparison effort estimation on code development
- Written Java program to retrieve data from HDFS and providing REST services.
- Shared responsibility and assistance for administration of Hadoop, Hive, Sqoop, HBase and Pig in team
- Implemented various MapReduce Jobs in custom environments and updating them to HBase tables by generating hive queries
- Worked on tuning the performance Pig queries and involved in loading data from LINUX file system to HDFS. Importing and exporting data into HDFS using Sqoop (version 1.4.3) and Kafka
- Experience working on processing unstructured data using Pig and Implemented Partitioning, Dynamic Partitions, Buckets in Hive
- Used JUnit for unit testing and Continuum for integration testing
Environment: Cloudera, Hadoop, Pig, Sqoop, Hive, HBase, Java, Eclipse, MySQL, MapReduce
- Involved in Design and Development of Search, Executive Connections and Compensation modules.
- Used Hibernate ORM framework with spring framework for data persistence and transaction management
- Designed the technical specifications, components based on the client requirement
- Identify the Stories from Backlog relating to Release plan prior to start the Sprint
- Involved in planning the Stories and tasks to be taken up sprint by sprint basis
- Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
- Developed Web Services for sending and getting data from different applications using SOAP messages.
- Monitoring and guiding the team to resolve the technical issues on daily basis
- Developed dynamic web pages using JSP and Servlets
- Have effectively written Criteria queries and HQL using Hibernate
- Worked on implementing the REST Web services using RESTLET
- Involved in developing the Pojo Classes and mapped them with Hibernate XML
Environment: Java, J2EE, Servlets, REST, Oracle 10g, Hibernate, Java Script, HTML
- Involved in Designing the Database
- Involved in writing Criteria and HQL queries
- Involved in Design and Development of User module
- Individually worked on all the stages of a Software Development Life Cycle (SDLC)
- Implemented the application using SpringMVC Framework which is based on MVC design pattern
- Developed JSPs and Servlets to dynamically generate HTML and display the data to the client side.
- Developed application on Struts MVCarchitecture utilizing Action Classes, Action Forms and validations.
- Developed application service components and configured beans using (applicationContext.xml) Spring IOC
- Designed User Interface and the business logic for customer registration and maintenance
- Integrating Web services and working with data in different servers
- Involved in designing and Development of SOA services using Web Services
Environment: Java, HTML, CSS, JSP, Servlets, Hibernate, MySQL, jQuery