- Collected streaming tweets using tweepy API and stored them in json format.
- Used Apache Spark to analyse collected tweets and developed interesting queries in jab/Spark SQL on those tweets and performed graph analytics using GraphX.
- Developed appropriate visualizations for the queries using HTML5, CSS3 and D3.js
- Paseo - A carpooling Android Application
- Paseo is a car pooling android application through which users can share journeys.
- Users can able to post their travel information so that other users who are interested in going to that same destination will request for ride so that they can have a less expensive journey
Developer Tools: Visual Studio, Android Studio, IntelliJ IDEA, PyCharm
Databases: MySQL, SQL, HBase, Cassandra, Redshift,S3
Operating Systems: Windows10/8/7 & Linux-Ubuntu
Big data technologies: Hadoop, Hive, HBase,Kafka, Cassandra, Pig, Sqoop, Oozie
- Implemented the project as per the Software Development Life Cycle (SDLC).
- Developed the web layer using Spring MVC framework.
- Implemented JDBC for mapping an object - oriented domain model to a traditional relational database.
- Created Stored Procedures to manipulate the database and to apply the business logic according to the user's specifications.
- Involved in analyzing, designing, implementing and testing of the project.
- Developed UML diagrams like Use cases and Sequence diagrams as per requirement.
- Developed the Generic Classes, which includes the frequently used functionality, for reusability.
- Exception Management mechanism using Exception Handling Application Blocks to handle the exceptions.
- Involved in Database design and developing SQL Queries, stored procedures on MySQL.
- Developed Action Forms and Action Classes in Struts frame work.
- Programmed session and entity EJBs to handle user info track and profile based transactions.
- Involved in writing JUnit test cases, unit and integration testing of the application.
- Developed user and technical documentation.
Confidential, Kansas city, MO
- Worked on Spark core, Spark Streaming, Spark SQL modules of Spark.
- Developing scripts to perform business transformations on the data using Hive and PIG.
- Developing UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark SQL and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Loading data into Spark Confidential and do in memory data Computation to generate the Output response.
- Loading the data to HBASE by using bulk load and HBASE API.
- Used Scala to write several Spark Jobs in real time applications.
- Analysed the SQL scripts and designed the solution to implement using Scala.
- Data analysis through Pig, Map Reduce, Hive.
- Import of data using Sqoop from Oracle to HDFS.
- Developed analytical components using Scala, Spark and Spark Stream
- Worked on Spark MLlib for predicting the conditions using classification and regression algorithms.
Data Engineer (Spark/Scala)
Confidential, Los Angeles
- Involved in loading data from traditional databases to Hadoop data storage (HDFS, HBase) using Sqoop and written analytical queries for processing the data using Hive.
- Designed and developed Confidential seeds using Scala and cascading.
- Created various parser programs using Scala and java to extract the Data from incoming data to the local storage.
- Performed data analysis through Pig scripts, MapReduce and Hive, storing and managing the data in NoSql databases like HBase and Cassandra.
- Collected streaming data using Spark streaming and stored in column oriented database HBase and implemented spark Sql queries and Hive queries to convert the data into spark Confidential ’s, worked on implementing spark jobs using Scala.
- Loading the source data coming into DSC MySql database to Amazon Redshift using the data pipelines
- Parsing the data while loading it into Amazon S3 from Redshift after processing using the Spark.
- Worked on Scala for writing several spark jobs in real time applications for processing the customer data stored in local data mart that we are using as the target storage
- Involved in writing some of machine learning algorithms available in Spark MLlib like ALS and Decision tree based algorithms for predictions.
- Worked on developing scripts to perform business transformations on the data using Hive and Scala.