Programming Languages: Java, C, C++, R, Python, Scala
Big Data tools: Hadoop Map Reduce, Pig, Hive, Apache Spark, Spark MLlib
Data Mining: SQL, T SQL, Teradata, Oracle PL SQL, Cassandra, HBase, Mongo DB
ETL & Reporting Tools: Integration Services (SSIS), Reporting Services (SSRS), Tableau, Qlikview, Matlablib, MS Excel
Data Scientist Intern
Confidential, Austin, TX
- Used Natural Language Processing techniques like LDA, LSA, K means clustering on top of Word2vec vectors to extract meaningful topics from huge set of documents and performed sentiment analysis to give multi aspect review rating
- Used topic probabilities for a document as features and performed random forest model to detect bad actors.
- Performed Data Analysis on crew salary data using Teradata and Python and visualized using Tableau.
- Worked on SSIS package to load train schedules and paths into database timely and scheduled it to run daily and used Python to optimize blocking sequences for the trains based on shortest path algorithms
- Developed various SSIS (Integration Services) and SSRS (Reporting Services) packages and jobs to handle huge amounts of data received from different sources and generate reports accordingly.
- Experience in creating tables, views, triggers, stored procedures, Cursors and other complicated T - SQL statements for various applications and was involved in query optimization and performance tuning of SQL queries and procedures.
- Installed Hadoop standalone cluster and performed Sentiment analysis on twitter data extracted using Streaming API