- 2 years 6 months of experience in Data Science, Data Analytics, Big Data, Python, Scala, Java and SQL.
- Proficient at building robust Machine Learning models, Deep Neural Networks, Convolution Neural Networks(CNN) models using Keras API.
- Adept inanalysing large datasets using Apache Spark, Pyspark, Spark ML.
Big Data Technologies: Spark SQL, Hadoop 2.0, Map Reduce 2.0, HIVE, PIG, Zeppelin, Sqoop, Kafka, Avro.
Programming Languages: Python, JAVA, SQL, R, SCALA
Operating Systems: Linux, Unix, Windows Systems.
Tools: Tableau, Microsoft Office, Microsoft Power Point, Microsoft Excel, JIRA, SAS Enterprise Guide, Hortonworks, R Studio.
Databases: MySQL, HBase, MongoDB
- Constructed machine learning models using Linear Regression, Logistic Regression, k - NN, K-Means Clustering, SVM, Decision Tree, Random Forest algorithms.
- Performed Statistical analysis, data analysis, data aggregating, data modelling, data wrangling, data cleaning on large datasets.
- Derived useful business insights from HMHCO ecommerce website events datasets, and analyzed data using NumPy, Pandas, SciPy, Sci-kit learn python modules.
- Carried out data Pre-processing, Data Visualization, Feature Scaling, Feature extraction, and Feature Engineering, hyperparameter optimization tuning, confusion matrix, PCA, Dimensionality Reduction.
- Exploratory data analysis, statistics and analysis using Python, R.
- Experience working with SQL transactions, Triggers, Stored Procedures, RDBMS.
- Worked on data pipelines, collecting and ingesting data into HDFS or HBASE storage and then transforming data using Spark, Hive to process analytical queries and get insights.
- Built robust classification and regression models using Spark MLlib.
- Familiarity with unsupervised, supervised learning methods, Natural Language Processing, NLTK, Data Structures, IPython Notebook, advanced analytics and predictive analytics, web analytics, Adobe Analytics (Omniture SiteCatalyst), Git, GitHub.
- Worked on RDD’s, Spark Streaming for analysing data streams in near-real time, Data Frames, Spark SQL API.
- Created Spark clusters using AWS EMR and S3 for data storage.
- Created tables in MySQL, filtered, grouping and aggregations, queried from multiple tables using joins.
- Involved in working with a POC for analytics using spark with NoSQL database Cassandra.
- Worked on Product Recommendation System using item-based collaborative filtering.
- Integrated Tableau with Hive, MySQL for analyzing data visualizations and created visualizations using Matplotlib, Seaborn packages.
- Followed Agile Methodologies for project and reported daily status through scrum meetings, JIRA dashboard and documented in Confluence.
- Familiarity with Excel Macros and advanced in working with Microsoft Excel.
- Experience working with SAS Enterprise Guide for data mining.
- ETL using Hive Scripts, bash scripting and used Flume for transferring unstructured data web logs to HDFS.
- Implemented MapReduce jobs for data processing large datasets.
- Created Business Reports and reporting using Pivot Tables.