We provide IT Staff Augmentation Services!

Data Scientist Resume

SUMMARY

  • 2 years 6 months of experience in Data Science, Data Analytics, Big Data, Python, Scala, Java and SQL.
  • Proficient at building robust Machine Learning models, Deep Neural Networks, Convolution Neural Networks(CNN) models using Keras API.
  • Adept inanalysing large datasets using Apache Spark, Pyspark, Spark ML.

TECHNICAL SKILLS

Big Data Technologies: Spark SQL, Hadoop 2.0, Map Reduce 2.0, HIVE, PIG, Zeppelin, Sqoop, Kafka, Avro.

Programming Languages: Python, JAVA, SQL, R, SCALA

Web Technologies: HTML5, CSS, JavaScript, D3.js.

Operating Systems: Linux, Unix, Windows Systems.

Tools: Tableau, Microsoft Office, Microsoft Power Point, Microsoft Excel, JIRA, SAS Enterprise Guide, Hortonworks, R Studio.

Databases: MySQL, HBase, MongoDB

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist

Responsibilities:

  • Constructed machine learning models using Linear Regression, Logistic Regression, k - NN, K-Means Clustering, SVM, Decision Tree, Random Forest algorithms.
  • Performed Statistical analysis, data analysis, data aggregating, data modelling, data wrangling, data cleaning on large datasets.
  • Derived useful business insights from HMHCO ecommerce website events datasets, and analyzed data using NumPy, Pandas, SciPy, Sci-kit learn python modules.
  • Carried out data Pre-processing, Data Visualization, Feature Scaling, Feature extraction, and Feature Engineering, hyperparameter optimization tuning, confusion matrix, PCA, Dimensionality Reduction.
  • Exploratory data analysis, statistics and analysis using Python, R.
  • Experience working with SQL transactions, Triggers, Stored Procedures, RDBMS.
  • Worked on data pipelines, collecting and ingesting data into HDFS or HBASE storage and then transforming data using Spark, Hive to process analytical queries and get insights.
  • Built robust classification and regression models using Spark MLlib.
  • Familiarity with unsupervised, supervised learning methods, Natural Language Processing, NLTK, Data Structures, IPython Notebook, advanced analytics and predictive analytics, web analytics, Adobe Analytics (Omniture SiteCatalyst), Git, GitHub.
  • Worked on RDD’s, Spark Streaming for analysing data streams in near-real time, Data Frames, Spark SQL API.
  • Created Spark clusters using AWS EMR and S3 for data storage.
  • Created tables in MySQL, filtered, grouping and aggregations, queried from multiple tables using joins.
  • Involved in working with a POC for analytics using spark with NoSQL database Cassandra.
  • Worked on Product Recommendation System using item-based collaborative filtering.
  • Integrated Tableau with Hive, MySQL for analyzing data visualizations and created visualizations using Matplotlib, Seaborn packages.
  • Followed Agile Methodologies for project and reported daily status through scrum meetings, JIRA dashboard and documented in Confluence.
  • Familiarity with Excel Macros and advanced in working with Microsoft Excel.
  • Experience working with SAS Enterprise Guide for data mining.
  • ETL using Hive Scripts, bash scripting and used Flume for transferring unstructured data web logs to HDFS.
  • Implemented MapReduce jobs for data processing large datasets.
  • Created Business Reports and reporting using Pivot Tables.

Hire Now