We provide IT Staff Augmentation Services!

Data Analyst/ Data Scientist Resume

5.00/5 (Submit Your Rating)

Hartford, CT

SUMMARY

  • Over 8 years of experience in Data mining, predictive modeling, Statistical analytics, econometric modeling, data visualization with large data sets of structured and unstructured data.
  • Strong experience in Data Analysis, Data Cleaning, Data Migration, Data Conversion, Data Export and Import, Data Integration.
  • Experience in using python libraries like Numpy, SciPy, Pandas, Matplotlib, Scikit - learn.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Experienced in integration of various relational and non-relational sources such as Teradata, Oracle, SQL Server, NoSQL, COBOL, XML and Flat Files.
  • Performed extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
  • Experience in performance tuning and query optimization techniques in transactional and data warehouse environments.
  • Experience working on BI visualization tools (Tableau, & QlikView).
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Created reports for the users using Tableau by connecting to multiple data sources like Flat Files, MS Excel, CSV files, SQL Server, and Oracle.
  • Hands on experience in writing queries in SQL and R to extract, transform and load (ETL) data from large datasets using Data Staging.
  • Evaluating data sources and strong understanding of data warehouse/data mart design, ETL, BI, OLAP, client/server applications.
  • Experience in creating partitions, indexes, indexed views to improve the performance, reduce contention and increase the availability of data.

TECHNICAL SKILLS:

Programming Languages: Scripting Languages Python (Numpy, SciPy, Pandas, matplotlib, scikitlearn and seaborn), R (ggplot, Weka, dplyr, knitr, caret)

BI and Visualization: Tableau, QlikView, Tableau server, SAP Business Objects, OBIEE, Crystal Reports XI, Power BI, Tableau, SSRS and SPSS

Databases: MSSQL Server, Oracle database, and MySQL and NoSQL (Hive, and Oracle NoSQL

Machine Learning: Na ve Bayes, Decision Trees, Regression models, random forests, K-means clustering, Market Basket Analysis, Time-series and support vector machines

Tools: HQL, Pig, Apache Spark, Microsoft Visual Studio, SPSS, Microsoft SQL Integration Services, Microsoft SQL Analyzing Services, Microsoft SQL Reporting Services, Oracle Database Integrator.

ETL tools: Informatica power center, SSIS, SSAS.

Data Modelling Tools: Erwin 8.0, ER/Studio, SAP Power designer

PROFESSIONAL EXPERIENCE:

Data Analyst/ Data Scientist

Confidential, Hartford, CT

Responsibilities:

  • Involved in Exploratory data analysis using Descriptive statistics and Data visualization to determine the baseline MLAs.
  • Involved in building and automating the robust model with very good accuracy for the given customer base.
  • Conducted analysis on assessing customer consuming behaviors and discover the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Coordinated the execution of A/B tests to measure the effectiveness of a personalized recommendation system.
  • Applied Wilcoxon sign test to stock performance data for pre-acquisition and post-acquisition for different sectors to find the statistical significance in R programming
  • Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior.
  • Utilized SQL and Hive QL to query, manipulate data from various data sources including Oracle and HDFS, while maintaining data integrity.
  • Worked on data cleaning, data preparation and feature engineering with Python including Numpy, SciPy, Pandas, Matplotlib, Seaborn and Scikit-learn.
  • Predicted the claim severity to understand future loss and ranked importance of features.
  • Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting.
  • Identifying internal and external information sources, building effective working relationships with subject matter experts across research groups within the firm and the external marketplace Involved in Data preparation using various tasks like.
  • Data reduction - Obtains reduced representation in volume but produces the same or similar analytical results Developed logistic regression models to predict subscription response rate based on customers' variables like past transactions, response to prior mailings, promotions, demographics, interests etc.
  • Data discretization - transform quantitative data into qualitative data.
  • Data cleaning - Fill in missing values, handle the noisy data, identify or remove outliers and resolve inconsistencies.
  • Data integration - Integration of multiple databases, data cubes, or files.
  • Data transformation - Normalization, standardization and aggregation.
  • Designed dashboards with Tableau and D3.js and provided complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders.

Environment: Tableau, Oracle, Teradata, R, Python and Spark, SQL, Hive QL, Machine learning algorithms, HDFS.

Sr. Data Scientist

Confidential, Atlanta, GA

Responsibilities:

  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
  • Led technical implementation of advanced analytics projects, Defined the mathematical approaches, developer new and effective analytics algorithms and wrote the key pieces of mission-critical source code implementing advanced machine learning algorithms utilizing caffe, TensorFlow, Scala, Spark, MLLib, R and other tools and languages needed.
  • Built analytical data pipelines to port data in and out of Hadoop/HDFS from structured and unstructured sources and designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
  • Performed K-means clustering, Multivariate analysis and Support Vector Machines in Python and R.
  • Professional Tableau user (Desktop, Online, and Server), Experience with Keras and Tensor Flow.
  • Created MapReduce running over HDFS for data mining and analysis using R and Loading & Storage data to Pig Script and R for MapReduce operations and created various types of data visualizations using R, and Tableau.
  • Worked on machine learning on large-size data using Spark and MapReduce.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from the Oracle database.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Stored and retrieved data from data-warehouses using Amazon Redshift.
  • Responsible for planning & scheduling new product releases and promotional offers.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Worked on NOSQL databases like MongoDB, HBase.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.
  • Worked on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.

Environment: Python, MongoDB, JavaScript, SQL Server, HDFS, Pig, Hive, Oracle, DB2, Tableau, ETL (Informatica), SQL, T-SQL, EC2, EMR, Teradata, Hadoop Framework, AWS, Spark SQL, Scala, SparkMllib, NLP, SQL, Matlab, HBase, Cassandra, R, Pyspark, Tableau Desktop, Excel, Linux, CDH5

We'd love your feedback!