We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Oklahoma City, OklahomA

SUMMARY:

  • Overall 7 years of IT experience with 4 plus years of experience on Statistics, Data Analysis, Machine Learning using Python.
  • Ability to analyse most complex projects at various levels Experience in building Big Data data - intense applications and products using open source frameworks like Hadoop, Pig, HIVE, Apache spark, Apache Kafka, Storm, Apache Mahout, revolution R software .
  • Versatile, intuitive and result-oriented data scientist with excellent integration of Machine Learning algorithms on statistical data.
  • A deep understanding of Statistical Modelling, Multivariate Analysis, Big data analytics and Standard Procedures Highly efficient in Dimensionality Reduction methods such as PCA (Principal component Analysis), Factor Analysis etc. Implemented bootstrapping methods such as Random Forests (Classification), K-Means Clustering, KNN, Naïve Bayes, SVM, Decision Tree, BFS, Linear and Logistic Regression Methods.
  • The experience of working in text understanding, classification, pattern recognition, recommendation systems, targeting systems and ranking systems using Python .
  • Strong skills in statistical methodologies such as A/B test, Experiment design, Hypothesis test, ANOVA, CrossTabs, T tests and Correlation Techniques
  • Worked with applications like R, SPSS and Python to develop predictive models
  • Experience with Natural Language Processing (NLP)
  • Extensively worked on Python 3.5/2.7 ( Numpy, Pandas, Matplotlib, NLTK and Scikit-learn )
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupyter Notebook 4.X, R 3.0(ggplot2) and Excel
  • Worked on Tableau, Quick View to create dashboards and visualizations.
  • Knowledge of agile development techniques
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server 2008
  • Good knowledge on cloud computing platform services platforms like AWS, Google Cloud and Microsoft Azure
  • Used the version control tools like Git 2.X
  • Develop and perform text classification using methods such as logistic regression, decision trees, support vector machines and maximum entropy classifiers
  • Having good knowledge of Exploratory Data Analysis, Descriptive Statistics and Predictive Modelling
  • Configured big data processing platform using Apache Spark, created predictive models using MLib and deployed models for large scale use.
  • Created tables, sequences, synonyms, join functions and operators in Netezza database.

TECHNICAL SKILLS:

Supervised Learning: Decision trees, Naive Bayes classification, Ordinary Least Squares regression, Logistic regression, Neural networks, Support vector machines Unsupervised Learning- Clustering Algorithms and Reinforcement Learning

Programming Languages: Python, Scala, SQL, RAnalytics

Python (Numpy, Pandas, Scipy, Scikit), statsmodels and Visualization: Matplotlib, seaborn, scikit-image), Big Data (HDFS, Pig, Hive, HBase, Sqoop, Spark), Excel, Tensorflow

Big Data Technologies: Spark, Hadoop, Hive, HDFS, mapReduce

Statistical Methods: t-test, Chi-squared and ANOVA testing, A/B Testing, Descriptive and Inferential Statistics, Hypothesis testing

Database: Hadoop, Spark, Postgres, Access, Oracle, SQL Server, NoSQL (Mongo DB), Cassandra, HBase, Teradata, Netezz,Dynamo DB

Reporting Tools: Tableau, Spotfire, IBM Watson, QlickView

Version control: :

Git, GitHub

Cloud Computing: Amazon AWS, Microsoft Azure, Google Analytics, OpenShift

PROFESSIONAL EXPERIENCE:

Confidential, Oklahoma City, Oklahoma

Data Scientist

Responsibilities:

  • Used the Classification machine learning algorithms Naïve Bayes, Linear regression, Logistic regression, SVM, Neural Networks and used Clustering Algorithm K Means.
  • Analysed business requirements and developed the applications, models, used appropriate algorithms for arriving at the required insights.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
  • Worked on data cleaning, data preparation and feature engineering with Python 3.X.
  • Developed User Defined Functions in Python for rapid analysis.
  • Performed text classification task using NLTK package and implemented various natural language processing techniques.
  • Extensively used Python's multiple data science packages like Pandas, NumPy, matplotlib, SciPy, Scikit-learn and NLTK.
  • Worked on Spark Python modules for machine learning & predictive analytics in Spark on AWS.
  • Worked on end to end pipe line in Spark.
  • Created the dashboards and reports in tableau for visualizing the data in required format.
  • Worked on Apache spark for analysing the live streaming data.
  • Created Hive scripts to create external, internal data tables on Hive. Worked on creating datasets to load data into HIVE.

Environment : Spark, Apache Spark, Hive, Machine learning, Python, Numpy, NLTK, Pandas, Scipy, SQL, Tableau, Sqoop, HBase, HDFS, Tableau, DynamoDB, Mongo DB, SQL Server, and ETL.

Confidential, Columbus, Ohio.

Data Science Analyst

Responsibilities:

  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce and loaded data into HDFS.
  • Communicated and coordinated with other departments to collect business requirement
  • Used Spark Mllib for developing various machine learning algorithms.
  • Used Spark-Streaming APIs to perform necessary transformations and actions in building a common learner data model which gets the data from Kafka in near real time.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Data Lake.
  • Used Statistical testing to evaluate Model performance.
  • Data cleansing, syncing data sources and date/time handling of different time zones.
  • Designed rich data visualizations with Tableau and Python.
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Implemented machine learning model (Random Forest and Cross Validation) with Spark Mllib
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Spark Mllib.

Environment: Machine learning, Data Lake, Kafka, Cassandra, NLTK, Spark, HDFS, Hive, Pig, Linux, Python(Matplotlib), SAS, SPSS, MySQL, PL/SQL, Tableau.

Confidential, Pittsburgh, PA.

Data Science Analyst

Responsibilities:

  • Used ETL tools such as Talend, Pentaho and Jedox.
  • Extracted, Transformed and loaded data from given source to analysis.
  • Hands-on implementation of R, Python, Hadoop, Tableau and SAS to extract and import data.
  • Had a pleasure working experience on Spark (spark streaming, spark SQL), Scala and Kafka. Also converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Used Kafka to load data in to HDFS and move data into NoSQL databases.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
  • Gained extreme knowledge on Map Reduce using Python, Sqoop queries, Pig scripts, Hive queries using Oozie workflows.
  • Good hands on experience on Amazon Redshift platform.
  • Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values.
  • Design, built and deployed a set of python modelling APIs for customer analytics, which integrate multiple machine learning techniques for various user behaviour prediction and support multiple marketing segmentation programs.
  • Segmented the customers based on demographics using K-means Clustering.
  • Explored different regression and ensemble models in machine learning to perform forecasting.
  • Presented Dashboards to Higher Management for more Insights using Power BI.
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
  • Performed Boosting method on predicted model for the improve efficiency of the model.

Environment : R/R studio, Informatica, SQL/PLSQL, Oracle 10g, MS-Office, Tableau, Teradata.

Confidential

Data Analyst/Python Developer

Responsibilities:

  • Brought in and implemented updated analytical methods such as regression modelling, classification tree, statistical tests and data visualization techniques with Python
  • Analysed customer Help data, contact volumes, and other operational data in MySQL to provide insights that enable improvements to Help content and customer experience.
  • Maintained and updated existing automated solutions.
  • Deployed Machine Learning Models built using mahout on Hadoop cluster
  • Improved data collection and distribution processes by using pandas and numpy packages in Python while enhancing reporting capabilities to provide clear line of sight into key performance trends and metrics.
  • Analysed historical demand, filter out outliers/exceptions, identify the most appropriate statistical forecasting algorithm, develop base plan, understand variance, propose improvement opportunities, and in corporate demand signal into forecast and executed data visualization by using plotly package in Python.
  • Used Sqoop for loading existing data in Relational databases to HDFS.
  • Interacted with QA to develop test plans from high-level design documentation

Environment : MySQL, Predictive modelling, Python libraries, pandas, Numpy packages, Hadoop, MapReduce, HDFS, Hive

Confidential

Associate Software Engineer.

Responsibilities:

  • Assisted in development and testing of various interior installations and instrumentation.
  • Documented the technical specification for the reports and tested the generated reports.
  • Gathered user requirements and created the business requirements documents.
  • Used the technical document to design tables.
  • Prepared user manual and technical support manuals.
  • Prepared test plans for various modules.
  • Created and managed Databases.
  • Optimized the SQL queries for improved performance.
  • Created Database triggers to maintain the audit data in the tables.

Environment : Oracle 9i, SQL* Loader, PL/SQL, SQL.

We'd love your feedback!