We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Rocky Hill, CT

SUMMARY:

  • Over 5 years of working experience in designing and building different machine learning models using Python, R and Scala.
  • Proficient in managing entire project life cycle and actively involved in all the phases of project.
  • Experienced with machine learning algorithm such as logistic regression, random forest, KNN, SVM, neural network, linear regression and k - means
  • Implemented Bagging and Boosting to enhance the model performance.
  • Experience in using various packages in R and python like ggplot2, caret, dplyr, Rweka, rjson, plyr, SciPy, Scikit-learn, Beautiful Soup, NLTK, NumPy, Pandas, Matplotlib.
  • Comprehensive knowledge and experience in normalization/de-normalization, data extraction, data cleansing and data manipulation
  • Working knowledge of extract, transform, and Load (ETL) components and process flow using Talend.
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Experience implementing ML back-end pipeline Spark MLlib, Scikit-learn, Pandas, NumPy.
  • Experience in writing code in R and Python to manipulate data for data loads, extracts, statistical analysis, modeling, and data munging.
  • Extensively worked in Cloud services such as Microsoft Azure and Amazon AWS.
  • Experience in AWS (Amazon Web Services) EC2, VPC, IAM, S3, Cloud Front, Cloud Watch, Cloud Formation, Glacier, RDS Config, Route53, SNS, SQS, Elastic Cache.
  • Experience in Big Data Technologies using Hadoop, Sqoop, Pig, Hive, Spark, HDFS.
  • Hands on experience using Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
  • Working experience in implementing Machine Learning Algorithms using MLLib and Mahout in Hadoop ecosystem and Apache Spark framework such as HDFS, Map Reduce, HiveQL and Spark SQL.
  • Hands on experience Hadoop, Deep Learning Text Analytics and IBM Data Science work bench tools.
  • Hands on experience in Data Governance, Data Mining, Data Analysis, Data Validation, Predictive modeling, Data Lineage and Data Visualization in all the phases of the Data Science Life Cycle.
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Extensively worked for data analysis using R Studio, SQL, Tableau and other BI tools.
  • Knowledge in NoSQL databases such as HBase, Cassandra, and Mongo DB etc.
  • Experience working with Web languages such as Html, CSS, Rshiny etc.
  • Expertise in using global variables, expressions and functions for the reports with immense experience in handling sub reports in SSRS.
  • Experience in importing/exporting data between different sources like Oracle/Access/Excel etc. using SSIS/DTS utility.
  • Defining data warehouse (star and snowflake schema), fact table, cubes, dimensions, measures using SQL Server Analysis Services.

TECHNICAL SKILLS:

Programming Languages: R, Python, SQL, Scala, UNIX, C, JAVA, and Tableau.

Operating Systems: Windows, Unix, Linux

Machine Learning: Linear regression, Logistic regression, SVM, Decision tree, Random Forest, K nearest neighbor, K means, Avro, MLbase.

Data Science tool: R, Python, MATLAB, Rshiny, Flask, Docker, Jupyter Notebook, and Azure Notebook.

Databases: MySQL, SQL Server, TSQL, MySQL, MS Access, Hive, Cassandra, MongoDB, Hadoop (HDFS)

Data Modeling Tools: Erwin 9.7, ER/Studio, Star-Schema Modeling, Star Schema, Snowflake Schema Modeling, FACT and dimension tables, Pivot Tables.

Bigdata Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Flume, HBase, Spark, Storm, Kafka, Scala.

BI Tools: Tableau, SAP, Crystal Reports, Amazon Redshift, Azure Data Warehouse, Splunk.

Cloud Technologies: AWS (EC2, S3, RDS, EBS, VPC, IAM, Security Groups), Microsoft Azure, Openstack.

Hands on R-Packages: tidyR, reshape2, stringR, lubridate, validate, neuralnet, ctree, rpart, c50,, tseries, randomforest, forecast, quantmode, tm.

Hands on Python Packages: Pandas, numpy, matplotlib, scipy, sklearn, Beautifulsoup, urllib2, nltk.

Other Tools and Technologies: PL/SQL, ASP, Visual Basic, XML, C, C++, C#, JAVA, HTML 5, UNIX shell scripting, PERL, Ruby.

PROFESSIONAL EXPERIENCE:

Confidential - Rocky Hill, CT

DATA SCIENTIST

Responsibilities:

  • Working independently and collaboratively throughout the complete analytics project lifecycle including data acquisition, data wrangling, data transformation, model selection and implementation of scalable machine learning models, hyper parameter tuning and documentation of results.
  • Performed Statistical Analysis to determine peak and off-peak time periods for ratemaking purposes.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
  • Conducted analysis of customer data for the purposes of designing rates.
  • Identified root causes of problems and facilitated the implementation of cost-effective solutions with all levels of management.
  • Developed Regression Models based on data provided by the client.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, clustering, SVM to identify Volume using scikit-learn package in Python.
  • Worked on different data formats such as JSON, XML and performed ML algorithms in Python.
  • Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Principle Component Analysis.
  • Performed K-means clustering, Regression and Decision Trees in R.
  • Worked on Naïve Bayes algorithms for Agent Fraud Detection using R.
  • Used packages like dplyr , tidyr and ggplot2 in R Studio for data visualization and generated scatter plot and high low graph to identify relation between different variables.
  • Developed NLP models for Topic extraction, Sentiment Analysis.
  • Worked on Text Analytics, Naïve Bayes, Sentiment analysis, creating word clouds, and retrieving data from Twitter and other social networking platforms
  • Have knowledge on A/B Testing, ANOVA, Multivariate Analysis and Association Rules using R.
  • Wrangled data, worked on large datasets (acquired data and cleaned the data), analyzed trends by making visualization(Tableau 9.0, Microsoft Power BI) using matplotlib and python.
  • Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster and SQL to retrieve data from Oracle database and then used ETL for data transformation.

Environment: R-Studio, Python, NLP, AWS, MySQL, DB2, Metadata, MS Excel,Scala,Spark,Cassandra, dplyr, tidyr, ggplot2, Scikit-learn, Matplotlib, json, Machine Learning Algorithms, Mainframes MS Vision.

Confidential - Reston VA.

DATA SCIENTIST

Responsibilities:

  • Lead the full machine learning system implementation process: collecting data , model design , feature selection , system implementation , and evaluation .
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib and R a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Creating various B2B Predictive and descriptive analytics using R and Tableau.
  • Used text mining and NLP techniques find the sentiment about the organization.
  • Developed unsupervised machine learning models in the Hadoop/Hive environment on AWS EC2 instance.
  • Worked with datasets of varying degrees of size and complexity including both structured and unstructured data.
  • Participated in all phases of data mining , data cleaning , data collection , developing models , validation , visualization and performed gap analysis .
  • Data wrangling to clean, transform and reshape the data utilizing Numpy and Pandas library.
  • Data Storyteller, Mining Data from different Data Source such as SQL Server, Oracle, Cube Database, Web Analytics, Business Object and Hadoop. Provided AD hoc analysis and reports to executive level management team.
  • Contributed to data mining architectures , modeling standards , reporting , and data analysis methodologies.
  • Worked with different sources such as Oracle, Teradata, SQLServer and Excel, Flat, Complex Flat File, Cassandra, MongoDB and HBase files.
  • Conduct research and made recommendations on data mining products, services, protocols, and standards in support of procurement and development efforts.
  • Used Python, R and SQL to create Statistical algorithms involving Linear Regression, Logistic Regression, Random forest, Decision trees for estimating the risks.
  • Developed statistical models to forecast inventory and procurement cycles.
  • Created Data Maps / Extraction groups in Power Exchange Navigator for legacy IMS Parent sources.
  • Assisted in building the ETL Source to Target specification documents by understanding the business requirements.
  • Developed mappings that perform Extraction, Transformation and load of source data into Derived Masters schema using various power center transformations like Source Qualifier, Aggregator, Filter, Router, Sequence Generator, look up, Rank, Joiner, Expression, Stored Procedure, SQL, Normalizer and update strategy to meet business logic in the mappings.
  • Used Teradata utilities like BTEQ, fast load, fast export, multiload for data conversion.
  • Created Post UNIX scripts to perform operations like gunzip, remove and touch files.
  • Managing the Openshift cluster that includes scaling up and down the AWS app nodes.
  • Had very strong exposure using ansible automation in replacing the different components of Openshift like ECTD, MASTER, APP, INFRA, Gluster.

Environment : R Studio 3.5.1, AWS S3, NLP, EC2, Neural networks, SVM, Decision trees, MLbase, ad-hoc, MAHOUT, NoSQL, MDM, MLLib & Git, Informatica, OpenShift, Teradata 14, Hadoop Map Reduce, Pyspark, Spark, R, Spark MLLib, Tableau, Cassandra, Oracle, MongoDB, Flat Files, XML, and Tableau.

Confidential - Farmington Hills, MI.

DATA ANALYST

Roles & Responsibilities:

  • Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from SQL server database.
  • Used python libraries like Beautiful Soup, NumPy
  • Created various types of data visualizations using Python and Tableau.
  • Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python Generated periodic reports based on the statistical analysis of the data using SQL Server
  • Worked on CSV files while trying to get input from the MySQL database.
  • Created functions, triggers, views and stored procedures using MySQL.
  • Worked on DB testing, wrote complex SQL queries to verify the transactions and business logic.
  • Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Developed SQL scripts for creating tables, Sequences, Triggers, views and materialized views.
  • Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
  • Worked on Python OpenStack APIs and used NumPy for Numerical analysis.
  • Interacted with the Client and documented the Business Reporting needs to analyze the data.
  • Developed data mapping documentation to establish relationships between source and target tables including transformation processes using SQL.
  • Used the Waterfall methodology to build the different phases of Software development life cycle.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management.
  • Extensive data cleansing and analysis, using pivot tables, formulas (V-lookup and others), data validation, conditional formatting, graph and chart manipulation using excel.
  • Created pivot tables and charts using worksheet data and external resources, modified pivot tables, sorted items and group data, and refreshed and formatted pivot tables.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.

Environment: AWS, Python 2.7, Tableau, R, Beautiful Soup, Windows XP, UNIX, HTML, SQL server.

Confidential - San Francisco, CA.

SQL DEVELOPER

Roles & Responsibilities:

  • Used DDL and DML for writing triggers, stored procedures, and data manipulation.
  • Interacted with Team and Analysis, Design and Develop database using ER Diagram, involved in Design, Development and testing of the system
  • Developed SQL Server Stored Procedures, Tuned SQL Queries (using Indexes)
  • Created Views to facilitate easy user interface implementation and Triggers on them to facilitate consistent data entry into the database.
  • Implemented Exceptional Handling.
  • Worked on client requirement and wrote Complex SQL Queries to generate Crystal Reports.
  • Created different Data sources and Datasets for the reports.
  • Tuned and Optimized SQL Queries using Execution Plan and Profiler.
  • Rebuilding Indexes and Tables as part of Performance Tuning Exercise.
  • Involved in performing database Backup and Recovery.
  • Documented end user requirements for SSRS Reports and database design.

Environment : SQL Server, SSIS, SSRS, Windows Server 2008, XML.

We'd love your feedback!