We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY:

  • Over 8+ years of experience in Data Science and Analytics including Machine Learning, Data Mining and Statistical Analysis.
  • Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modelling and data visualization with large data sets of structured and unstructured data
  • Experienced with machine learning algorithm such as logistic regression, random forest, Xgboost, KNN, SVM, neural network, linear regression, lasso regression and k - means
  • Implemented Bagging and Boosting to enhance the model performance.
  • Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA
  • Extensively worked on Python 3.5/2.7 (Numpy, Pandas, Matplotlib, NLTK and Scikit-learn)
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0JupiterNotebook 4.X, R 3.0 (ggplot2, Caret, dplyr) and Excel.
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQLServer2008, NoSQL databases like MongoDB3.2
  • Strong experience in Big data technologies like Spark 1.6, Spark sql, py Spark, Hadoop 2.X, HDFS, Hive 1.X
  • Experience in visualization tools like, Tableau9.X, 10.X for creating dashboards
  • Excellent understanding Agile and Scrum development methodology
  • Used the version control tools like Git 2.X
  • Passionate about gleaning insightful information from massive data assets and developing a culture of sound, data-driven decision making.
  • Ability to maintain a fun, casual, professional and productive team atmosphere
  • Experienced the full software life cycle in SDLC, Agile and Scrum methodologies.
  • Skilled in Advanced Regression Modelling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
  • Proficient in Predictive Modelling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Experienced in Machine Learning and Statistical Analysis with Python Scikit-Learn.
  • Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Numpy, Scipy and Pandas for data analysis.
  • Worked with complex applications such as R, SAS, Matlab and SPSS to develop neural network, cluster analysis.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Skilled in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.
  • Strong SQL programming skills, with experience in working with functions, packages and triggers.
  • Experienced in Visual Basic for Applications and VB programming languages to work with developing applications.
  • Worked with NoSQL Database including HBase, Cassandra and MongoDB.
  • Experienced in Big Data with Hadoop, HDFS, MapReduce, and Spark.
  • Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, SSRS.
  • Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
  • Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
  • Worked in development environment like Git and VM.
  • Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.

TECHNICAL SKILLS:

Languages: R, Python

Big Data: MapReduce, HDFS, Spark

NoSQL: MongoDB

Analysis: Feature Selection Methods, Principal Component Analysis, Supervised and Unsupervised Learning, Classification Techniques, Topic modeling, Model building, Time Series

Relational Databases: Oracle, MySQL, SQL Server, PostgreSQL

Tools: Altreyx(User-Interface),Tableau, QlikView, MS Excel, MS Access, PyTorch

R Packages: ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr

NumPy, Pandas, Matplotlib, Scikit: Learn, XGBoost, LightGBM, Seaborn, Beautiful Soup, OS, sklearn, Stats models, NLTK and Skimage

PROFESSIONAL EXPERIENCE:

Data Scientist

Confidential, Charlotte, NC

Responsibilities:

  • Provided Configuration Management and Build support for more than 5 different applications, built and deployed to the production and lower environments.
  • Evaluated the performance of Various Classification and Regression algorithms using R language to predict the future power.
  • Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, space time.
  • Involved in Detecting Patterns with Unsupervised Learning like K-Means Clustering.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Perform a proper EDA, Uni variate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Worked with Data governance, Data quality, data lineage, Data architect to design various models and processes.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MS Visio.
  • Developed triggers, stored procedures, functions and packages using cursors and ref cursor concepts associated with the project using Pl/SQL
  • Created various types of data visualizations using R, python and Tableau.
  • Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
  • Identified and targeted welfare high-risk groups with Machine learning algorithms.
  • Conducted campaigns and run real-time trials to determine what works fast and track the impact of different initiatives.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Used Graphical Entity-Relationship Diagramming to create new database design via easy to use, graphical interface.
  • Created multiple custom SQL queries in Teradata SQL Workbench to prepare the right data sets for Tableau dashboards
  • Perform analyses such as regression analysis, logistic regression, discriminant analysis, cluster analysis using SAS programming.
  • Used Meta data tool for importing metadata from repository, new job categories and creating new data elements.
  • Scheduled the task for weekly updates and running the model in workflow. Automated the entire process flow in generating the analysis and reports.

Environment: Erwin 8, Teradata 13, SQL Server 2008, Oracle 9i, SQL*Loader, PL/SQL, ODS, OLAP, OLTP, SSAS, Informatica Power Center 8.1.

Data Analyst

Confidential - Van Nuys, CA

Responsibilities:

  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
  • Performed data ETL by collecting, exporting, merging and massaging data from multiple sources and platforms including SSIS (SQL Server Integration Services) in SQL Server.
  • Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongoDB connector for Hadoop.
  • Performed data cleaning and feature selection using MLlib package in Pyspark.
  • Performed partitioned clustering into 100 by k-means clustering using Scikit-learn package in Python where similar hotels for a search are grouped together.
  • Used Python to perform ANOVA test to analyze the differences among hotel clusters.
  • Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.
  • Determined the most accurately prediction model based on the accuracy rate.
  • Used text-mining process of reviews to determine customers' concentrations.
  • Delivered analysis support to hotel recommendation and providing an online A/B test.
  • Designed Tableau bar graphs, scattered plots, and geographical maps to create detailed level summary reports and dashboards.
  • Developed hybrid model to improve the accuracy rate.
  • Delivered the results to operation team for better decisions and feedbacks.

Environment: Python, Pyspark, Tableau, MongoDB, Hadoop, SQL Server, SDLC, ETL, SSIS, recommendation systems, Machine Learning Algorithms, text-mining process, A/B test.

Data Scientist

Confidential, Miami FL

Responsibilities:

  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, and time, Date and Time etc.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (SciPy, NumPy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks Data Migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Experience in Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Used MLlib, Spark's Machine learning library to build and evaluate different models.
  • Implemented rule based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: Python 2.x, CDH5, HDFS, Hadoop 2.3, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, Matlab, Spark SQL, Pyspark.

Data Analyst

Confidential, Coral Gables, FL.

Responsibilities:

  • Performed data profiling in the source systems that are required for New Customer Engagement (NCE) Data mart.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Manipulating, cleansing & processing data using Excel, Access and SQL.
  • Responsible for loading, extracting and validation of client data.
  • Liaising with end-users and 3rd party suppliers. Analyzing raw data, drawing conclusions & developing recommendations writing SQLscripts to manipulate data for data loads and extracts.
  • Developing data analytical databases from complex financial source data. Performing daily system checks. Data entry, data auditing, creating data reports & monitoring all data for accuracy. Designing, developing and implementing new functionality.
  • Monitoring the automated loading processes. Advising on the suitability of methodologies and suggesting improvements. Involved in defining the source to target data mappings, business rules, and business and data definitions. Responsible for defining the key identifiers for each mapping/interface.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team. Reverse engineered all the Source Database's using Embarcadero.
  • Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
  • Document data quality and traceability documents for each source interface.
  • Designed and implemented data integration modules for Extract/Transform/Load (ETL) functions.
  • Involved in Dataware house and Data mart design. Experience with various ETL, data warehousing tools and concepts.
  • Documented the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Worked with internal architects and, assisting in the development of current and target state data architectures.
  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.

Environment: SQL/Server, Oracle10 &11g, MS-Office, Netezza, Teradata, Enterprise Architect, Informatica Data Quality, ER Studio, TOAD, Business Objects, Green plum Database, PL/SQL

Data Scientist

Confidential

Responsibilities:

  • Developed analytics solutions based on Machine Learning platform and demonstrated creative problem-solving approach and strong analytical skills.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Creating various B2B Predictive and descriptive analytics using R and Tableau
  • Data Story teller, Mining Data from different Data Source such as SQL Server, Oracle, Cube Database, Web Analytics, Business Object and Hadoop. Provided AD hoc analysis and reports to executive level management team.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, R, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Used MLlib, Spark's Machine learning library to build and evaluate different models.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Created high level ETL design document and assisted ETL developers in the detail design and development of ETL maps using Informatica.
  • Created MDM, OLAP data architecture, analytical data marts, and cubes optimized for reporting.
  • Worked with different sources such as Oracle, Teradata, SQLServer2012 and Excel, Flat, Complex Flat File, Cassandra, MongoDB, HBase, and COBOL files.
  • Performed K-means clustering, Multivariate analysis and Support Vector Machines in R.
  • Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
  • Analyzed data and predicted end customer behaviors and product performance by applying machine learning algorithms using Spark MLlib.
  • Used External Loaders like Multi Load, T Pump and Fast Load to load data into Teradata Database, Involved in analysis, development, testing, implementation and deployment.

Environment: R, Machine Learning, Teradata 14, Hadoop Map Reduce, Pyspark, Spark, R, Spark MLLib, Tableau, Informatica, SQL, Excel, Erwin, SAS, Scala Nlp, Cassandra, Oracle, MongoDB, Cognos, SQL Server 2012, Teradata, DB2, T-SQL, PL/SQL, Flat Files, XML, and Tableau

We'd love your feedback!