We provide IT Staff Augmentation Services!

Data Scientist/engineer Resume

5.00 Rating

Jersey City, NJ

SUMMARY

  • Over 7 years of professional experience as a Data Scientist/Data Analystworking across Healthcare, Supply Chain Management, Insurance wif a master's degree in Business Analytics.
  • Proficient in Predictive Modeling, Machine Learning techniques (Linear, Logistic, Decision Trees, Random Forest, SVM, K - nearest neighbors, Bayesian, XG Boost) in Forecasting/Predictive analysis.
  • Knowledge in SDLC using Development, Component Integration, Performance Testing, Deployment, Support Maintenance.
  • Built various kinds of machine learning algorithms like Linear, Logistic Regression - Decision Trees, Random Forest, Extra Trees, KNN, Kmeans, Naïve Bayes for both structured, semi-structured and unstructured data.
  • Worked on Data Visualization tools like Tableau and Google Analytics. Performed Statistical analysis on both descriptive and Predictive analysis using machine learning algorithms.
  • Hands on experience in implementing dimensionality reduction techniques like Truncated SVD, TEMPPrincipal Component Analysis, t-Stochastics Neighborhood Embedding (t-SNE).
  • Good knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node,DataNode, Secondary Name Node, MapReduce concepts, and ecosystems including Hive and Pig.
  • Experienced in SQL programming and creation of relational database models. Experienced in creating cutting-edge data processing algorithms to meet project demands.
  • Involved in writing the complex structured queries using views, triggers, and joins. Worked wif packages like Matplotlib, Seaborn, and pandas in Python.
  • Well Versed in Machine learning algorithms such as Linear and Logistic Regression, Decision Trees, Random forest, K nearest neighbors and Experienced in Predictive and Descriptive analytics of data sets using R and Python Programming.
  • Better Understanding of Statistical Analysis and Modeling, Algorithms and Multivariate Analysis and familiar wif model selection, testing, comparison, and validations.
  • Worked wif python libraries like matplotlib, numpy, scipy and pandas for data analysis. Connected python wif Hadoop to perform Hive and Spark to perform data analysis.
  • Ability to work independently and problem-solving skills as the part of the team. Excellent skills in using pandas in python and dplyr in R for performing exploratory analysis.
  • Experience in employing in R and Python Programming and SQL for data cleaning, data visualization, risk analysis and predictive analysis.
  • Good knowledge of analytical and problem-solving skills and able to work wifin the team and as an individual.
  • Extensive experience in Data Visualization using tables, lists, and tools like Tableau. Experience in Business Intelligence tools like SSIS, SSRS, and ETL.
  • Worked wif Aws to pull the data from EC2. User groups in Aws cloud in redshift wif memory of concurrent queries.
  • Proficient in design and development of various Dashboards, Reports utilizing Tableau Visualizations like bar graphs, scatter plots, pie-charts, Geographic's and other making use of actions, local and global filters, cascading filters, context filters, Quick filters, parameters according to the end user requirements
  • Experience in text analytics,datavisualizations using R, Python, and Tableau and large transactional databases Teradata, Oracle, HDFS.
  • Experience in Apache Spark, Kafka for Big DataProcessing & Scala Functional programming. In depth knowledge and hands on experience of Big Data/ Hadoop ecosystem (MapReduce, HDFS, Hive, Pig and Sqoop).
  • Experience in migration from heterogeneous sources including Oracle to MS SQL Server. Experience in writing SQL queries and working wif various databases (MS Access, MySQL, Oracle DB).
  • Worked on Jupiter notebook, PySpark through cloud platform in EC2 instance using putty and estimated models using Cross Validation, Log loss function, ROC curves used AUC for feature selection.
  • Experience in data analytics, predictive analysis like Classification, Regression, Recommender Systems. Experience in developing Custom Report and different types of Tabular Reports, Matrix Reports, Ad hoc reports and distributed reports in multiple formats using SQL Server Reporting Services (SSRS) and Statistical Package for the Social Sciences (SPSS)
  • Resolute which model expects the classes finest in classification analysis using ROC &AUC by plotting the graph for true positive rates against false positive rates.
  • Experience in applying predictive models, Machine Learning Algorithms for analytical projects.
  • Proficient in Predictive Modeling, factor analysis, ANOVA, hypothetical testing, normal distribution and other advanced statistics.
  • Proficient in Machine Learning techniques (Decision Trees, Linear, Logistics, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Used Snowflake schema for extraction the data from S3 buckets and loading the data and used for changing the data into business requirements.
  • Worked on machine learning on large size data using Spark and Map Reduce.
  • Knowledge of Information Extraction, NLP algorithms coupled wif Deep Learning.
  • Performed analysis of implementing Spark uses Scala and wrote spark sample programs using PySpark.
  • Experience in foundational machine learning models and concepts (Regression, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning).
  • Well experienced in Normalization& De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Worked and extracted data from various database sources like Oracle, SQL Server and Teradata.
  • Knowledge of working wif Proof of Concepts and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging.
  • Understanding of data mining Techniques like classification, clustering, Regression Techniques and Random Forest.
  • Experience in Big data technologies like Hadoop, Sqoop, Pig and Hive.

PROFESSIONAL EXPERIENCE

Data Scientist/Engineer

Confidential, Jersey City, NJ

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon W eb Services (Linux / Ubuntu) and Configuring launched instances wif respect to specific applications.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AW S Cloud watch.
  • Implemented Machine Learning, Computer Vision, Deep Learning and Neural Networks algorithms using TensorFlow, Keras and designed Prediction Model using Data Mining Techniques wif help of Python, and Libraries like NumPy, SciPy, Matplotlib, Pandas, Scikit-learn.
  • Used pandas, NumPy, Seaborne, SciPy, matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Applied Support vector machines (SV M) and it's kernels such Polynomial, RBF-kernel on machine learning problems.
  • Worked on imbalanced datasets and used the appropriate metrics while working on the imbalanced datasets. Worked wif deep neural networks and Convolutional Neural Networks (CNN's) and Recurrent Neural networks (RNN's).
  • Developed low-latency applications and interpretable models using machine learning algorithms.
  • Responsible for installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.
  • Developed Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Analyze and build a prediction model by using four various classification methods in Jupyter Notebook. Performed Data Profiling to learn about user behavior.
  • Merged user data from multiple data sources.
  • Performed Data Cleaning, features scaling, features engineering. Imported sales records from Excel to R.
  • Presented results graphically using ggplot functions.
  • Recovered databases from corrupted/lost data files in real time on production systems.

Data Scientist

Confidential, St.Louis, MO

Responsibilities:

  • Addressed overfitting by implementing the algorithm regularization methods like L2 and L1 and dropouts in neural networks.
  • Implemented statistical modeling wif XGBoost machine learning software package using Python to determine the predicted probabilities of each model.
  • Worked wif different performance metrics like log-loss, AUC, confusion matrix, f1-score for classification and mean square error, mean absolute error for regression problems.
  • Utilized Boosting algorithms to build a model for predictive analysis of student's behavior who took USMLE exam apply for residency.
  • Used NumPy, SciPy, pandas, NLTK (Natural Language Processing Toolkit), matplotlib to build the model.
  • Extracted data from HDFS using Hive, Presto and performed data analysis using Spark wif Scala, pySpark, Redshift, and feature selection and created nonparametric models in Spark.
  • Application of various Artificial Intelligence (AI) /machine learning algorithms and statistical modeling like decision trees, text analytics, Image and Text Recognition using OCR tools like Abbyy, natural language processing (NLP), supervised and unsupervised, regression models.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python and build models using deep learning frameworks.
  • Develop and implement databases, data collection systems, data analytics and other strategies dat optimize statistical efficiency and quality.
  • Utilized SAS and SQL to extract data from statewide databases for analysis.
  • Acquired data from primary or secondary data sources and maintain databases/data systems.
  • Performed Data Analysis using visualization tools such as Tableau, Spotfire, and SharePoint to provide insights into the data.
  • Contribute to major development initiatives wif codebases utilizing Python, Django, R, MySQL, MongoDB, jQuery and React.
  • Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package.
  • Used python graphics APIs for creating graphics and serialization libraries for encoding data in XML/JSON formats.
  • Perform database design, coding, testing and debugging and installing.
  • Tuned databases and application queries, performed server access and login management and performed database backups and restores.
  • Cloned schemas, objects and data on new server using exports from 10g database and imported in 11g using Oracle data pump.

Data Scientist

Confidential, San Francisco, CA

Responsibilities:

  • Responsible for Retrieving data using SQL/Hive Queries from the database and perform analysis developments.
  • Used R/SQL to manipulate data and develop and validate quantitative methods.
  • Worked wif SQL, PL/SQL, Procedures and Functions, Stored Procedures and Packages wif mapping.
  • Used Advanced Microsoft Excel functions such as Pivot tables and VLOOKUP to analyze the data and prepare programs.
  • Skilled in Advanced Regression Modelling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
  • Importing and exporting the data into HDFS and Hive using Sqoop. Worked on NOSQL database like MangoDB, Cassandra and Hbase.
  • Cleaned the data by analyzing and removing the duplicate and inaccurate data using R.
  • Worked wif Data Frames and other data interfaces in R for storing and retrieving the data.
  • Coded, Tested, Debugged, implemented and documented data using R.
  • Worked wif Quality control team to develop test plan and test cases.
  • Participated in features engineering such as feature generating, PCA, Feature normalization and label encoding wif Scikit-learn preprocessing. DataImputation using variant methods in Scikit-learn package in Python. Worked wif Tableau Reports to test and validate data integrity to the reports. Evaluated the performance of various models based on real datasets.
  • Completed many tasks from collecting the data and exploring the data and interpreting the statistical information.
  • Identifying the data needs and requirements and work wif other members of the IT- organization to deliver proper Data Visualization and reporting solutions to those needs.
  • Used pruning algorithms to cut away the connections and perceptron’s to significantly improve the performance of back-propagation algorithm.
  • Conducting studies, rapid plots and using advanced datamining and statistical modeling techniques to build a solution dat optimizes the quality and performance of data.
  • Worked wif several outlier algorithms like Z-score, PCA, LMS, and DBSCAN to better process the data for higher accuracy.
  • Implemented Pearson's Correlation and Maximum Variance techniques to find the key predictors for the Regression models.
  • Worked wif ETL source to specification documents and understood the business requirements and performed in extraction, transformation and loading the data into the applications.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLibSAS, regression, Spark, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFSODS, NLTK, SVM, JSON, XML, MapReduce.

Data Analyst/Data Scientist

Confidential

Responsibilities:

  • Supervised, Unsupervised, Semi-Supervised classification and clustering of documents.
  • Used key indicators in Python and machine learning concepts like regression, Boot strap Aggregation and Random Forest.
  • Sampling using predictive models enabling improved error detection.
  • Involved in sqoop implementation which helps in loading data from various RDBMS sources to Hadoop systems and vice versa.
  • Created intermediate staging tables same as source tables.
  • Used Pig as an ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS.
  • Documenting data requirements and performing data analysis, wif a clear understanding of differences between business data and metadata.
  • Worked wif SAS as an ETL tool to extract and load the data into Teradata table from excel data sources.
  • Lead the Data Correction and validation process by using data utilities to fix the mismatches between different shared business operating systems.
  • Extensive data mining of different attributes involved in business tables and providing consolidated analysis reports, resolutions on a time to time basis.
  • Interpreted complex data mapping and data integration between two or more applications in a producer/consumer construct.
  • Generated Python Django Forms to record data of online users.
  • Implemented a module to connect and view the status of an Apache Cassandra instance using python.
  • Utilized Python libraries NumPy and matplotlib.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Troubleshoot and resolve complex issue for all supported services non-hosted environment. Created tickets, prioritized and escalated issues using Remedy Connector based on SLAs.
  • Reading Alert Log and User trace Files and Diagnosing the Problem.
  • Performed daily data queries and prepared reports on daily, weekly, monthly, and quarterly basis.

Pl/SQL developer

Confidential

Responsibilities:

  • Utilized Informatica toolset (Informatica Data Explorer, and Informatica Data Quality) to analyze legacy data for data profiling.
  • Evaluated data profiling, cleansing, integration and extraction tools (e.g. Informatica).
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Used GitHub for version control.
  • Created Views and developed Stored Procedures, Functions and Triggers.
  • Wrote script SQL file to drop all the rows of tables, cleaned the schema such as objects procedures, views, synonyms, type, functions, index, database links, etc.
  • Fixed errors regarding import pump script file. Grant privilege to DBA users, roles & SYSDBA. Perform tuning Oracle servers.

We'd love your feedback!