We provide IT Staff Augmentation Services!

Data Scientist / Machine Learning Engineer Resume

4.00/5 (Submit Your Rating)

Parsippany New, JerseY

SUMMARY

  • Expertise in Dimensionality Reduction techniques like PCA, LDA, Singular Value Decomposition technique.
  • Expertise in k - Fold Cross Validation and Grid Search for Model Selection.
  • Explicitly fashioned in writing SQL queries for various RDBMS such as SQL Server, MySQL, Microsoft SQL, Postgre SQL, Teradata and Oracle, NoSQL databases such as MongoDB, HBase and Cassandra to handle unstructured data.
  • Experienced writing queries on HQL on Hue editor to access data from Hive Data Warehouse.\ In - depth understanding of SnowFlake cloud technology.
  • In-Depth understanding of SnowFlake Multi-cluster Size and Credit Usage
  • Played key role in Migrating Teradata objects into SnowFlake environment.
  • Strong experience in using Excel and MS Access to dump the data and analyse based on business needs.
  • Knowledge in Cloud services such as Amazon AWS and GCP.
  • Experience working with Weka and Meka (Multi-label classification).
  • Expert in python libraries such as NumPy, SciPy for mathematical calculations, Pandas for data preprocessing/wrangling, Mat plot, Seaborn for data visualization, Sklearn for machine learning, Theano, TensorFlow, Keras for Deep leaning and NLTK for NLP.
  • Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
  • Proficient in applying Statistical Modelling and Machine Learning techniques (Linear Regression, Logistic Regression, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Factor analysis, PCA, Ensembles and good knowledge on Recommendation Systems.
  • Expertise in Scrapy and beautiful soup libraries for designing web crawlers.
  • Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources and scheduling.
  • Hands on experience on building Recommendation Engines and Natural Language Processing.
  • Experience in visualization tools like, Tableau for creating dashboards.
  • Expertise designing the web crawlers for data gathering and application of LDA.
  • Good knowledge on Data Warehousing and Data Ingestion.
  • Good Knowledge on Version control systems such as Git, SVN, Github, bitbucket.
  • Robust participation for functioning in fast-paced multi-tasking environment both independently and in the collaborative team. Adequate with challenging projects and work in ambiguity to solve complex problems. A self-motivated exuberant learner.
  • Practically engaged in Evaluating Models performance using A/B Testing, K-fold cross validation, R-Square, CAP Curve, Confusion Matrix, ROC plot, Gini Coefficient and Grid Search.
  • Using Agile methodology to develop a project when working on a team.
  • Strong SQL Server programming skills, with experience in working with functions, packages and triggers.
  • Good Knowledge on Big Data Ecosystems like Apache Hadoop, MapReduce, Spark, HDFS Architecture, Cassandra, HBase, Sqoop, Hive, Pig, MLlib, ELT.
  • Expertise in ETL tools like Talend, Apache Beam.
  • Expert in using Model Pipelines to automate the tasks and p ut models into production quickly.
  • Expertise on relational databases like Oracle SQL and SQLite.
  • Good knowledge on the five stages of Design Thinking Methodology.

TECHNICAL SKILLS

  • Scikit-learns, Keras, TensorFlow, Numpy, Pandas, NLTK, Gensim, Matplotib, ggplot2, Scrapy, BeautifulSoup, Seaborn, Bokeh, networkx, Stats models, Theano.
  • Python, R, SQL, Scala, Pig
  • SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL.
  • Data Preprocessing, Weighted Least Square, PCR, PLS, Picewise, Spline, Quadratic
  • Discriminant Regression, Logistic Regression, Naive Bayes, Decision Tree, Random
  • Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, K-means
  • Ridge and Lasso, Polynomial Regression, Azure, Perceptron, Back Propagation,PCALDA. UML, RDF, SPARQL
  • Tableau, Python - Matplotlib, Seaborn
  • MySQL, SQL Lite
  • PyCharm, Spyder, Eclipse, Visual Studio and NetBeans, Amazon Sagemaker.
  • JIRA, Share Point
  • Agile, Scrum, Waterfall
  • Snowflake, Anaconda Enterprise, R-Studio, Azure Machine Learning Studio, Oozie, AWS Lambda.

PROFESSIONAL EXPERIENCE

Confidential, Parsippany, New Jersey

Data Scientist / Machine Learning Engineer

Responsibilities:

  • Applying techniques such as multivariate regressions, Bayesian probabilities, clustering algorithms, machine learning, dynamic programming, stochastic-processes, queuing theory, algorithmic knowledge to efficiently research and solve complex development problems.
  • Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
  • Worked with text feature engineering techniques n-grams, TF-IDF, word2vec etc.
  • Involved in Migrating Objects from Teradata to Snowflake.
  • Developed text classification algorithm using classical machine learning algorithms & applied the state-of-the-art machine learning algorithms such as deep neural networks and RNN's.
  • Reduced the log-loss error to below 1.0 for text classification problem using the machine learning & deep learning algorithms.
  • Built automated data pipeline infrastructure to transfer Sprinklr data from google drive to Cloud SQL using serverless cloud functions in Google Cloud Platform.
  • Built RDBMS tables in cloudSQL for large datasets pulled from various sources using API’s to develop NLP machine learning models.
  • Used python, customized the scripts for unstructured data migration using API’s to Google Big Query, Cloud storage and Google sheets
  • Data wrangling to clean transform and reshape the data utilizing Numpy and Pandas library.
  • Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified many anomalies. Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel and Python.
  • Involved in defining the Source to Target data mappings, Business rules, and data definitions.
  • Worked with different data science teams and provided respective data as required on an ad-hoc request basis.
  • Guided and advised both application engineering and data scientist teams in mutual agreements/provisions of data.
  • Leading, training and working with other data scientists in designing effective analytical approaches taking into consideration performance and scalability to large datasets
  • Application of projects knowledge of data wrangling techniques and scripting languages.
  • Performing unit and system testing to validate the output of the analytic procedures against expected results.
  • Compiling and presenting complex information using Tableau.
  • Designed and developed NLP models for sentiment analysis.
  • Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical and Physical Data Models. Expert in Business Intelligence and Data Visualization tools: Tableau, Microstrategy.
  • Selecting features, building and optimizing classifiers using Machine Learning Techniques.
  • Built earned media metrics model using NLP, Regression algorithms and Principal Component Analysis (PCA) for Fortune 500 clients, helped to choose marketing and PR efforts by ranking the media outlets.
  • Used NLP methods for information extraction, topic modelling, parsing, and relationship extraction of the twitter users.
  • Built auto text summarization NLP model with TF-IDF for social media campaigning and crisis monitoring projects; and optimized analyst work efficiency by 30% than previous years.
  • Performed data wrangling to clean, transform and reshape the data utilizing Pandas library. Analysed data using SQL, R, Scala, Python, and presented analytical reports to management and technical teams.
  • Developed a sentiment analysis model to find out the user sentiment about the product using machine learning algorithms & deep learning RNN's.
  • Modernized data streaming process by using serverless cloud functions in Google Cloud Platform to perform predictions for various projects.
  • Fetching the data lake, filtering the data and repartition the data subset in pyspark (ETL).
  • Worked on DataFrame API, Built-in functions, User-defined functions, RDDs in PySpark.
  • Worked on handling CSV tables, performing aggregations, filters and joins in Hive and AUC etc.
  • Data wrangling to clean transform and reshape the data utilizing Numpy and Pandas library.
  • By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning)
  • Forecasted based on exponential smoothing, ARIMA modeling, statistical algorithms and statistical analysis and transfer function models.
  • Created meaningful data visualizations to communicate findings and relate them back to how they create business impact.
  • Contribute to data mining architectures, modelling standards, reporting, and data analysis methodologies.
  • Conduct research and make recommendations on data mining products, services, protocols, and standards in support of procurement and development efforts.
  • Utilized a diverse array of technologies and tools as needed, to deliver insights such as R, Tableau and more.
  • Worked on Natural Language Processing with NLTK module of python for application development and automated customer response.
  • Used Text Mining and NLP techniques find the sentiment about the organization.
  • Designed a web crawler to gather the data.
  • Studying and analysing the HTML and CSS scripts of the web pages.
  • Obtaining the required data from the data and storing data as a Json file.
  • Applying Reinforced learning algorithms like Upper Confidence Bound on plausible data.
  • Developed low-latency applications and interpretable models using machine learning algorithms.
  • Worked with different performance metrics such as f-1 score, precision, recall, log-loss, accuracy
  • Used clustering technique K-Means to identify outliers and to classify unlabelled data.
  • Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data.
  • Generously practiced on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Using Patterns and variations in characteristics of data supporting the predictive analysis.

Confidential, Charlotte, NC

Data Scientist / Machine Learning Engineer

Responsibilities:

  • Researching and developing Predictive Analytic solutions and creating solutions for business needs.
  • Mining large data sets using sophisticated analytical techniques to generate insights and inform business decisions.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Analyzed and interpreted impact of weather on historical and current sales information from multiple data sources using API’s by applying Regression techniques in Python
  • Prepared Business Intelligence reports and dashboards for business managers requirements using Oracle OBIEE tool.
  • Used Kibana to visualize the data collected from Twitter using Twitter REST APIs.
  • Analysing end user requirements, communicating and modelling them to the development team.
  • Took responsibility to bridge between technologists and business stakeholders to drive innovation from conception to production.
  • Validated different models developed applying appropriate measures such as k-Fold cross validation, AUC, ROC to identify the best performing model.
  • Created Machine Learning and statistical methods, (SVM, CRF, HMM, sequential tagging) or willingness to intensely learn.
  • Initially the data was stored in MongoDB. Later the data was moved to Elasticsearch.
  • Developed Machine Learning algorithms with Spark Mlib standalone and Python.
  • Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
  • Implemented various machine learning models such as regression, classification, Tree based and Ensemble models.
  • Performed model Tuning by adjusting the Hyper parameters and raised the model accuracy.
  • Building data platforms for analytics, advanced analytics in Azure.
  • Managing Tickets using basic SQL queries.
  • Segmented the customers based on demographics using K-means Clustering.
  • Conceptualized and created a knowledge graph database of news events extracted from tweets using Python, Stanford CoreNLP, Apache Jena, RDF.
  • Extracting the data from Azure Data Lake into HDInsight Cluster (INTELLIGENCE + ANALYTICS) and applying spark transformations & Actions and loading into HDFS.
  • Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
  • Worked on data processing on very large datasets that handle missing values, creating dummy variables and various noises in data.
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
  • Used the AWS SageMaker to quickly build, train and deploy the machine learning models.
  • Creating stories with data that a non-technical team could also understand.
  • Managed database design and implemented a comprehensive Star-Schema with shared dimensions.
  • Implemented Normalization Techniques and build the tables as per the requirements given by the business users.
  • Developed and maintained stored procedures, implemented changes to database design including tables and views and Documented Source to Target mappings as per the business rules.
  • Gathering, retrieving and organising data and using it to reach meaningful conclusions.
  • Developed a system for collecting data and generating their findings into reports that improved the company.
  • Setting up the analytics system to provide insights.
  • Built time series predictive models fbProphet utilizing R to forecast daily sales demand and improved forecast accuracy by 47% optimized inventory thus reducing cost by 1%, and increasing sales by 7% for a store.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build solution that optimize the quality and performance of data.
  • Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, making it ready for statistical analysis.
  • Built automated ETL functionality using python to load structured data in to the Teradata database.
  • Built complex SQL queries to pull data from various database tables for reporting and analytical purposes.
  • Used python and SQL to assist team with data mining techniques to identify sales trends, product recalls, stock outs and analyze sales patterns from retail store data.

Confidential

Data Scientist

Responsibilities:

  • Implementation of Character Recognition using Support vector machine for performance optimization.
  • Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.
  • Implementing various machine learning algorithms in spark using MLLib.
  • Accomplished multiple tasks from collecting data to organizing and interpreting statistical information.
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI.
  • Creating statistics out of the data by analysing and generating reports.
  • Cleaning database by removing data files and unnecessary information.
  • Running SQL queries to serve solutions to customer generated tickets.
  • Performing specific data queries and writing scripts.
  • Collecting data from multiple sources and adding it to the database.
  • Research and reconcile data discrepancies occurring among various information systems and reports.
  • Identifying new sources of data and methods to improve data collection, analysis and reporting.
  • Testing prototype software and participating in approval for a new software.
  • Identifying areas with data inaccuracies and also the trends in growing data inaccuracies.
  • Contributing to the methods using large data sets and complex processes.
  • Finding trends and patterns to make recommendations to the clients.
  • Noting down the patterns weekly, monthly and quarterly.
  • Collaborating with marketers, salespeople, data architects and database developers.
  • Working with web developers to collect data and streamlining the data reuse.
  • Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files with high volume of data.
  • Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.
  • Performed Segmentation on customer’s data to identify target groups using Clustering techniques such as K-Means and further processed using Support Vector Regression.
  • Designing, developing and implementing new functionality.
  • Utilized Python to cluster users and implemented predictive analysis
  • Building and testing hypothesis, ensuring statistical significance and building statistical models for business application.
  • Monitoring the automated loading processes.
  • Advising on the suitability of methodologies and suggesting improvements.
  • Carrying out specified data processing and statistical techniques.
  • Worked on Descriptive, Diagnostic, Predictive and Prescriptive analytics.
  • Developed Informatica Mappings using various transformations and PL/SQL Packages to extract, transformation and loading of data.
  • Wrote Python program to parse and upload csv files into PostgreSQL Database. HTTP Request Library was used for Web API call.
  • Understood all the Hadoop architecture and drove all the meetings
  • Wrote SQL for data profiling and developed data quality reports.
  • Using Informatica to extract, transform & load source data from transaction systems.
  • Developing data analytical databases from complex financial source data.
  • Data entry, data auditing, creating data reports & monitoring all data for accuracy.
  • Utilized various techniques like Histogram, bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.
  • Used Data Warehousing Concepts like Ralph Kimball Methodology, Bill Inmon Methodology, OLAP, OLTP, Star Schema, Snow Flake Schema, Fact Table and Dimension Table.
  • Streamlining information by integrating data from multiple data sets into one database system.
  • Creating database triggers and designing tables.
  • Machine learning automatically scores user assignment based on few manually scored assignments.

We'd love your feedback!