We provide IT Staff Augmentation Services!

Sr. Data Scientist Resume

5.00/5 (Submit Your Rating)

St Louis, MO

PROFESSIONAL SUMMARY:

  • Over 8 years of IT industry experience encompassing in Machine Learning, Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Extensive experience in Text Analytics, developing different statistical machine learning, Data mining solutions to various business problems and generating data visualizations using R, Python, and Tableau.
  • Over 5+Experience with Machine learning techniques and algorithms (such as k - NN, Naive Bayes, etc.)
  • Experience object-oriented programming (OOP) concepts using Python, C++ and PHP.
  • Have knowledge on advanced SAS programming techniques, such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
  • Integration Architect & Data Scientist experience in Analytics, Big data, BPM, SOA, ETL and Cloud technologies.
  • Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
  • Tagging of experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning.
  • Proficiency in understanding statistical and other tools/languages - R, Python, C, C++, Java, SQL, UNIX, QlikView data visualization tool and Ana plan forecasting tool.
  • Proficient in the Integration of various data sources with multiple relational databases like Oracle/, MS SQL Server, DB2, Teradata and Flat Files into the staging area, ODS, Data Warehouse and DataMart.
  • Familiar on Deep learning projects for image identification CNN, RNN for stock price prediction auto encoders for Movie Recommender System (PyTorch), Image captioning (CNN-RNN auto encoder architecture).
  • Exposure to AI and Deep learning platforms/methodologies like tensor flow, RNN, LSTM
  • Build LSTM neural network for text, like item description, comments.
  • Have experience in training Artificial Intelligence Chatbots.
  • Build deep neural network with output of LSTM and other features.
  • Experience in Extracting data for creating Value Added Datasets using Python, R, Azure and SQL to analyze the behavior to target a specific set of customers to obtain hidden insights within the data to effectively implement the project Objectives.
  • Worked with NoSql Database including HBase, Cassandra and Mongo DB.
  • Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R, MATLAB, and Python.
  • Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning.
  • Worked with complex applications such as R, Stata, Scala, Perl, Linear, and SPSS to develop a neural network, cluster analysis.
  • Experienced the full software lifecycle in SDLC, Agile and Scrum methodologies.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge of Recommender Systems.
  • Experienced with machine learning algorithms such as logistic regression, random forest, XP boost, KNN, SVM, neural network, linear regression, lasso regression and k-means.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Skilled in using dplyr and pandas in R and python for performing exploratory data analysis.
  • Experience working with data modeling tools like Erwin, Power Designer and ER Studio.
  • Experience with data analytics, data reporting, Ad-hoc reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
  • Proficient knowledge of statistics, mathematics, machine learning, recommendation algorithms and analytics with an excellent understanding of business operations and analytics tools for effective analysis of data.
  • Knowledge on, NLP and MDM in the processing and consumption of non-traditional data sources.

TECHNICAL SKILLS:

Languages: Python, R Machine Learning Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, CRM, K-Nearest Neighbors (K-NN)OLAP/ BI / ETL Tool Business Objects 6.1/XI, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle 9i OLAP, Power BI, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)

Big Data Technologies: Spark peg, Hive, HDFS, Map Reduce, Pig, spark SQL and Kafka.

Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL Tools Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, SAP HANA.

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, VBA, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

Version Control Tools: SVM, GitHub.

Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.

PROFESSIONAL EXPERIENCE:

Confidential, St. Louis, MO

Sr. Data Scientist

Responsibilities:

  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Experience of developing concept of operations for new processing capabilities, developing Image Processing algorithms and assessing the quality of algorithm outputs.
  • Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Spearheaded Chatbots development initiative to improve customer interaction with application.
  • Developed the Chatbots using api.ai.
  • Automated csv to chatbot friendly JSON transformation by writing NLP scripts to minimize development time by 20%.
  • Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build a solution that optimize the quality and performance of data.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
  • Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, Mongo DB, logistic regression, Hadoop, PySpark, Teradata, random forest, OLAP, Azure, Maria DB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, XML, Macros, Cassandra, Map Reduce, AWS.

Confidential, Los Angeles, California.

Data Scientist

Responsibilities:

  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Worked with several R packages including knit, dplyr, Spark, R, Causal Infer, Space-Time.
  • Coded R functions to interface with Caffe Deep Learning Framework.
  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe NLP Framework.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and Mongo DB.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, Mongo DB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
  • Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • Designed a static pipeline in MS Azure for data ingestion and dash boarding. Used MS ML Studio for modeling and MS Power BI for dash boarding.
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
  • Programmed a utility in Python that used multiple packages (SciPy, NumPy, Pandas).
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision Trees, KNN, and Naive Bayes.
  • Designed both 3NF data models for ODS, OLTP systems and Dimensional Data Models using Star and Snowflake Schemas.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQLPLUS and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, and Business Objects.

Environment: R, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, Power BI, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL and MongoDB.

Confidential, St. Louis, MO

Data Scientist

Responsibilities:

  • Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver data science solutions.
  • Built machine learning models to identify whether a user is legitimate using real-time data analysis and prevent fraudulent transactions using the history of customer transactions with supervised learning.
  • Extracted data from SQL Server Database copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
  • Performed data cleaning including transforming variables and dealing with missing value and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Tackled highly imbalanced Fraud dataset using sampling techniques like under sampling and oversampling with SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.
  • Utilized PCA, t-SNE and other feature engineering techniques to reduce the high dimensional data, applied feature scaling, handled categorical attributes using one hot encoder of scikit-learn library
  • Developed various machine learning models such as Logistic regression, KNN, and Gradient Boosting with Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python.
  • Worked on Amazon Web Services (AWS) cloud services to do machine learning on big data.
  • Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
  • Implemented a Python-based distributed random forest via PySpark and MLlib.
  • Used cross-validation to test the model with different batches of data to find the best parameters for the model and optimized, which eventually boosted the performance.
  • Experimented with Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods and deployed the model on AWS.
  • Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.

Environment: Machine Learning, AWS, Python (Scikit-learn, SciPy NumPy, Pandas, Matplotlib, Seaborn), SQL Server, Hadoop, HDFS, Hive, Pig Latin, Apache Spark/PySpark/MLlib, GitHub, Linux, Tableau.

Confidential, Denver, CO

Data Analyst

Responsibilities:

  • Worked on the thorough Metadata and Data analysis to make sure that all the sensitive data has been identified.
  • Conducted workflow, process diagram and gap analyses to derive requirements for existing systems enhancements.
  • Worked with the reference data to analyze the sensitivity of the data and secured it to protect the privacy of the client. Worked on transformation logic for data.
  • Created and maintained required Data marts and OLAP Cubes using SAS DI Studio and SAS OLAP Cube Studio to fulfil reporting requirements.
  • Write automation scripts using Python scripting based on those manual scripts.
  • Worked with the Risk Matrix and identified the different tiers of the risks in the data warehouse environment.
  • Used Heat Maps to identify different Risk Mediums across different internal departments.
  • Performed Data Analysis and Data validation by writing SQL queries and Regular expressions.
  • Worked on several mainframe and distributed environment databases such as Sybase, SQL, ADABAS, db2 etc.
  • Worked on developing the tool to extract the data from db2 database and conducted Metadata and Data analysis.
  • Wrote SQL and querying databases.
  • Used regular expressions, text parsing, and data mining techniques to search for defined data patterns
  • Worked on several Database Management tools DB Artisian and other business intelligence tools.
  • Created SQL-Loader scripts to load legacy data into Oracle staging tables and wrote SQL queries to perform Data Validation and Data Integrity testing.
  • Maintained/updated system data flow chart, Heat Maps, Tree Maps, Visio documents, and system documentations
  • Lead meetings with the clients and business owners to discuss the sensitivity of the data.
  • Created and maintained several custom reports using Business Objects.

Environment: UNIX, SAS, Rational Suite (Requisite Pro, Clear Quest, Clear case), Enterprise Architect, Main Frame, COBOL, Informatica, MS Excel, MS Project, Oracle 9i Heat Maps, MS SQL Server 2008, Teradata, Visual Basic, Python.

Confidential

Data Analyst

Responsibilities:

  • Gather, analyze and translate business requirements into functional specification documents and share for peer review.
  • Analyzed the customer data and business rules to maintain the data quality and integrity.
  • Developed SQL queries to create tables, extract/merge to bring data together from various sources.
  • Involved in the data cleaning procedures, data alignment and data profiling.
  • Conducted exploratory data analysis to find the potential fraud customers based on the frequency of their claims and distinct reasons mentioned for each claim.
  • Responsible for maintaining Metadata repository and documenting the complete process flow to describe program development, testing and implementation, application integration and coding.
  • Explored the historical customer claims data and performed data manipulations to find the trends in the claims.
  • Participated in the Agile planning process and daily scrums, provided details to create use cases based on technical solutions and estimates and worked with internal architects and, assisted in the development of current and target state data architectures.
  • Involved in Data Migration from Teradata to MS SQL server.
  • Analyzed data from a variety of sources like MS Access, SQL, MS Excel, CSV and flat files.
  • Performed ETL process to Extract, Transform and Load the data from OLTP tables into staging tables and data warehouse.
  • Used MS Excel and Tableau for visualizations and reports.

Environment: MS Excel, MS Visio, MS Project, MS-Office, MS PowerPoint, MS Word, Macros, Teradata, Tableau, SQL, Oracle 10g, MS Office, MS Visio.

We'd love your feedback!