We provide IT Staff Augmentation Services!

Data Scientist/ Machine Learning Resume

3.00/5 (Submit Your Rating)

Waukesha, WI

SUMMARY

  • Over 8+ years of IT industry experience encompassing in Machine Learning, Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Extensive experience in Text Analytics, developing different statistical machine learning, Data mining solutions to various business problems and generating data visualizations using R, Python, and Tableau.
  • Over 5+Experience with Machine learning techniques and algorithms (such as k - NN, Naive Bayes, etc.)
  • Experience object-oriented programming (OOP) concepts using Python, C++ and PHP.
  • Have knowledge on advanced SAS programming techniques, such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
  • Integration Architect & Data Scientist experience in Analytics, Bigdata, BPM, SOA, ETL and Cloud technologies.
  • Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
  • Tagging of experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning.
  • Proficiency in understanding statistical and other tools/languages - R, Python, C, C++, Java, SQL, UNIX, QlikView data visualization tool and Anaplan forecasting tool.
  • Proficient in the Integration of various data sources with multiple relational databases like Oracle/, MS SQL Server, DB2, Teradata and Flat Files into the staging area, ODS, Data Warehouse and DataMart.
  • Familiar on Deep learning projects for image identification CNN, RNN for stock price prediction autoencoders for Movie Recommender System (PyTorch), Image captioning (CNN-RNN autoencoder architecture).
  • Exposure to AI and Deep learning platforms/methodologies like tensor flow, RNN, LSTM
  • Experience in Extracting data for creating Value Added Datasets using Python, R, Azure and SQL to analyze the behavior to target a specific set of customers to obtain hidden insights within the data to effectively implement the project Objectives.
  • Worked with NoSQL Database including HBase, Cassandra and MongoDB.
  • Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R, MATLAB, Python.
  • Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning.
  • Worked with complex applications such as R, Stata, Scala, Perl, Linear, and SPSS to develop a neural network, cluster analysis.
  • Experienced the full software lifecycle in SDLC, Agile and Scrum methodologies.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge of Recommender Systems.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Skilled in using dplyr and pandas in R and python for performing Exploratory data analysis.
  • Experience working with data modeling tools like Erwin, Power Designer and ERStudio.
  • Experience with data analytics, data reporting, Ad-hoc reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
  • Proficient knowledge of statistics, mathematics, machine learning, recommendation algorithms and analytics with an excellent understanding of business operations and analytics tools for effective analysis of data.

TECHNICAL SKILLS

Data Modeling Tools: Erwin r 9.6/9.5, ER/Studio 9.7, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.

Databases: Oracle, MS Access, SQL Server, Sybase, DB2, Teradata, Hive, MySQL, Oracle, Teradata, MSSQL, DB2, SQL Lite, Hbase, MongoDB.

Machine Learning Tools: OpenCV, Theano, TensorFlow, Pygame, OpenGL, Numpy, Sympy, Scipy, Pandas

Big Data Tools: Hadoop, Hive, Spark, Pig, HBase, Sqoop, Flume.

Web Technologies: Django, HTML/5, CSS/3, XHTML, Java Script, React Js, XML, SOAP, REST, Bootstrap, JSON, AJAX.

R Package: dplyr, sqldf, data.table, Random Forest, gbm, caret, elastic net and all sort of Machine Learning Packages.

BI Tools: Tableau 7.0/8.2, Tableau server 8.2, Tableau Reader 8.1,SAP Business Objects, Crystal Reports

Operating Systems: Microsoft Windows 8/7/XP, Linux and UNIX

Languages: SAS/STAT, SAS/ETS, SAS E-Miner, SPSS, SQL, PL/SQL, ASP, Visual Basic, XML, Python, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML, UNIX shell scripting, PERL, R, SCALA, MATLAB, Spark, Power BI.

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Data Scientist/ Machine Learning

Confidential

Responsibilities:

  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Setup storage and data analysis tools in Confidential Web Services cloud computing infrastructure.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • Worked on customer segmentation using an unsupervised learning technique - clustering. Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, NameNode, DataNode, Secondary NameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks Data Migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Experience in Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, Regression, Logistic Regression, Hadoop, NoSQL, Teradata, OLTP, Random Forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce.

Data Scientist/ Machine Learning

Confidential

Responsibilities:

  • Worked with several R packages including knit, dplyr, Spark, R, Causal Infer, Space-Time.
  • Coded R functions to interface with Caffe Deep Learning Framework.
  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe NLP Framework.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
  • Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, Power BI and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
  • As Architect delivered various complex OLAP Databases/Cubes, Scorecards, Dashboards and Reports.
  • Programmed a utility in Python that used multiple packages (SciPy, NumPy, Pandas).
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision Trees, KNN, Naive Bayes.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQLPLUS and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions

Environment: R, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Data Scientist

Confidential - Waukesha, WI

Responsibilities:

  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, dimensionality reduction etc. and utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects and SmartView.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name node, Data node, Secondary Name node, and MapReduce concepts.
  • Programmed a utility in Python that used multiple packages (SciPy, NumPy, Pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.

Environment: AWS, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Data Modeler/ Data Analyst

Confidential - Atlanta, GA

Responsibilities:

  • Involved in requirement gathering, data analysis and Interacted with Business users to understand the reporting requirements, analyzing BI needs for the user community.
  • Created Entity/Relationship Diagrams, grouped and created the tables, validated the data, identified PKs for lookup tables.
  • Created and maintained logical, dimensional data models for different Claim types and HIPAA Standards.
  • Implemented one-many, many-many Entity relationships in the data modeling of Datawarehouse.
  • Experience working with MDM team with various business operations involved within the organization.
  • Identify the Primary Key, Foreign Key relationships across the entities and across subject areas.
  • Developed ETL routines using SSIS packages, to plan an effective package development process and design the control flow within the packages.
  • Took an active role in the design, architecture, and development of user interface objects in QlikView applications. Connected to various data sources like SQL Server, Oracle, and flat files.
  • Presented the Dashboard to Business users and cross-functional teams, define KPIs (Key Performance Indicators), and identify data sources.
  • Designed data flows that (ETL) extract, transform, and load data by optimizing SSIS performance.
  • Deliver end to end mapping from source (Guidewire application) to target (CDW) and legacy systems coverages to Landing Zone and to Guidewire Reporting Pack.
  • Involved in loading the data from Source Tables to Operational Data Source tables using Transformation and Cleansing Logic.
  • Performed the Data Accuracy, Data Analysis, Data Quality checks before and after loading the data.
  • Resolved the data type inconsistencies between the source systems and the target system using the Mapping Documents.
  • Created ad-hoc reports to users in Tableau by connecting various data sources.
  • Worked on the reporting requirements for the data warehouse.
  • Created support documentation and worked closely with production support and testing team.

Environment: Erwin8.2, Oracle 11g, OBIEE, Crystal Reports, Toad, Sybase Power Designer, Datahub, MS Visio, DB2, QlikView 11.6, Informatica.

Data Analyst

Confidential

Responsibilities:

  • Designed, Build the Dimensions, cubes with star schema andSnow FlakeSchema using SQL Server Analysis Services (SSAS).
  • Participated inJADsession with business users and sponsors to understand and document the business requirements in alignment with the financial goals of the company.
  • Involved in the analysis of Business requirement, Design, and Development of the High level and Low-level designs, Unit, and Integration testing
  • Performed data analysis and data profiling using complexSQLon various sources systems including Teradata, SQL Server.
  • Developed the logical data models and physical data models that confine existing condition/potential status data fundamentals and data flows usingERStudio
  • Performed second and third normalizations forER data modelofOLTPsystem
  • Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
  • Translate business and data requirements into Logical data models in support ofEnterprise Data Models, ODS, OLAP, OLTP, Operational Data Structuresand Analytical systems.
  • Design and model the reporting data warehouse considering current and future reporting requirement
  • Involved in the daily maintenance of the database that involved monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.
  • Worked with Data Scientist in order to create a Data marts for data science specific functions.
  • Determined data rules and conducted Logical and Physical design reviews with business analysts, developers, and DBAs.
  • Used External Loaders likeMulti-Load, TPumpandFast Loadto load data into Oracle and Database analysis, development, testing, implementation, and deployment.
  • Reviewed the logical model with application developers,ETL Team,DBAs, and testing team to provide information about the data model and business requirements.

Environment: Erwin r7.0, Informatica 6.2, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL.

Machine learning intern

Confidential

Responsibilities:

  • Developed a GUI application using Linux QT to Classify Navigation system Test data using Gaussian Anomaly models.
  • Worked on development of data warehouse, Data Lake and ETL systems using relational and non-relational tools like SQL, No SQL.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas).
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naïve Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Involved in preparation & design of technical documents like Bus Matrix Document, PPDM Model, and LDM & PDM.
  • Understanding the client business problems and analyzing the data by using appropriate Statistical models to generate insights.

We'd love your feedback!