We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Resume

SUMMARY

  • Over 6+ years of IT industry experience encompassing in Machine Learning, Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Extensive experience in Text Analytics, developing different statistical machine learning, Data mining solutions to various business problems and generating data visualizations using R, Python, and Tableau.
  • Over 5+Experience with Machine learning techniques and algorithms (such as k - NN, Naive Bayes, etc.)
  • Experience object-oriented programming (OOP) concepts using Python, C++ and PHP.
  • Have knowledge on advanced SAS programming techniques, such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
  • Integration Architect & Data Scientist experience in Analytics, Big data, BPM, SOA, ETL and Cloud technologies.
  • Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
  • Tagging of experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning.
  • Proficiency in understanding statistical and other tools/languages - R, Python, C, C++, Java, SQL, UNIX, Qlik View data visualization tool and Anaplan forecasting tool.
  • Proficient in the Integration of various data sources with multiple relational databases like Oracle/, MS SQL Server, DB2, Teradata and Flat Files into the staging area, ODS, Data Warehouse and Data Mart.
  • Familiar on Deep learning projects for image identification CNN, RNN for stock price prediction auto encoders for Movie Recommender System (PyTorch), Image captioning (CNN-RNN auto encoder architecture).
  • Exposure to AI and Deep learning platforms/methodologies like tensor flow, RNN, LSTM
  • Experience in Extracting data for creating Value Added Datasets using Python, R, Azure and SQL to analyze the behavior to target a specific set of customers to obtain hidden insights within the data to effectively implement the project Objectives.
  • Worked with NoSQL Database including HBase, Cassandra and Mongo DB.
  • Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R, MATLAB, Python .
  • Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning.
  • Worked with complex applications such as R, Stata, Scala, Perl, Linear, and SPSS to develop a neural network, cluster analysis.
  • Experienced the full software lifecycle in SDLC, Agile and Scrum methodologies.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge of Recommender Systems.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Skilled in using dplyr and pandas in R and python for performing exploratory data analysis.
  • Experience working with data modeling tools like Erwin, Power Designer and ER Studio.
  • Experience with data analytics, data reporting, Ad-hoc reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
  • Proficient knowledge of statistics, mathematics, machine learning, recommendation algorithms and analytics with an excellent understanding of business operations and analytics tools for effective analysis of data.

TECHNICAL SKILLS

Data Modeling Tools: Erwin r 9.6/9.5, ER/Studio 9.7, Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.

Databases: Oracle, MS Access, SQL Server, Sybase, DB2, Teradata, Hive, MySQL, Oracle, Teradata, MSSQL, DB2, SQL Lite, Hbase, Mongo DB.

Machine Learning Tools: Open CV, Theano, Tensor Flow, Pygame, OpenGL, NumPy, Sym Py, Scipy, Pandas

Big Data Tools: Hadoop, Hive, Spark, Pig, HBase, Sqoop, Flume.

Web Technologies: Django, HTML/5, CSS/3, XHTML, Java Script, React Js, XML, SOAP, REST, Bootstrap, JSON, AJAX.

R Package: dplyr, sqldf, data.table, Random Forest, gbm, caret, elastic net and all sort of Machine Learning Packages.

BI Tools: Tableau 7.0/8.2, Tableau server 8.2, Tableau Reader 8.1,SAP Business Objects, Crystal Reports

Operating Systems: Microsoft Windows 8/7/XP, Linux and UNIX

Languages: SAS/STAT, SAS/ETS, SAS E-Miner, SPSS, SQL, PL/SQL, ASP, Visual Basic, XML, Python, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML, UNIX shell scripting, PERL, R, SCALA, MATLAB, Spark, Power BI.

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Data Scientist/Machine Learning

Confidential

Responsibilities:

  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
  • A highly immersive Data Science program involving Data Manipulation&Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Setup storage and data analysis tools in Confidential Web Services cloud computing infrastructure.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • Worked on customer segmentation using an unsupervised learning technique - clustering . Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks Data Migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Experience in Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, Regression, Logistic Regression, Hadoop, NoSQL, Teradata, OLTP, Random Forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce.

Data Scientist/Machine Learning

Confidential

Responsibilities:

  • Worked with several R packages including knit, dplyr, Spark, R, Causal Infer, Space-Time.
  • Coded R functions to interface with Caffe Deep Learning Framework.
  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe NLP Framework.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
  • Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, Power BI and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
  • As Architect delivered various complex OLAP Databases/Cubes, Scorecards, Dashboards and Reports.
  • Programmed a utility in Python that used multiple packages (SciPy, NumPy, Pandas).
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision Trees, KNN,Naive Bayes.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQLPLUS and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions

Environment: R, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Hire Now