We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Denver, CO

PROFESSIONAL SUMMARY:

  • Over 6+ years of IT industry experience encompassing in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Extensive experience in Text Analytics, developing different statistical machine learning, Datamining solutions to various business problems and generating data visualizations using R, Python, and Tableau.
  • Over 2+Experience with Machine learning techniques and algorithms (such as k - NN, Naive Bayes, etc.)
  • Experience object-oriented programming (OOP) concepts using Python, C++ and PHP.
  • Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
  • Tagging of experience in foundational machine learning models and concepts: regression, random forest, boosting, deep learning.
  • Proficiency in understanding statistical and other tools/languages - R, Python, C, C++, SQL, QlikView data visualization tool
  • Familiar on Deep learning projects for image identification CNN, RNN for stock price prediction autoencoders for Movie Recommender System (PyTorch), Image captioning (CNN-RNN autoencoder architecture).
  • Exposure to AI and Deep learning platforms/methodologies like tensor flow, RNN, LSTM
  • Build LSTM neural network for text, like item description, comments.
  • Have experience in training Artificial Intelligence Chatbots.
  • Build deep neural network with output of LSTM and other features.
  • Experience in Extracting data for creating Value Added Datasets using Python, R, Azure and SQL to analyze the behavior to target a specific set of customers to obtain hidden insights within the data to effectively implement the project Objectives.
  • Worked with NoSQL Database including HBase, Cassandra and MongoDB.
  • Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R, MATLAB, Python.
  • Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning.
  • Experienced the full software lifecycle in SDLC, Agile and Scrum methodologies.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge of Recommender Systems.
  • Experienced with machine learning algorithms such as logistic regression, random forest, XP boost, KNN, SVM, neural network, linear regression, lasso regression and k-means.
  • Skilled in using dplyr and pandas in R and python for performing Exploratory data analysis.
  • Experience working with data modeling tools like Erwin, Power Designer and ERStudio.
  • Experience with data analytics, data reporting, Ad-hoc reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.

TECHNICAL SKILLS:

Languages & Machine Learning Algorithms: Python, R, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbors (K-NN)

OLAP/ BI / ETL Tool: Business Objects 6.1/XI, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle 9i OLAP, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10( Confidential )

Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Big Data Technologies: sparkpeg, Hive, HDFS, MapReduce, Pig, Kafka.

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, SAP HANA.

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

Version Control Tools: SVM, GitHub.

Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, Azure Data Warehouse, Amazon Web Services (AWS): SageMaker

Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.

PROFESSIONAL EXPERIENCE:

Confidential, Denver, CO

Data Scientist

Responsibilities:

  • Study of hemoglobin level changes on patients using Deep learning concepts
  • Project broadly involved many steps - Data Collection, Statistical analysis, Advanced datamining.
  • Have streamlined the huge set of data from Netezza DB (22 crore rows) and stored on GPU server for further analysis managed by Jenkins
  • Have done EDA on this huge set of data to arrive at a more efficient subset of data which could lead us for good training example
  • Collected data required cleaning like Imputation for categorical and Numerical data, dropping columns having +50% null values
  • Have used state of the art method as part of data preprocessing Normalization, regularization, Scaling, One hot encoding
  • Have done optimization for model building after identifying important feature using methods like RFE, Random forest light GBM, SHAP
  • Build model for time series evaluation of data using deep neural networks LSTM, RNN, ANN and arrived at Stacked Auto encoder feedforward neural network, Transactional model (HMM, GMM)
  • Developed a classification system for finding clickbait in media content using scikit-learn and TensorFlow.

Environment: Python, SQL, Netezza, SQL Server,PL/SQL, T-SQL, MLlib, regression, logistic regression, Hadoop, Py Spark, Teradata, random forest, Google Cloud, OLAP, Deep Learning (Keras, TensorFlow), Pytorch, Azure, GPU, Jenkins, Kibana, SVM, JSON, .

Confidential, Minneapolis, MN

Data Scientist/ Machine Learning Engineer

Responsibilities:

  • Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Developed a classification system for finding clickbait in media content using scikit-learn and TensorFlow.
  • Spearheaded chatbot development initiative to improve customer interaction with application.
  • Developed the chatbot using api.ai.
  • Automated csv to chatbot friendly Json transformation by writing NLP scripts to minimize development time by 20%.
  • Deployed Machine Learning models for item-item similarity on Amazon SageMaker (AWS)
  • Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build a solution that optimize the quality and performance of data.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Designed model to monitor the illegal waste dumping activity using the Caffe deep learning. Implemented Tensor RT on Caffe model to increase memory efficiency.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, PL/SQL, T-SQL, MLlib, regression, Cluster analysis, Spark, logistic regression, Hadoop, PySpark, Teradata, random forest, OLAP, Deep Learning (Keras, TensorFlow), Azure,, ODS, JSON, Tableau, XML, Cassandra, MapReduce, AWS Sage Maker.

Confidential, Birmingham, AL

Python Developer/ Machine Learning Engineer

Responsibilities:

  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe NLP Framework.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
  • Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, Power BI and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
  • As Architect delivered various complex OLAP Databases/Cubes, Scorecards, Dashboards and Reports.
  • Programmed a utility in Python that used multiple packages (SciPy, NumPy, Pandas).
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision Trees, KNN, Naive Bayes.
  • Designed both 3NF data models for ODS, OLTP systems and Dimensional Data Models using Star and Snowflake Schemas.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQLPLUS and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, and Business Objects.

Environment: R, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential

Python Developer

Responsibilities:

  • Worked on the project from gathering requirements to developing the entire application.
  • Worked on Anaconda Python Environment.
  • Created, activated and programmed in Anaconda environment.
  • Wrote programs for performance calculations using NumPy and SQL Alchemy.
  • Wrote python routines to log into the websites and fetch data for selected options.
  • Used python modules of Urllib, urllib2, Requests for web crawling.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data mining solutions to various business problems and generating data visualizations using R, Python and Tableau. Used with other packages such as Beautiful Soup for data parsing.
  • Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format. Used with other packages such as Beautiful Soup for data parsing
  • Worked on development of SQL and stored procedures on MYSQL.
  • Analyzed the code completely and have reduced the code redundancy to the optimal level.
  • Design and build a text classification application using different text classification models.
  • Used Jira for defect tracking and project management.
  • Worked on writing and as well as read data from CSV and excel file formats.
  • Involved in Sprint planning sessions and participated in the daily Agile SCRUM meetings.
  • Conducted every day scrum as part of the SCRUM Master role.
  • Developed the project in Linux environment.
  • Worked on resulting reports of the application.
  • Performed QA testing on the application.
  • Held meetings with client and worked for the entire project with limited help from the client.

Environment: Python, Anaconda, Spyder (IDE), Windows 7, Teradata, Requests, urllib, urllib2, Beautiful Soup, Tableau, python libraries such as NumPy, SQL Alchemy, MySQL

Confidential

Software Engineer

Responsibilities:

  • Designed and developed the user interface of the project with HTML, CSS and JavaScript
  • Entire Front end and back end modules of the project are developed using Python with Django Framework.
  • Designed and developed the data management systems using MySQL databases.
  • Written Python Scripts to parse the XML documents and load the data in database.
  • Utilized the existing Python and Django modules and rewritten to deliver data in required formats.
  • Client-side validations and manipulations are done using JavaScript and JQuery
  • Experienced in writing indexes, views, constraints, stored procedures, triggers, cursors and user defined functions or subroutines in MySQL.
  • Responsible for Debugging and troubleshooting the application.
  • Utilized Subversion control tool to coordinate team work.
  • Used Selenium Libraries to write fully functioning test automation process.

Environment: Python 2.6, Django, UNIX, HTML, XML, CSS, JavaScript, MySQL and Bugzilla.

We'd love your feedback!