Data Scientist/machine Learning Engineer Resume
Minneapolis, MN
SUMMARY:
- Over 8+ years of IT industry experience encompassing in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
- Extensive experience in Text Analytics, developing different statistical machine learning, Datamining solutions to various business problems and generating data visualizations using R, Python, and Tableau.
- Over 5+Experience with Machine learning techniques and algorithms (such as k - NN, Naive Bayes, etc.)
- Experience object-oriented programming (OOP) concepts using Python, C++ and PHP.
- Have knowledge on advanced SAS programming techniques, such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
- Integration Architect & Data Scientist experience in Analytics, Bigdata, BPM, SOA, ETL and Cloud technologies.
- Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
- Tagging of experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning.
- Proficiency in understanding statistical and other tools/languages - R, Python, C, C++, Java, SQL, UNIX, QlikView data visualization tool and Anaplan forecasting tool.
- Proficient in the Integration of various data sources with multiple relational databases like Oracle/, MS SQL Server, DB2, Teradata and Flat Files into the staging area, ODS, Data Warehouse and DataMart.
- Familiar on Deep learning projects for image identification CNN, RNN for stock price prediction autoencoders for Movie Recommender System (PyTorch), Image captioning (CNN-RNN autoencoder architecture).
- Exposure to AI and Deep learning platforms/methodologies like tensor flow, RNN, LSTM
- Build LSTM neural network for text, like item description, comments.
- Have experience in training Artificial Intelligence Chatbots.
- Build deep neural network with output of LSTM and other features.
- Experience in Extracting data for creating Value Added Datasets using Python, R, Azure and SQL to analyze the behavior to target a specific set of customers to obtain hidden insights within the data to effectively implement the project Objectives.
- Worked with NoSQL Database including HBase, Cassandra and MongoDB.
- Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R, MATLAB, Python.
- Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning.
- Worked with complex applications such as R, Stata, Scala, Perl, Linear, and SPSS to develop a neural network, cluster analysis.
- Experienced the full software lifecycle in SDLC, Agile and Scrum methodologies.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge of Recommender Systems.
- Experienced with machine learning algorithms such as logistic regression, random forest, XP boost, KNN, SVM, neural network, linear regression, lasso regression and k-means.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Skilled in using dplyr and pandas in R and python for performing Exploratory data analysis.
- Experience working with data modeling tools like Erwin, Power Designer and ERStudio.
- Experience with data analytics, data reporting, Ad-hoc reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
- Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
- Proficient knowledge of statistics, mathematics, machine learning, recommendation algorithms and analytics with an excellent understanding of business operations and analytics tools for effective analysis of data.
TECHNICAL SKILLS:
Languages: Python, R Machine Learning Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbors (K-NN)OLAP/ BI / ETL Tool: Business Objects 6.1/XI, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle 9i OLAP, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)
Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDLTools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.
Big Data Technologies sparkpeg, Hive, HDFS, MapReduce, Pig, Kafka.
Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, SAP HANA.
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.
Version Control Tools: SVM, GitHub.
Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse
Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.
PROFESSIONAL EXPERIENCE:
Data Scientist/Machine Learning Engineer
Confidential, Minneapolis, MN
Responsibilities:
- Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
- Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
- Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Spearheaded chatbot development initiative to improve customer interaction with application.
- Developed the chatbot using api.ai.
- Automated csv to chatbot friendly Json transformation by writing NLP scripts to minimize development time by 20%.
- Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build a solution that optimize the quality and performance of data.
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
- Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
- Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Worked on customer segmentation using an unsupervised learning technique - clustering.
- Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
- Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
- Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server,SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, PySpark, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.
Confidential Birmingham, AL
Machine Learning Engineer/Data Scientist
Responsibilities:
- Worked with several R packages including knit, dplyr, Spark, R, Causal Infer, Space-Time.
- Coded R functions to interface with Caffe Deep Learning Framework.
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
- Installed and used Caffe NLP Framework.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7.
- Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
- Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
- Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
- Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
- Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
- Data Manipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, Power BI and SmartView.
- Implemented Agile Methodology for building an internal application.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
- As Architect delivered various complex OLAP Databases/Cubes, Scorecards, Dashboards and Reports.
- Programmed a utility in Python that used multiple packages (SciPy, NumPy, Pandas).
- Implemented Classification using supervised algorithms like Logistic Regression, Decision Trees, KNN, Naive Bayes.
- Designed both 3NF data models for ODS, OLTP systems and Dimensional Data Models using Star and Snowflake Schemas.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Created SQL tables with referential integrity and developed queries using SQL, SQLPLUS and PL/SQL.
- Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
- Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, and Business Objects.
Environment: R, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.
Confidential, Dallas, TX
Data Scientist/Machine Learning Engineer
Responsibilities:
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
- Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, dimensionality reduction etc. and utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
- Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
- Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
- Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects and SmartView.
- Implemented Agile Methodology for building an internal application.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name node, Data node, Secondary Name node, and MapReduce concepts.
- Programmed a utility in Python that used multiple packages (SciPy, NumPy, Pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
- Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions
- Identifying and executing process improvements, hands-on in various technologies such as Oracle,and Business Objects.
Environment: AWS, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.
Confidential, San Jose, CA
Sr. Python developer
Responsibilities:
- Analyzed the requirements and designed the flow of task using flow charts and designed flow between pages of the UI.
- Documented the design solutions and created stories for client requirements.
- Written REST services using python and Apollo (internal to Cisco).
- Written Python Scripts to establish continuous workflows from different teams providing data.
- Written unit and integration tests in python to test the code.
- Implemented LDAP authentication to authenticate and authorize the Customers using python Rest Services.
- Generated client certificates in both pem and pfx formats using M2Crypto python module.
- Used SQLite3 database for caching the client requests.
- Written LDAP search filters for both single level and multilevel.
- Complete UI development using AngularJS, CSS and HTML5.
- Dashboards with quick filters, parameters and sets to handle views more efficiently.
- Performed user validations on client side as well as server side.
- Improved code reuse and performance by making effective use of various design patterns.
- Efficient delivered code based on principles of Test Driven Development (TDD) and continuous integration to keep in line with Agile Software Methodology principles.
- Participated with QA to develop test plans from high-level design documentation.
- Used Rally for Agile software management.
- Primary contact for all issues in both development and production environments.
- Implemented the Long-term fix for incidents that are happened in production environment by finding the root cause.
- Completed the cisco white belt in security, which helps to develop the applications in a secured manner to protect from the threats.
Environment: Python 2.7, Cassandra, MySQL, LDAP, Git, Linux, Windows, JSON, JQuery, HTML, XML, CSS, REST, Rally, Bootstrap, JavaScript, Angular JS, Agile, Bitbucket, Py Unit, PyCharm, Microsoft SQL server management studio, DataStax DevCenter, Apache Directory Studio, Ansible, Jenkins, Matplotlib, MOCK, Beautiful Soup, PyTest
Confidential
Python Developer
Responsibilities:
- Worked on the project from gathering requirements to developing the entire application.
- Worked on Anaconda Python Environment.
- Created, activated and programmed in Anaconda environment.
- Wrote programs for performance calculations using NumPy and SQL Alchemy.
- Wrote python routines to log into the websites and fetch data for selected options.
- Used python modules of Urllib, urllib2, Requests for web crawling.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data mining solutions to various business problems and generating data visualizations using R, Python and Tableau. Used with other packages such as Beautiful Soup for data parsing.
- Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format. Used with other packages such as Beautiful Soup for data parsing
- Worked on development of SQL and stored procedures on MYSQL.
- Analyzed the code completely and have reduced the code redundancy to the optimal level.
- Design and build a text classification application using different text classification models.
- Used Jira for defect tracking and project management.
- Worked on writing and as well as read data from CSV and excel file formats.
- Involved in Sprint planning sessions and participated in the daily Agile SCRUM meetings.
- Conducted every day scrum as part of the SCRUM Master role.
- Developed the project in Linux environment.
- Worked on resulting reports of the application.
- Performed QA testing on the application.
- Held meetings with client and worked for the entire project with limited help from the client.
Environment: Python, Anaconda, Spyder (IDE), Windows 7, Teradata, Requests, urllib, urllib2, Beautiful Soup, Tableau, python libraries such as NumPy, SQL Alchemy, MySQL