Sr. Data Scientist/machine Learning Engineer /python Resume
Seattle, WA
SUMMARY:
- 5+ years of IT industry experience encompassing in Machine Learning, Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization.
- Extensive experience in Text Analytics, developing different statistical machine learning, Data mining solutions to various business problems and generating data visualizations using R, Python, and Tableau.
- Over 5+Experience with Machine learning techniques and algorithms (such as k - NN, Naive Bayes, etc.)
- Experience object-oriented programming (OOP) concepts using Python, C++ and PHP.
- Integration Architect & Data Scientist experience in Analytics, Bigdata, BPM, SOA, ETL and Cloud technologies
- Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards
- Experience in Integrating multiple applications through REST API
- Tagging of experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning.
- Proficiency in understanding statistical and other tools/languages - R, Python, C, C++, Java, SQL, UNIX, QlikView data visualization tool and Anaplan forecasting tool
- Proficient in the Integration of various data sources with multiple relational databases like Oracle/, MS SQL Server, DB2, Teradata and Flat Files into the staging area, ODS, Data Warehouse and Data Mart.
- Familiar on Deep learning projects for image identification CNN, RNN for stock price prediction auto encoders for Movie Recommender System (PyTorch), Image captioning (CNN-RNN auto encoder architecture).
- Exposure to AI and Deep learning platforms/methodologies like tensor flow, RNN, LSTM
- Build LSTM neural network for text, like item description, comments.
- Have experience in Artificial Intelligence Chatbots.
- Build deep neural network with output of LSTM and other features.
- Experience in Extracting data for creating Value Added Datasets using Python, R, Azure and SQL to analyze the behaviour to target a specific set of customers to obtain hidden insights within the data to effectively implement the project Objectives.
- Worked with NoSQL Database including HBase, Cassandra and MongoDB.
- Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R, MATLAB, and Python.
- Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning.
- Worked with complex applications such as R, Stata, Scala, Perl, Linear, and SPSS to develop a neural network, cluster analysis.
- Good experience in Python scripting, Shell Scripting (Bash, ZSH, KSH, etc.), SQL Server, UNIX, and Linux.
- Experienced the full software lifecycle in SDLC, Agile and Scrum methodologies.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge of Recommender Systems.
- Experienced with machine learning algorithms such as logistic regression, random forest, XP boost, KNN, SVM, neural network, linear regression, lasso regression and k-means.
- Experience on Python Testing frameworks like zope, pytest, NOSE and Robot framework.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Skilled in using dplyr and pandas in R and python for performing exploratory data analysis.
- Experience working with data modeling tools like Erwin, Power Designer and ERStudio.
- Experience with data analytics, data reporting, Ad-hoc reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
- Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
- Proficient knowledge of statistics, mathematics, machine learning, recommendation algorithms and analytics with an excellent understanding of business operations and analytics tools for effective analysis of data.
- Proficient in Tableau, Adobe Analytics and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
- Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
- Worked in development environments like Git and VM.
- Experience in working with different operating systems Windows, Linux and UNIX, experienced in Shell scripting and UNIX commands.
- Ability to maintain a fun, casual, professional and productive team atmosphere.
- Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in a collaborative team, a self-motivated enthusiastic learner.
TECHNICAL SKILLS:
Libraries: Scikit-learns, Keras, TensorFlow, Numpy, Pandas, NLTK, Gensim, Matplotib, ggplot2, Scrapy, BeautifulSoup, Seaborn, Bokeh, networkx, Stats models, Theano.
Programming/Scripting Languages: Python, R, SQL, Scala, Pig, C, MATLAB, Java.
Querying languages SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL, Python Scripting, Shell Scripting.
Machine Learning: Data Preprocessing, Weighted Least Square, PCR, PLS, Picewise, Spline, Quadratic Discriminant Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, K-means, Ridge and Lasso, Polynomial Regression, Azure, Perceptron, Back Propagation,PCA LDA. UML, RDF, SPARQL
ML packages: NumPy, SciPy, Scikit-learn, Pandas, PySpark, TensorFlow, Matplotlib, NLTK
Cloud Technologies: - AWS (S3, Athena, EMR, Machine Learning Services, Amazon QuickSight)
Visualization Tools: Tableau, Python - Matplotlib, Seaborn
Databases: MySQL, SQL Lite
IDE Tools: PyCharm, Spyder, Eclipse, Visual Studio and NetBeans, Amazon Sagemaker.
Project Management: JIRA, Share Point
SDLC Methodologies: Agile, Scrum, Waterfall
Deployment Tools: Anaconda Enterprise, R-Studio, Azure Machine Learning Studio, Oozie 4.2, and AWS Lambda.
PROFESSIONAL EXPERIENCE:
Confidential, Seattle, WA
Sr. Data Scientist/Machine Learning Engineer /Python
Responsibilities:
- Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
- Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
- Automated different workflows, which are initiated manually with Python scripts and UNIX shell scripting.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
- Preparation of Shell scripts to perform testing of the application and to run the automated scripts to analyze the performance of the created database objects.
- Applied REST API development via Django REST Framework model to develop an information management system.
- Skilled in using collections in Python scripting for manipulating and looping through different user defined objects.
- Developed Restful API's using Python Flask and SQLAlchemy data models as well as ensured code quality by writing unit tests using Pytest.
- Job scheduling, batch-job scheduling, process control, forking and cloning of jobs and checking the status of the jobs using shell scripting.
- Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Monitoring Python scripts run as daemons in the UNIX/Linux system background to collect trigger and feed arrival information.
- Spearheaded chatbot development initiative to improve customer interaction with application.
- Written and developed scripts for automating tasks using Jenkins and UNIX shell scripting
- Automated csv to chatbot friendly Json transformation by writing NLP scripts to minimize development time by 20%.
- Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build a solution that optimize the quality and performance of data.
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
- Developed tools using Python, Shell scripting, XML to automate some of the menial tasks. Interfacing with supervisors, artists, systems administrators and production to ensure production deadlines are met.
- Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
- Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Worked on customer segmentation using an unsupervised learning technique - clustering.
- Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
- Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
- Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables
Environment: Erwin r9.6, Python, Python Scripting, Rest API, SQL, Oracle 12c, Netezza, SQL Server,SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Shell Scripting, Hadoop, PySpark, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.
Confidential, Boston, MA
Data Scientist/ Machine Learning Engineer/Python
Responsibilities:
- Worked with several R packages including knit, dplyr, Spark, R, Causal Infer, Space-Time.
- Coded R functions to interface with Caffe Deep Learning Framework.
- Built database Model, Views and API's using Python for interactive web-based solutions.
- Used Python scripts to update the content in database and manipulate files.
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
- Installed and used Caffe NLP Framework.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Designed and maintained databases using Python and developed Python-based API(RESTful Web Service) using Flask, SQL Alchemy and PostgreSQL
- Integrated two payroll systems through REST API calls and worked on Data correction on records and maintained the application as per changing employee information
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
- Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Writing REST APIs, as part of developing web-based applications for insurance premium calculations, using Django REST framework.
- Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
- Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
- Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models
- Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
- Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization, and Performed Gap Analysis.
- Data Manipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, Power BI and SmartView.
- Implemented Agile Methodology for building an internal application.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
- Programmed a utility in Python that used multiple packages (SciPy, NumPy, Pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision Trees, KNN, Naive Bayes.
- Designed both 3NF data models for ODS, OLTP systems and Dimensional Data Models using Star and Snowflake Schemas.
- Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Created SQL tables with referential integrity and developed queries using SQL, SQLPLUS and PL/SQL.
- Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object-oriented Design) using UML and Visio.
- Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, and Business Objects.
Environment: R, Python, HDFS, ODS, OLTP, Oracle 10g, Shell Scripting, Python Scripting, Hive, OLAP, DB2, Metadata, Rest API, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.
Confidential, Raleigh, NC
Data Scientist/Machine Learning Engineer/Python
Responsibilities:
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
- Worked on design and development of Unix Shell Scripting as a part of the ETL process to automate the process of loading.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
- Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, dimensionality reduction etc. and utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
- Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
- Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
- Also involved in writing REST APIs using Django framework for data exchange and business logic implementation.
- Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects and SmartView.
- Implemented Agile Methodology for building an internal application.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name node, Data node, Secondary Name node, and MapReduce concepts
- Programmed a utility in Python that used multiple packages (SciPy, NumPy, Pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
- Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.
- Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solution
- Identifying and executing process improvements, hands-on in various technologies such as Oracle,and Business Objects.
Environment: AWS, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.
Confidential
Data Engineer
Responsibilities:
- Worked with data analyst to support model building, scoring, monitoring, and reporting.
- Co-ordinated with business users to gather business requirements and prepared the documentation for analysis.
- Was a liaison between project teams, data architecture, data management, data stewardship, lines of business & the delivery/development group to align business needs with enterprise data management strategy & solutions.
- Assisted in supporting the enterprise conceptual and logical data models for analytics, operational and data mart structures using an industry-standard model, where possible
- Acquired data from primary or secondary data sources and maintained databases/data systems and developed data collection system & other strategies that optimize statistical efficiency and data quality.
- Used SQL for creating and using Views, User Defined Functions, Triggers, Indexes and Stored procedures involving joins and sub-queries from multiple tables.
- Established relationships between the tables using primary and foreign key constraints using SQL triggers.
- Strong ability to Merge datasets, clean constructed datasets, produce summary statistics, conduct difference in means tests, and store all accompanying files in an organized manner.
- Load and transform large sets of structured, semi-structured and unstructured data.
- Prepared and analyzed the data includes locating, profiling, cleansing, extracting, mapping, importing, transforming, validating or modeling.
- Performed data mining, working with complex data sets, conducting multiple regression analysis and leveraging statistical tools.
- Create filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
- Acquired strong experience in all areas of SQL server development including tables, user functions, views, indexes, stored procedures, functions, joins.
- Explored dataset using various diagrams such as Histograms, boxplots, skewness in R studio.
- Analyzed the customer data and business rules to maintain data quality and integrity.
- Extensively created excel charts, pivot tables, functions in Microsoft Excel to analyze the data.
- Clean dataset by removing missing values and outliers using R studio
- Identified patterns, data quality issues, and opportunities and leveraged insights by communicating opportunities with business partners.
- Applied linear regression to understand the relationship between different attributes of the dataset and causal relationship between them using R.
- Performed statistical analysis to understand the data & produced forecast trends for various categories.
- Closely monitored the operating and financial results against plans and budgets.
- Made tables, charts, and graphs to visualize analysis for reports to clients using excel and tableau.
Environment: Python, Django 1.4, Tableau, R Studio, SQL Server, Jenkins, MySQL, Linux, HTML, CSS, Apache, Linux, Git
Confidential
Data Engineer
Responsibilities:
- Experience in analyzing data using Python, R, SQL, Microsoft Excel, Hive, PySpark, Spark SQL for Data Mining, Data Cleansing, Data Munging and Machine Learning.
- Sound knowledge in Data Quality & Data Governance practices & processes.
- Good experience in developing web applications implementing Model View Control (MVC) architecture using Django, Flask and Python web application frameworks.
- Proficient in SQLite, MySQL and SQL databases with Python.
- Experienced in working with various Python IDE's using PyCharm, PyScripter, Notebook, Spyder, Studio code, IDLE, NetBeans and Sublime Text.
- Hands-on experience in handling database issues and connections with SQL and NoSQL databases like MongoDB, DynamoDB by installing and configuring various packages in python.
- Strong ability to conduct qualitative and quantitative analysis for effective data-driven decision making.
- Conducted ad-hoc data analysis on large datasets from multiple data sources to provide data insights and actionable advice to support business leaders according to self-service BI goals.
- Experience in data preprocessing, data analysis, machine learning to get insights into structured and unstructured data.
- Good Knowledge in writing different kinds of tests like Unit test/Pytest and build them.
- Good Experience in Linux Bash scripting and following PEP-8 Guidelines in Python.
- Involved in Data pipelines using python for image pre-processing, and Testing.
- Cleansing the data for normal distribution by applying various techniques like missing value treatment, outlier treatment, and hypothesis testing.
- Loaded the dataset into Hive for ETL Operation.
- Collected data needs and requirements by Interacting with the other departments.
- Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
- Involved in development of Web Services using REST API's for sending and getting datafrom the external interface in the JSON format.
- Implemented Agile Methodology for building an internal application.
- Experienced with version control systems like Git, GitHub to keep the versions and configurations of the code organized.
Environment: Python, Django 1.4, Jenkins, MySQL, Linux, HTML, CSS, Apache, Linux, Git