Data Scientist Resume
Phoenix, AZ
SUMMARY
- Over 6 plus Years of Professional Qualified Data Scientist including Machine Learning, Data Mining, and Statistical Analysis.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
- Experienced with Machine learning algorithms. Python 3.5(NumPy, Pandas, Matplotlib and Sci - kit-learn), Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Linear Regression, Multiple regression, Cluster Analysis, Neural Networks, KNN, SVM, k-means from Mathematical perspective.
- Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupiter Notebook 4.X, R (ggplot2, Caret, dplyr) and Excel.
- Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modelling, and data visualization with large data sets of structured and unstructured data.
- Skilled in Advanced Regression Modelling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
- Ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL server 2008, NoSQL databases like MongoDB 3.2.
- Experience with ELK(Elasticsearch) stack technologies
- Extensive experience in Text Analytics, developing different Statistical Machine learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
- Performed Information Extraction using NLP algorithms coupled with Deep learning (ANN and CNN, RNN, LSTM, Encoders, Embedders), Keras and Tensor Flow.
- Experience with Computer vision and ROS development.
- Optimize code for performance and footprint, considering advantages of h/w platform.
- Read papers about multi-person pose estimation and achieved prototype model making use of Open Pose.
- Trained and tested various object detection models.
- Excellent understanding of SDLC, Agile and Scrum development methodology.
- Experienced in working with ARIMA parametric time series models.
- Assist in the creation of forecasting tools and models using mathematical and statistical techniques to determine continuous improvement opportunities.
- Experienced in Big Data with Hadoop, MapReduce, Spark 1.6, PySpark, SparkSQL, HDFS, Hive 1.X.
- Extracted data from HDFS and prepared data for exploratory analysis using data munging.
- Proficient in Predictive Modelling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
- Performed data manipulation, Data preparation, Normalization and Predictive modelling. Improved efficiency and accuracy by evaluating model in Python.
- Worked as a member of the experience optimization team to design creative variants to be A/B tested Focused on developing, improving, and testing the subscriptions landing pages as well as the crosswords page Updated python scripts to match training datawith our database stored in AWS Cloud Search. So that we would be able to assign each documents a response label for further classification.
- Hands on experience and in provisioning virtual clusters under Amazon Web Service (AWS) cloud which includes services like Elastic compute cloud (EC2), S3, and EMR.
- Performing Map Reduce jobs in Hadoop and implemented Spark analysis using Python for performing machine learning & predictive analytics on AWSplatform.
- Experienced working with data modelling tools like Erwin, Power Designer and E-R studio.
- Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, SSRS.
- Worked with complex applications such as, MATLAB, and SPSS to develop a neural network, cluster analysis.
- Strong experience and knowledge in data visualization with Tableau creating line and scatterplots, Bar-charts, Histograms, Pie-chart, Dot-charts, Boxplots, Timeseries, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
- Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
- Experience in visualization tools like, Tableau 9.X, 10.X for creating dashboards.
- Used the version control tools like Git 2.X and VM.
- Passionate about gleaning insightful information from massive data assets and developing a culture of sound, data-driven decision making.
- Experienced in Visual Basic for Applications and VB programming languages to work with developing applications.
- Taking responsibility for technical problem solving, creatively meeting product objectives and developing best practices
- Excellent communication skills (verbal and written) to communicate with clients and team prepare + deliver effective presentations.
- Ability to maintain a fun, casual, professional and productive team atmosphere.
TECHNICAL SKILLS
Languages: Python(NumPy,SciPy,Pandas),R,TensorFlow, NLP,C,C++,Java,MATLAB, SQL, PL/SQL.
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.
Databases: Microsoft SQL Server 2008 … MySQL 4.x/5.x, PostgreSQL, Oracle 10g, 11g, 12c, DB2, Teradata, Netezza.
Bigdata Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Spark,kafka.
Cloud technologies: AWS, GCP, VMware.
NO SQL Databases: Cassandra, MongoDB.
Development Tools: Microsoft SQL Studio, NetBeans, IntelliJ
Development Methodologies: Agile/Scrum, Waterfall, UML, Design Patterns
Version Control Tools and Testing: Git, SVM, GitHub.
ETL Tools: Informatica Power Centre, SSIS
Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos7.0/6.0.
Operating Systems: All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris
PROFESSIONAL EXPERIENCE
Confidential, Parsippany, NJ
Data scientist
Responsibilities:
- Used pandas, NumPy, Seaborn, SciPy, matplotlib, sci-kit-learn in python for developing various machine learning algorithms.
- Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
- Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, PowerBL, and SmartView.
- Extracted company relations from unstructured news to form a relation map surrounding a company.
- Applied distant supervision, CNN model and attention mechanism to obtain relation label for each company and achieved 91% accuracy in relation classification.
- Performed Information Extraction using NLP algorithms coupled with Deep learning (ANN and CNN), Keras and TensorFlow.
- Implemented Agile Methodology for building an internal application.
- Programmed a utility in Python that used multiple packages (SciPy, NumPy, pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Worked on predictive and what-if analysis using R from HDFS and successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE.
- Prepared data visualization reports for the management using R.
- Developed MapReduce/Spark, R modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a R-based distributed random forest.
- Knowledge on time series analysis using AR, MA, ARIMA, GARCH and ARCH model.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Experience with ELK(Elasticsearch) stack technologies.
- Work with the capacity planning director in building and maintaining highly advanced capacity plans for the Individual Client Solutions team.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce.
- Recommend / develop in sensor and embedded image processing HW and overall system design & architecture for a multi-camera computer vision based solution with Computer vision.
- Project experience in data mining, Segmentation analysis, business forecasting and association rule mining using large data sets with Machine learning.
- Setup storage and data analysis tools in AWS cloud computing infrastructure.
- Created financial package that supports 3-Year financial plan for all AWS cloud services infrastructure expenses.
- Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions.
- Developed normalized Logical and Physical database models for designing an OLTP application.
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
- Updated python scripts to match training data with our database stored in AWSCloudSearch. So that we would be able to assign each documents a response label for further classification.
- Used Data quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
- Participated in all phases of Data-mining, Data-collection, Data cleaning, Developing-Models, validation, Visualization and performed Gap Analysis.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas. Have done Text analytics on historical email subject lines to retrieve effective Keywords and suggest them to the creative team to create a new subject line that would increase open rates and delivery rates.
- Developed sophisticated data models to support automated reporting and analytics.
- Analyzed & processed complex data sets using the advanced query, visualization, and analytics tools.
- Knowledge on time series analysis using AR, MA, ARIMA, GARCH and ARCH model.
- Designed dashboards with Tableau provided complex reports including summaries, charts, and graphs to interpret findings to team and stakeholders.
- Identified process improvements that significantly reduce workloads or improve quality.
- Analyzed the email user click history and third-party data for pattern recognition and to support and change targeting algorithms.
- Supported client by developing Machine Learning Algorithms on Big Data using PySpark to analyze transaction fraud, Cluster Analysis etc.
- UsedPythonfor XML, JSON processing, data exchange and business logic implementation.
- Created Revenue optimization algorithm to divert click traffic to different advertiser throughout the day to maximize Revenue.
Environment: Python, R, TensorFlow, Machine learning Algorithms, Deep learning, ODS, OLTP, OLAP, Oracle 12c, Hive, Hadoop, Spark, Tableau, MapReduce, Metadata, AWS, JSON,E-R studio, Informatica 9.0, MS Excel, Mainframes MS Vision, Rational Rose.
Confidential - Phoenix, AZ
Data scientist
Responsibilities:
- Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, time, Date and Time etc.
- Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions.
- Application of various machine learning algorithms and statistical Modelling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using Sci-kit-learn package in python.
- Performed data visualization with Tableau and generated dashboards to present the findings.
- Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
- Prototyping and experimenting ML/DL algorithms and integrating into production system for different business needs.
- Analyse traffic patterns by calculating autocorrelation with different time lags.
- Ensured that the model has low False Positive Rate.
- Addressed overfitting by implementing the algorithm regularization methods like L2 and L1.
- Used Principal Component Analysis in feature engineering to analyze high dimensional data.
- Created and designed reports that will use gathered metrics to infer and draw logical conclusions from past and future behavior.
- Performed Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
- Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest and Neural Network
- Used MLLib, Spark's Machine learning library to build and evaluate different models.
- Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
- Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Communicated the results with operations team for taking best decisions.
- Collected data needs and requirements by Interacting with the other departments.
- Developed MapReduce pipeline for feature extraction using Hive.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
- Interaction with Business Analyst, SMEs and other Data Architects to understand business needs and functionality for various project solutions.
Environment: Python, R, Machine learning, CDH5, HDFS, Hive, AWS, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.
Confidential
Machine Learning developer
Responsibilities:
- Enhanced Data collection procedures to include information that is relevant for building analytic systems and created a value from data by performing advanced analytics and statistical techniques to determine to deepen insights, optimal solution architecture, efficiency, maintainability, and scalability which make predictions and generate recommendations.
- Maintained and developed complex SQL queries, stored procedures, views, functions, and reports that qualify customer requirements using Microsoft SQL Server 2008 R2.
- Support Sales and Engagement's management planning and decision making on sales incentives and production by, developing and maintaining financial models, reporting and sensitivity analysis by customer segment.
- Worked with the ETL team to document the transformation rules for Data migration from OLTP to Warehouse environment for reporting purposes.
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Sci-kit-learn.
- Worked on data modelling and produced data mapping and data definition documentation.
- Advanced and developed test plans to ensure successful delivery of a project. Employed performance analytics predicated on high-quality data to develop reports and dashboards with actionable insights.
- Précised Development and implementation of several types of sub-reports, drill down reports, summary reports, parameterized reports, and ad-hoc reports using SSRS through mailing server subscriptions & SharePoint server.
- Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
- Resolved the data related issues such as: Assessing data quality, data consolidation, evaluating existing data sources.
- Generated ad-hoc reports using Crystal Reports 9 and SQL Server Reporting Services (SSRS).
- Generated the reports and visualizations based on the insights mainly using Tableau and developed dashboards for the company insight teams.
- Worked closely with data architect to review all the conceptual, logical and physical data base design models with respect to functions, definition, maintenance review and support data analysis, Data quality and ETL design that feeds the logical data models.
- Created financial package that supports 3-Year financial plan for all AWS cloud services infrastructure expenses.
Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, Tableau, Qlik View, Django, ad-hoc, SharePoint and Query Analyzer.
Confidential
Python developer
Responsibilities:
- Involved in developing the business components using JAVA and JDBC.
- Managing, training and coordinating the team.
- Developed a new product for the unsecured loans featuring Term Business and personal loans.
- Developed and implemented the user registration and login feature for the application process from scratch by extending Django user model.
- Used Restful web service calls for the validation.
- Developed a fully automated continuous integration system using Git, Gerrit, Jenkins, MySQL and custom tools developed inPythonand Bash.
- Implemented schema and data migration for the postgresdatabase using South migration tool.
- Developed user-friendly modals for the form submissions using simplemodal.js, JQuery, Ajax and JavaScript.
- Experience in building the war with help the putty and deployed into cloud environment using the cloud controller and experience in solving the cloud issue.
- Worked closely with Client managers/Business Analysts of the bank to drive technical solutions, design and provide development estimates for schedule and effort.
- Worked very closely with product owners, project managers and vendors to satisfy all the business needs.
- Used Django framework for database application development.
- Dynamic, hard-working, ability to work in-groups as well as independently with initiative to learn new technologies/tool quickly and emphasis on delivering quality services.
- Have strong ability to build productive relationships with peers, management, and clients using strong communication, interpersonal, organizational, and planning skills.
Environment: Python, Django, JSP, Oracle, Java, MySQL, Linux, HTML, CSS.
Confidential
Python developer
Responsibilities:
- Involved in building database model, APIs and views utilizingPython, in order to build an interactive web-based solution.
- Used data types like dictionaries, tuples and object -concepts based inheritance features for making complex algorithms of networks.
- Expertise in client scripting languages like JavaScript, JQuery, JSON, DOJO, bootstrap, Angular.js.
- Designed and managed API system deployment using fast http server and Amazon AWS architecture.
- Worked onPythonOpen stack API's.
- Carried out various mathematical operations for calculation purpose usingpythonlibraries.
- Managed large datasets using Panda data frames and MySQL.
- Worked with JSON based REST Web services.
- Performed testing using Django's Test Module.
- Involved in Agile Methodologies and SCRUM Process.
- Creating unit test/regression test framework for working/new code.
- Using Subversion version control tool to coordinate team-development.
- Developed SQL Queries, Stored Procedures, and Triggers Using Oracle, SQL, PL/SQL.
- Responsible for debugging and troubleshooting the web application.
- Supported user groups by handling target-related software issues/service requests, identifying/fixing bugs.
- Developed Views and Templates with Django view, controller and template language to create a user-friendly website interface.
- Configured the Django admin site, dashboard and created a custom Django dashboard for end users with custom look and feel.
- Used Django APIs for database access.
- UsedPythonfor XML, JSON processing, data exchange and business logic implementation.
- UsedPythonscripts to update the content in database and manipulate files.
- Created UI using JavaScript and HTML5. Designed and developed data management system using MongoDB.
- Proficient in Software Design and Development with a solid background in developing.
- Worked through the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation and support.
Environment: Python, Django, JSP, Oracle, Java, MySQL, Linux, HTML, CSS.