We provide IT Staff Augmentation Services!

Machine Learning/data Scientist Resume

2.00/5 (Submit Your Rating)

Valley Forge, PA

PROFESSIONAL SUMMARY:

  • Over 8+ years of experience in Data Analysis/Business analysis, ETL Development, and Project Management.
  • Having Experience in all phases of diverse technology projects specializing in Data Science and Machine Learning.
  • Proven expertise in employing techniques for Supervised and Unsupervised (Clustering, Classification, PCA, Decision trees, KNN, SVM) learning, Predictive Analytics, Optimization Methods, and Natural Language Processing (NLP), Time Series Analysis.
  • Proficient in managing entire Data science project life cycle and actively involved in all the phases of project life cycle including Data acquisition, Data cleaning, statistical modelling and Data Visualization.
  • Good experience in Machine Learning, Data Mining with large datasets of Structured and Unstructured data.
  • Extensive experience in using Tableau functionalities for creating different filters, charts and interactive dashboards.
  • Experienced in Machine Learning Regression Algorithms like Simple, Multiple, Polynomial, SVR (Support Vector Regression), Decision Tree Regression, Random Forest Regression.
  • Experienced in advanced statistical analysis and predictive Modeling in the structured and unstructured data environment.
  • Experience in creating Data Visualizations for KPI's as per the business requirements for various departments.
  • Experience in using Statistical procedures and Machine Learning algorithms such as ANOVA, Clustering and Regression and Time Series Analysis to analyze data for further Model Building.
  • Extensive working experience with Python including Scikit - learn, SciPy, Pandas, and NumPy developing machine learning models, manipulating and handling data.
  • Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbours) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Solid understanding of AWS (Amazon Web Services) S3, EC2, RDS and IAM, Azure ML, Apache Spark, Scala process, and concepts.
  • Good Understanding of working on Artificial Neural Networks and Deep Learning models using Theano and TensorFlow packages using in Python.
  • Experienced in Machine Learning Classification Algorithms like Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, Decision Tree & Random Forest classification.
  • Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in writing/documenting Technical Design Document (TDD), Functional Specification Document (FSD), Test Plans, GAP Analysis and Source to Target mapping documents.
  • Excellent understanding of Hadoop architecture and Map Reduce concepts and HDFS Framework.
  • Strong understanding of project life cycle and SDLC methodologies including RUP, RAD, Waterfall, and Agile.
  • Equipped with experience in utilizing statistical techniques which include Correlation, Hypotheses Modeling, Inferential Statistics as well as data mining and Modeling techniques using Linear and Logistic regression, clustering, decision trees, and k-mean clustering.
  • Expertise in building Supervised and Unsupervised Machine Learning experiments using Microsoft Azure utilizing multiple algorithms to perform detailed predictive analytics and building Web Services models for all types of data: continuous, nominal, and ordinal.
  • Expertise in using Linear & Logistic Regression and Classification Modeling, Decision-trees, Principal Component Analysis (PCA), Cluster and Segmentation analyses, and have authored and co-authored several scholarly articles applying these techniques.
  • Assist in determining the full domain of the MVP, create and implement its relevant data model for the App and work with App developers integrating the MVP into the App and any backend domains.
  • Integrating these services with each other and ensuring that user access to data, data storage, and communication between various services.
  • Excellent Team player and self-starter possess good communication skills.

TECHNICAL SKILLS:

Languages: HTML5, DHTML, XML, R/R Studio, Java, Scala, Python (NumPy, SciPy, Pandas).

Cloud Computing Tools: Amazon AWS

ETL tools: Informatica Power center, SSIS, AB Initio

Data Modeling: Sybase Power Designer / IBM Data Architect

Databases: Microsoft SQL Server 2008 … MySQL 4.x/5.x, Oracle 10g, 11g, 12c, DB2, Teradata, Netezza

Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ

Database Tools: SQL Server Data Tools, Visual Studio, Spotlight, SQL Server Management Studio, Query Analyzer, Enterprise Manager, JIRA, Profiler

ETL Tools: Informatica Power Centre, SSIS

Operating Systems: All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE:

Confidential, Valley Forge, PA

Machine Learning/Data Scientist

Responsibilities:

  • Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, Space-Time.
  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various Machine learning algorithms.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Data Manipulation and Aggregation from a different source using Nexus, Business Objects, Toad, Power BI, and Smart View.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Setup storage and data analysis tools in Amazon Webservices cloud computing infrastructure.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop, and MongoDB.
  • Implementing Natural Language Processing (NLP) tools such as NLTK, Stanford's core NLP suite
  • Built Artificial Neural Network using TensorFlow in Python to identify the customer's probability of cancelling the connections. (Churn rate prediction)
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction, etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL, and MLLib libraries.
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization, and Performed Gap Analysis.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
  • As Architect delivered various complex OLAP Databases/Cubes, Scorecards, Dashboards, and Reports.
  • Programmed by a utility in Python that used multiple packages (SciPy, NumPy, Pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Designed both 3NF data models for ODS, OLTP systems and Dimensional Data Models using Star and Snowflake Schemas.
  • Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL Plus, and PL/SQL.
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and BusinessObjects.
  • Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.

Environment : AWS, R, Informatica, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, San Jose

Machine Learning/Data Scientist

Responsibilities:

  • Work independently and collaboratively throughout the complete analytics project lifecycle including data extraction/preparation, design, and implementation of scalable machine learning analysis and solutions, and documentation of results.
  • Performed statistical analysis to determine peak and off-peak time periods for rate-making purposes
  • Conducted analysis of customer data for the purposes of designing rates.
  • Identified root causes of problems and facilitated the implementation of cost-effective solutions with all levels of management.
  • Application of various machine learning algorithms and statistical Modeling like decision trees, regression models, clustering, SVM to identify Volume using Scikit-learn package in R.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Involved in transforming data from legacy tables to HDFS, and HBase tables using Sqoop.
  • Research on Reinforcement Learning and control (TensorFlow, Torch), and machine learning model (Scikit-learn).
  • Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear, and Logistic Regression, SVM, Clustering, Principle Component Analysis.
  • Performed K-means clustering, Regression and Decision Trees in R.
  • Work independently or collaboratively throughout the complete analytics project lifecycle including data extraction/preparation, design, and implementation of scalable machine learning analysis and solutions, and documentation of results.
  • Partner with technical and non-technical resources across the business to leverage their support and integrate our efforts.
  • Partner with infrastructure and platform teams to configure, tune tools, automate tasks and guide the evolution of internal big data ecosystem; serve as a bridge between data scientists and infrastructure/platform teams.
  • Worked on Text Analytics and Naive Bayes creating word clouds and retrieving data from social networking platforms.
  • Pro-actively analyze data to uncover insights that increase business value and impact.
  • Support various business partners on a wide range of analytics projects from ad-hoc requests to large-scale cross-functional engagements
  • Prepared Data Visualization reports for the management using R
  • Approach analytical problems with an appropriate blend of statistical/mathematical rigor with practical business intuition.
  • Hold a point-of-view on the strengths and limitations of statistical models and analyses in various business contexts and can evaluate and effectively communicate the uncertainty in the results.
  • Application of various machine learning algorithms and statistical Modeling like decision trees, regression models, SVM, clustering to identify Volume using Scikit-learn package in python, MATLAB.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Approach analysis in multiple ways to evaluate approaches and compare results.

Environment: Machine Learning, R Language, Hadoop, Oracle, PL/ SQL, MongoDB, Apache CXF, REST, Eclipse, WebLogic, Subversion (SVN), Junit, Agile, UML, JSP, Jasper report, ILOG, Web 2.0, SOA, Tomcat.

Confidential, Broomfield, CO

Machine Learning Engineer

Responsibilities:

  • Deployed order cancellation estimation model as a real-time production service using JD's big data platform, pipelines, and machine learning platform.
  • Implementation of machine learning methods, optimization, and visualization. Mathematical methods of statistics such as Regression Models, Decision Tree, Naïve Bayes, Ensemble Classifier, Hierarchical Clustering and Semi-Supervised Learning on different datasets using Python.
  • Researched and implemented various Machine Learning Algorithms using the R language.
  • Devised a machine learning algorithm using Python for facial recognition.
  • Used R for a prototype on a sample data exploration to identify the best algorithmic approach and then wrote Scala scripts using spark machine learning module.
  • Used Scala scripts for spark machine learning libraries API execution for decision trees, ALS, logistic and linear regressions algorithms.
  • Worked on Migrating an On-premises virtual machine to Azure Resource Manager Subscription with Azure Site Recovery.
  • Provide consulting and cloud architecture for premier customers and internal projects running on MS Azure platform for high-availability of services, low operational costs.
  • Develop structured, efficient and error-free codes for Big Data requirements using my knowledge in Hadoop and its Eco-system.
  • Development of web service using Windows Communication Foundation and.Net to receive and process XML files and deploy on Cloud Service on Microsoft Azure.
  • Worked on various methods including data fusion and machine learning and improved the accuracy of distinguished right rules from potential rules.
  • Developed Merge jobs in Python to extract and load data into a MySQL database.
  • Used Test driven approach for developing the application and Implemented the unit tests using Python Unit test framework.
  • Wrote unit test cases in Python and Objective-C for other API calls in the customer frameworks.
  • Tested with various Machine Learning algorithms like Support Vector Machine, Random Forest, Trees with XGBoost concluded Decision Trees as a champion model.
  • Used machine learning to design a classifier that matched the performance of subjective pathologist interpretations.
  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XGBoost, SVM, and Random Forest.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.

Environment: Machine Learning, R Language, Hadoop, Big Data, Python, Java, J2EE, Spring, Struts, JSF, Dojo, JavaScript, DB2, CRUD, PL/ SQL, JDBC, coherence, MongoDB, Apache CXF, soap, Web Services, Eclipse

Confidential, Dallas, TX

Data Analyst

Responsibilities:

  • Collaborated with project managers and business owners to clearly define specific business needs, expectations and timelines for Business Intelligence projects.
  • Working on data migration projects involving Cloud & Big data technologies like Microsoft Azure Data Lake, Azure SQL data warehouse, AWS, Apache Spark, Azure data Factory, Azure Service Bus, Azure Functions.
  • Building complex ETL pipelines for On-Premise systems using Microsoft SQL Server Integration Services (SSIS).
  • Working on Kanban board in JIRA and following agile methodology in solution development.
  • Developing report dashboards using Microsoft PowerBI, Microsoft SQL Server Reporting Services (SSRS), SQL Server Analysis Services (SSAS), and Microsoft Excel.
  • Maintaining and reporting operational KPIs on items such as performance, service incidents and tickets set forth by the business team.
  • Participated in requirement gathering and worked closely with the architect in designing and Modeling.
  • Worked on development of SQL and stored procedures on MYSQL.
  • Developed a shopping cart for Library and integrated web services to access the payment (E-commerce).

Environment : Microsoft Azure, Microsoft BI tools, SQL.

Confidential

Data Analyst

Responsibilities:

  • Extracted data using T- SQL from the SQL server database. Interpreted and analyzed data to identify key metrics and transform raw data into meaningful, actionable information.
  • Used Microsoft Excel (Pivot Tables, VLOOKUP, Pivot Chart, Power Pivot, and VBA) to perform a descriptive analysis of customer's data and share the results with business managers to help them in marketing communication.
  • Responsible for preparing metrics and reports for weekly and monthly metrics review, created visually impactful dashboards using Microsoft Power BI with different descriptive analysis to deliver insights on trends to business manager.
  • Performed ETL testing to ensure that data correctly extracted and loaded to the target.

Confidential

Data Analyst

Responsibilities:

  • Worked with leadership teams to implement tracking and reporting of operations metrics across global programs
  • Worked with large data sets, automate data extraction, built monitoring/reporting dashboards and high-value, automated Business Intelligence solutions (data warehousing and visualization)
  • Gathered Business Requirements, interacted with Users and SMEs to get a better understanding of the data
  • Performed Data entry, data auditing, creating data reports & monitoring all data for accuracy
  • Designed, developed and modified various Reports
  • Created and presented dashboards to provide analytical insights into data to the client
  • Translated requirement changes, analyzing, providing data driven insights into their impact on existing database structure as well as existing user data.
  • Worked primarily on SQL Server, creating Store Procedures, Functions, Triggers, Indexes and Views using T-SQL.

We'd love your feedback!