We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Engineer Resume

2.00/5 (Submit Your Rating)

PhiladelphiA

SUMMARY

  • Over 6+ years of IT experience in various stages of Software Development.
  • Experience in Reinforcement learning, graph, GANs, semi - supervised learning, multi-task learning.
  • Experience in converting DL/AI papers into code.
  • Hands-on experience delivering products or solutions that utilized deep learning (DL) solutions like computer vision, NLP, IoT, recommender systems.
  • Experience in integratingdata, profiling, validating anddatacleansing transformation anddatavisualization using R and Python.
  • Theoretical foundations and practical hands-on projects related to (i) supervised learning (linear and logistic regression, boosted decision trees, Support Vector Machines, neural networks, NLP), (ii) unsupervised learning (clustering, dimensionality reduction, recommender systems), (iii) probability & statistics, experiment analysis, confidence intervals, A/B testing, (iv) algorithms anddatastructures.
  • Extensive knowledge on AzureDataLake and Azure Storage.
  • This was a complex model involving Machine Learning pipelines available through thedatabricksAPI.
  • Experience in migration from heterogeneous sources including Oracle to MS SQL Server.
  • Experience in writing SQL queries and working with various databases (MS Access, MySQL, Oracle DB)
  • Hands on experience in design, management and visualization of databases using Oracle, MySQL and SQL Server.
  • In depth knowledge and hands on experience of BigData/ Hadoop ecosystem (MapReduce, HDFS, Hive, Pig and Sqoop).
  • Experience in Apache Spark, Kafka for BigDataProcessing & Scala Functional programming.
  • Experience in manipulating the largedatasets with R packages like tidyr, tidyverse, dplyr reshape, lubridate, Caret and visualizing thedatausing lattice and ggplot2 packages.
  • Experience in dimensionality reduction using techniques like PCA and LDA.
  • Experience indataanalytics, predictive analysis like Classification, Regression, Recommender Systems.
  • Good Exposure with Factor Analysis, Bugging and Boosting algorithms.
  • Experience in Descriptive Analysis Problems like Frequent Pattern Mining, Clustering, Outlier Detection.
  • Worked on Machine Learning algorithms like Classification and Regression with KNN Model, Decision Tree Model, Naïve Bayes Model, Logistic Regression, SVM Model and Latent Factor Model.
  • Hands-on experience on Python and libraries like Numpy, Pandas, Matplotlib, Seaborn, NLTK, Sci-Kit learn, SciPy.
  • Expertise and knowledge in TensorFlow to do machine learning/deep learning package in python.
  • Good knowledge on Microsoft Azure SQL, Machine Learning and HDInsight.
  • Good Exposure on SAS analytics.
  • Good Exposure in deep learning with Tensor flow in python.
  • Good Knowledge on Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and R.
  • Good knowledge in Tableau, Power BI for interactivedatavisualizations.
  • In-depth Understanding in NoSQL databases like MongoDB, HBase.
  • Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
  • Experience in Business Intelligence tools like SSIS, SSRS and ETL.
  • Proficient in design and development of various Dashboards, Reports utilizing Tableau Visualizations like bar graphs, scatter plots, pie-charts, Geographic's and other making use of actions, local and global filters, cascading filters, context filters, Quick filters, parameters according to the end user requirements.
  • Good exposure in creating pivot tables and charts in Excel.
  • Experience in developing Custom Report and different types of Tabular Reports, Matrix Reports, Ad hoc reports and distributed reports in multiple formats using SQL Server Reporting Services (SSRS).
  • Excellent Database administration (DBA) skills including user authorizations, Database creation, Tables, indexes and backup creation.

TECHNICAL SKILLS

Programming Languages: Python, SQL, R

Scripting Languages: Python, Shell Scripting, Bash.

ETL Tool: Talend

Data Sources: SQL Server, Excel

Data Visualization: Tableau, Power BI, SSRS

Predictive and Machine Learning: Linear Regression, Logistic regression, Principal Component Analysis (PCA), K-means, Random Forest, Decision Trees, Natural Language Processing (NLP) SVM, K-NN, Deep learning, Auto Encoder, CNN, R-CNN, Reinforcement Learning, Time Series Analysis and Ensemble methods

ML Frameworks, Libraries: TensorFlow, Keras, Caffe, Scikit Learn, SciPy, Pandas, Matplotlib, Seaborn, MLlib, Spark, Pytorch, Theano, OpenCV

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Spark

Operating System: Linux, Windows, Unix.

PROFESSIONAL EXPERIENCE

Confidential, Philadelphia

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Analyzed Trading mechanism for real-time transactions and build collateral management tools.
  • Compiled data from various sources to perform complex analysis for actionable results.
  • Utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Implemented Azure Data Factory pipelines, datasets, copy and transform data in bulk via Data Factory UI and PowerShell, scheduling and exporting data.
  • Designed and developed standalone data migration applications to retrieve and populate data from BLOB storage to Python,HDInsightand Power BI.
  • Measured Efficiency of Hadoop/Hive environment ensuring SLA is met.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance.
  • Performed extraction, loading, and transformations into Scale data frames onDatabricks cluster
  • Analyzing the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes.
  • Create and manageAzureAD tenants, and configure application integration with Azure AD.
  • Integrate on-premises Windows AD with Azure AD Integrating on-premises identity withAzureActive Directory.
  • Worked on deployment tools such asAzureMachine Learning Studio, Oozie, and AWS Lambda
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program. Used Tensor Flow to train the model from insightful data and look at thousands of examples.
  • Designing, developing and optimizing SQL code (DDL / DML).
  • Expertise in Data archival and Data migration, ad-hoc reporting and code utilizing SAS on UNIX and Windows Environments.
  • Tested and debuggedSASprograms against the test data.
  • Processed the data inSASfor the given requirement usingSASprogramming concepts.
  • Imported and Exported data files to and fromSASusing Proc Import and Proc Export from Excel and various delimited text-based data files such as .
  • TXT (tab delimited) and .CSV (comma delimited) files intoSASdatasets for analysis.
  • Expertise in producing RTF, PDF, HTML files using SAS ODS facility.
  • Providing support for data processes. This will involve monitoring data, profiling database usage, trouble shooting, and tuning and ensuring data integrity.
  • Participating in the full software development b lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other agile methodologies.
  • Collaborate with team members and stakeholders in design and development of data environment.
  • Learning new tools and skillsets as needs arise.
  • Preparing associated documentation for specifications, requirements and testing.
  • Optimizing the Tensor flow Model for efficiency.
  • Used Tensor flow for text summarization.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Map Reduce jobs in Python for data cleaning and data processing.

Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Azure, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, plantation, Florida

Data Scientist /Machine Learning

Responsibilities:

  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Installed and used Caffe Deep Learning Framework
  • Worked on differentdataformats such as JSON, XML and performed machinelearningalgorithms in Python.
  • Developed Machine learning pipeline onHDInsightR- server cluster, for training the model; scoring of the production data is performed on-premises SQL server (Azure Stack) by writing complex T-SQL stored procedures.
  • Tested Python/SAS on AWS cloud service and CNTK modeling on MS-Azurecloud service.
  • Created UI using JavaScript and HTML5/CSS.
  • Participated in all phases of data mining data collection, data cleaning, developing models, validation, visualization and performed Gapanalysis.
  • Developing Voice Bot using AI (IVR ), improving the interaction between Human and the Virtual Assistant
  • Implemented Event Task for execute Application Automatically
  • Involved in developing Patches & Updates Module.
  • Setup storage and dataanalysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Development and Deployment using Google Dialogflow Enterprise.
  • Worked asDataArchitects and IT Architects to understand the movement ofdataand its storage and ERStudio9.7
  • Used Kibana an open source plugin for Elasticsearch in analytics andDatavisualization.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • PerformedDataCleaning, features scaling, features engineering using pandas and numpy packages in python and build models using deep learning frameworks
  • Migrating Informatica mappings from SQL Server to Netezza Foster culture of continuous engineering improvement through mentoring, feedback, and metrics
  • Broad knowledge of programming, and scripting (especially in R / Java / Python)
  • Developing and maintainingDataDictionary to create metadata reports for technical and business purpose.
  • Good Knowledge inAzurecloud services,Azurestorage,Azureactive directory,Azure Service Bus.
  • Predictive modeling using state-of-the-art methods
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Parse and manipulate raw, complexdatastreams to prepare for loading into an analytical tool.
  • Build and maintain dashboard and reporting based on the statistical models to identify and track key metrics and risk indicators
  • Dataanalysis using regressions,datacleaning, excel v-look up, histograms and TOAD client anddatarepresentation of the analysis and suggested solutions for investors
  • Extracteddatafrom HDFS and prepareddatafor exploratory analysis using datamunging

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Azure, Unix, Python 3.5.2,, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce, Google Dialog Flow.

Confidential, Houston, Texas

Data Analyst /Data Scientist

Responsibilities:

  • Wrote the conversion rules in theDatamapping document and updating according to the business needs and the impacts identified during the conversion.
  • Actively participated in gathering requirements from various SMEs, FinancialAnalysts, RiskAnalysts, and Portfolio Management via JAD/JAR sessions.
  • Developed different Machine algorithms such as Logistic Regression and Decision trees to predict customerinsight, target marketing, potential lapse customers.
  • Worked ondatamodeling and produceddatamapping anddatadefinition documentation.
  • PerformedDatamapping, logicaldatamodeling, created class diagrams and ER diagrams and used SQL queries to filterdata.
  • Gathereddatafrom the legacy system (db2) to the present updated system usingdatawarehouse tools such as Informatica.
  • Project experience inDatamining, segmentation analysis, business forecasting and association rule mining using LargeDataSets with Machine Learning.
  • Worked with BI team in gathering the report requirements and also Sqoop to export data into HDFS and Hive
  • Involved in the below phases of Analytics using R, Python and Jupyter notebook.
  • Identified business rules fordatamigration and Performdataadministration throughdatamodels and metadata.
  • Designed, developed, and tested software to implement the models using SAS statistics procedures.
  • Functioned as the primary liaison between the business line, operations, and the technical areas throughout the project cycle. Worked with ODS and OLAP system.
  • Worked with Market Risk and Operational Risk team which were part of a different initiative.
  • Worked with Internal modeling team fordataanalysis and profiling.
  • Performed ETL Informatica development task's like creating jobs using different stages, debugging etc
  • Performeddatamapping, logicaldatamodeling, created class diagrams and ER diagrams and used SQL queries to filterdatawithin the Oracle database.
  • Worked closely withDataArchitect to review all the conceptual, logical and physical database design models with respect to functions, definition, maintenance review and supportDataanalysis,DataQuality and ETL design that feeds the logicaldatamodels.
  • Created financial package that supports 3-year financial plan for all AWS cloud services infrastructure expenses.
  • Worked on ETL design for various source systems.
  • Developed test cases for System, Integration, Functional, Progression and Regression testing, in PyUnit framework and executed test cases using Python scripts to validate products and software applications.
  • Designed reports using Tableau, tabular forms, Pivot tables and Charts on Excel.
  • Wrote complex SQL queries for checking the counts and for validating thedataat field level.
  • Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams, OOD using UML.
  • Identified, researched, investigated, analyzed, defined, and documented business processes and Use Case Scenarios
  • Worked with project managers supporting Amazon Web Services (AWS) migration.
  • Wrote complex queries to extract and insertdatafrom and into Teradata Databases.
  • Wrote packages to fetch complexdatafrom different tables in remote databases using joins, sub queries
  • ValidatedDatato check for the proper conversion of thedata.DataCleansing to identify baddataand clean thedata.Dataprofiling for accuracy, completeness, consistency.
  • Reviewed all the systems design by assuring adherence to defined requirements.
  • Met with user groups to analyze requirements and proposed changes in design and specifications.
  • Flat file conversion from thedatawarehouse scenario.
  • Created Static and Dynamic Parameters at the report level.
  • Performed migration of Reports (Crystal Reports, and Excel) from one domain to another domain using Import/Export Wizard.
  • Performed a wide range of QA testing to ensuredataintegrity and coordinated UAT to meet or exceed specified standards and end-user requirements.

Environment: UNIX, Mainframe, DB2, SAS, MS SQL Server, MS Visio, Oracle, Web Logic, MS Project, IIS, XML, Rational RequisitePro, UML,Business Objects,RUP, SSIS, Teradata, AWS, Python, Informatica 8.6.1, Cognos, WinSQL, QTP 9.2, Quality Center 9.2, TOAD, Oracle 9i/10g, PL/SQL, IBM DB2, VBA MS Excel

Confidential

Data Analyst

Responsibilities:

  • Gathered requirements, analyzed and wrote the design documents.
  • Provided End-user Training and documentation for customer reporting services.
  • Attended JAD sessions and created reports for Budgeting, Finance and Project Management departments of Pima County.
  • Performeddataprofiling in the source systems that are required for Dual Medicare Medicaid Datamart.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the trumping rules applied by MasterDataRepository
  • Worked with project team representatives to ensure that logical and physical ER/Studiodatamodels were developed in line with corporate standards and guidelines
  • Reverse engineered all the Source Database's using Embarcadero.
  • Designed and implementeddataintegration modules for Extract/Transform/Load (ETL) functions
  • Involved indatawarehouse andDatamart design
  • Experience with various ETL,datawarehousing tools and concepts
  • Good Experience with Mainframe enterprise billing systems Involved in defining the business/transformation rules applied for Dual Medicare Medicaiddata.
  • Useddataanalysis techniques to validate business rules and identify low quality for Missingdatain the existing Humana Enterprisedatawarehouse (EDW).
  • Performed dailydataqueries and prepared reports on daily, weekly, monthly, and quarterly basis.
  • Used advanced Excel functions to generate spreadsheets and pivot tables.
  • Worked with the users to do the User Acceptance Testing (UAT).
  • Created Technical specifications documents based on the functional design document for the ETL coding to build thedatamart.
  • Extensively involved inDataExtraction, Transformation and Loading (ETL process) from Source to target systems.
  • Wrote SQL queries for custom reports
  • Worked on daily basis with leadDataWarehouse developers to evaluate impact on current implementation, redesign of all ETL logic.
  • Worked on exporting reports in multiple formats including MS Word, Excel, CSV and PDF.

Environment: SQL/Server, Oracle10&11g, MS-Office, Embarcadero, Teradata, Enterprise Architect, ETL (Extract, Transform, Load), ER Studio, XML, PowerPoint, MS Visio and MS Outlook.

We'd love your feedback!