Machine Learning/data Scientist Resume
Philadelphia, PA
PROFESSIONAL SUMMARY:
- Having 8+ years of experience in NLP/NLU/NLG/AI/machinelearning/Computer vision/Probabilistic Graphical Models/Inferential statistics/Graph Theory/System Design.
- 4 years in progress, Approved Patent in Statistical Modeling and traffic pattern analysis.
- Part of R&D team to build new analytics POC's usingApache Spark, Scala, Randmachine learning.
- I can help you build futuristic AI bots to assist/replace human in various business domains.
- Proficientto understandofSparkcore, Spark SQL, Spark Streaming and Spark MLlib.
- Expert level understandingin Application Design, Development and testing in Mainframe environments usingPL/1,COBOL,EGL,Easytrieve,DB2, JCL, QC& VAG.
- Regression analysis,Statistical testanalysis,Reportand Dashboardgeneration, Data management.
- Git,Java,MySQL,MongoDB,Neo4J,AngularJS,SPSS,Tableau.
- Python,Numpy, Scikit - Learn,genism,NLTK,Tensorflow,keras.
- Experience inMachine Learning, Statistics, Regression- Linear, Logistic, Poisson, Binomial.
- Single handed built a model to replace the job of doer in the pension sector. This model (Patent under progress) generates experience from structured data and learns through a bootstrapping mechanism new experience from unseen data.
- Single handed Built and designed a whole Information extraction bot POC for KYC extraction. This bot is using adaptivelearningtechniques and uses some custom supervised classifiers for entity and relation extraction.
- Experience building solutions for enterprises, context-awareness, pervasive computing, and/or application of machine learning
- Research and development ofmachinelearningpipeline design for Optical Character Recognition (Handwritten), anomaly detection system using multi variate Gaussian model. Healthcare diagnostics systems using PGM (BN).
- Comfortable presenting to senior management, business stakeholders, and external partners.
- KBC, Chatbots, Adaptive SupervisedLearning(deterministic classification), UnsupervisedLearningmethods for IE, ANN and DeepNN for NLP and Chatbots, Probabilistic models for NLG and inferences, Decision science.
- Having hands on experience in data mining algorithms and approach.
- Having good at algorithm and design techniques.
- Fluency in one or more modern programming languages such as Java, C# or C++.
- Comfortable presenting to senior management, business stakeholders, and external partners.
- BS or MS degree in Computer Science or related quantitative field that relies heavily on statistics and ML software.
- Architecture and Design of reusable server components for the web as well as Mobile applications.
- Strong programming expertise (preferably in Python) and strong in Database SQL.
- Be a valued contributor in shaping the future of our products and services
- Solid coding and engineering skills preferably in Machine Learning
- Proficient in Python, experience building, and product ionizing end-to-end systems
- Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning
- Exposure to python and python packages.
- Experience with file systems, server architectures, databases, SQL, and data movement (ETL).
- Experience with Hadoop systems
- Experience with Supervised or Unsupervised machine learning algorithms.
TECHNOLOGIES:
Client-side Technologies: HTML5, PERL, Processing, Python and R, Python, Hive, C/C++, C#, Java or Python, name Bash.
Machine LearningModels: Basic Statistics,Supervised and Unsupervisedlearning.
Programming Languages: C#, VB.NET (VB6), VBScript, OOPS, Data structures, Algorithms
Frameworks: Shogun, Accord Framework/AForge.net, Scala, Spark, Cassandra, DL4J, ND4J, Scikit-learn
Development Tools: Cassandra,DL4J,ND4J,Scikit-learn,Shogun,AccordFramework/AForge.net, Mahout, MLlib, H2O, ClouderaOryx, GoLearn, Apache Singa.
BI Tools: C x 4,HBase x 4,Bash x 3,Spark x 3,ElasticSearch x 2
Version Controller: TFS, Microsoft Visual SourceSafe, GIT, NUNIT, MSUNIT
Software Packages: MS-Office 2003/ 07/10/13 , MS Access, Messaging Architectures.
Microsoft Technologies: PHP,Scala2,Shark2,Awk,Cascading,Cassandra,Clojure,Fortran,JavaScript,JMP,Mahout,objectiveC,QlickView,Redis,Redshifed
Web Technologies: Windows API, Web Services, Web API (RESTFUL) HTML5, XHTML, CSS3, AJAX, XML, XAML, MSMQ, Silverlight, Kendo UI.
Web Servers: IIS 5.0, IIS 6.0, IIS 7.5, IIS ADMIN.
Operating Systems: Windows Win8/XP/NT/ 95/98/2000/2008/2012 , Android SDK.
Databases: SQL Server 2014/2012/2008/2005/2000 , MS-Access, Oracle 11g/10g/9i and Teradata, big data, Hadoop, Mahout, ML lib, H2O, Cloudera Oryx, GoLearn.
PROFESSIONAL EXPERIENCE:
Confidential, Philadelphia, PA
Machine Learning/Data Scientist
Responsibilities:
- Design and develop state-of-the-art deep-learning / machine-learning algorithms for analyzing the image and video data among others.
- Develop and implement innovative AI and machine learning tools that will be used in the Risk.
- Automatic Categorization on drug efficacy and side effect extraction were performed. Counter intuitive predictors were identified usingmachine-learningmethods.
- Liaise with functional lead to understand and clarify meaning and impact of key data variables.
- Effective software development processes to customize and extend the computer vision and image processing techniques to solve new problems for Automation Anywhere.
- Develop and implement innovative data quality improvement tools.
- Experimented and applied various Data science algorithms like regression, classification, KNN and clustering to create decide and create models for solving various business requirements.
- Will demonstrate cross-functional resource interaction to accomplish your goals.
- Develop project requirements and deliverable timelines; execute efficiently to meet the plan timelines.
- Creating and supporting a data management workflow from data collection, storage, and analysis to and validation.
- Develop necessary connectors to plug ML software into wider data pipeline architectures.
- Creating and supporting a data management workflow from data collection, storage, and analysis to and validation.
- Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
- Design and build scalable software architecture to enable real-time / big-data processing.
- Acquire business knowledge in the Firm’s risk management processes.
- Be very passionate about quality and have a strong sense of ownership of the work accomplished.
- Be quick to learn new technologies as well as deliver on them in short order.
- Taking responsibility for technical problem solving, creatively meeting product objectives and developing best practices.
- Have a high sense of urgency to deliver projects as well as troubleshoot and fix data queries/ issues.
- Work independently with R&D partners to understand requirements.
Environment: R 9.0, R Studio, Machine learning, Informatica 9.0, Scala, Spark, Cassandra, DL4J, ND4J, Scikit-learn, Shogun, Accord Framework/AForge.net, Mahout, MLlib, H2O, Cloudera Oryx, GoLearn, Apache.
Confidential, NJ
Machine Learning/Data Scientist
Responsibilities:
- Built an IE bot for automating KYC extraction for Institutional entities and Risk & Compliance domain.
- Implementation and design of the patented algorithm for signature less intrusion detection and prevention for SIP traffic in Siberia's first IDS/IPS product line.
- Anamoly detection using Multivariate Gaussian model for traffic optimization
- ACL implementation using TACACS Authentication adapter, OpenSSH customization, and PAM module implementation.
- Implementation of Character Recognition using Support vectormachinefor performance optimization.
- Image compression and reconstruction using Principle component analysis.
- Recommender system using low-rank matrix factorization to increase CTR for add network.
- Collaborate with distributed cross-functional teams on common goals.
- Innovate and leverage machine learning, data mining and statistical techniques to create new, scalable solutions for business problems
- Analyze a large amount of data classify data.
- Visualize data using D3.js.
- Create a model for forecast revenue.
- Machinelearningautomatically scores user assignment based on few manually scored assignments.
- Developed Online Slot booking system for Assessment test.
- Developing new products in the site with client's specification and solving the issues in the Freshersworld.com site.
- Adding Google analytics code in site.
- Testing the compatibility and functionality of websites in differentbrowsers and mobile.
- DB design and Maintenance.
- Extracted Statistics forperformance metrics analysisfor various components on load testing.
- Applied association rule mining & chain model to identify hidden patterns and rules in remedy ticket analysis which aid in decision making.
- Segmenting ABO population and developing demographic profile against each fragment.
- Isolating customer behavioral patterns by analyzing millions of customerdatarecords over a period of time and correlating multiple customers’ attributes.
- Worked on various strategic projects (Customer engagement, SA closures, Locker surrender etc.) & Adhoc proactive Analysis with product teams to provide insights and trends using SAS & Excel.
Environment: Apache, Spark MLlib, TensorFlow, Oryx 2, Accord.NET, Amazon Machine Learning (AML)Python, Django, Flask, ORM, Jinja 2, Mako, Naive Bayes, SVM, K- means, ANN, Regression.
Confidential, Bentonville, Arkansas
Machine Learning/Data Scientist
Responsibilities:
- Developing and scaling REST APIs and Good understanding and working experience in JSON, JSON Schema, and REST API.
- Object-Oriented Design using common design patterns / Software Debugging, Test-driven development, UML, SVN.
- Created Machine Learning and statistical methods, (SVM, CRF, HMM, sequential tagging) or willingness to intensely learn.
- Used data mining algorithms and approach.
- Developed using Python unit test framework, or any other unit test framework.
- Using Python and Product ionizing end-to-end systems.
- Worked on file systems, server architectures, databases, SQL, and data movement (ETL).
- Exposure to python and python packages.
- Collaborate with Risk Analytics teams; Stress Testingteam, Middle Office, IT and other departments.
- Be responsible for creation and execution of test plans, protocols, and documentation.
- Develop and implement innovative AI and machine learning tools that will be used in the Risk.
- Be very passionate about quality and have a strong sense of ownership of the work accomplished.
- Acquire business knowledge in the Firm’s risk management processes.
- Be responsible for creation and execution of test plans, protocols, and documentation.
- Formulate and test hypotheses, extract signals from petabyte scale, unstructured data sets, and ensure that our display advertising business delivers the highest standards of performance.
- Understanding the trade-offs between competing approaches, and identifying the ones that are likely to have a real impact on the product.
- Lead a project team of systems engineers (HW, FW & SW) and internal and outsourced development partners to develop reliable, cost effective and high-quality solutions.
- Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
- NLP engineer with a profound interest in research and development for cutting edge machine learning techniques.
- Architecture and Design of reusable server components for the web as well as Mobile applications.
Environment: Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL, etc.
Confidential, Fairfield, California
Machine Learning/Data ScientistResponsibilities:
- Developing propensity models for Retail liability products to drive proactive campaigns.
- Extraction and tabulation ofdatafrom multipledatasources using R, SAS.
- Datacleansing, transformation and creating new variables using R.
- Built predictive scorecards for Cross-selling Car loan, Life Insurance, TD, and RD.
- Scoring predictive models as per regulatory requirements & ensuring deliverables with PSI.
- Using Statistical techniques on a need basis ranging from simple significance test to Regression Segmentation techniques for delivering analyses and building strategies.
- Datamodeling and formulation of statistical equations using advanced statistical forecasting techniques.
- Provide guidance and mentoring to team members.
- Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation.
- Formulate and test hypotheses, extract signals from petabyte scale, unstructured data sets, and ensure that our display advertising business delivers the highest standards of performance.
- Lead a multi-functional project team.
- Develop necessary connectors to plug ML software into wider data pipeline architectures.
- Applied association rule mining & chain model to identify hidden patterns and rules in remedy ticket analysis which aid in decision making.
- Understanding the client business problems and analyzing thedataby using appropriate Statistical models to generate insights.
- Integrated Teradata with R for BI platform and also implemented corporate business rules. Environment: R Studio, Machine learning, Informatica 9.0, Scala, Amazon Machine Learning (AML) Python, Django, SAAS. Client: DELTA Technologies & Management Services, Hyderabad, India Role: Data Modeller Description: Delta Technology's vision is to be an organization of value Delta Technology's vision is to be an organization of value, respect and transparency for its people to continuously innovate, improve and deliver efficient and effective business solutions, respect and transparency for its people to continuously innovate, improve and deliver efficient and effective business solutions.Responsibilities:
- Worked as Data Expert on a data mining ETL development project using SAS Enterprise Guide.
- Created test plan documents for all back-end database modules.
- Worked with large amounts of structured and unstructured data.
- Responsible for data collection, cleansing, and ANOVA. Designed technical solution roadmap to deal with noise in sales data.
- Worked on loading the data from MySQL to H Base where necessary using Sqoop.
- Handled end-to-end project from data discovery to model deployment
- Knowledge in Business Intelligence tools and visualization tools such as Business Objects, Tableau, ChartIO, etc.
- Knowledge in Machine Learning concepts (Generalized Linear models, Regularization, Random Forest, Time Series models, etc.).
- Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, and AJAX.
- Configured the project on WebSphere 6.1 application servers.
- Implemented the online application by using Core Java, JDBC, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL.
- Monitoring the automated loading processes.
- Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC.
- Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
- Used SAX and DOM parsers to parse the raw XML documents
- Used RAD as Development IDE for web applications.
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements for the front-end modules.
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application.
- Maintenance in the testing team for System testing/Integration/UAT.
- Guaranteeing quality in the deliverables.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- It was a part of the complete life cycle of the project from the requirements to the production support.
- Implemented the project in Linux environment.
Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.
Confidential
Data Analyst
Responsibilities:
- Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans.
- Developed Internet traffic scoring platform for ad networks, advertisers, and publishers (rule engine, site scoring, keyword scoring, lift measurement, linkage analysis).
- Responsible for communication and negotiation with project related aspects of project loading, construction budget, design alterations, and unexpected events on the project.
- Responsible for defining the key identifiers for each mapping/interface.
- Clients include eBay, Click Forensics, Cars.com, Turn.com, Microsoft, and Looksmart.
- Designed the architecture for one of the first analytics 3.0. Online platforms: all-purpose scoring, with on-demand, SaaS, API services.
- Web crawling and text mining techniques to score referral domains, generate keyword taxonomies and assess commercial value of bid keywords.
- Used RAD as Development IDE for web applications.
- Developed new hybrid statistical and data mining technique known as hidden decision trees and hidden forests.
- Reverse engineering of keyword pricing algorithms in the context of pay-per-click arbitrage.
- Performed data quality in Talend Open Studio.
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
- Automated bidding for advertiser campaigns based either on keyword or category (run-of-site) bidding.
- Creation of multimillion bid keyword lists using extensive web crawling. Identification of metrics to measure the quality of each list (yield or coverage, volume, and keyword average financial value).
- Enterprise Metadata Library with any changes or updates.
- Document data quality and traceability documents for each source interface.
- Establish standards of procedures.
- Generate weekly and monthly asset inventory reports.
