Data Scientist/machine Learning Engineer Resume
Columbus, OH
PROFESSIONAL SUMMARY:
- Savior of data with 7+ years of experience in Machine Learning, Statistical Modelling, Predictive analytics, Text Mining, Natural Language Processing and AI algorithms.
- Passionately involved in entire Data Science Project life cycle from data extraction to Machine learning model evaluation.
- Wrangled unstructured/structured data to help show senior management that more optimal and faster decisions can be made with the right data
- I leveraged packages like Sklearn, Keras 2.0, TensorFlow, NLTK, SciPy in Python in developing and evaluating Machine learning, Deep Learning and NLP Models for Problem solving.
- Hands on Experience with Data Manipulation packages like Numpy, Pandas, SQLAlchemy and Data Visualizing packages like Matplotlib, Seaborn and Bokeh.
- Experienced in SQL programming and creation of relational and Non - Relational Data Bases
- Worked on Statistical models to create new theories and products. I employed Spotfire, Tableau to create dashboards and visualizations.
- Experience working with statistical and regression analysis, multi-objective optimization.
- Designed and implemented supervised algorithms like Logistics Regression, Decision trees, XGboost, SVM's, Polynomial Regression and Unsupervised Machine Learning algorithms like clustering. K-means. Mixture models. Hierarchical Clustering, Anomaly Detection.
- Worked with clients to identify analytical needs and documented them for further use. Identified problems and provided solutions to business problems using data processing, data visualization and graphical data analysis.
- Solid knowledge of mathematics and experience in applying it to technical and research fields. Identifying areas where optimization can be efficient.
- Compiled Statistical methodologies, Hacker Statistics methodologies like A/B testing, Hypothesis testing, Statistical inference, Parameter estimation on historical data to make import decisions while solving the Business problems.
- High end knowledge on Big Data technologies like Spark SQL, PySpark, Hive, Scoop, Flume, Ambari Console.
- Developed predictive models using Python to predict customers churn and classification of customers.
- Query optimization, execution plan and Performance tuning of queries for better performance in SQL.
- Worked on Shiny and R application showcasing machine learning for improving the forecast of business.
- Hands on experience with version control systems like GIT, GitHub.
TECHNICAL SKILLS:
Programming Languages: Python, Java, SAS Base, SAS Enterprise Miner, Bash Scripting, Regular Expressions and SQL (Oracle & SQL Server).
Pandas, NumPy, SciPy, Scikit: Learn, NLTK, Spacy, matplotlib, Seaborn, Beautiful Soup, Logging, PySpark, Keras and Tensor FLow.
Machine Learning: Linear Regression, Logistic Regression, Multinomial logistic regression, Regularization (Lasso & Ridge), Decision trees, Support Vector Machines, Ensembles - Random Forest, Gradient Boosting, X treme Gradient Boosting(xGBM), Deep Learning - Neural Networks, Deep Neural Networks(CNN, RNN & LSTM) with Keras and Tensor flow, Dimensionality Reduction- Principal Component Analysis(PCA), Weight of Evidence (WOE) and Information Value, Hierarchical & K-means clustering, K-Nearest Neighbors.
Data Visualization: Tableau, Google Analytics, Advanced Microsoft Excel and Power BI.
Text Pre: Processing, Information Retrieval, Classification, Topic Modeling, Text Clustering, Sentiment Analysis and Word2Vec.
Cloud Technologies: Google Cloud Platform Big Data & Machine Learning modules- Cloud Storage, Cloud Data Flow, Cloud ML, Big Query, Cloud Data proc, Cloud Data store, Big Table. Familiarity on AWS - EMR, EC2, S3.
Version Control: Git
Big Data Tools: Spark/PySpark, HIVE, IMPALA, HUE, Map Reduce, HDFS, Sqoop, Flume and Oozie
PROFESSIONAL EXPERIENCE:
Confidential, Columbus, OH
Data Scientist/Machine Learning Engineer
Responsibilities:
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, Space-Time.
- Coded R functions to interface with CaffeDeepLearningFramework.
- Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various machinelearning algorithms.
- Installed and used CaffeDeep Learning Framework
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Setup storage and data analysis tools in Amazon Web Services (AWS) cloud computing infrastructure.
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and Mongo DB.
- Worked as DataArchitects and IT Architects to understand the movement of data and its storage and ER Studio 9.7.
- Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's usingScala, Spark SQL and MLlib libraries.
- Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
- Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
- Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
- Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
- Data Manipulation and Aggregation froma different source using Nexus, Toad, Business Objects, PowerBI and SmartView.
- Implemented Agile Methodology for building an internal application.
- Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
- Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, Name Node, Data Node, SecondaryNameNode, and MapReduce concepts.
- As Architect delivered various complex OLAP Databases/Cubes, Scorecards, Dashboards and Reports.
- Programmed a utility in Python that used multiple packages (Scipy, Numpy, Pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Designed both 3NF data models for ODS, OLTP systems and Dimensional DataModels using Star and Snow flake Schemas.
- Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
- Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
- Interaction with BusinessAnalyst, SMEs and other DataArchitects to understand Business needs and functionality for various project solutions
- Interaction with BusinessAnalyst, SMEs, and other DataArchitects to understand Business needs and functionality for various project solutions
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and BusinessObjects.
Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.
Confidential, Philadelphia, PA
Data Scientist/Data Modeler
Responsibilities:
- Machine Learning Projects based on Python, SQL, Spark and SAS advanced programming. Performed data exploratory, data visualizations, and feature selections
- Applications of machine learning algorithms, including random forest and boosted tree, SVM, SGD, neural network, and deep learning using CNTK and Tensor flow.
- Big data analytics with Hadoop,Hive QL, Spark RDD, and Spark SQL.
- Tested Python/SAS on AWS cloud service and CNTK modeling on MS-Azure cloud service.
- Created UI using JavaScript and HTML5/CSS.
- Developed and tested many features for dashboard using Python, Bootstrap, CSS, and JavaScript.
- Interacting with the ETL, BI teams to understand / support on various ongoing projects.
- Extensively using MS Excel for data validation.
- Wrote DAX (Data Analysis Expressions) expressions to create customized calculations and hierarchies in Power BI Desktop.
- Implemented several DAX functions for various fact calculations for efficient data visualization in Power BI.
- Involved in creating Data Lake by extracting customer's Big Data from various data sources into Hadoop HDFS. This included data from Excel, Flat Files, Oracle, SQL Server, Mongo Db, Cassandra, H Base, Teradata, Netezza and also log data from servers
- Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries
- Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon.
- Used R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
- Tracking operations using sensors until certain criteria is met using Airflow technology.
- Responsible for different Data mapping activities from Source systems to Teradata using utilities like TPump, FEXP,BTEQ, MLOAD, FLOAD etc
- Developed PLSQL procedures and functions to automate billing operations, customer barring and number generations.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems
- Executed ad-hoc data analysis for customer insights using SQL using Amazon AWS Hadoop Cluster.
Environment: Data Governance, SQL Server, Python, ETL, MS Office Suite - Excel (Pivot, VLOOKUP), DB2, R, Visio, HP ALM, Agile, Azure, Data Quality,, Excel, AWS Redshift, ScalaNlp, Cassandra, Oracle, Power BI,MongoDB, Informatica MDM, Cognos, SQL Server 2012, Teradata, DB2, SPSS, T-SQL, PL/SQ.
Confidential, Laguna Hills, CA
Sr. Data Scientist
Responsibilities:
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms.
- Installed and used Caffe Deep Learning Framework
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
- Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gapanalysis.
- Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Power BI and Smart View.
- Implemented Agile Methodology for building an internal application.
- Focus on integration overlap and Informaticanewer commitment to MDM with the acquisition of Identity Systems.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, TaskTracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
- As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
- Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naïve Bayes.
- Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Interaction with Business Analyst, SMEs,and other Data Architects to understand Business needs and functionality for various project solutions
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow flake Schemas.
Environment: R 9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Rational Rose.
Confidential, New York, NY
Data Analyst
Responsibilities:
- Data analysis and reporting using MySQL, MS Power Point, MS Access and SQL assistant.
- Involved in MySQL, MS Power Point, MS Access Database design and design new database on Netezzawhich will have optimized outcome.
- Involved in writing T-SQL, working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
- Involved in writing scripts for loading data to target data Warehouse using Bteq, Fast Load, Multiload
- Create ETL scripts using Regular Expressions and custom tools (Informatica, Pentaho, and Sync Sort) to ETL data.
- Developed SQL Service Broker to flow and sync of data from MS-I to Microsoft's master database management (MDM).
- Involved in loading data between Netezza tables using NZSQL utility.
- Worked on Data modeling using Dimensional Data Modeling, Star Schema/Snow Flake schema, and Fact& Dimensional, Physical & Logical data modeling.
- Generated Stats pack/AWR reports from Oracle database and analyzed the reports for Oracle wait events, time consuming SQL queries, table space growth, and database growth.
Environment: MySQL, MS Power Point, MS Access, MY SQL, MS Power Point, MS Access, Netezza, DB2, T-SQL, DTS, SSIS, SSRS, SSAS, ETL, Oracle, Star Schema and Snow Flake Schema.
Confidential, Weehawken, NJ
Engineer- Data Analytics
Responsibilities:
- Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC
- Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
- Used SAX and DOM parsers to parse the raw XML documents
- Used RAD as Development IDE for web applications.
- Preparing and executing Unit test cases
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements for the front-end modules.
- Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX
- Configured the project on WebSphere 6.1 application servers
- Implemented the online application by using Core Jdbc, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequencediagrams, and Activity diagrams for SDLC process of the application
- Maintenance in the testing team for System testing/Integration/UAT
- Guaranteeing quality in the deliverables.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Was a part of the complete life cycle of the project from the requirements to the production support
- Created test plan documents for all back-end database modules
- Implemented the project in Linux environment.
Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLlib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, HIVE, AWS.
Confidential
Data Analyst
Responsibilities:
- Worked with internal architects, assisting in the development of current and target state data architectures.
- Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
- Involved in defining the business/transformation rules applied for sales and service data.
- Implementation of Metadata Repository, Transformations, Maintaining DataQuality, DataStandards, DataGovernanceprogram, Scripts, Stored Procedures, triggers and execution of test plans
- Define the list codes and code conversions between the source systems and the data mart.
- Involved in defining the source to business rules, target data mappings, data definitions.
- Responsible for defining the key identifiers for each mapping/interface.
- Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
- Responsible for defining the key identifiers for each mapping/interface.
- Performed data quality in Talend Open Studio.
- Enterprise Metadata Library with any changes or updates.
- Document data quality and traceability documents for each source interface.
- Establish standards of procedures.
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
Environment: Windows Enterprise Server 2000, SSRS, SSIS, Crystal Reports, DTS, SQL Profiler, and Query Analyze.