We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Engineer Resume

4.00/5 (Submit Your Rating)

Columbus, OH

PROFESSIONAL SUMMARY:

  • Savior of data with 7+ years of experience in Machine Learning, Statistical Modelling, Predictive analytics, Text Mining, Natural Language Processing and AI algorithms.
  • Passionately involved in entire Data Science Project life cycle from data extraction to Machine learning model evaluation.
  • Wrangled unstructured/structured data to help show senior management that more optimal and faster decisions can be made with the right data
  • I leveraged packages like Sklearn, Keras 2.0, TensorFlow, NLTK, SciPy in Python in developing and evaluating Machine learning, Deep Learning and NLP Models for Problem solving.
  • Hands on Experience with Data Manipulation packages like Numpy, Pandas, SQLAlchemy and Data Visualizing packages like Matplotlib, Seaborn and Bokeh.
  • Experienced in SQL programming and creation of relational and Non - Relational Data Bases
  • Worked on Statistical models to create new theories and products. I employed Spotfire, Tableau to create dashboards and visualizations.
  • Experience working with statistical and regression analysis, multi-objective optimization.
  • Designed and implemented supervised algorithms like Logistics Regression, Decision trees, XGboost, SVM's, Polynomial Regression and Unsupervised Machine Learning algorithms like clustering. K-means. Mixture models. Hierarchical Clustering, Anomaly Detection.
  • Worked with clients to identify analytical needs and documented them for further use. Identified problems and provided solutions to business problems using data processing, data visualization and graphical data analysis.
  • Solid knowledge of mathematics and experience in applying it to technical and research fields. Identifying areas where optimization can be efficient.
  • Compiled Statistical methodologies, Hacker Statistics methodologies like A/B testing, Hypothesis testing, Statistical inference, Parameter estimation on historical data to make import decisions while solving the Business problems.
  • High end knowledge on Big Data technologies like Spark SQL, PySpark, Hive, Scoop, Flume, Ambari Console.
  • Developed predictive models using Python to predict customers churn and classification of customers.
  • Query optimization, execution plan and Performance tuning of queries for better performance in SQL.
  • Worked on Shiny and R application showcasing machine learning for improving the forecast of business.
  • Hands on experience with version control systems like GIT, GitHub.

TECHNICAL SKILLS:

Programming Languages: Python, Java, SAS Base, SAS Enterprise Miner, Bash Scripting, Regular Expressions and SQL (Oracle & SQL Server).

Pandas, NumPy, SciPy, Scikit: Learn, NLTK, Spacy, matplotlib, Seaborn, Beautiful Soup, Logging, PySpark, Keras and Tensor FLow.

Machine Learning: Linear Regression, Logistic Regression, Multinomial logistic regression, Regularization (Lasso & Ridge), Decision trees, Support Vector Machines, Ensembles - Random Forest, Gradient Boosting, X treme Gradient Boosting(xGBM), Deep Learning - Neural Networks, Deep Neural Networks(CNN, RNN & LSTM) with Keras and Tensor flow, Dimensionality Reduction- Principal Component Analysis(PCA), Weight of Evidence (WOE) and Information Value, Hierarchical & K-means clustering, K-Nearest Neighbors.

Data Visualization: Tableau, Google Analytics, Advanced Microsoft Excel and Power BI.

Text Pre: Processing, Information Retrieval, Classification, Topic Modeling, Text Clustering, Sentiment Analysis and Word2Vec.

Cloud Technologies: Google Cloud Platform Big Data & Machine Learning modules- Cloud Storage, Cloud Data Flow, Cloud ML, Big Query, Cloud Data proc, Cloud Data store, Big Table. Familiarity on AWS - EMR, EC2, S3.

Version Control: Git

Big Data Tools: Spark/PySpark, HIVE, IMPALA, HUE, Map Reduce, HDFS, Sqoop, Flume and Oozie

PROFESSIONAL EXPERIENCE:

Confidential, Columbus, OH

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, Space-Time.
  • Coded R functions to interface with CaffeDeepLearningFramework.
  • Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various machinelearning algorithms.
  • Installed and used CaffeDeep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Setup storage and data analysis tools in Amazon Web Services (AWS) cloud computing infrastructure.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and Mongo DB.
  • Worked as DataArchitects and IT Architects to understand the movement of data and its storage and ER Studio 9.7.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's usingScala, Spark SQL and MLlib libraries.
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
  • Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Data Manipulation and Aggregation froma different source using Nexus, Toad, Business Objects, PowerBI and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, Name Node, Data Node, SecondaryNameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAP Databases/Cubes, Scorecards, Dashboards and Reports.
  • Programmed a utility in Python that used multiple packages (Scipy, Numpy, Pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Designed both 3NF data models for ODS, OLTP systems and Dimensional DataModels using Star and Snow flake Schemas.
  • Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Interaction with BusinessAnalyst, SMEs and other DataArchitects to understand Business needs and functionality for various project solutions
  • Interaction with BusinessAnalyst, SMEs, and other DataArchitects to understand Business needs and functionality for various project solutions
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and BusinessObjects.

Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, Philadelphia, PA

Data Scientist/Data Modeler

Responsibilities:

  • Machine Learning Projects based on Python, SQL, Spark and SAS advanced programming. Performed data exploratory, data visualizations, and feature selections
  • Applications of machine learning algorithms, including random forest and boosted tree, SVM, SGD, neural network, and deep learning using CNTK and Tensor flow.
  • Big data analytics with Hadoop,Hive QL, Spark RDD, and Spark SQL.
  • Tested Python/SAS on AWS cloud service and CNTK modeling on MS-Azure cloud service.
  • Created UI using JavaScript and HTML5/CSS.
  • Developed and tested many features for dashboard using Python, Bootstrap, CSS, and JavaScript.
  • Interacting with the ETL, BI teams to understand / support on various ongoing projects.
  • Extensively using MS Excel for data validation.
  • Wrote DAX (Data Analysis Expressions) expressions to create customized calculations and hierarchies in Power BI Desktop.
  • Implemented several DAX functions for various fact calculations for efficient data visualization in Power BI.
  • Involved in creating Data Lake by extracting customer's Big Data from various data sources into Hadoop HDFS. This included data from Excel, Flat Files, Oracle, SQL Server, Mongo Db, Cassandra, H Base, Teradata, Netezza and also log data from servers
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries
  • Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon.
  • Used R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
  • Tracking operations using sensors until certain criteria is met using Airflow technology.
  • Responsible for different Data mapping activities from Source systems to Teradata using utilities like TPump, FEXP,BTEQ, MLOAD, FLOAD etc
  • Developed PLSQL procedures and functions to automate billing operations, customer barring and number generations.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems
  • Executed ad-hoc data analysis for customer insights using SQL using Amazon AWS Hadoop Cluster.

Environment: Data Governance, SQL Server, Python, ETL, MS Office Suite - Excel (Pivot, VLOOKUP), DB2, R, Visio, HP ALM, Agile, Azure, Data Quality,, Excel, AWS Redshift, ScalaNlp, Cassandra, Oracle, Power BI,MongoDB, Informatica MDM, Cognos, SQL Server 2012, Teradata, DB2, SPSS, T-SQL, PL/SQ.

Confidential, Laguna Hills, CA

Sr. Data Scientist

Responsibilities:

  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gapanalysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informaticanewer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, TaskTracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naïve Bayes.
  • Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs,and other Data Architects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow flake Schemas.

Environment: R 9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Rational Rose.

Confidential, New York, NY

Data Analyst

Responsibilities:

  • Data analysis and reporting using MySQL, MS Power Point, MS Access and SQL assistant.
  • Involved in MySQL, MS Power Point, MS Access Database design and design new database on Netezzawhich will have optimized outcome.
  • Involved in writing T-SQL, working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
  • Involved in writing scripts for loading data to target data Warehouse using Bteq, Fast Load, Multiload
  • Create ETL scripts using Regular Expressions and custom tools (Informatica, Pentaho, and Sync Sort) to ETL data.
  • Developed SQL Service Broker to flow and sync of data from MS-I to Microsoft's master database management (MDM).
  • Involved in loading data between Netezza tables using NZSQL utility.
  • Worked on Data modeling using Dimensional Data Modeling, Star Schema/Snow Flake schema, and Fact& Dimensional, Physical & Logical data modeling.
  • Generated Stats pack/AWR reports from Oracle database and analyzed the reports for Oracle wait events, time consuming SQL queries, table space growth, and database growth.

Environment: MySQL, MS Power Point, MS Access, MY SQL, MS Power Point, MS Access, Netezza, DB2, T-SQL, DTS, SSIS, SSRS, SSAS, ETL, Oracle, Star Schema and Snow Flake Schema.

Confidential, Weehawken, NJ

Engineer- Data Analytics

Responsibilities:

  • Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC
  • Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
  • Used SAX and DOM parsers to parse the raw XML documents
  • Used RAD as Development IDE for web applications.
  • Preparing and executing Unit test cases
  • Used Log4J logging framework to write Log messages with various levels.
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX
  • Configured the project on WebSphere 6.1 application servers
  • Implemented the online application by using Core Jdbc, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL
  • Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequencediagrams, and Activity diagrams for SDLC process of the application
  • Maintenance in the testing team for System testing/Integration/UAT
  • Guaranteeing quality in the deliverables.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support
  • Created test plan documents for all back-end database modules
  • Implemented the project in Linux environment.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLlib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, HIVE, AWS.

Confidential

Data Analyst

Responsibilities:

  • Worked with internal architects, assisting in the development of current and target state data architectures.
  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
  • Involved in defining the business/transformation rules applied for sales and service data.
  • Implementation of Metadata Repository, Transformations, Maintaining DataQuality, DataStandards, DataGovernanceprogram, Scripts, Stored Procedures, triggers and execution of test plans
  • Define the list codes and code conversions between the source systems and the data mart.
  • Involved in defining the source to business rules, target data mappings, data definitions.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Performed data quality in Talend Open Studio.
  • Enterprise Metadata Library with any changes or updates.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.

Environment: Windows Enterprise Server 2000, SSRS, SSIS, Crystal Reports, DTS, SQL Profiler, and Query Analyze.

We'd love your feedback!