Data Scientist/Machine Learning Engineer Resume Columbus, OH - Hire IT People

PROFESSIONAL SUMMARY:

Savior of data with 7+ years of experience in Machine Learning, Statistical Modelling, Predictive analytics, Text Mining, Natural Language Processing and AI algorithms.
Passionately involved in entire Data Science Project life cycle from data extraction to Machine learning model evaluation.
Wrangled unstructured/structured data to help show senior management that more optimal and faster decisions can be made with the right data
I leveraged packages like Sklearn, Keras 2.0, TensorFlow, NLTK, SciPy in Python in developing and evaluating Machine learning, Deep Learning and NLP Models for Problem solving.
Hands on Experience with Data Manipulation packages like Numpy, Pandas, SQLAlchemy and Data Visualizing packages like Matplotlib, Seaborn and Bokeh.
Experienced in SQL programming and creation of relational and Non - Relational Data Bases
Worked on Statistical models to create new theories and products. I employed Spotfire, Tableau to create dashboards and visualizations.
Experience working with statistical and regression analysis, multi-objective optimization.
Designed and implemented supervised algorithms like Logistics Regression, Decision trees, XGboost, SVM's, Polynomial Regression and Unsupervised Machine Learning algorithms like clustering. K-means. Mixture models. Hierarchical Clustering, Anomaly Detection.
Worked with clients to identify analytical needs and documented them for further use. Identified problems and provided solutions to business problems using data processing, data visualization and graphical data analysis.
Solid knowledge of mathematics and experience in applying it to technical and research fields. Identifying areas where optimization can be efficient.
Compiled Statistical methodologies, Hacker Statistics methodologies like A/B testing, Hypothesis testing, Statistical inference, Parameter estimation on historical data to make import decisions while solving the Business problems.
High end knowledge on Big Data technologies like Spark SQL, PySpark, Hive, Scoop, Flume, Ambari Console.
Developed predictive models using Python to predict customers churn and classification of customers.
Query optimization, execution plan and Performance tuning of queries for better performance in SQL.
Worked on Shiny and R application showcasing machine learning for improving the forecast of business.
Hands on experience with version control systems like GIT, GitHub.

TECHNICAL SKILLS:

Programming Languages: Python, Java, SAS Base, SAS Enterprise Miner, Bash Scripting, Regular Expressions and SQL (Oracle & SQL Server).

Pandas, NumPy, SciPy, Scikit: Learn, NLTK, Spacy, matplotlib, Seaborn, Beautiful Soup, Logging, PySpark, Keras and Tensor FLow.

Machine Learning: Linear Regression, Logistic Regression, Multinomial logistic regression, Regularization (Lasso & Ridge), Decision trees, Support Vector Machines, Ensembles - Random Forest, Gradient Boosting, X treme Gradient Boosting(xGBM), Deep Learning - Neural Networks, Deep Neural Networks(CNN, RNN & LSTM) with Keras and Tensor flow, Dimensionality Reduction- Principal Component Analysis(PCA), Weight of Evidence (WOE) and Information Value, Hierarchical & K-means clustering, K-Nearest Neighbors.

Data Visualization: Tableau, Google Analytics, Advanced Microsoft Excel and Power BI.

Text Pre: Processing, Information Retrieval, Classification, Topic Modeling, Text Clustering, Sentiment Analysis and Word2Vec.

Cloud Technologies: Google Cloud Platform Big Data & Machine Learning modules- Cloud Storage, Cloud Data Flow, Cloud ML, Big Query, Cloud Data proc, Cloud Data store, Big Table. Familiarity on AWS - EMR, EC2, S3.

Version Control: Git

Big Data Tools: Spark/PySpark, HIVE, IMPALA, HUE, Map Reduce, HDFS, Sqoop, Flume and Oozie

PROFESSIONAL EXPERIENCE:

Confidential, Columbus, OH

Data Scientist/Machine Learning Engineer

Responsibilities:

Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, Space-Time.
Coded R functions to interface with CaffeDeepLearningFramework.
Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various machinelearning algorithms.
Installed and used CaffeDeep Learning Framework
Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
Setup storage and data analysis tools in Amazon Web Services (AWS) cloud computing infrastructure.
Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and Mongo DB.
Worked as DataArchitects and IT Architects to understand the movement of data and its storage and ER Studio 9.7.
Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's usingScala, Spark SQL and MLlib libraries.
Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
Data Manipulation and Aggregation froma different source using Nexus, Toad, Business Objects, PowerBI and SmartView.
Implemented Agile Methodology for building an internal application.
Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, Name Node, Data Node, SecondaryNameNode, and MapReduce concepts.
As Architect delivered various complex OLAP Databases/Cubes, Scorecards, Dashboards and Reports.
Programmed a utility in Python that used multiple packages (Scipy, Numpy, Pandas)
Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
Designed both 3NF data models for ODS, OLTP systems and Dimensional DataModels using Star and Snow flake Schemas.
Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
Interaction with BusinessAnalyst, SMEs and other DataArchitects to understand Business needs and functionality for various project solutions
Interaction with BusinessAnalyst, SMEs, and other DataArchitects to understand Business needs and functionality for various project solutions
Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and BusinessObjects.

Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, Philadelphia, PA

Data Scientist/Data Modeler

Responsibilities:

Machine Learning Projects based on Python, SQL, Spark and SAS advanced programming. Performed data exploratory, data visualizations, and feature selections
Applications of machine learning algorithms, including random forest and boosted tree, SVM, SGD, neural network, and deep learning using CNTK and Tensor flow.
Big data analytics with Hadoop,Hive QL, Spark RDD, and Spark SQL.
Tested Python/SAS on AWS cloud service and CNTK modeling on MS-Azure cloud service.
Created UI using JavaScript and HTML5/CSS.
Developed and tested many features for dashboard using Python, Bootstrap, CSS, and JavaScript.
Interacting with the ETL, BI teams to understand / support on various ongoing projects.
Extensively using MS Excel for data validation.
Wrote DAX (Data Analysis Expressions) expressions to create customized calculations and hierarchies in Power BI Desktop.
Implemented several DAX functions for various fact calculations for efficient data visualization in Power BI.
Involved in creating Data Lake by extracting customer's Big Data from various data sources into Hadoop HDFS. This included data from Excel, Flat Files, Oracle, SQL Server, Mongo Db, Cassandra, H Base, Teradata, Netezza and also log data from servers
Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries
Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon.
Used R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
Tracking operations using sensors until certain criteria is met using Airflow technology.
Responsible for different Data mapping activities from Source systems to Teradata using utilities like TPump, FEXP,BTEQ, MLOAD, FLOAD etc
Developed PLSQL procedures and functions to automate billing operations, customer barring and number generations.
Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems
Executed ad-hoc data analysis for customer insights using SQL using Amazon AWS Hadoop Cluster.

Environment: Data Governance, SQL Server, Python, ETL, MS Office Suite - Excel (Pivot, VLOOKUP), DB2, R, Visio, HP ALM, Agile, Azure, Data Quality,, Excel, AWS Redshift, ScalaNlp, Cassandra, Oracle, Power BI,MongoDB, Informatica MDM, Cognos, SQL Server 2012, Teradata, DB2, SPSS, T-SQL, PL/SQ.

Confidential, Laguna Hills, CA

Sr. Data Scientist

Responsibilities:

Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms.
Installed and used Caffe Deep Learning Framework
Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gapanalysis.
Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Power BI and Smart View.
Implemented Agile Methodology for building an internal application.
Focus on integration overlap and Informaticanewer commitment to MDM with the acquisition of Identity Systems.
Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, TaskTracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naïve Bayes.
Updated Python scripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
Data transformation from various resources, data organization, features extraction from raw and stored.
Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
Interaction with Business Analyst, SMEs,and other Data Architects to understand Business needs and functionality for various project solutions
Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients
Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow flake Schemas.

Environment: R 9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Rational Rose.

Confidential, New York, NY

Data Analyst

Responsibilities:

Data analysis and reporting using MySQL, MS Power Point, MS Access and SQL assistant.
Involved in MySQL, MS Power Point, MS Access Database design and design new database on Netezzawhich will have optimized outcome.
Involved in writing T-SQL, working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
Involved in writing scripts for loading data to target data Warehouse using Bteq, Fast Load, Multiload
Create ETL scripts using Regular Expressions and custom tools (Informatica, Pentaho, and Sync Sort) to ETL data.
Developed SQL Service Broker to flow and sync of data from MS-I to Microsoft's master database management (MDM).
Involved in loading data between Netezza tables using NZSQL utility.
Worked on Data modeling using Dimensional Data Modeling, Star Schema/Snow Flake schema, and Fact& Dimensional, Physical & Logical data modeling.
Generated Stats pack/AWR reports from Oracle database and analyzed the reports for Oracle wait events, time consuming SQL queries, table space growth, and database growth.

Environment: MySQL, MS Power Point, MS Access, MY SQL, MS Power Point, MS Access, Netezza, DB2, T-SQL, DTS, SSIS, SSRS, SSAS, ETL, Oracle, Star Schema and Snow Flake Schema.

Confidential, Weehawken, NJ

Engineer- Data Analytics

Responsibilities:

Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC
Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
Used SAX and DOM parsers to parse the raw XML documents
Used RAD as Development IDE for web applications.
Preparing and executing Unit test cases
Used Log4J logging framework to write Log messages with various levels.
Involved in fixing bugs and minor enhancements for the front-end modules.
Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX
Configured the project on WebSphere 6.1 application servers
Implemented the online application by using Core Jdbc, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL
Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequencediagrams, and Activity diagrams for SDLC process of the application
Maintenance in the testing team for System testing/Integration/UAT
Guaranteeing quality in the deliverables.
Conducted Design reviews and Technical reviews with other project stakeholders.
Was a part of the complete life cycle of the project from the requirements to the production support
Created test plan documents for all back-end database modules
Implemented the project in Linux environment.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLlib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, HIVE, AWS.

Confidential

Data Analyst

Responsibilities:

Worked with internal architects, assisting in the development of current and target state data architectures.
Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
Involved in defining the business/transformation rules applied for sales and service data.
Implementation of Metadata Repository, Transformations, Maintaining DataQuality, DataStandards, DataGovernanceprogram, Scripts, Stored Procedures, triggers and execution of test plans
Define the list codes and code conversions between the source systems and the data mart.
Involved in defining the source to business rules, target data mappings, data definitions.
Responsible for defining the key identifiers for each mapping/interface.
Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
Responsible for defining the key identifiers for each mapping/interface.
Performed data quality in Talend Open Studio.
Enterprise Metadata Library with any changes or updates.
Document data quality and traceability documents for each source interface.
Establish standards of procedures.
Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.

Environment: Windows Enterprise Server 2000, SSRS, SSIS, Crystal Reports, DTS, SQL Profiler, and Query Analyze.

We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Engineer Resume

Columbus, OH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship