We provide IT Staff Augmentation Services!

Data Scientist /machine Learning Consultant Resume

3.00/5 (Submit Your Rating)

Houston, TX

PROFESSIONAL SUMMARY:

  • Over 6 years of experience in Machine Learning, Data - mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization.
  • Experience in coding SQL/PL SQL using Procedures, Triggers, and Packages.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Excellent Knowledge of Relational Database Design, Data Warehouse/OLAP concepts, and methodologies.
  • Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever-evolving regulatory environment.
  • Professional working experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, K-Means Clustering and Association Rules.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
  • Experience with data visualization using tools like Ggplot, Matplotlib, Seaborn, Tableau and using Tableau software to publish and presenting dashboards, storyline on web and desktop platforms.
  • Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations.
  • Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments.
  • Experience in multiple software tools and languages to provide data-driven analytical solutions to decision makers or research teams.
  • Hands - on experience in Machine Learning algorithms such as Linear Regression, GLM, CART, SVM, KNN, LDA/QDA, Naive Bayes, Random Forest, SVM, Boosting, K-means Clustering, Hierarchical clustering, PCA, Feature Selection, Collaborative Filtering, Neural Networks and NLP
  • Familiar with predictive models using numeric and classification prediction algorithms like support vector machines and neural networks, and ensemble methods like bagging, boosting and random forest to improve the efficiency of the predictive model.
  • Worked on Text Mining and Sentimental analysis for extracting the unstructured data from various social Media platforms like Facebook, Twitter, and Reddit.
  • Good Knowledge of NoSQL databases like Mongo DB and HBase.
  • Develop, maintain and teach new tools and methodologies related to data science and high-performance computing.
  • Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, Scikit Learn, Hadoop Map Reduce
  • Expertise in Technical proficiency in Designing, Data Modeling Online Application, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Cluster Analysis, Principal Component Analysis (PCA), Association Rules, Recommender Systems.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Adept in statistical programming languages like R and Python including Big Data technologies like Hadoop, Hive.
  • Hands on experience with RStudio for doing data pre-processing and building machine learning algorithms on different datasets.
  • Collaborated with the lead Data Architect to model the Data warehouse in accordance with FSLDM subject areas, 3NF format, and Snowflake schema.
  • Worked and extracted data from various database sources like Oracle, SQL Server, and DB2.
  • Implemented machine learning algorithms on large datasets to understand hidden patterns and capture insights.
  • Predictive Modelling Algorithms: Logistic Regression, Linear Regression, Decision Trees, K-Nearest Neighbors, Bootstrap Aggregation (Bagging), Naive Bayes Classifier, Random Forests, Boosting, Support Vector Machines.
  • Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Centos5/6, Ubuntu13/14, Cosmos.

TECHNICAL SKILLS:

Languages: Python, R Machine Learning Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbors (K-NN), NLP

OLAP/ BI / ETL Tool: Business Objects 6.1/XI, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle 9i OLAP, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)

Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL Tools Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer. Big Data Technologies spark peg, Hive, HDFS, Map Reduce, Pig, Kafka.

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, SAP HANA.

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

Version Control Tools: SVM, GitHub.

Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.

PROFESIONAL EXPERIENCE:

Confidential, HOUSTON, TX

Data Scientist /Machine Learning Consultant

Responsibilities:

  • Implemented Machine Learning, Computer Vision, Deep Learning and Neural Networks algorithms using TensorFlow and designed Prediction Model using Data Mining Techniques with help of Python, and Libraries like NumPy, SciPy, Matplotlib, Pandas, scikit-learn.
  • Used pandas, NumPy, Seaborn, SciPy, matplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used CaffeDeep Learning Framework.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Powerball, and Smart View.
  • Determined customer satisfaction and helped enhance customer experience using NLP.
  • IBM® Netcool® Operations Insight powered with AI and Machine learning capabilities helps reduce event noise, automatically groups events related to the same.
  • Implemented Agile Methodology for building an internal application. .
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary NameNode, and Map Reduce concepts.
  • Programmed by a utility in Python that used multiple packages (SciPy, NumPy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, PowerBL and Smart View.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, Mongo DB, logistic regression, Hadoop, PySpark, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.

Confidential, MALVERN, PA

Data Scientist /Machine Learning Consultant

Responsibilities:

  • Utilized Spark, Scala, Hadoop, HQL, VQL, oozie, pySpark, Data Lake, TensorFlow, HBase, Cassandra, Redshift, Mongo DB, Kafka, Kinesis, Spark Streaming, Edward, CUDA, MLLib, AWS, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Mat lab.
  • Used the version control tools like Git 2.X and build tools like Apache Maven/Ant
  • Worked on analyzing data from Google Analytics, Ad Words, and Facebook etc.
  • Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like Elastic Search, Kibana.
  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, Date and Time etc.
  • Categorized comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
  • Configuration of AWS in built SNS/SQS setup for job scheduling
  • Used Python scripts to update content in the database and manipulate files
  • Skilled in using dplyr and pandas in R and Python for performing exploratory data analysis.
  • Performed Multinomial Logistic Regression, Decision Tree, Random forest, SVM to classify package is going to deliver on time for the new route.
  • Used Jenkins for Continuous Integration Builds and deployments (CI/CD).
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database and used ETL for data transformation.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Exploring DAG's, their dependencies and logs using Airflow pipelines for automation
  • Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon.
  • Developed Spark/Scala, R Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Tracking operations using sensors until certain criteria is met using Air Flow technology.
  • Responsible for different Data mapping activities from Source systems to Teradata using utilities like TPump, FEXP, BTEQ, MLOAD, FLOAD etc.
  • CICD pipeline implementation for Java applications.
  • CICD implementation on Azure cloud platform.
  • Analyze traffic patterns by calculating autocorrelation with different time lags.
  • Ensured that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
  • Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, MapReduce, Pig and others.
  • Addressed over fitting by implementing of the algorithm regularization methods like L1 and L2.
  • Used Principal Component Analysis in feature engineering to analyze high dimensional data.
  • Used MLlib, Spark's Machine learning library to build and evaluate different models.
  • Implemented rule based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
  • Developed Map Reduce pipeline for feature extraction using Hive and Pig.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: R, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, NLP, Metadata, AWS, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, Ashburn, VA

Data Analyst/Data Scientist

Responsibilities:

  • Worked closely with data mapping SME and QA team to understand the business rules for acceptable data quality standards.
  • Creating tables in database in Aws Redshift Cloud by using SQl WorkBench, and aslo objects like tables, views, procedures, triggers, and functions.
  • Created and provided support on various monitoring and control reports which includes, Customer verification report that accepted offer in sales engine, AMF waiver report, Credit fulfillment report, qualification and offer load volume reconciliation report and upgrade performance monitoring report.
  • Wrote complex SQL queries to identify granularity issues and relationships between data sets and created recommended solutions based on analysis of the query results.
  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, Space-Time.
  • Built an NLP (LDA) model to analyze users' comments and helped to improve mobile bank app design.
  • Wrote the SQL queries on data staging tables and data warehouse tables to validate the data results.
  • Performed data profiling on datasets with millions of rows on Teradata environment, validating key gen elements, ensuring correctness of codes and identifiers, and recommending mapping changes.
  • Performed unit testing on transformation rules to ensure data moved correctly.
  • Developed scalable machine learning solutions within a distributed computation framework (e.g. Hadoop, Spark, Storm etc.).
  • Created Python scripts to take client content documents and images as input and create web pages, including home page, table of contents and links.
  • Developing Models on SCALA and SPARK for users, prediction models, sequential algorithms
  • I worked on GM Dealer Survey report. The project required survey of GM customers who had applied for a credit card through the dealer channel. Also, they wanted to capture the customer experience of GM Card customers by offering the product through the dealer channel to ascertain program compliance and effectiveness.
  • Involved in full life cycle of Business Objects reporting Application.
  • Worked directly with Cloud System Administrators and project managers supporting Amazon Web Services (AWS) migration.
  • Delivered Enterprise Data Governance, Data Quality, Metadata, and ETL Informatica solution.
  • Maintained Excel workbooks, such as development of pivot tables, exporting data from external SQL databases, producing reports and updating spreadsheet information.
  • Used Python's Panda library in the process of analyzing the data
  • Worked on GM Year End Summary Statement (YESS) project.
  • Involved in maintaining and modifying monthly and weekly processes for GM Whirl/Chargeback, Capital One Balance Transfer report, GM Dealer fulfillment, GM ACXIOM, GM Bank Account Management History (BAMH), GM BT campaigns and DB Load GM/UP/IBT.
  • Involved in extracting and transforming of Data from Enterprise data warehousing.
  • Actively involved in communication with Business Analyst, user acceptance testers and BPM for requirements gathering.
  • Engaged in performing various ad-hoc queries.
  • Responsible for creating monthly and weekly MIS's.
  • Implanted various Data transformation such as SQL, extract, Split and data validation.
  • Analyzing and supporting day to day marketing strategy solutions.
  • Researched and fixed data issues pointed out by QA team during regression tests.
  • Interfaced with business users to verify business rules and communicated changes to ETL development team.
  • Created Tableau views with complex calculations and hierarchies making it possible to analyze and obtain insights into large data sets.
  • Created and executed SQL queries to perform Data Integrity testing on a Teradata Database to validate and test data using TOAD.
  • Worked with data architects team to make appropriate changes to the data models.
  • Worked on the ETL Informatica mappings and other ETL Processes (Data Warehouse).
  • Worked with the data governance team to ensure the data quality of compliance reports for EDI transactions.
  • Utilized Tableau server to publish and share the reports with the business users.
  • Experienced in designing complex Drill-Down & amp; Drill-Through Reports using Business Objects.
  • Experienced in creating UNIX scripts for file transfer and file manipulation.
  • Generated ad-hoc or management specific reports using Tableau and Excel.
  • Analyzed the subscriber, provider, members and claim data to continuously scan and create authoritative master data.
  • Prepare the data rules spreadsheet using MS Excel that will be used to update allowed values, findings, and profiling results
  • Performed auditing in the development phase in order to assure data quality and integrity.
  • Validated and tested SAS code for new and existing reports.
  • Provided insight and ideas in support of customer management processes and reporting.
  • Involved in data cleansing mechanism in order to eliminate duplicate and inaccurate data.

Environment: Windows 7, Linux, Tableau desktop, Tableau Server, NLP, Business Objects, AWS, R, SQL Developer, MySQL, MS-Access, MS Excel and SQL.

We'd love your feedback!