We provide IT Staff Augmentation Services!

Data Scientist/ Machine Learning Engineer/ Python Developer Resume

5.00/5 (Submit Your Rating)

Houston, TexaS

SUMMARY:

  • Above 6+ years of experience in Machine Learning, Data Mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive Modelling, Data Visualization.
  • Designing of Physical Data Architecture of New system engines.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau .
  • Having good experience in NLP with Apache, Hadoop and Python .
  • Hands on SparkMlib utilities such as Including Classification, Regression, Clustering, Collaborative Filtering, Dimensionality Reduction .
  • Proficient in Statistical Modelling and Machine Learning techniques ( Linear, Logistics, Decision Trees, Random Forest, SVM, K - Nearest Neighbours, Bayesian, XG Boost ) in Forecasting / Predictive Analytics, Segmentation Methodologies, Regression based models, Hypothesis testing, Factor analysis / PCA, Ensembles .
  • Hands on experience in implementing LDA, Naïve Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Neural Networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Developing Logical Data Architecture with adherence to Enterprise Architecture .
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Adept in statistical programming languages like R and also Python including Big Data technologies like Hadoop, Hive .
  • Strong experience in Software Development Life Cycle ( SDLC ) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Experience working with data modelling tools like Erwin, Power Designer and ER Studio .
  • Skilled in using dplyr and pandas in R and python for performing exploratory data analysis.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Experience in designing star schema, Snow flake schema for Data Warehouse, ODS architecture .
  • Experience in designing and developing the Tableau and updating the existing desktop, developing ad - hoc reports, scheduling the processes and administering the tableau activities using tableau .
  • Experienced in designing customized interactive dashboards in Tableau using Marks, Action, Filters, Parameter and Calculations .
  • Good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like BTEQ, Fast Load, Multi Load, Fast Export .
  • Experience and Technical proficiency in Designing, Data Modelling Online Applications, Solution Lead for Architecting Data Warehouse / Business Intelligence Applications .
  • Experience in maintaining database architecture and metadata that support the Enterprise Data warehouse .
  • Prediction - Prediction of a numerical value using Regression or CART .
  • Experience with Data Analytics, Data Reporting, Ad - hoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting.
  • Highly skilled in using visualization tools like Tableau, ggplot2 and d3 . js for creating dashboards.
  • Highly skilled in using Hadoop ( Pig and Hive ) for basic analysis and extraction of data in the infrastructure to provide data summarization.

TECHNICAL SKILLS:

Machine Learning: Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbours (K-NN).

Erwin r, ER/Studio, Star: Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables, Pivot Tables.

Big Data Technologies: Hadoop, Hive, HDFS, Map Reduce, Pig, Kafka.

OLAP/ BI / ETL Tool: Business Objects 6.1/XI, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle 9i OLAP, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, Qlik View, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse.

Packages: ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numPy, seaborn, sciPy, matplot lib, scikit-learn, Beautiful Soup, Rpy2, sqlalchemy.

Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Languages: SQL, PL/SQL, ASP, Visual Basic, XML, SAS, Python, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML, Shell Scripting, PERL, R, Matlab, Scala.

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, SAP HANA.

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

ETL Tools: Informatica Power Centre, SSIS.

VersionControlTools: SVM, GitHub.

Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.

PROFESSIONAL EXPERIENCE:

Confidential, Houston, Texas

Data Scientist/ Machine Learning Engineer/ Python Developer

Responsibilities:

  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest .
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, MySQL, Mongo DB, Hadoop .
  • Designing and develop Tableau Reports, Documents, Dashboards for specified requirements and timelines .
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, numpy, sea born, scipy, matplotlib, scikit - learn, NLTK in Python for developing various machine learning algorithms .
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7 .
  • Purchasing, Setting up and configuring a Tableau Server and MS - SQL 2008 R2 server for Data warehouse purpose .
  • Preparing Dashboards using calculations, parameters in Tableau .
  • Designed, developed and implemented Tableau Business Intelligence reports .
  • Participated in all phases of data mining ; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis .
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View .
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
  • As Architect delivered various complex OLAP databases / cubes, scorecards, dashboards and reports .
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes .
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Experience in Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Extracted data from HDFS and prepared data for exploratory analysis using data mugging .

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, Machine learning, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, Map Reduce.

Confidential, Birmingham, Alabama.

Data Scientist/ Machine Learning Engineer/ Python Developer

Responsibilities:

  • Extracted data from HDFS and prepared data for exploratory analysis using data munging
  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
  • Participated in all phases of data mining, data cleaning, data collection, developing models, validation, visualization, and performed Gap analysis.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Mongo DB, Hadoop .
  • Setup storage and data analysis tools in AWS cloud computing infrastructure .
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python .
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Used pandas, numpy, sea born, matplotlib, scikit-learn, scipy, NLTK in Python for developing various machine learning algorithms.
  • Data Manipulation and Aggregation from different source using Nexus, Business Objects, Toad, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Coded proprietary packages to analyse and visualize SPC file data to identify bad spectra and samples to reduce unnecessary procedures and costs.
  • Programmed a utility in Python that used multiple packages ( numpy, scipy, pandas )
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, Naive Bayes, KNN.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports .
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.

Environment: SQL/Server, Oracle 10g/11g, MS-Office, Teradata, Informatica, ER Studio, XML, R connector, Python, R, Tableau 9.2.

Confidential, Columbus, Ohio.

Data Scientist/ Python Developer

Responsibilities:

  • Utilized Spark , Scala, Hadoop, HQL, VQL, oozie, pySpark, Data Lake, Tensor Flow, HBase, Cassandra, Redshift, Mongo DB, Kafka, Kinesis, Spark Streaming, Edward, CUDA, MLLib, AWS, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Application of various machine learning algorithms and statistical modelling like decision trees, text analytics, natural language processing ( NLP ), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Mat lab.
  • Worked onanalysing data from Google Analytics, Ad Words, Facebook, etc.
  • Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like Elastic Search, Kibana .
  • Performed Data Profiling to learn about behaviour with various features such as traffic pattern, location, Date and Time etc.
  • Categorized comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics
  • Performed Multinomial Logistic Regression, Decision Tree, Random forest, SVM to classify package is going to deliver on time for the new route.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve datafrom Oracle database and used ETL for data transformation.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python .
  • Exploring DAG's, their dependencies and logs using Air Flow pipelines for automation
  • Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon .
  • Developed Spark/Scala, R, and Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Used clustering technique K-Means to identify outliers and to classify un-labelled data.
  • Tracking operations using sensors until certain criteria is met using Air Flow technology .
  • Responsible for different Data mapping activities from Source systems to Teradata using utilities like TPump, FEXP,BTEQ, MLOAD, FLOAD etc.
  • Analyse traffic patterns by calculating autocorrelation with different time lags.
  • Ensured that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
  • Addressed over fitting by implementing of the algorithm regularization methods like L1 and L2 .
  • Used Principal Component Analysis in feature engineering to analyse high dimensional data.
  • Used MLlib, Spark's Machine learning library to build and evaluate different models.
  • Implemented rule based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behaviour.
  • Developed Map Reduce pipeline for feature extraction using Hive and Pig .
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau .
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: Python 2.x, CDH5, HDFS, Hadoop 2.3, Hive, Impala, AWS, Linux, Spark, Tableau Desktop, SQL Server 2014, Microsoft Excel, Matlab, Spark SQL, Pyspark.

Confidential, Phoenix, Arizona.

Data Analyst/Data Architecture

Responsibilities:

  • Worked with BI team in gathering the report requirements and also Sqoop to export data into HDFS and Hive
  • Involved in the below phases of Analytics using R, Python and Jupyter notebook. a. Data collection and treatment: Analysed existing internal data and external data, worked on entry errors, classification errors and defined criteria for missing values b. Data Mining: Used cluster analysis for identifying customer segments, Decision trees used for profitable and non-profitable customers, Market Basket Analysis used for customer purchasing behaviour and part/product association.
  • Developed multiple Map Reduce jobs in Java for data cleaning and pre-processing.
  • Assisted with data capacity planning and node forecasting.
  • Installed, Configured and managed Flume Infrastructure.
  • Administrator for Pig, Hive and HBase installing updates patches and upgrades.
  • Worked closely with the claims processing team to obtain patterns in filing of fraudulent claims.
  • Worked on performing major upgrade of cluster from CDH3u6 to CDH4.4.0
  • Developed Map Reduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop .
  • Patterns were observed in fraudulent claims using text mining in R and Hive .
  • Exported the data required information to RDBMS using Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
  • Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW .
  • Created tables in Hive and loaded the structured (resulted from Map Reduce jobs) data
  • Using Hive QL developed many queries and extracted the required information.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.

Environment: HDFS, PIG, HIVE, Map Reduce, Linux, HBase, Flume, Sqoop, R, VMware, Eclipse, Cloud era, and Python.

We'd love your feedback!