We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Engineer Resume

0/5 (Submit Your Rating)

Norfolk, VA

SUMMARY

  • 8+ years of experience in Data Science and Analytics including Machine Learning, Data Mining, Data Blending & Statistical Analysis.
  • Solid experience of 4+ year in MongoDB / NoSQL & JavaScript.
  • Over 5+Experience wif Machine Learning techniques and algorithms (such as k - NN, Naive Bayes, etc).
  • Experience in AWS (Amazon Web Services) EC2, VPC, IAM, IAM, S3, Cloud Front, Cloud Watch, Cloud Formation, Glacier, RDS Config, Route 53, SNS, SQS, Elastic Cache.
  • Having good experience in AmazonWebServices (AWS).
  • An articulate Informatica Data Quality Expert and certified wif vast 8+ years of extensive experience on Informatica Data Quality IDQ and Informatica Power Center wif strong business understanding and knowledge of Extracting, Transforming and Loading of data from heterogeneous source systems like Oracle, SQL Server, Flatfiles, Excel, XML, UDB, Sybase.
  • Extensively worked on Informatica Data Quality (IDQ), definedrules, built scorecards, applied metrics and grouped into various dimensions in IDQ.
  • Azure Cloud Extensive full cycle Cloud Azure experience wif full Big Data, Elastic search and SOLR, Machine Learning and Deep Learning development and deployment. HD Insight, Data Lake, Data Factory, Data Gateway, Machine Learning Studio, PowerBI, Azure Cosmos DB, Cortana Intelligence Suite.
  • Experienced wif machine learning algorithms such as logistic regression, random forest, XGboost, KNN, SVM, neural network, linear regression, lasso regression, and k-means.
  • Expertise in synthesizing Machine learning, Predictive Analytics and BigData technologies like Hadoop, Hive, Pig.
  • Strong skills in statistical methodologies such as A/Btest, experiment design, hypothesis test, ANOVA.
  • Extensively worked on Python 3.5/2.7 (Numpy, Pandas, Matplotlib, NLTK and Scikit-learn).
  • Experience in implementing data analysis wif various analytic tools, such as Anaconda 4.0, Jupiter Notebook 4.X, R 3.0 (ggplot2, Caret, dplyr) and Excel.
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQLS erver 2008, NoSql databases like MongoDB 3.2.
  • Strong experience in BigData technologies like Spark 1.6, Sparksql, pySpark, Hadoop 2.X, HDFS, Hive 1.X.
  • Experience in visualization tools like Tableau 9.X, 10.X, DataBlendingfor creating dashboards.
  • Created a recommendation system using k-means clustering, NLP and Flask to generate a listof potential users and worked on NLP algorithm consist of TF-IDF and LSI on the user reviews.
  • Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
  • Developed predictive models using Decision Tree, Random Forest, Naive Bayes, Logistic Regression, ClusterAnalysis, and Neural Networks.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable wif R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Experienced in Python to manipulate data for data loading and extraction and worked wif python libraries like Matplotlib, Numpy, Scipy and Pandas for Dataanalysis.
  • Hands on Experience in implementing Model View Control (MVC) architecture using Spring, JDK, CoreJava (Collections, OOPS Concepts), JSP, Servlets, Struts, springs, Hibernate, JDBC. and provided Server Administrator duties Logical Position.
  • Strong experience in application development using Java/J2EE technologies which includes implementing Model View Control (MVC) architecture using Spring, JDK1.6, CoreJava (Collections, OOPS Concepts), JSP, Servlets, Struts, springs, Hibernate, Web Services, AJAX, JDBC, HTML, and JavaScript.
  • Worked wif complex applications such as R, SAS, Matlab, and SPSS to develop a neural network, cluster analysis.
  • Experienced in BigData wif Hadoop, HDFS, MapReduce, and Spark.
  • Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, SSRS.
  • Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets create visually powerful and actionable interactive reports and dashboards.
  • Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
  • Worked in a development environment like Git and VM.
  • Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in a collaborative team, a self-motivated enthusiastic learner.

TECHNICAL SKILLS

Data Analytics Tools: Python (numpy, scipy, pandas, Gensim, Keras), R (Caret, Weka, ggplot), MATLAB.

Analysis & Modelling Tools: Erwin, Sybase Power Designer, Oracle Designer, TOAD, Erwin, Rational Rose, ER/Studio, MS Visio, SAS, Django, Flask, pip, NPM, Node JS, Spring MVC.

Data Visualization: Tableau, Visualization packages, Microsoft Office.

Machine Learning: K-NN,K-Means,Kernel SVM, Clustering, Softmax, Gradient Descent, Decision Trees,RandomForest, Simple LinearRegression,Multi-Variate Linear Regression, LogisticRegression, PolynomialRegression,Backprop, Feed Forward ANN, CNN, RNN, and Word2Vec.

Machine Learning Frameworks: Spark ML, Kafka, Spark MiLB, Scikit-Learn & NLTK.

Big Data Tools: Hadoop, Map Reduce, SQOOP, Pig, Hive, NOSQL, Spark, Apache Kafka, Shiny, Yarn, Data Frames, pandas, ggplot2, Sklearn, Theano, Cuda, Azure, HD Insight, etc.

ETL Tools: Informatica Power Centre, Data Stage 7.5, Ab Initio, Talend.

OLAP Tools: MS SQL Analysis Manager, DB2 OLAP, Cognos Power-play.

Programming Languages: C, C++, SQL, PL/SQL, T-SQL, XML, HTML, UNIX Shell Scripting, Microsoft SQL Server, Oracle PLSQL, Python, Scala, AWK, JavaScript.

R Package: dplyr, sqldf, data table, Random Forest, gbm, caret, elastic net and all sortof Machine Learning Packages.

Databases: SQLServer, Linked Servers, DTS Packages, SSIS, PL/SQL, MS SQL Server, DB2 UDB, Teradata, Netezaa, Hyperion Essbase, Sybase ASE, Informix, AWS RDS, Cassandra, and MongoDB, PostgreSQL.

Database Tools: SQLProfiler,SQLQuery Analyzer, SQLAgents,SQLAlerts,DTS Import/Export, SSRS, SSIS, SSAS, Informatica Power Center, OLAP Services, Data Visualization.

Project Execution Methodologies: Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD), Ralph Kimball and Bill Inmon data warehousing methodology.

Tools: & Software: SAS/STAT, SAS/ETS, SAS E-Miner, SPSS, R, Advance R, TOAD, MS Office, BTEQ, Teradata SQL Assistant.

Methodologies: Ralph Kimball, COBOL.

Version Control: Git, SVN.

Reporting Tools: Business ObjectsXIR 2/6.5/5.0/5.1 , Cognos Impromptu 7.0/6.0/5.0, Informatica Analytics Delivery Platform, Micro Strategy, SSRS, Tableau.

Operating Systems: Windows 2007/8, UNIX (Sun-Solaris, HP-UX), Windows NT/XP/Vista, MSDOS.

PROFESSIONAL EXPERIENCE

Confidential, Norfolk, VA

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Provided the architectural leadership in shaping strategic, business technology projects, wif an emphasis on application architecture.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed MapReduce/Spark, Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed randomforest via Python streaming.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, abroadvariety of machinelearning methods includingclassifications, regressions, dimensionallyreduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Well versed wif: Cloud IaaS and PaaS implementations in both private and publicclouds like VM ware, Openstack, Amazon AWS and Cloudfoundry (Pivotal and HP Stackato).
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machinelearningalgorithms such as linearregression, multivariateregression, naiveBayes, RandomForests, K-means, & KNN for dataanalysis.
  • Worked on databasedesign, relationalintegrityconstraints, OLAP, OLTP, Cubes, and Normalization (3N0F) and De-Normalization of the database.
  • Developed MapReduce/SparkPython modules for machinelearning&predictiveanalytics in Hadoop on AWS.
  • Experience in Machine learning using NLP text classification, Churn prediction using Python.
  • Worked on customersegmentation using an unsupervised learning techniqueclustering.
  • Worked wif various Teradata 15 tools and utilities like Teradata View point, MultiLoad, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machinelearningmethods including classifications, regressions, dimensionallyreduction etc.
  • Data analysis, reporting using Tableau Perform numerous data pulling requests using SQL server2012.
  • Developed LINUX Shellscripts by using NZSQL/NZLOAD utilities to load data from flat files to the Netezza database.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Hands on experience in Hadoop ecosystem wif components Hadoop Map Reduce, HDFS, Oozie, HiveQL, Sqoop, HBase, MongoDB, Zookeeper, Pig, and Flume wif M5, CDH3 & 4 clusters and EMR cloud computing wif Amazon Web Services (AWS).

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, Informatica, Java, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.

Confidential, Kansas City, Missouri

Data Scientist

Responsibilities:

  • Involved in extensive hoc reporting, routine operational reporting, and data manipulation to produce routine metrics and dashboards for management
  • Created parameters, action filters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Interacting wif other datascientists and architects, custom solutions for data visualization using tools like a tableau and Packages in Python.
  • Involved in running Map Reduce jobs for processing millions of records.
  • Written complex SQL queries using joins and OLAP functions like Count, CSUM, and Rank etc.
  • The building, publishing customized interactive reports, report scheduling and dashboards using Tableau server.
  • Developed in Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
  • Performing statistical data analysis and data visualization using Python.
  • Worked on creating filters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Created data models in Splunk using pivot tables by analyzing the vast amount of data and extracting key information to suit various business requirements.
  • Created new scripts for Splunk scripted input for the system, collecting CPU and OS data.
  • Implemented data refreshes on Tableau Server for biweekly and monthly increments based on a business change to ensure dat the views and dashboards were displaying the changed data accurately.
  • Developed normalized Logical and Physical database models for designing an OLTP application.
  • Knowledgeable in AWS Environment for loading data files from on prim to Redshift cluster.
  • Performed SQL Testing on AWS Redshift databases.
  • Developed Teradata SQL scripts using OLAP functions like rank and rank over to improve the query performance while pulling the data from large tables.
  • Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.
  • Designed the Data Marts in dimensional data modelling using star and snowflake schemas.
  • Analyzed DataSet wif SAS programming, R and Excel.
  • Publish Interactive dashboards and schedule auto-data refreshes
  • Maintenance of large data sets, combining data from various sources by Excel, Enterprise, and SAS Grid, Access and SQL queries.
  • Created Hive queries dat halped market analysts spot emerging trends by comparing incremental data wif Teradata reference tables and historical metrics.
  • Design and development of ETL processes using Informatica ETL tools for dimension and fact file creation.
  • Develop and automate solutions for a new billing and membership Enterprise data Warehouse including ETL routines, tables, maps, materialized views, and stored procedures incorporating Informatica and Oracle PL/SQL toolsets.
  • Performed analysis of implementing Spark uses Scala and wrote spark sample programs using PySpark.

Environment: SQL/Server, Oracle 10g/11g, MS-Office, Teradata, Informatica, ER Studio, XML, R connector, Python, R, Tableau 9.2.

Confidential, Englewood, Colorado.

Data Scientist/R Developer

Responsibilities:

  • The conducted analysis in assessing customer consuming behaviors and discover the value of customers wif RMF analysis, applied customer segmentation wif clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Collaborated wif data engineers to implement the ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle.
  • Involved in managing backup and restoring data in the live Cassandra Cluster.
  • Used R, Python, and Spark to develop a variety of models and algorithms for analytic purposes.
  • Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
  • Developed personalized product recommendation wif Machine learning algorithms, including Gradient Boosting Tree and Collaborative filtering to better meet the needs of existing customers and acquire new customers.
  • Used Python and Spark to implement different machine learning algorithms, including Generalized Linear Model, Random Forest, SVM, Boosting and NeuralNetwork.
  • Evaluated parameters wif K-Fold Cross Validation and optimized performance of models.
  • Worked on benchmarking Cassandra Cluster using the Cassandra stress tool.
  • A highly immersive Data Science program involving Data Manipulation and Visualization, Web Scraping, Machine Learning, GIT, SQL, UNIXCommands, Python programming, No SQL.
  • Worked on data cleaning, data preparation, and feature engineering wif Python, including Numpy, Scipy, Matplotlib, Seaborn, Pandas, and Scikit-learn.
  • Identified risk level and eligibility of new insurance applicants wif Machine Learning algorithms.
  • Determined customer satisfaction and halped enhance customer using NLP.
  • Utilized SQL and HiveQL to query, manipulate data from variety data sources including Oracle and HDFS, while maintaining data integrity.
  • Performed datavisualization and Designeddashboards wif Tableau and D3.js and provided complexreports, includingcharts, summaries, and graphs to interpret the findings to the team and stakeholders.

Environment: R, MATLAB, MongoDB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning), Python, Spark (MLlib, PySpark), Tableau, Micro Strategy, SAS, Tensor Flow, regression, logistic regression, Hadoop 2.7, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce.

Confidential, New York, New York.

Data Analyst

Responsibilities:

  • Participated in JAD sessions, gatheird information from BusinessAnalysts, end users and other stakeholders to determine the requirements.
  • Worked in Data warehousing methodologies/Dimensional Data modeling techniques such as Star/Snowflake schema using ERWIN 9.1.
  • Hands on Experience in Cloud Computing such as Azure storage, Compute, Databases SQL, Document DB (Cosmos), Data lake store & analytics, Data factory, HD Insight, StreamAnalytics.
  • Extensively used Aginity Netezza workbench to perform various DDL, DML etc. operations on Netezza database.
  • Designed the Data Warehouse and MDM hub Conceptual, Logical and Physical data models.
  • Performed Daily Monitoring of Oracle instances using Oracle Enterprise Manager, ADDM, TOAD, monitorusers, tablespaces, memorystructures, rollbacksegments, logs, and alerts.
  • Used Normalization methods up to 3NF and De-normalization techniques for TEMPeffective performance in OLTP and OLAP systems.
  • Generated DDL scripts using Forward Engineering technique to create objects and deploy them into the databases.
  • Worked on database testing, writing complex SQL queries to verify the transactions and business logic like identifying the duplicate rows by using SQL Developer and PL/SQL Developer.
  • Used Teradata SQL Assistant, Teradata Administrator, PMON and data load/export utilities like BTEQ, FastLoad, MultiLoad, Fast Export, TPump on UNIX/Windows environments and running the batch process for Teradata.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
  • The migrated database from legacy systems, SQL server to Oracle and Netezza.
  • Used SSIS to create ETL packages to validate, extract, transform and load data to pull data from Source servers to staging database and then to Netezza Database and DB2 Databases.
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (SQL Server Analysis Services) and SSRS (SQL Server Reporting Services).

Environment: ER Studio, Teradata13.1, SQL, PL/SQL, BTEQ, DB2, Oracle, MDM, Netezza, ETL, RTF UNIX, SQL Server2010, Informatica, SSRS, SSIS, SSAS, SAS, Aginity.

Confidential

Data Modeler

Responsibilities:

  • Analyze business information requirements and model class diagrams and/or conceptual domain models.
  • Managed the project requirements, documents and use cases by IBM Rational Requisite Pro.
  • Assisted in building an Integrated Logical DataDesign, propose physical database design for building the datamart.
  • Gather&ReviewCustomerInformationRequirements for OLAP and building the data mart.
  • Worked wif BTEQ to submit SQL statements, import and export data, and generate reports in Terradata.
  • Calculated and analyzed claims data for provider incentive and supplemental benefit analysis using Microsoft Access and Oracle SQL.
  • Designed email marketing campaigns and also created responsive web forms dat saved data into a database using Python/ Django Framework.
  • Worked on Hadoopsinglenode, Apache spark, Hive installations.
  • Installation, Configuration, Integration, Tuning, Backup, Crash recovery, Upgrades, Patching, Monitoring System Performance, System and Network Security and Trouble shooting of Linux/UnixServers.
  • Implemented public segmentation using unsupervised machine learning algorithms by implementing the k-means algorithm using Pyspark.

Environment: SQL Server 2008, R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer.

Confidential

Data Analyst/Data Architectur

Responsibilities:

  • Worked wif internal architects, assisting in the development of current and target state data architectures.
  • Implementation of Metadata Repository, Transformations, Maintaining Data Quality, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans.
  • Define the list codes and code conversions between the source systems and the data mart.
  • Involved in defining the source of business rules, target data mappings, data definitions.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Performed data quality in Talend Open Studio.
  • Enterprise Metadata Library wif any changes or updates.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Coordinated meetings wif vendors to define requirements and system interaction agreement documentation between client and vendor system.

Environment: Windows Enterprise Server 2000, SSRS, SSIS, Crystal Reports, DTS, SQL Profiler, and Query Analyze.

We'd love your feedback!