We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Engineer Resume

5.00/5 (Submit Your Rating)

Boston, MA

SUMMARY

  • 7+ years of experience in Machine Learning, Data - mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever-evolving regulatory environment.
  • Experience in building models with deep learning frameworks like TensorFlow, PyTorch, and Keras.
  • Professional working experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, K-Means Clustering and Association Rules.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
  • Experience with data visualization using tools like Ggplot, Matplotlib, Seaborn, Tableau and using Tableau software to publish and presenting dashboards, storyline on multiple platforms.
  • Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, Scikit Learn, Hadoop Map Reduce.

TECHNICAL SKILLS

Languages: Python, R Machine Learning Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbors (K-NN), NLP

Databases: SQL Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, Oracle, SAP HANA, SQL, Impala, Pig, Spark

Reporting Tools: Tableau, Crystal Reports XI, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, Azure Data Warehouse

PROFESSIONAL EXPERIENCE

Confidential, Boston, MA

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Implement Machine Learning, Computer Vision, Deep Learning and Neural Networks algorithms using TensorFlow and designed Prediction Model using Data Mining Techniques with help of Python, and Libraries like NumPy, SciPy, Matplotlib, Pandas, scikit-learn.
  • Use pandas, NumPy, Seaborn, SciPy, matplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms.
  • Implemented an automated, cost-effective service to load data from AWS into Snowflake
  • Used Snowflake for integration with AWS Glue to flexibly manage data transformation and ingestion pipelines.
  • Determine customer satisfaction and helped enhance customer experience using NLP.
  • Work on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Work as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7.
  • Participate in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Used Terraform to automate and manage the infrastructure and services running on organizational platform.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Powerball, and Smart View.
  • IBM® Netcool® Operations Insight powered with AI and Machine learning capabilities helps reduce event noise, automatically groups events related to the same.
  • Utilizing Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary NameNode, and Map Reduce concepts.
  • Programmed a utility in Python that used multiple packages (SciPy, NumPy, pandas).
  • Install and use Caffe Deep Learning Framework.
  • Work as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7.
  • Update Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Handle importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Design both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, Mongo DB, logistic regression, Hadoop, PySpark, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS

Confidential

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Utilized Spark, Scala, Hadoop, HQL, VQL, oozie, pySpark, Data Lake, TensorFlow, HBase, Cassandra, Redshift, Mongo DB, Kafka, Kinesis, Spark Streaming, Edward, CUDA, MLLib, AWS, Azure, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in Python, Mat lab.
  • Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like Elastic Search, Kibana.
  • Ensured that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, Date and Time.
  • Categorized comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
  • Configuration of AWS in built SNS/SQS setup for job scheduling.
  • Used Python scripts to update content in the database and manipulate files.
  • Skilled in using dplyr and pandas in R and Python for performing exploratory data analysis.
  • Performed Multinomial Logistic Regression, Decision Tree, Random forest, SVM to classify package is going to deliver on time for the new route.
  • Used Git 2.x for version control, Ant/Maven for builds, and Jenkins for CI/CD.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.
  • Performed Data Cleaning, features scaling, features engineering using pandas and Numpy packages in python.
  • Explored DAG's, their dependencies and logs using Airflow pipelines for automation.
  • Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon.
  • Developed Spark/Scala, R Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Tracked operations using sensors until certain criteria is met using Air Flow technology.
  • Responsible for different Data mapping activities from Source systems to Teradata using utilities like TPump, FEXP, BTEQ, MLOAD, FLOAD etc.
  • CICD pipeline implementation for Java applications and Azure cloud platform.
  • Analyzed traffic patterns by calculating autocorrelation with different time lags.
  • Addressed over fitting by implementing of the algorithm regularization methods like L1 and L2.
  • Used Principal Component Analysis in feature engineering to analyze high dimensional data.
  • Used MLlib, Spark's Machine learning library to build and evaluate different models.
  • Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
  • Developed Map Reduce pipeline for feature extraction using Hive and Pig.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.

Environment: R, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, NLP, Metadata, AWS, MS Excel, MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB

Confidential, Livonia, MI

Data Analyst/Data Scientist

Responsibilities:

  • Worked closely with data mapping SME and QA team to understand the business rules for acceptable data quality standards.
  • Created tables in database in AWS Redshift Cloud by using SQl WorkBench, and objects like tables, views, procedures, triggers, and functions.
  • Created and provided support on various monitoring and control reports which includes, Customer verification report that accepted offer in sales engine, AMF waiver report, Credit fulfillment report, qualification and offer load volume reconciliation report and upgrade performance monitoring report.
  • Wrote complex SQL queries to identify granularity issues and relationships between data sets and created recommended solutions based on analysis of the query results.
  • Built an NLP (LDA) model to analyze users' comments and helped to improve mobile bank app design.
  • Performed data profiling on datasets with millions of rows on Teradata environment, validating key gen elements, ensuring correctness of codes and identifiers, and recommending mapping changes.
  • Performed unit testing on transformation rules to ensure data moved correctly.
  • Created Python scripts to take client content documents and images as input and create web pages, including home page, table of contents and links.
  • Involved in full life cycle of Business Objects reporting Application.
  • Worked directly with Cloud System Administrators and project managers supporting Amazon Web Services (AWS) migration.
  • Delivered Enterprise Data Governance, Data Quality, Metadata, and ETL Informatica solution.
  • Maintained Excel workbooks, such as development of pivot tables, exporting data from external SQL databases, producing reports and updating spreadsheet information.
  • Used Python's Panda library in the process of analyzing the data.
  • Involved in extracting and transforming of Data from Enterprise data warehousing.
  • Implanted various Data transformation such as SQL, extract, Split and data validation.
  • Created Tableau views with complex calculations and hierarchies making it possible to analyze and obtain insights into large data sets.
  • Created and executed SQL queries to perform Data Integrity testing on a Teradata Database to validate and test data using TOAD.
  • Worked on the ETL Informatica mappings and other ETL Processes (Data Warehouse).
  • Utilized Tableau server to publish and share the reports with the business users.
  • Experienced in designing complex Drill-Down & amp; Drill-Through Reports using Business Objects.
  • Experienced in creating UNIX scripts for file transfer and file manipulation.
  • Generated ad-hoc or management specific reports using Tableau and Excel.
  • Analyzed the subscriber, provider, members and claim data to continuously scan and create authoritative master data.
  • Prepared the data rules spreadsheet using MS Excel that will be used to update allowed values, findings, and profiling results.
  • Validated and tested SAS code for new and existing reports.

Environment: Windows 7, Linux, Tableau desktop, Tableau Server, NLP, Business Objects, AWS, R, SQL Developer, MySQL, MS-Access, MS Excel and SQL

Confidential, San Diego, California

Data Analyst

Responsibilities:

  • Worked with Data governance, Data quality, data lineage, Data architect to design various models and processes.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using Informatica, Tableau and business objects.
  • Designed, developed, tested, and maintained Tableau functional reports based on user requirements.
  • Mastered the ability to design and deploy rich Graphic visualizations using Tableau and converted existing Business objects reports into tableau dashboards.
  • Created scripts for system administration and AWS using languages such as BASH and Python.
  • Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into target database.
  • Created mappings using Designer and extracted data from various sources, transformed data according to the requirement.
  • Involved in extracting the data from the Flat Files and Relational databases into staging area.
  • Developed Informatica Mappings and Reusable Transformations to facilitate timely Loading of Data of a star schema.
  • Developed the Informatica Mappings by usage of Aggregator, SQL overrides usage in Lookups, source filter usage in Source qualifiers, and data flow management into multiple targets using Router.
  • Created Sessions and extracted data from various sources, transformed data according to the requirement and loading into data warehouse.
  • Used various transformations, including Filter, Expression, Sequence Generator, Update Strategy, Joiner, Router and Aggregator to create robust mappings in Informatica Power Center Designer.
  • Imported various heterogeneous files using Informatica Power Center 8.x Source Analyzer.

Environment: SAS/Base, SAS/Connect, SAS/UNIX, SAS/ODS, SAS/Macros, SQL, Tableau, MS Excel, Power Point, DB2, Teradata, SAS Enterprise guide.

Confidential

SQL Developer

Responsibilities:

  • Implemented Side by Side Migration of MS SQL Server 2000 on Windows 2008 server.
  • Installed, Configured and Maintained SQL Server 2008R2/20 bit) on Active/Active Cluster with latest Service Packs, Hot Fixes.
  • Used the 2008 DMV's for monitoring the fragmentation of indexes, blocks, page splits, Log space.
  • Installed and configured Microsoft SQL2 node Clustering (Active/Passive).
  • Created documentation to install SQL Server, apply service pack, hot fix and clustering installation.
  • Tuning queries which are running slow using Profiler and Statistics by using different Methods in terms of evaluating joins, indexes, updating Statistics and code modifications.
  • Experienced in CDC (Change data Capture) Process.
  • Experienced in configuring PowerShell environment for SQL support.
  • Implemented new T-SQL features added in SQL Server 2005 that are Data partitioning, Error handling through Try- Catch statement, Common Table Expression (CTE).
  • Worked in Active passive and Active cluster environment and Installed and Configured more than 50 Clustered SQL Instances.
  • Installed Service packs and Builds and Worked with SQL Server 2008 in place and side by side upgrade from SQL server 2005.
  • Managed users including creation/alteration, grant of system/dB roles and permission on various database objects.
  • Worked on setting up Transactional Replication (push and pull) and Merge Replication and troubleshooting.
  • Checked the performance issues on servers by using profiler, perform, DBCC and DMVs.
  • Installed and Configured SSRS and Deploying SSIS packages.
  • Managed schema objects like Triggers, cursors, indexes, procedures.

Environment: SQL Server 2008R 2/2008/2005/2000 , Lite Speed, MS Visio, Microsoft SQL Server Visual Studio 2005/2008, MS Business Intelligence Development Studio 2008

We'd love your feedback!