Data Scientist Resume
Cedar Rapids, IA
SUMMARY
- 8+ years of experience in IT and comprehensive industry knowledge on Machine Learning, Statistical Modeling, Data Analysis, Predictive Analysis, Data Manipulation, Data Mining, Data Visualization and Business Intelligence.
- Proficient at building robust Machine Learning, Deep Learning models, LSTM using Tensor Flow and Keras. Adept in analyzing large datasets using Apache Spark, PySpark, Spark ML and Amazon Web Services (AWS).
- Experience in performing Feature Selection, Linear Regression, Logistic Regression, k - Means Clustering, Classification, Decision Tree, Supporting Vector Machines (SVM), Naive Bayes, K-Nearest Neighbors (KNN), Random Forest, and Gradient Descent, Neural Network algorithms to train and test the huge data sets.
- Experienced with Machine Learning, Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools.
- Adept in statistical programming languages like Python, R and SAS including Big Data technologies like Hadoop, Hive, HDFS, MapReduce and NoSQL Based Databases.
- Expertise in Python data extraction and data manipulation, and widely used python libraries like NumPy, Pandas, and Matplotlib for data analysis.
- Proficient in designing and creating various Data Visualization Dashboards, worksheets and analytical reports to help users to identify critical KPIs and facilitate strategic planning in the organization utilizing Tableau Visualizations according to the end user requirements.
- Extensively worked on other machine learning libraries such as Seaborn, Scikit learn, SciPy for machine learning and familiar working with TensorFlow, NLTK for deep learning.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
- Exposed to the manipulating large data sets, by using R Packages like tidyr, tidy verse, dplyr reshape, lubridate, Caret and data visualization using ggplot2 packages.
- Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio, SSAS, SSIS and SSRS.
- Quick learner having strong business domain knowledge and can communication the business data insights easily with technical and nontechnical clients.
TECHNICAL SKILLS
PROGRAMMING LANGUAGES: Python 2.x/3.x, R, Scala, SQL, PL/SQL, T-SQL, Spark, Hive, Sqlit3, Java, PHP, XML
SCRIPTING: JavaScript, AngularJS, NodeJS, Shell Scripting (Linux)
FRAMEWORKS, PACKAGES AND OTHER: OpenCV, Django web framework, Scikit-learn, pandas, json, NumPy, SciPy, mechanize, Beautifulsoup4, MNE, Caffe, NLP, Google ML, ggplots2
DATABASE PLATFORMS: Oracle (10g, 11g, 12c), Hadoop/Map Reduce, Spark, Big data, PDI, Azure, AWS (S3/EC2 CLOUD PLATFORMS)
REPORTING TOOLS: Tableau, Pentaho, Google Cloud Prediction API, MLBase, R-Shiny
MACHINE LEARNING: Linear Regression, SVM, KNN, Naive Bayes, Logistic regression, CART, Random Forest, K-means clustering, Hierarchical clustering, TensorFlow, Caffe, Neon.
DATA ANALYSIS / STATISTICAL ANALYSIS: Hypothesis Test, ANOVA, Survival Analysis, Longitudinal Analysis, Experimental Design and Sample Determination, A/B Test, Z-test, T-test.
PROFESSIONAL EXPERIENCE
DATA SCIENTIST
Confidential, CEDAR RAPIDS, IA
Responsibilities:
- Worked collaboratively with other engineers, data scientists, analytics teams, and business product owners in an agile environment.
- Designed, planned and implemented for existing On-Prem applications to Azure Cloud (ARM), Configured and deployed Azure Automation Scripts utilizing Azure stack (Compute, Web and Mobile, Blobs, ADF, Resource Groups, Azure Data Lake, HDInsight Clusters, Azure Data Factory, Azure SQL, Cloud Services and ARM), Services and Utilities focusing on Automation.
- Launched and migrated SQL server database to Azure SQL Database using SQL Azure migration wizard and have knowledge working with Azure Cosmo Database, Azure Database for MariaDB for scenarios with high availability and pioneered on Azure Monitor for Apps Insights, Log Analysis, Health Monitoring Developed applications of Machine Learning, Statistical Analysis, and Data Visualizations with challenging data Processing problems in sustainability and biomedical domain.
- Designed robust, reusable, and scalable data driven solutions and data pipeline frameworks to automate the ingestion, processing and delivery of both structured and unstructured batch and real-time streaming data.
- Built data APIs and data delivery services to support critical operational processes, analytical models and machine learning applications.
- Assisted in selection and integration of data related tools, frameworks, and applications required to expand platform capabilities.
- Extensive experience with T-SQL in constructing Triggers, Tables, implementing stored Procedures, Functions, Views, User Profiles, Data Dictionaries and Data Integrity.
- Used advanced features of T-SQL in order to design and tune T-SQL to interface with the Database and other applications in the most efficient manner and created stored Procedures for the business logic using T-SQL.
- Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
- Built and analyzed datasets using R, SAS, MATLAB, and Python (in decreasing order of usage).
- Designed and developed Natural Language Processing models for sentiment analysis.
- Worked on Natural Language Processing with NLTK module of python for application development for automated customer response.
- Applied concepts of probability, distribution and statistical inference on given dataset to unearth interesting findings through the use of comparison, T-test, F-test, R-squared, P-value etc.
- Applied linear regression, multiple regression, ordinary least square method, mean-variance, the theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, Naive Bayes, fitting function etc. to data with help of Scikit, SciPy, NumPy and Pandas module of Python.
- Applied clustering algorithms i.e. Hierarchical, K-means with help of Scikit and SciPy.
- Developed visualizations and dashboards using ggplot, Tableau.
- Applied linear regression in Python and SAS to understand the relationship between different attributes of dataset and causal relationship between them.
- Pipelined (ingest/clean/munge/transform) data for feature extraction toward downstream classification.
- Wrote Hive queries for data analysis to meet the business requirements.
- Expertise in Business Intelligence and data visualization using R and Tableau.
- Identified patterns, data quality issues, and opportunities and leveraged insights by communicating opportunities with business partners.
Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/SciPy/NumPy/Pandas), R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau.
DATA ENGINEER
Confidential, PHILADELPHIA, PA
Responsibilities:
- Worked with large, complex data sets. Solve difficult, non-routine analysis problems, applying advanced analytical methods as needed. Conducted end-to-end analysis that includes data gathering and requirements specification, processing, analysis, ongoing deliverables, and presentations.
- Built and performed prototype analysis pipelines iteratively to provide insights at scale. Developed comprehensive understanding of Confidential data structures and metrics, advocating for changes where needed for both products development and business/sales activity.
- Interacted cross-functionally with a wide variety of people and teams. Worked closely with engineers to identify opportunities for, design, and assess improvements to Confidential products.
- Created custom Azure objects (VNETs, Subnets, VM, Storage Accounts) using JASON templates and PowerShell for the automation process and Implemented migration of on premise to Windows Azure using Azure Site Recovery and Azure backups
- Involved segregating the Azure services as part of sprint planning and preparing the hardening checklist for each Azure service. Creating new Azure ARM templates and artifacts to update the existing PAAS services as per the security standards.
- Initiated build and release pipelines using Azure DevOps and orchestrating deployment of applications and Configured Azure Networks with Azure Network Watcher, Implemented Azure site Recovery, Azure stack, Azure Backup and Azure Automation.
- Deployed Azure IaaS virtual machines (VMs) and Cloud services (PaaS role instances) into secure V Nets and subnets. Involved in migrating on premise cloud storage to Windows Azure using Azure Site Recovery and Azure backups.
- Experience in using Azure Media and Content delivery, Azure Networking, Azure Hybrid integration, Azure Identity and Access Management, Azure Data Factory and Storage, Azure compute services and Azure Web apps. Used MLlib, Spark's Machine learning library to build and evaluate different models.
- Involved in creating Data Lake by extracting customer's Big Data from various data sources into Hadoop HDFS. This included data from Excel, Flat Files, Oracle, SQL Server, Mongo DB, Cassandra, HBase, Teradata, Netezza and log data from servers.
- Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
- Excellent T-SQL development skills to write complex queries involving multiple tables, great ability to develop and maintain stored procedures, triggers, user defined functions.
- Developed database triggers and stored procedures using T-SQL cursors and tables.
- Used R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems
- Worked on predictive and what-if analysis using R from HDFS and successfully loaded files to HDFS from Teradata and loaded from HDFS to HIVE.
- Designed the schema, configured and deployed AWS Redshift for optimal storage and fast retrieval of data.
- Analyzed data and predicted end customer behaviors and product performance by applying machine learning algorithms using Spark MLlib.
- Performed data mining on data using very complex SQL queries and discovered pattern and Used extensive SQL for data profiling/analysis to provide guidance in building the data model.
Environment: R, Machine Learning, Teradata 14, Hadoop Map Reduce, PySpark, Spark, R, Spark MLLib, Tableau, Informatica, SQL, Excel, AWS Redshift, Scala NLP, Cassandra, Oracle, MongoDB, Informatica MDM, Cognos, SQL Server 2012, Teradata, DB2, SPSS, T-SQL, PL/SQL, Flat Files, XML, and Tableau.
DATA ENGINEER
Confidential - SANTA MONICA, CA
Responsibilities:
- The conducted analysis in assessing customer consuming behaviors and discover the value of customers with RMF analysis, applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
- Collaborated with data engineers to implement the ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle.
- Involved in managing backup and restoring data in the live Cassandra Cluster.
- Used R, Python, and Spark to develop a variety of models and algorithms for analytic purposes.
- Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
- Used to work on different types of storage accounts such as blob file disk etc to store the data for high availability and durability.
- Write ARM templates to roll out infrastructure as code and provide automation for developing infrastructure such as vms, databases.
- Used Python and Spark to implement different machine learning algorithms, including Generalized Linear Model, Random Forest, SVM, Boosting and Neural Network.
- Evaluated parameters with K-Fold Cross Validation and optimized performance of models.
- Worked on benchmarking Cassandra Cluster using the Cassandra stress tool.
- Used DTS/SSIS and T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data marts and performed action in XML
- A highly immersive Data Science program involving Data Manipulation and Visualization, Web Scraping, Machine Learning, GIT, SQL, UNIX Commands, Python programming, NoSQL.
- Worked on data cleaning, data preparation, and feature engineering with Python, including NumPy, SciPy, Matplotlib, Seaborn, Pandas, and Scikit-learn.
- Identified risk level and eligibility of new insurance applicants with Machine Learning algorithms.
- Determined customer satisfaction and helped enhance customer using NLP.
- Utilized SQL and HiveQL to query, manipulate data from variety data sources including Oracle and HDFS, while maintaining data integrity.
- Performed data visualization and Designed dashboards with Tableau and D3.js and provided complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
Environment: R, MATLAB, MongoDB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning, Python, Spark (MLlib, PySpark), Tableau, Micro Strategy, SAS, Tensor Flow, regression, logistic regression, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON and XML.
DATA ENGINEER
Confidential
Responsibilities:
- Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC
- Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
- Used SAX and DOM parsers to parse the raw XML documents
- Used RAD as Development IDE for web applications.
- Preparing and executing Unit test cases
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements for the front-end modules.
- Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX
- Configured the project on WebSphere 6.1 application servers
- Implemented the online application by using Core JDBC, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
- Maintenance in the testing team for System testing/Integration/UAT
- Guaranteeing quality in the deliverables.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Was a part of the complete life cycle of the project from the requirements to the production support
- Created test plan documents for all back-end database modules
- Implemented the project in Linux environment.
Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLlib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, HIVE, AWS.
DATA ANALYST
Confidential
Responsibilities:
- Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
- Involved in defining the source to target data mappings, business rules, data definitions.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Responsible for defining the key identifiers for each mapping/interface.
- Responsible for defining the functional requirement documents for each source to target interface.
- Document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team.
- Work with users to identify the most appropriate source of record and profile the data required for sales and service.
- Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Involved in defining the business/transformation rules applied for sales and service data.
- Define the list codes and code conversions between the source systems and the data mart.
- Worked with internal architects and, assisting in the development of current and target state data architectures.
- Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
- Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
Environment: Python, R Studio, Erwin, Tableau, MDM, QlikView, MLlib, PL/SQL, HDFS, JSON, MapReduce, Spark, HIVE, AWS.