We provide IT Staff Augmentation Services!

Data Scientist Resume

Chicago, IL


  • A data science enthusiast having 6+ years of experience, having broad and in - depth knowledge in statistics, python programming with a strong math background
  • Identified areas of improvement in business by uncovering insights by analyzing huge amount of data using various machine learning techniques
  • Effectively utilized analytical applications like Python and R to identify trends and relationships between different chunks of data, to draw meaningful inferences and translate analytical conclusions into risk management and marketing strategies that drive value
  • Strong programming skills in languages like Python, R, SAS and SQL
  • Experience with emerging technologies such as Big Data, Hadoop, and NoSQL
  • Performed in-depth analysis and predictive modelling to unearth hidden opportunities; draw and present insights to the product, sales and marketing teams
  • Proficient in performing data parsing, data manipulation and data preparation with methods as describe data contents, descriptive statistics, regex, split and combine, merge, subset, remap, melt and reshape.
  • Skilled in the integration of various data sources with multiple relational databases like Oracle, MS SQL Server, DB2, Teradata and Flat Files into the staging area, ODS, Data Warehouse and DataMart
  • Data Acquisition, Data Validation, Predictive demonstrating, Data Visualization. Capable in measurable programming languages like R and Python
  • Expertise in using Python libraries and packages in R
  • Experience in Data Extraction/Transformation/Loading Data Conversion and Data Migration using PL/SQL Scripts and SQL Server Integration Services (SSIS)
  • Hands-on experience with Machine learning algorithms such as Regression Analysis, Clustering, Classification, Principal Component Analysis and Data Visualization Tools
  • Integration Architect & Data Scientist experience in Analytics, Bigdata, BPM, SOA, ETL and Cloud technologies
  • Experience with NLP, mining of structured, semi-structured, and unstructured data
  • Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/PCA, Ensembles
  • Experienced in developing data models and processing data through big data frameworks like HDFS, Hive and Spark to access streaming data and implementing data pipelines to process real-time data and recommendations
  • Experience in Data Profiling, analysis by implementing appropriate database standards and processes, in defining and design of enterprise business data hierarchies.
  • Experience developing SQL procedures on complex datasets for data cleaning and report automation
  • Familiarity in using Teradata tools like SQL Assistant and Microsoft SQL server for accessing and manipulating data on ODBC-compliant database servers
  • Familiar with development and deployment of various cloud-based systems like AWS and Azure
  • Good understanding on the usage of NoSQL database column oriented, HBase
  • Good knowledge and experience on AWS, Redshift, S3, and EMR
  • Experience in developing new airflow DAG to find popular items in redshift and ingest in the main PostgreSQL DB via a web service call
  • Documented and managed migration and development process of Airflow Data Pipelines using Airflow DAGs
  • Familiarity with AI and Deep learning platforms/methodologies like tensor flow, RNN, LSTM
  • Excellent communication and inter-personal skills in understanding the flow of business process and ability to interact with all levels of Software Development Life Cycle (SDLC)


Languages: Python, SQL, R, NoSQL(mongodb), C

Databases: Oracle 10g/11g, PostgreSQL, MySQL, Azure SQL, MS-Access, SSIS, SSRS

Operating Systems: Windows, Linux

Machine Learning Algorithm s: Linear Regression, Logistic Regression, Decision Trees, Random Forests, K-Nearest Neighbors, Support Vector Machines, Gradient Boost Decision Trees, Naive BayesK-Means Clustering, Stacking Classifiers, Cascading Models, Hierarchical Clustering and Density, Based Clustering

Machine Learning Techniques: Principal Component Analysis, Truncated SVD, Data Standardization, L1 and L2 Regularization, Loss Minimization, Hyper Parameter Tuning, Performance Measurement of Models, Feature Engineering, Content Based and Collaborative Based Filtering, Matrix Factorization, Model Calibration and Validation, Productionizing and Deploying Models, A/B Testing, Point and Interval Estimation, Hypothesis Testing, Cross-Validations, Decision Surface Analysis, Re, Models periodically, t-distributed Stochastic Neighbor Embedding

SQL Tools: SQL Server Tools SQL Server Management Studio (SSMS), Erwin Data Modeler, SASData Warehouse Tools, MS SQL Server 2005/2008/2 Integration ServicesBusiness Intelligence Tools SQL Server 2005/2008/2012/2014 , Business Intelligence Development Studio Reporting Tools, SQL Server 2005/2008/2012/2014 Reporting Services

Deep Learning: Artificial Neural Networks, Convolutional Neural Networks, Multi-Layer Perceptron s, Recurrent Neural Networks, LSTM, GRU, SoftMax Classifier, Back Propagation, Chain Rule, Choosing Activation Functions, Dropout, Optimization Algorithms, Vanishing and Exploding Gradient, Striding, Padding, Optimized Weight Initializations, Gradient Monitoring and Clipping, Batch Normalizations, Max Pooling

BI & other Tools: Tableau, Plotly, Power BI, Qlik View, SAS Visual studios, Jenkins, Toad, Erwin, AWS, Azure,D3, Mule Soft, Shiny, Anaconda

DataCamp s: Introduction to Python, Intermediate Python for Data Science, Introduction to R, Introduction to SQL, Joining Data in SQL


Confidential, Chicago, IL

Data Scientist


  • Familiarity with Git for project management and versioning
  • Acquire, clean and structure data from multiple sources and maintain databases/data systems
  • Identify, analyze, and interpret trends or patterns in complex data sets
  • Filter and “clean” data, and review computer reports, prints, and performance indicators to detect and correct code problems
  • Worked on massive structured, unstructured, transactional and real-time data sets from a variety of sources to analyze customer usage patterns and provide actionable, impactful, intuitive insights using statistics, metrics and algorithms
  • Strong programming skills in languages like Python, R, SAS and SQL
  • Good knowledge of Machine Learning techniques (Decision Trees, Linear/Logistic Regression, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors)
  • Made use of Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles
  • Experience developing SAS macros for Ad-hoc reporting in SAS Enterprise guide using query builder and SQL
  • Involved from the start to end with data science tools and techniques, including data manipulation (SQL, Hadoop, etc.) and programming (R, Python, XML, ETL) frameworks
  • Took ownership of analytical projects end to end from extracting and exploring data, tracking feature usage of product using Google Analytics, and present it to product managers
  • Troubleshoot ETL failures and performed manual loads using SQL stored procedures
  • Provide analytical and business insights for decision-making and provide support for key growth metrics
  • Worked closely with various business functions to identify opportunities, analyze, and interpret trends or patterns in data sets using different techniques and tools such as PostgreSQL, Mixpanel, Google Analytics, Excel and R
  • Worked with large volumes of data; extracted and manipulated large datasets using standard tools such as Python, Hadoop, R, SQL and SAS
  • Extracted set of new features to help better understand the interplay between geography and audience features to improve model performance
  • Data Mining experience in Python, R, H2O and/or SAS. Familiar with various Machine Learning algorithms and Statistical methods
  • Involved in planning, roadmap, and architecture discussions to help evolve AI processes to improve revenue-generating products
  • Collaborated cross functionally with data science team and other teams including back-end developers, product managers etc. to help define problems, collect data and build analytical models
  • Designed simple yet robust algorithms using rule-based or optimization-based (e.g. linear programming) approaches to optimize data
  • Performed visualizations using Tableau and Matplotlib
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption currently
  • Familiar with development and deployment of various cloud-based systems like AWS and Azure
  • Handled AWS Management Tools as Cloud watch and Cloud Trail
  • Cognitive about designing, deploying and operating highly available, scalable and fault tolerant systems using Amazon Web Services (AWS)
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2
  • Experience working in a technical environment with the following technologies: AWS data services (Redshift, Athena, EMR) or similar
  • Involved in designing and deploying multitude applications utilizing almost all of the AWS stack (Including EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2
  • Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWS
  • Automated Regular AWS tasks like snapshots creation using Python scripts
  • Build servers using AWS, importing volumes, launching EC2, RDS, creating security groups, auto-scaling, load balancers (ELBs) in the defined virtual private connection
  • Used Python 2.x/3.X (NumPy, SciPy, Pandas, Scikit-learn, Seaborn to develop a variety of models and algorithms for analytic purposes
  • Expertise and Vast knowledge of Enterprise Data Warehousing including Data Modeling, Data Architecture, Data Integration (ETL/ELT) and Business Intelligence
  • Enforced model Validation using test and Validation sets via K- fold cross validation, statistical significance testing
  • Predominantly driven open source tools Spyder (Python) and R Studio(R) for statistical analysis and contriving the machine learning. Involved in defining the Source to Target data mappings, Business rules, data definitions
  • Deployed and monitored scalable infrastructure on Amazon Web Services (AWS) & configuration management using AWS Chef

Environment: Python, R, SQL, SAS, Scikit-learn, Numpy, Seaborn, Pandas, Apache Airflow, Apache HTTP, Hadoop, Decision Trees, SVM, Linear Regression, Logistic Regression, Random Forest, Bayesian, XG Boost, K-Nearest Neighbors, PostgreSQL, Machine Learning, AWS


Python Data Engineer


  • Worked on projects starting from gathering requirements to developing the entire application. Hands on with Anaconda Python Environment. Developed, activated and programmed in Anaconda environment
  • Performed Exploratory Data analysis (EDA) to find and understand interactions between different fields in the dataset, for dimensionality reduction, to detect outliers, summarize main characteristics and extract important variables graphically
  • Responsible for Data Cleaning, feature scaling and feature engineering using NumPy and Pandas in Python
  • Extract data and actionable insights from a variety of client sources and systems, find probabilistic and deterministic matches across second- and third-party data sources, and complete exploratory data analysis
  • Worked on writing and as well as read data from CSV and excel file formats
  • Proficient in object-oriented programming, design patterns, algorithms, and data structures
  • Wrote python routines to log into the websites and fetch data for selected options
  • Implemented Python scripts to update content in databases and manipulate files
  • Experience using python libraries for machine learning like pandas, numpy, matplotlib, sklearn, SciPy to Load the dataset, summarizing the dataset, visualizing the dataset, evaluating some algorithms and making some predictions
  • Worked with python modules like urllib, urllib2, requests for web crawling. Experience using ML techniques: clustering, regression, classification, graphical models
  • Carried out Regression, K-means clustering and Decision Trees along with Data Visualization reports for the management using R
  • Implemented classification algorithms such as Logistic Regression, KNN and Random Forests to predict the Customer churn and Customer interface
  • Implemented various algorithms like Principal Component Analysis (PCA) and t-Stochastics Neighborhood Embedding (t-SNE) for dimensionality reduction and normalize the large datasets
  • Performed data visualization and designed dashboards using Tableau, generated reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data
  • A keen eye for code re-usability and maintainability
  • Involved in debugging and troubleshooting issues and fixed many bugs in two of the main applications
  • Developed, tested and debugged software tools utilized by clients and internal customers
  • Coded test programs and evaluated existing engineering processes
  • Familiarity with cloud platforms such as Google Cloud Platform, Microsoft Azure, AWS, IBM Cloud to use/deploy analytics resources
  • Designed data warehouses on platforms such as AWS Redshift, Azure SQL Data Warehouse, Snowflake, and other high-performance platforms
  • Knowledge in extracting and synthesizing of data from Azure: data lake storage (ADLS), blob storage, SQL DW, SQL Server; and legacy systems: Oracle and its companion data lake storage
  • Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as Azure, AWS, GCP, etc.
  • Administered regular user and application support for highly complex issues involving multiple components such as Hive, Spark, Kafka, MapReduce
  • Developed Full life cycle of Data Lake, Data Warehouse with Bigdata technologies like Spark and Hadoop

Environment: Anaconda, Python, R, Hive, Spark, Kafka, MapReduce, Apache Airflow, AWS, AWS Fargate, Nginx,Tomcat, Numpy, Pandas, Matplotlib, Sklearn, Scipy, PCA, k-means Clustering, Decision Trees, KNN, Random Forest, t-SNE, Tableau


ETL Developer


  • Worked closely with Business Users to gather requirements
  • Created and monitored sessions using workflow manager and workflow monitor
  • Responsible for transforming functional requirements into technical requirements
  • Involved in tuning of targets, Informatica mappings and sessions for optimum performance requirements
  • Developed and implemented an efficient migration process to move ETL objects from development to test and production environments
  • Wrote complex SQL scripts to avoid Informatica Look-ups to improve the performance as the data was heavy in volume
  • Created PL/SQL stored procedures and called them from Informatica power center
  • Design and develop innovative solutions for managing data movement in/out of and between various data systems, platforms and file types
  • Co-ordinated monthly roadmap releases to push enhanced/new informatica code to production
  • Developed and maintained ETL (Data Extraction, Transformation and Loading) mappings using Informatica Designer 8.6 to extract the data from multiple source systems that comprise databases like Oracle 10g, SQL Server 7.2, flat files to the Staging area, EDW and then to the Data Marts
  • Participated in providing the project estimates for development team efforts for the offshore as well as on-site
  • Coordinated and monitored the project progress to ensure the timely flow and complete delivery of the project
  • Create mappings using reusable components like worklets, mapplets using other reusable transformations
  • Wrote various technical documents, including Business Requirements, Functional and Technical Specifications, Data Flow and Process diagrams using MS Visio tools
  • Prioritized and handled multiple tasks in high-pressure environment
  • Showcased excellent customer service to the internal functional team by pro-actively following up on issues
  • Consistently met deadlines for all production work orders

Environment: Informatica, ETL, SQL, PL/SQL, EDW, MS Visio tools, Oracle 10g, Data Marts


SQL Developer


  • Designed, developed and maintained relational databases
  • Performed queries to extract data from the central data repository
  • Aggregated data from multiple data sources
  • Involved in creating, maintaining, and modifying the reporting system
  • Created links between the data sources and the reporting tools. Developed and distributed reports
  • Created reports utilizing SQL Server Reporting Services (SSRS)
  • Maintained active relationships with internal customers to determine business requirements
  • Analyzed and evaluated detailed business and technical requirements
  • Identified database requirements by analyzing usage, development needs and system capacities
  • Involved in maintaining database performance, availability and capacity planning to ensure maximum efficiency
  • Managed database backups in accordance with policies and best practices
  • Built and executed data queries for business partners
  • Implemented system changes, upgrades, and modifications in accordance to business needs
  • Proven analytical problem solving and debugging skills
  • Designed and developed database objects, tables, stored procedures, views, triggers and SSIS packages
  • Participated in all phases of Data Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis
  • Perform complex database administration assignments essential to the production and development of DBMS applications, including but not limited to database analysis, design and development, security support, performance monitoring and tuning
  • Monitor performance and managing parameters to provide fast responses to front-end users considering back-end organization of data
  • Work with other SQL developers to develop and extend data warehouse modeling to provide flexible and efficient access to information
  • Perform tests and conduct code reviews with other team members to make sure your work is resilient, elegantly coded, and effectively tuned for performance
  • Manage and contribute to all aspects of application development including functional and technical specifications, design, development and production support

Environment: SQL, SSRS, Data Mining, Data Collection, Data Cleaning, Validation, Visualization, DBMS


Jr. Data Analyst


  • Aided in accumulating requirements and technical documentation
  • Data procurement from various sources for customer data analysis
  • Analyzed the credibility of customers data to check for loan eligibility
  • Collaborated with clients for requirement gathering, use-case development, business process flow and modeling
  • Interpreted complex patterns and trends in datasets using SPSS and Excel
  • Involved in mapping data elements from user interface to database
  • Assisted in modifying the data for visualization depending on business requirement
  • Documented logical, physical, relational and dimensional data models
  • Normalized and De-normalized the tables and maintaining Referential Integrity by using Triggers and Primary and Foreign Keys
  • Involved in the design and analysis of underlying database schema, altering, and creation of the table structure
  • Responsible for report generation using SQL Server Reporting Services (SSRS) and Crystal Reports based on business requirements

Environment: SPSS, Microsoft Excel, Normalization, De-normalization, SQL, SSRS

Hire Now