Data Scientist Resume Chicago - IL - Hire IT People

SUMMARY:

A data science enthusiast having 6+ years of experience, having broad and in - depth knowledge in statistics, python programming with a strong math background
Identified areas of improvement in business by uncovering insights by analyzing huge amount of data using various machine learning techniques
Effectively utilized analytical applications like Python and R to identify trends and relationships between different chunks of data, to draw meaningful inferences and translate analytical conclusions into risk management and marketing strategies that drive value
Strong programming skills in languages like Python, R, SAS and SQL
Experience with emerging technologies such as Big Data, Hadoop, and NoSQL
Performed in-depth analysis and predictive modelling to unearth hidden opportunities; draw and present insights to the product, sales and marketing teams
Proficient in performing data parsing, data manipulation and data preparation with methods as describe data contents, descriptive statistics, regex, split and combine, merge, subset, remap, melt and reshape.
Skilled in the integration of various data sources with multiple relational databases like Oracle, MS SQL Server, DB2, Teradata and Flat Files into the staging area, ODS, Data Warehouse and DataMart
Data Acquisition, Data Validation, Predictive demonstrating, Data Visualization. Capable in measurable programming languages like R and Python
Expertise in using Python libraries and packages in R
Experience in Data Extraction/Transformation/Loading Data Conversion and Data Migration using PL/SQL Scripts and SQL Server Integration Services (SSIS)
Hands-on experience with Machine learning algorithms such as Regression Analysis, Clustering, Classification, Principal Component Analysis and Data Visualization Tools
Integration Architect & Data Scientist experience in Analytics, Bigdata, BPM, SOA, ETL and Cloud technologies
Experience with NLP, mining of structured, semi-structured, and unstructured data
Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/PCA, Ensembles
Experienced in developing data models and processing data through big data frameworks like HDFS, Hive and Spark to access streaming data and implementing data pipelines to process real-time data and recommendations
Experience in Data Profiling, analysis by implementing appropriate database standards and processes, in defining and design of enterprise business data hierarchies.
Experience developing SQL procedures on complex datasets for data cleaning and report automation
Familiarity in using Teradata tools like SQL Assistant and Microsoft SQL server for accessing and manipulating data on ODBC-compliant database servers
Familiar with development and deployment of various cloud-based systems like AWS and Azure
Good understanding on the usage of NoSQL database column oriented, HBase
Good knowledge and experience on AWS, Redshift, S3, and EMR
Experience in developing new airflow DAG to find popular items in redshift and ingest in the main PostgreSQL DB via a web service call
Documented and managed migration and development process of Airflow Data Pipelines using Airflow DAGs
Familiarity with AI and Deep learning platforms/methodologies like tensor flow, RNN, LSTM
Excellent communication and inter-personal skills in understanding the flow of business process and ability to interact with all levels of Software Development Life Cycle (SDLC)

TECHNICAL SKILLS:

Languages: Python, SQL, R, NoSQL(mongodb), C

Databases: Oracle 10g/11g, PostgreSQL, MySQL, Azure SQL, MS-Access, SSIS, SSRS

Operating Systems: Windows, Linux

Machine Learning Algorithm s: Linear Regression, Logistic Regression, Decision Trees, Random Forests, K-Nearest

Neighbors, Support Vector Machines, Gradient Boost Decision Trees, Naive BayesK-Means Clustering, Stacking Classifiers, Cascading Models, Hierarchical Clustering and Density, Based Clustering

Machine Learning Techniques: Principal Component Analysis, Truncated SVD, Data Standardization, L1 and L2 Regularization, Loss Minimization, Hyper Parameter Tuning, Performance Measurement of Models, Feature Engineering, Content Based and Collaborative Based Filtering, Matrix Factorization, Model Calibration and Validation, Productionizing and Deploying Models, A/B Testing, Point and Interval Estimation, Hypothesis Testing, Cross-Validations, Decision Surface Analysis, Retraining, Models periodically, t-distributed Stochastic Neighbor Embedding

SQL Tools: SQL Server Tools SQL Server Management Studio (SSMS), Erwin Data Modeler, SASData Warehouse Tools, MS SQL Server 2005/2008/2 Integration ServicesBusiness Intelligence Tools SQL Server 2005/2008/2012/2014 , Business Intelligence

Development Studio Reporting Tools, SQL Server 2005/2008/2012/2014 Reporting Services

Deep Learning: Artificial Neural Networks, Convolutional Neural Networks, Multi-Layer Perceptron s, Recurrent Neural Networks, LSTM, GRU, SoftMax Classifier, Back Propagation, Chain Rule, Choosing Activation Functions, Dropout, Optimization Algorithms, Vanishing and Exploding Gradient, Striding, Padding, Optimized Weight Initializations, Gradient Monitoring and Clipping, Batch Normalizations, Max Pooling

BI & other Tools: Tableau, Plotly, Power BI, Qlik View, SAS Visual studios, Jenkins, Toad, Erwin, AWS, Azure,D3, Mule Soft, Shiny, Anaconda

DataCamp Certifications: Introduction to Python, Intermediate Python for Data Science, Introduction to R, Introduction to SQL, Joining Data in SQL

PROFESSIONAL EXPERIENCE:

Confidential, Chicago - IL

Data Scientist

Familiarity with Git for project management and versioning
Acquire, clean and structure data from multiple sources and maintain databases/data systems
Identify, analyze, and interpret trends or patterns in complex data sets
Filter and “clean” data, and review computer reports, prints, and performance indicators to detect and correct code problems
Worked on massive structured, unstructured, transactional and real-time data sets from a variety of sources to analyze customer usage patterns and provide actionable, impactful, intuitive insights using statistics, metrics and algorithms
Strong programming skills in languages like Python, R, SAS and SQL
Good knowledge of Machine Learning techniques (Decision Trees, Linear/Logistic Regression, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors)
Made use of Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles
Experience developing SAS macros for Ad-hoc reporting in SAS Enterprise guide using query builder and SQL
Involved from the start to end with data science tools and techniques, including data manipulation (SQL, Hadoop, etc.) and programming (R, Python, XML, ETL) frameworks
Took ownership of analytical projects end to end from extracting and exploring data, tracking feature usage of product using Google Analytics, and present it to product managers
Troubleshoot ETL failures and performed manual loads using SQL stored procedures
Provide analytical and business insights for decision-making and provide support for key growth metrics
Worked closely with various business functions to identify opportunities, analyze, and interpret trends or patterns in data sets using different techniques and tools such as PostgreSQL, Mixpanel, Google Analytics, Excel and R
Worked with large volumes of data; extracted and manipulated large datasets using standard tools such as Python, Hadoop, R, SQL and SAS
Extracted set of new features to help better understand the interplay between geography and audience features to improve model performance
Data Mining experience in Python, R, H2O and/or SAS. Familiar with various Machine Learning algorithms and Statistical methods
Involved in planning, roadmap, and architecture discussions to help evolve AI processes to improve revenue-generating products
Collaborated cross functionally with data science team and other teams including back-end developers, product managers etc. to help define problems, collect data and build analytical models
Designed simple yet robust algorithms using rule-based or optimization-based (e.g. linear programming) approaches to optimize data
Performed visualizations using Tableau and Matplotlib
Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption currently
Familiar with development and deployment of various cloud-based systems like AWS and Azure
Handled AWS Management Tools as Cloud watch and Cloud Trail
Cognitive about designing, deploying and operating highly available, scalable and fault tolerant systems using Amazon Web Services (AWS)
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2
Experience working in a technical environment with the following technologies: AWS data services (Redshift, Athena, EMR) or similar
Involved in designing and deploying multitude applications utilizing almost all of the AWS stack (Including EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2
Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWS
Automated Regular AWS tasks like snapshots creation using Python scripts
Build servers using AWS, importing volumes, launching EC2, RDS, creating security groups, auto-scaling, load balancers (ELBs) in the defined virtual private connection
Used Python 2.x/3.X (NumPy, SciPy, Pandas, Scikit-learn, Seaborn to develop a variety of models and algorithms for analytic purposes
Expertise and Vast knowledge of Enterprise Data Warehousing including Data Modeling, Data Architecture, Data Integration (ETL/ELT) and Business Intelligence
Enforced model Validation using test and Validation sets via K- fold cross validation, statistical significance testing
Predominantly driven open source tools Spyder (Python) and R Studio(R) for statistical analysis and contriving the machine learning. Involved in defining the Source to Target data mappings, Business rules, data definitions
Deployed and monitored scalable infrastructure on Amazon Web Services (AWS) & configuration management using AWS Chef

Environment: Python, R, SQL, SAS, Scikit-learn, Numpy, Seaborn, Pandas, Apache Airflow, Apache HTTP, Hadoop, Decision Trees, SVM, Linear Regression, Logistic Regression, Random Forest, Bayesian, XG Boost, K-Nearest Neighbors, PostgreSQL, Machine Learning, AWS

Confidential

Python Data Engineer

Worked on projects starting from gathering requirements to developing the entire application. Hands on with Anaconda Python Environment. Developed, activated and programmed in Anaconda environment
Performed Exploratory Data analysis (EDA) to find and understand interactions between different fields in the dataset, for dimensionality reduction, to detect outliers, summarize main characteristics and extract important variables graphically
Responsible for Data Cleaning, feature scaling and feature engineering using NumPy and Pandas in Python
Extract data and actionable insights from a variety of client sources and systems, find probabilistic and deterministic matches across second- and third-party data sources, and complete exploratory data analysis
Worked on writing and as well as read data from CSV and excel file formats
Proficient in object-oriented programming, design patterns, algorithms, and data structures
Wrote python routines to log into the websites and fetch data for selected options
Implemented Python scripts to update content in databases and manipulate files
Experience using python libraries for machine learning like pandas, numpy, matplotlib, sklearn, SciPy to Load the dataset, summarizing the dataset, visualizing the dataset, evaluating some algorithms and making some predictions
Worked with python modules like urllib, urllib2, requests for web crawling. Experience using ML techniques: clustering, regression, classification, graphical models
Carried out Regression, K-means clustering and Decision Trees along with Data Visualization reports for the management using R
Implemented classification algorithms such as Logistic Regression, KNN and Random Forests to predict the Customer churn and Customer interface
Implemented various algorithms like Principal Component Analysis (PCA) and t-Stochastics Neighborhood Embedding (t-SNE) for dimensionality reduction and normalize the large datasets
Performed data visualization and designed dashboards using Tableau, generated reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders
Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data
A keen eye for code re-usability and maintainability
Involved in debugging and troubleshooting issues and fixed many bugs in two of the main applications
Developed, tested and debugged software tools utilized by clients and internal customers
Coded test programs and evaluated existing engineering processes
Familiarity with cloud platforms such as Google Cloud Platform, Microsoft Azure, AWS, Confidential Cloud to use/deploy analytics resources
Designed data warehouses on platforms such as AWS Redshift, Azure SQL Data Warehouse, Snowflake, and other high-performance platforms
Knowledge in extracting and synthesizing of data from Azure: data lake storage (ADLS), blob storage, SQL DW, SQL Server; and legacy systems: Oracle and its companion data lake storage
Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as Azure, AWS, GCP, etc.
Administered regular user and application support for highly complex issues involving multiple components such as Hive, Spark, Kafka, MapReduce
Developed Full life cycle of Data Lake, Data Warehouse with Bigdata technologies like Spark and Hadoop

Environment: Anaconda, Python, R, Hive, Spark, Kafka, MapReduce, Apache Airflow, AWS, AWS Fargate, Nginx,Tomcat, Numpy, Pandas, Matplotlib, Sklearn, Scipy, PCA, k-means Clustering, Decision Trees, KNN, Random Forest, t-SNE, Tableau

Confidential

ETL Developer

Worked closely with Business Users to gather requirements
Created and monitored sessions using workflow manager and workflow monitor
Responsible for transforming functional requirements into technical requirements
Involved in tuning of targets, Informatica mappings and sessions for optimum performance requirements
Developed and implemented an efficient migration process to move ETL objects from development to test and production environments
Wrote complex SQL scripts to avoid Informatica Look-ups to improve the performance as the data was heavy in volume
Created PL/SQL stored procedures and called them from Informatica power center
Design and develop innovative solutions for managing data movement in/out of and between various data systems, platforms and file types
Co-ordinated monthly roadmap releases to push enhanced/new informatica code to production
Developed and maintained ETL (Data Extraction, Transformation and Loading) mappings using Informatica Designer 8.6 to extract the data from multiple source systems that comprise databases like Oracle 10g, SQL Server 7.2, flat files to the Staging area, EDW and then to the Data Marts
Participated in providing the project estimates for development team efforts for the offshore as well as on-site
Coordinated and monitored the project progress to ensure the timely flow and complete delivery of the project
Create mappings using reusable components like worklets, mapplets using other reusable transformations
Wrote various technical documents, including Business Requirements, Functional and Technical Specifications, Data Flow and Process diagrams using MS Visio tools
Prioritized and handled multiple tasks in high-pressure environment
Showcased excellent customer service to the internal functional team by pro-actively following up on issues
Consistently met deadlines for all production work orders

Environment: Informatica, ETL, SQL, PL/SQL, EDW, MS Visio tools, Oracle 10g, Data Marts

Confidential

SQL Developer

Designed, developed and maintained relational databases
Performed queries to extract data from the central data repository
Aggregated data from multiple data sources
Involved in creating, maintaining, and modifying the reporting system
Created links between the data sources and the reporting tools. Developed and distributed reports
Created reports utilizing SQL Server Reporting Services (SSRS)
Maintained active relationships with internal customers to determine business requirements
Analyzed and evaluated detailed business and technical requirements
Identified database requirements by analyzing usage, development needs and system capacities
Involved in maintaining database performance, availability and capacity planning to ensure maximum efficiency
Managed database backups in accordance with policies and best practices
Built and executed data queries for business partners
Implemented system changes, upgrades, and modifications in accordance to business needs
Proven analytical problem solving and debugging skills
Designed and developed database objects, tables, stored procedures, views, triggers and SSIS packages
Participated in all phases of Data Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis
Perform complex database administration assignments essential to the production and development of DBMS applications, including but not limited to database analysis, design and development, security support, performance monitoring and tuning
Monitor performance and managing parameters to provide fast responses to front-end users considering back-end organization of data
Work with other SQL developers to develop and extend data warehouse modeling to provide flexible and efficient access to information
Perform tests and conduct code reviews with other team members to make sure your work is resilient, elegantly coded, and effectively tuned for performance
Manage and contribute to all aspects of application development including functional and technical specifications, design, development and production support

Environment: SQL, SSRS, Data Mining, Data Collection, Data Cleaning, Validation, Visualization, DBMS

Confidential

Jr. Data Analyst

Aided in accumulating requirements and technical documentation
Data procurement from various sources for customer data analysis
Analyzed the credibility of customers data to check for loan eligibility
Collaborated with clients for requirement gathering, use-case development, business process flow and modeling
Interpreted complex patterns and trends in datasets using SPSS and Excel
Involved in mapping data elements from user interface to database
Assisted in modifying the data for visualization depending on business requirement
Documented logical, physical, relational and dimensional data models
Normalized and De-normalized the tables and maintaining Referential Integrity by using Triggers and Primary and Foreign Keys
Involved in the design and analysis of underlying database schema, altering, and creation of the table structure
Responsible for report generation using SQL Server Reporting Services (SSRS) and Crystal Reports based on business requirements

Environment: SPSS, Microsoft Excel, Normalization, De-normalization, SQL, SSRS

We provide IT Staff Augmentation Services!

Data Scientist Resume

Chicago -, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship