Data Scientist Resume Chicago, IL - Hire IT People

PROFESSIONAL SUMMARY:

Around 9 + years of hands on experience and comprehensive industry knowledge of Python development, Machine Learning, Artificial Intelligence algorithms, Data Analytics, Data Modeling, Statistical Modeling, Data Architecture, Data Mining, Text Mining & Natural Language Processing (NLP), Analytics Models (like Decision Trees, Hadoop (Hive, PIG), Linear & Logistic Regression, Business Intelligence, R, Spark, Scala, MS Excel, SQL and Postgre SQL
Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake schema, Fact & Dimensions tables and Extended Star
Expertise in applying Data mining and optimization techniques in B2B and B2C models
Experienced in building data models using machine learning techniques for Classification, Regression, Clustering and Associative mining
Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, and maintenance
Expertise in writing functional specifications, translating business requirements to technical specifications, created/maintained/modified database design document with detailed description of logical entities and physical tables
Expertise in Data Analysis, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Export through the use of multiple ETL tools such as Informatica Power Center
Proficient in Statistical Analysis & Predictive Modeling
Expertise in data acquisition, storage, analysis, integration, predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, cluster analysis, ANOVA and other advanced statistical techniques
Excellent knowledge and experience in OLTP/OLAP system study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool
Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer
Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS, Map Reduce, HiveQL, SparkSQL, PySpark
Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR
Proficient in data visualization tools such as Tableau, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards
Experienced in Agile methodology and SCRUM process
Strong business sense and abilities to communicate data insights to both technical and non - technical clients

TECHNICAL SKILLS:

Databases: MySQL, Postgre SQL, Oracle, HBase, Amazon Redshift, MS SQL Server 2016/2014/2012/2008 R2/2008, Teradata

Statistical Methods: Hypothetical Testing, ANOVA, Time Series, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation

Machine Learning: Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, K-Means Clustering, KNN and Ensemble Method

Hadoop Ecosystem: Hadoop 2.x, Spark 2.x, Map Reduce, Hive, HDFS, Sqoop, Flume

Languages: Python (2.x/3.x), R, SAS, SQL, T-SQL, C,C++

Data Visualization: Tableau, MatPlotLib, Seaborn, ggplot2

Operating Systems: Power Shell, UNIX/UNIX Shell Scripting (via PuTTY client), Linux and Windows

Reporting Tools: Tableau Suite of Tools 10.x, 9.x, 8.x which includes Desktop, Server and Online, Server Reporting Services(SSRS)

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Data Scientist

Responsibilities:

Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, R, Celenois modules a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and utilized the engine to increase user lifetime by 45% and tripled user conversations for target categories
Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn) and Spark2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes
Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis
Participated in features engineering such as feature intersection generation, feature normalize and label encoding with Scikit-learn preprocessing
Ensured solutions architecture / technical architectures are well documented & maintained, while setting standards and offering consultative advice to technical & management teams and also involved in recommending the roadmap and an approach for implementing the data integration architecture (with Cost, Schedule & Effort Estimates)
Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical and Physical Data Models. Expert in Business Intelligence and Data Visualization tools: Tableau, Microstrategy
Worked on machine learning on large size data using Spark and Map Reduce
Developed and evangelized best practices for statistical analysis of Big Data
Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client
Designed and developed NLP models for sentiment analysis
Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements
Designed the Enterprise Conceptual, Logical, and Physical Data Model for ‘Bulk Data Storage System ‘using Embarcadero ER Studio, the data models were designed in 3NF
Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from RedShift
Explored and analyzed the customer specific features by using SparkSQL
Performed data imputation using Scikit-learn package in Python
Led the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches
Developed Spark/Scala, SAS and R programs for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources
Conducted analysis on assessing customer consuming behaviours and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering
Built regression models include: Lasso, Ridge, SVR, XGboost to predict Customer Life Time Value
Built classification models include: Logistic Regression, SVM, Decision Tree, Random Forest to predict Customer Churn Rate
Used F-Score, AUC/ROC, Confusion Matrix, MAE, RMSE to evaluate different Model performance

Environment: AWS RedShift, EMR, EC2, Hadoop Framework, S3, Tableau Desktop (9.x/10.x), Tableau Server (9.x/10.x), HDFS, Spark(Pyspark, MLlib, Spark SQL), Python 3.x (Scikit-Learn/Scipy/Numpy/Pandas/Matplotlib/Seaborn), Machine Learning (Regressions, Teradata, Git 2.x, Agile/SCRUM, KNN, SVM, Decision Tree, Random Forest, XGboost, Light GBM, Collaborative filtering, Ensemble)

Confidential, Dallas, TX

Data Scientist

Responsibilities:

Tackled highly imbalanced Fraud dataset using under sampling, oversampling with SMOTE and cost sensitive algorithms with Python Scikit-learn
Wrote complex Spark SQL queries for data analysis to meet business requirement
Developed Map Reduce/Spark Python modules for predictive analytics & machine learning in Hadoop on AWS
Worked on Celenois module, data cleaning and ensured data quality, consistency, integrity using Pandas, Numpy
Participated in feature engineering such as feature intersection generation, feature normalize and label encoding with Scikit-learn preprocessing
Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn
Performed Naïve Bayes, KNN, Logistic Regression, Randomforest, SVMandXGboost to identify whether a loan will default or not
Implemented Ensemble of Ridge, Lasso Regression and XGboost to predict the potential loan default loss
Used various metrics(RMSE, MAE, F-Score, ROC and AUC) to evaluate the performance of each model
Used big data tools Spark (Pyspark, SparkSQL, Mllib) to conduct real time analysis of loan default based on AWS
Conducted Data blending, Data preparation using Alteryx and SQL for tableau consumption and publishing data sources to Tableau server
Created multiple custom SQL queries in Teradata SQL Workbench to prepare the right data sets for Tableau dashboards.
Queries involved retrieving data from multiple tables using various join conditions that enabled to utilize efficiently optimized data extracts for Tableau workbooks

Environment: MS SQL Server 2014, Teradata, ETL, SSIS, Alteryx, Tableau (Desktop 9.x/Server 9.x),Forest, SVM, XGboost, Ensemble), AWS Redshift, Spark(Pyspark, MLlib, Spark SQL),Hadoop 2.x, Map Reduce, HDFS, SharePoint

Confidential, Edison, NJ

Data Engineer

Responsibilities:

Gathered, analyzed, documented and translated application requirements into data models and Supports standardization of documentation and the adoption of standards and practices related to data and applications.
Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Sqoop, Pig, Flume, Hive, Map Reduce and HDFS
Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data
Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in Python.
Applied clustering algorithms i.e. Hierarchical, K-means using Scikit and Scipy.
Created logical data model from the conceptual model and it's conversion into the physical database design using ERWIN.
Mapped business needs/requirements to subject area model and to logical enterprise model.
Worked with DBA's to create a best fit physical data model from the logical data model.
Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/ columns as part of data analysis responsibilities.
Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
Developed the data warehouse model (star schema) for the proposed central model for the project
Created 3NF business area data modeling with de-normalized physical implementation data and information requirements analysis using ERWIN tool .
Worked on the Snow-flaking the Dimensions to remove redundancy
Worked in using Teradata14 tools like Fast Load, Multi Load, T Pump, Fast Export, Teradata Parallel Transporter (TPT) and BTEQ.
Helped in migration and conversion of data from the Sybase database into Oracle database, preparing mapping documents and developing partial SQL scripts as required
Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems .

Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit- Learn/ Scipy/ Numpy/ Pandas), R, SAS, SPSS, Mysql, Eclipse, PL/SQL, SQL connector, Tableau.

Confidential

BI Developer/ Data Analyst

Responsibilities:

Used SSIS to create ETL packages to Validate, Extract, Transform and Load data into Data Warehouse and Data Mart
Maintained and developed complex SQL queries, Celenois components stored procedures, views, functions and reports that meet customer requirements using Microsoft SQL Server 2008 R2
Created Views and Table-valued Functions, Common Table Expression (CTE), joins, complex sub queries to provide the reporting solutions
Optimized the performance of queries with modification in T-SQL queries, removed the unnecessary columns and redundant data, normalized tables, established joins and created index
Created SSIS packages using Pivot Transformation, Fuzzy Lookup, Derived Columns, Condition Split, Aggregate, Execute SQL Task, Data Flow Task and Execute Package Task
Migrated data from SAS environment to SQL Server 2008 via SQL Integration Services (SSIS)
Developed and implemented several types of Financial Reports (Income Statement, Profit& Loss Statement, EBIT, ROIC Reports) by using SSRS
Developed parameterized dynamic performance Reports (Gross Margin, Revenue base on geographic regions, Profitability based on web sales and smart phone app sales) and ran the reports every month and distributed them to respective departments through mailing server subscriptions and SharePoint server
Designed and developed new reports and maintained existing reports using Microsoft SQL Reporting Services (SSRS) and Microsoft Excel to support the firm's strategy and management
Created sub-reports, drill down reports, summary reports, parameterized reports, and ad-hoc reports using SSRS
Used SAS/SQL to pull data out from databases and aggregate to provide detailed reporting based on the user requirements
Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical analyses
Provided statistical research analyses and data modeling support for mortgage product
Perform analyses such as regression analysis, logistic regression, discriminant analysis, cluster analysis using SAS programming

Environment: SQL Server 2008 R2, DB2, Oracle,SQL Server Management Studio, SAS/ BASE, SAS/SQL, SAS/Enterprise Guide, MS BI Suite(SSIS/SSRS), T-SQL, SharePoint 2010, Visual Studio 2010,Agile/SCRUM

Confidential

Data Analyst

Responsibilities:

Involved in the design of the overall database using Entity Relationship diagrams.
Wrote triggers, menus and stored procedures in PL/SQL.
Involved in developing interactive forms and customization of screens using Forms 4.5.
Involved in building, debugging and running forms.
Involved in Data loading and Extracting functions using SQL*Loader.
Designed and developed all the tables, views for the system in Oracle.
Designing and developing forms validation procedures for query and update of data.
Extensive learning and development activities.
Performed back-end testing for the Application.
Performed manual testing for integration and acceptance.
Performed negative and positive testing for the application.
Modified existing Test Plans and Test Scripts for regression testing.
Conducted data integrity testing by extensive use of SQL.

Environment: Oracle 8.0, SQL*plus, SQL*Loader, T-SQL, PL/SQL, Forms 4.5, Reports 2.5, Business Objects, Win Runner

We provide IT Staff Augmentation Services!

Data Scientist Resume

Chicago, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship