Data Scientist Resume

SUMMARY

Having 9 years of IT experience in including Data Analysis, ETL pipelines, Data visualization, Model Evaluation, Predictive modelling, Data warehousing, BI reporting.
Extensively worked on very large structured and unstructured data and transformed business questions into machine learning models and developed data mining and reporting solutions.
Skilled in Data preparation, Exploratory analysis, Feature engineering, parameter fine - tuning using supervised and unsupervised Machine Learning models.
Expertise in performing data cleansing, transformation, describe data contents, compute descriptive statistics of data and ETL using Python and R.
Hands on experience in implementing Experienced in implementing linear & logistic regression, classification modeling, decision-trees, cluster and Time Series analysis, NLP, Dimensionality Reduction, CNN, ANN, Random forest, XG Boost, Naive Bayes, SVM, Clustering, Association Rule Mining, Reinforcement Learning using Python and R programming.
Ccomprehen sive knowledge in math behind machine learning algorithms like Gradient descent, Bagging, Boosting.
Extensive experience in Natural Language Processing along with Text Analytics and Sentiment Analysis using CBOW, TF/IDF, Word2Vec, Glove2Vec, Text blob, lemmatization, stop words, n-grams.
Familiar with Recommendation System Design by implementing Content based filtering, Collaborative Filtering, Hybrid Filtering, Matrix Factorization and Clustering Methods
Very strong knowledge in Dee learning Neural networks like ANN, CNN, RNN, Transfer learning, Time series data.
Skilled in CNN Regularization techniques like Early stopping, Dropout, Lasso and ridge and Hyper parameter tuning like learning rate, freezing layers.
Demonstrative knowledge and solid ability to write Kusto Query language.
Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server, NoSQL databases like MongoDB.
Strong knowledge and skills in statistical methodologies such as A/B testing, experiment design, hypothesis test, Z-test, T-test, Chi-square independence test and ANOVA.
Experienced in using various packages in Python 3.5/2.7 and R like ggplot2, caret, dplyr, Pandas, NumPy, SciPy, Scikit- learn, Keras, TensorFlow, OpenCV, PyTorch.
Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupiter Notebook 4.X, Jupyter lab, R 3.0, RStudio, Visual studio, Spyder and Excel.
Worked with complex applications such as R, SAS, Matlab and SPSS to develop a neural network, cluster analysis.
Skilled in business intelligence and analytical skills with ability to extract insights and identify risk factors through careful analysis of statistical data
Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau, Power BI.
Effective team player with strong communication and interpersonal skills, possessing strong ability to adapt and learn new technologies and new business lines promptly
Experienced in working with both technical and non-technical team members.

TECHNICAL SKILLS

Programming Languages: Python (Pandas, NumPy, Scikit learn, NLTK, Beautiful soup, Scrapy Matplotlib, Seaborn, Plotly, Dash, TensorFlow, keras), R (ggplot2, dplyr, caret), rshiny, Spark.

Databases: SQL, Hive, Pig, Databases SQL-Server, My SQL, MS-SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra, KQL, Azure SQL.

Statistical Modeling: Descriptive statistics, Excel( VLOOKUP, Rand, Pivot tables, Data Analysis tool Pak), Hypothesis Testing, Regression (Linear, Random forest, Lasso, ridge), Classification methods (Logistic, Multinomial, Random forest, XG BOOST, Decision Trees, Naïve Bayes, KNN, SVM), Parameter tuning, Cross validation, Model evaluation (ROC, AUC, Sensitivity, Specificity), NLP (Text mining), Word embedding (CBOW,word2vec,Tf/IDF), Deep learning Neural Networks, AI Computer vision, A/B Testing.

Database Design Tools and Data Modelling: ERWIN 4.5/4.0, Azure Data Factory, Azure Data Lake, Azure BLOB, Star Schema/Snowflake Schema modelling, AWS redshift, AWS data pipeline, Aws data lake, S3, HDFS, Fact & Dimensions tables, physical & logical data modelling, Normalization and De-normalization techniques.

Tools: and Techniques: Python (Jupyter notebook, PyCharm), R (RStudio, Shiny), Microsoft Office Suite, Azure (Microsoft Remote Desktop, CLI), Microsoft SQL server, Hadoop MapReduce, Docker, Azure Databricks, Tableau, JIRA, ETL Data stage 8.1, Tableau, Power BI.

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist

Responsibilities:

Performed Exploratory Data Analysis and involved in generating various graphs and charts for analyzing the data using Python Libraries.
Developed complex SQL query to pull data from S3 to check data quality and solve the data issues.
Developed Map Reduce/ Spark modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
Employed Sqoop to import data from SQL server to Cassandra and used file System Check to check health of the files in HDFS.
Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis .
Implemented techniques like forwarding selection, backward elimination and stepwise approach for selection of most significant independent variables.
Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
Worked on Spark and used PySpark, sparksql programming languages to process the large volumes of data.
Extensively used Text analytics and NLP for decision support including lemmatization, stop words and word embedding techniques.
Demonstrated experience in the design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
Worked on Very large text data to classify the industry type and Evaluated model using Confusion Matrix, ROC and AUC methods.
Used different classification techniques like SVM, Naive Bayes and Gradient Boosted Trees

Environment: Python, SQL, Oracle 12c, SQL Server, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, NLP, Spark, DynamoDB, logistic regression, Hadoop, Hive, OLAP,, NLTK, SVM, JSON, Tableau, AWS EC2, Data Lake, Data Pipeline, S3, redshift, EMR.

Confidential

Data Engineer / Data Scientist

Responsibilities:

Performed the day to day activities of projects including meeting business stakeholders and understanding business requirements.
Extracted data from data sources and analyzed data to identify emerging trends and patterns through highly scalable and efficient analytical approaches.
Performed Data Analysis and Data Profiling and worked on data transformations and data qualit y rules.
Participated in End to End data mining life cycle and used advanced data mining techniques to extract the data from different sources, conducted studies and generated rapid plots with different visualization tools.
Performed featuring engineering and statistical modeling using machine learning and deep learning techniques, optimized the model performance and deployed the model.
Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python and build models using deep learning frameworks.
Design ed and develop ed analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
Employed customer clustering based on ML and statistical modelling effort including building predictive models and generate data products to support customer classification and segmentation.
Prototyping and experimenting ML algorithms and integrating into production system for different business needs.
Design, built and deployed a set of python modelling APIs for customer analytics, which integrates multiple machine learning techniques for various user behavior prediction and supports multiple marketing segmentation programs
Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
Deployed Azure IaaS virtual machines (VMs) and Cloud services (PaaS role instances) into secure VNets and subnets.
Developed a fully automated continuous integration system using Git, MySQL and custom tools developed in Python and Bash.
Used Erwin Data Modeler for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
Extensively used Python libraries for Data analysis, and Used efficient methods for handling null values, missing values, outliers.
Employed Different clustering techniques like K-means, hierarchical to bucket the customers into groups and predicted sales between different groups.
Applied different regression techniques to predict the sales of different stores and e-commerce for various customer groups.
Evaluate the model with adjusted R2 value, RMSE score and boosted the model performance with hyper parameter tuning, cross validation techniques with best parameters.
Built ETL data pipelines using Azure Data factory (ADF) to ingress the data from Blob storage to Azure Data lake gen2.
Developed and managed the code with version control tool Gitlab repository.
Outguessed the data from HDFS to Azure SQL data warehouse by building ETL pipelines using S
Extensively used python libraries including Scikit learn, NumPy, Pandas, Regex, OpenCV, Seaborn, Dash, Flask for Data Munging, Data Transformation, ML techniques, Data Visualization.
Designed and developed user interfaces and customization of Reports using Tableau and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.
Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.

Environment: MS SQL, Hadoop, HDFS, Pig, Hive, Map Reduce, Python libraries (Numpy, Pandas, Sci-kit learn, SciPy, Matplotlib), PL/SQL, MDM, SQL Server, DB2, Azure, Data Factory, Azure Data lake, Azure, SQL, Azure BLOB, Azure Databricks, Git.

Confidential

Data Analyst

Responsibilities:

Interacted regularly with the Business Analysts & development team to gain a better understanding of the Business Process, Requirements & Design.
Prepared Technical specification documents based on the functional requirements.
Used DataStage Designer for developing various jobs to extract, cleansing, transforming, integrating and loading data into ODS.
Performed data quality analysis and cleansing data in parallel jobs by using the data quality stages.
Performed Data manipulation like null handling, type conversion using Transformer stage and Used CDC Stage to implement the SCD1 & SCD2 as per the business requirement.
Performed code complex problems using DataStage Shared containers.
Performed root cause analysis when ETL issues arise and proposes solutions.
Designed developed, tested and maintained DataStage jobs.
Ensured that all project activities are thoroughly documented and current, and that appropriate change management and version control practices procedures have been followed.

Environment: Data Stage 8.1 Oracle 10g.

Confidential

Business systems Analyst

Responsibilities:

Collaborated with the business to define requirements and recommend optimized solutions. Ability to quickly understand complex business processes and associated data sets.
Consulted with internal and external stakeholders to identify specific needs within customer application modules and document requirements for data, reports, analysis, metadata, training, service levels, data quality, performance and troubleshooting.
Performed system impact analysis including upstream and downstream impacts, propose solutions and build functional specifications.
Responsible for writing complicated SQL queries with a good understanding of transactional databases and Data Warehouses.
Determined and validate potential data sources, build validation plan, assisting with or building scripts, executing the scripts and communicating results in non-production environment.
Plan, facilitate and participate in working sessions with cross-functional resources. Collaborate with data architects and BI developers to translate requirements into actionable insights.
Conducted peer reviews to solicit feedback on materials created.
Learn, understand and use new technology/tool and willingness to document as well as mentor others in the team.
Assisted reporting teams in developing Tableau visualizations and dashboards using Tableau Desktop.
Good understanding of Tableau features like calculated fields, parameters, joins, filters and data blending.
Involved in business troubleshooting and implementing agile analysis which includes communicating with stakeholders, working with data architects and development teams and provide direct feedback to teams.
Participate in daily scrums, sprint reviews, sprint planning, sprint retrospectives and portfolio/backlog grooming.
Create User stories, tasks and sprint backlog in Rally.
Work actively with the Product owner and Stakeholders to monitor and prioritize product backlog on an ongoing basis to meet release timelines and value to the business.
Guide the development teams to break down large and complex user stories into simplified versions for execution.

Environment: Tableau, MySQL.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship