We provide IT Staff Augmentation Services!

Data Scientist- Python Resume

Richardson, TX

PROFESSIONAL SUMMARY:

  • Data Scientist/Data Analyst around 7 years of Experience in Data Science and Analytics including Data Mining, Statistical Analysis with domain knowledge in Retail, Healthcare and Banking industries.
  • Involved in Data Science project life cycle, including Data Cleaning, Data extraction, Visualization, with large data sets of structured and unstructured data, created ER diagrams and schema.
  • Experience with Machine Learning algorithms such as logistic regression, KNN, SVM, random forest, neural network, linear regression, lasso regression and k - means.
  • Good experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions, to various business problems and generating data visualizations using R , Python and T ableau .
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupiter Notebook 4.X, R 3.0 (ggplot2, dplyr, Caret) and Excel
  • Experienced the full software lifecycle in SDLC, Agile, DevOps and Scrum methodologies including creating requirements, test plans.
  • Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA
  • Working Experience on Python 3.5/2.7 such as NumPy, SQLAlchemy, Beautiful soup, pickle, Pyside, Pymongo, SciPy, PyTables.
  • Ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server 2008, NoSQL databases like MongoDB 3.2
  • Experience in Big Data technologies like Spark 1.6, Spark SQL, PySpark, Hadoop 2.X, HDFS, Hive 1.X.
  • Experience in Data Warehousing including Data Modeling, Data Architecture, Data Integration (ETL/ELT) and Business Intelligence.
  • Good Knowledge and experience in deep learning algorithms such as Artificial Neural network ( ANN ), Convolutional Neural Network ( CNN ) and Recurrent Neural Network ( RNN ) , LSTM and RNN based speech recognition using TensorFlow.
  • Good Experience in using various Python libraries (Beautiful Soup, NumPy, Scipy, matplotlib, python-twitter, Pandas, MySQL dB for database connectivity).
  • Having experienced in Big Data technologies including Apache Spark , HDFS, Hive, MongoDB .
  • Used the version control tools like Git2.X and build tools like Apache Maven/Ant.
  • Worked on Machine Learning algorithms like Classification and Regression with KNN Model, Decision Tree Model, Naïve Bayes Model, Logistic Regression, SVM Model and Latent Factor Model .
  • Experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR .
  • Good knowledge on Microsoft Azure .
  • Knowledge and understanding of Devops (Dockers).
  • Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MySQL and PostgreSQL database.
  • Extensive experience in Data visualization tools like, Tableau 9.X, 10.X for creating dashboards.
  • Experience in development and designing of ETL methodology for supporting data transformations and processing in a corporate-wide environment using Teradata, Mainframes, and UNIX Shell Scripting
  • Used SQL Queries and Stored Procedures extensively in retrieving the contents from MySQL .
  • Good in implementing SQL tuning techniques such as Join Indexes (JI), Aggregate Join Indexes (AJI's), Statistics and Table changes including Index.
  • SQL loader for direct and parallel load of data from raw file to database tables.
  • Experience in development of T-SQL, OLAP, PL/SQL, Stored Procedures, Triggers, Functions, Packages, performance tuning and optimization for business logic implementation.
  • Strong SQL Server programming skills, with experience in working with functions, packages and triggers.
  • Good industry knowledge, analytical &problem solving skills and ability to work well with in a team as well as an individual.
  • Great team player and ability to work collaboratively and independently as required.

SKILLS MATRIX:

Languages: C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R, Python 2.x/3.x, Java, C, SQL, Shell Scripting

NO SQL Databases: Cassandra, HBase, MongoDB, Maria DB

Statistics: Hypothetical Testing, ANOVA, Confidence Intervals, Bayes Law, MLE, Fish Information, Principal Component Analysis (PCA), Cross-Validation, correlation.

BI Tools: Tableau, Tableau server, Tableau Reader, Splunk, SAP Business Objects, OBIEE, SAP Business Intelligence, QlikView, Amazon Redshift, or Azure Data Warehouse

Algorithms: Logistic regression, random forest, XG Boost, KNN, SVM, neural network rk, linear regression, lasso regression, k-means.

Big Data: Hadoop, HDFS, HIVE, PuTTy, Spark, Scala, Sqoop

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

WORK EXPERIENCE:

Confidential, Richardson, TX

Data Scientist- Python

Responsibilities:

  • Involved in Data Profiling to learn about user behavior and merge data from multiple data sources.
  • Participated in big data processing applications to collect, clean and normalization large volumes of open data using Hadoop ecosystems such as PIG, Hive, and HBase.
  • Designed the prototype of the Data Mart and documented possible outcome from it for end-user
  • Worked as Analyst to generate Data Models using Erwin and developed a relational database system.
  • Designing and developing various machine learning frameworks using Python, R and MATLAB.
  • Processed huge datasets (over billion data points, over 1 TB of datasets) for data association pairing and provided insights into meaningful data association and trends
  • Participated in all phases of data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS
  • Collaborate with data engineers to implement ETL process, write and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
  • Collect unstructured data from MongoDB 3.3 and completed data aggregation.
  • Conducted analysis of assessing customer consuming behaviors and discover the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Participate in features engineering such as feature intersection generating, feature normalize and Label encoding with Scikit-learn preprocessing.
  • Used pandas, NumPy, Seaborn, Scipy, Matplotlib, SKLearn and NLTK (Natural Language Toolkit), in Python for developing various machine learning algorithms
  • Utilized machine learning algorithms such as Decision Tree, linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN.
  • Parsing data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format.
  • Determine customer satisfaction and help enhance customer experience using NLP.
  • Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data
  • Perform data integrity checks, data cleaning, exploratory analysis and feature engineer using R 3.4.0
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in R
  • Worked on MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop.
  • Perform data visualizations with Tableau 10 and generated dashboards to present the findings.
  • Work on Text Analytics, Naïve Bayes, Sentiment analysis, creating word clouds, and retrieving data from Twitter and other social networking platforms
  • Use Git2.6 to apply version control. Tracked changes in files and coordinated work on the files among multiple team members.

Environment: Python 3.2/2.7, hive, Tableau, R, QlikView, MySQL, MS SQL Server 2008/2012, AWS, S3, EC2, Linux, Jupyter Notebook, RNN, ANN, Spark, Hadoop.

Confidential, SFO, CA

Data Scientist - Python

Responsibilities:

  • Communicated and coordinated with other departments to gather business requirements.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Participated in the installation of SAS/EBI on Linux platform. worked on Data Modeling tools Erwin Data Modeler to design the data models.
  • Designed tables and implemented the naming conventions for Logical and Physical Data Models in Erwin 7.0
  • Worked on development of data warehouse, data Lake and ETL systems using relational and non-relational tools like SQL, No SQL.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS, and PL/SQL.
  • Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database
  • Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions.
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, business Objects.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Participated in feature engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Used Python (NumPy, Scipy, Pandas, Scikit-Learn, Seaborn), and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Utilized spark, Scala, Hadoop , HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Implemented, tuned, and tested the model on AWS EC2 to get the best algorithm and parameters.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Designed and developed machine learning models in Apache - Spark (MLlib) .
  • Used NLTK in Python for developing various machine learning algorithms.
  • Implemented deep learning algorithms such as Artificial Neural network ( ANN ) and Recurrent Neural Network ( RNN ), tuned hyper-parameter and improved models with Python packages TensorFlow.
  • Installed and used Caffe Deep Learning Framework.
  • Modified selected machine learning models with real-time data in in Spark (PySpark).
  • Worked with architect to improve cloud Hadoop architecture as needed for Research.
  • Worked on different formats such as JSON, XML and performed machine learning algorithms in Python .
  • Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Worked very close with Data Architects and DBA team to implement data model changes in the database in all environments.
  • Used Pandas library for statistical Analysis.
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: Python 3.2/2.7, hive, oozie, Tableau, Informatica 9.0, HTML5, CSS, XML, MySQL, MS SQL Server 2008/2012, JavaScript, AWS, S3, EC2, Linux, Jupyter Notebook, RNN, ANN, Spark, Hadoop.

Confidential, Boston, MA

Data Analyst

Responsibilities:

  • Investigated market sizing, competitive analysis and positioning for product feasibility.
  • Conducted research on development and designing of sample methodologies, and analyzed data for pricing of client's products.
  • Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from SQL server database.
  • Worked on Business forecasting, segmentation analysis and Data mining.
  • Developed Machine Learning algorithm to diagnose blood loss.
  • Generated graphs and reports using ggplot2 package in R-Studio for analytical models.
  • Developed and implemented R and Shiny application which showcases machine learning for business forecasting.
  • Developed predictive models using Decision Tree, Random Forest and Naïve Bayes.
  • Performed time series analysis using Tableau.
  • Developed various workbooks in Tableau from multiple data sources.
  • Created dashboards and visualizations using Tableau desktop.
  • Later used Alteryx to blend the data.
  • Performed analysis using JMP.
  • Perform validation on machine learning output from R.
  • Written connectors to extract data from databases.

Environment: R, Python 2.x, Excel 2010, Machine Learning, Tableau, Quick View, JMP, Segmentation analysis

Confidential

Data Analyst

Responsibilities:

  • Used DDL and DML for writing triggers, stored procedures, and data manipulation.
  • Interacted with Team and Analysis, Design and Develop database using ER Diagram, involved in Design, Development and testing of the system
  • Developed SQL Server Stored Procedures, Tuned SQL Queries (using Indexes)
  • Created Views to facilitate easy user interface implementation and Triggers on them to facilitate consistent data entry into the database.
  • Implemented Exceptional Handling.
  • Worked on client requirement and wrote Complex SQL Queries to generate Crystal Reports.
  • Created different Data sources and Datasets for the reports.
  • Tuned and Optimized SQL Queries using Execution Plan and Profiler.
  • Rebuilding Indexes and Tables as part of Performance Tuning Exercise.
  • Involved in performing database Backup and Recovery.
  • Documented end user requirements for SSRS Reports and database design.

Environment: Python 2.7, Tableau, R, Windows XP, UNIX, HTML, SQL server 2005

Hire Now