Data Scientist Resume
Plano, TX
SUMMARY:
- Highly efficient Data Scientist with 11 + years of experience in IT and 5 + in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Statistical modeling, NLP, Text Mining, Predictive modeling, Data Visualization.
- Adept in statistical programming languages like R and Python including Machine learning.
- Excellent hands on experience in Statistical procedures and Machine Learning algorithms such as ANOVA, Clustering and Regression Analysis to analyze data for further Model Building.
- Experience in Data Science QA.
- Excellent hands on experience in Data Science deployment tool H2O and Driverless AI.
- Extensive experience on usage of ETL & Reporting tools like SQL Server.
- Experienced in data mining & loading and analyzing unstructured data - XML, JSON, flat file formats into Hadoop.
- Expertise in Excel Macros, Pivot Tables, vlookups and other advanced functions and expertise R user with knowledge of statistical programming languages SAS.
- Excellent hands on experience in Hadoop eco-system like HDFS, PySpark and Hive.
- Hands on experience in Natural Language Processing (NLP) like Sentiment Analysis.
- Hands on experience in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering.
- Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.
TECHNICAL SKILLS:
Data Analytics Tools/Programming: Python (numpy, scipy, pandas, Scikit-learn, NLTK), R-language, Microsoft SQL Server, Oracle PLSQL, SQL, PL/SQL, T-SQL, UNIX shell scripting, Java, H2O, Driverless AI, SAS and Tableau.
Big Data Tools: Hadoop, SQOOP, Pig, Hive, PySpark, and PUTTY.
Machine Learning Algorithms: Classifications, Regression, Clustering, and NLP- Sentiment Analysis.
Databases: DB 2, Oracle and SQL Server
Other Tools: Agile, JIRA, Confluence, MS-Office suite (Word, Excel, MS Project and Outlook), Spark, NLP and Azure.
PROFESSIONAL EXPERIENCE:
Confidential, Plano, Tx
Data Scientist
Responsibilities:
- Developed analytics solutions based on Machine Learning/Hadoop platform and demonstrated creative problem-solving approach and strong analytical skills.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Responsible for Data preparation using manipulation and wrangling techniques.
Environment: Python, R, Machine Learning, Hadoop, PySpark, Spark, Hive, Tableau, Excel, SAS, Oracle, SQL, Flat Files, Parquet files, XML, Tableau, H2O and Driverless AI.
Confidential, Columbus, IN
Data Scientist
Responsibilities:
- Developed analytics solutions based on Machine Learning platform and demonstrated creative problem-solving approach and strong analytical skills.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Creating various B2B Predictive and descriptive analytics using Python and Tableau.
- Data Story teller, Mining Data from different Data Source such as SQL Server, Oracle, Web Analytics, Business Object and Hadoop. Provided AD hoc analysis and reports to executive level management team.
- Utilized variety of machine learning methods including classifications, regression etc.
- Analyzed data and predicted Engines performance by applying machine learning algorithms using Python and ML.
- Used Spark for test data analytics using PUTTY and Analyzed the performance to identify bottlenecks.
- Involved in analysis, development, testing, implementation and deployment.
Environment: Python, R, Machine Learning, Hadoop, PySpark, Spark MLLib, Tableau, SQL, Excel, Oracle, SQL Server, PL/SQL, Flat Files, XML, and Tableau.
Confidential, San Jose CA
Data Scientist
Responsibilities:
- Implemented public segmentation using unsupervised machine learning algorithms by implementing k-means algorithm using R.
- Explored and Extracted data from source XML in HDFS, preparing data for exploratory analysis using data munging.
- Used R and python for Exploratory Data Analysis, Unit and functional testing, ANOVA test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in R.
- Worked on Linux shell scripts for business process and loading data from different interfaces to HDFS.
- Created Dashboards, reports, and analytical data matrix for reporting using Data Visualization tool like Tableau.
- Worked with different sources such as Oracle, SQL Server2016 and Excel, Flat, and Parquet and pickle file format.
- Performed K-means clustering, Multivariate analysis and Support Vector Machines in R.
- Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees for estimating the risks of welfare dependency.
Environment: Python, PL/SQL, Tableau, JSON, HADOOP (HDFS), MapReduce, SQL Server, NLP, Spark, Azure, R Studio and HIVE.
Confidential
Data Scientist
Responsibilities:
- Gathering the requirements by interacting heavily with the business users, multiple technical teams to design and develop the workflows for the new functional piece.
- Collaborated with various business stakeholders to create Business Requirement Document (BRD), translated gathered high-level requirements into a Functional Requirement Document (FRD) to assist implementation side SMEs and developers, along with data flow diagrams, user stories and use cases.
- Collaborate with provider practices on innovative programs that improve affordability and health.
- Translate business needs into practical applications and solutions.
- Performed data mining and text mining in Python with NLP process.
- Worked on customer churn using predictive modelling (Logistic Reg).
- Experience in managing the design, development, and continuous improvement of complex healthcare data systems, tools, and reports.
Environment: Python - (Pandas, Numpy, NLTK, Scikit Learn), SQL and Excel, R-Programing, NLP, Tableau, and Machine-learning.
Confidential, Plano TX
Data Analyst/ Scientist
Responsibilities:
- Collaborated with various business stakeholders to create Business Requirement Document (BRD), translated gathered high-level requirements into a Functional Requirement Document (FRD) to assist implementation side SMEs and developers, along with data flow diagrams, user stories and use cases
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Responsible for Data preparation using manipulation and wrangling techniques.
- Performed in depth analysis in data & prepared weekly, biweekly, monthly reports by using Tableau
- Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Good Understanding of advanced statistical modeling and logical modeling using SAS
- Part of a Scrum Agile team
- Experience in SQL joins, sub queries, tracing and performance tuning for better running of queries
- Extensively used joins and sub queries for complex queries involving multiple tables from different databases
Environment: R, SAS, Machine Learning, DB2, Cobol, SQL and Tableau.
Confidential, Plano, TX
Sr Application Development Analyst
Responsibilities:
- Data analysis and reporting using MY SQL, MS Power Point, MS Access and SQL assistant.
- Involved in MY SQL, MS Power Point, MS Access Database, Data Cleansing, Data Scrubbing and Data Migration.
- Involved in data retrieving from database through SQL as per business requirements.
- Manipulation of Data using BASE SAS Programming.
Environment: DB2, SQL, MS Power Point, MS Access and MS Power Point.