Data Scientist/data Analyst Resume
Minneapolis, MN
SUMMARY
- 6 years of experience as Data Scientist and Data Analyst and good experience in domains like Banking, Finance, Insurance and Retail.
- Experience in Statistical Analysis, Data Mining and Machine Learning using R, Python and SQL
- Professional experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, K - Means Clustering and Association Rules
- Expertise in manipulating large data sets from multiple sources (SQL, Hadoop)
- Experience in analyzing the data using Hive-QL and Spark, Spark SQL.
- Hands on experience in importing, cleaning, transforming, and validating data and making conclusions from the data for decision-making purposes
- Categorizing content, implement in Python with Scikit-learn/Sklearn, and with classification algorithms (Support Vector Machine (SVM), K-nearest neighbours (KNN), logistic regression).
- In-depth knowledge of Machine Learning and NLP algorithms and techniques
- Working knowledge ofSpark and Spark SQL
- Good understanding and knowledge of No-SQL databases like HBase and Cassandra.
- Extensive hands-on experience and high proficiency with structured, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools.
- Actively used Agile methodology to develop and complete the project within the sprint time.
- Strong experience in Data Migration and conversion projects.
- Strong experience in RDBMS like DB2
- Hands on Experience in building applications using build tools (Maven and Gradle).
- Experience of using HDFS, AMBARI, SPARK, Horton Works Data Platform (HDP) to migrate code base from R to Python using SparkML
- Expertise in working with the AGILE Methodology using Rally devSCRUM Methodology using scrum works pro tool.
- Well versed in system analysis, ER/Dimensional Modeling, Database design and implementing RDBMS specific features.
- Experience with different RDBMS like Oracle 9i/10g/11g, SQL Server 2005/2008, MySQL
- Extensive involvement being developed of T-SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business rationale execution
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms
- Develop dashboards adhering to the visualization best practices and communicate a story
- Competent communicator and confident presenter in reporting analytical findings to members of senior management
- Responsible in development and design the Java/J2EE and Big Data applications. Involved in the testing for applications.
TECHNICAL SKILLS
Scripts: Python, UNIX Shell, WINDOWS batch, awk, sed, HTML, XML.
Database: PostgreSQL, DB2 (SQL), HIVE, IBM-IMS DB/DC, UNIX-DB, Cassandra
Operating Systems: WINDOWS 95/98/NT/XP, UNIX, LINUX, MS-DOS, MVS-ES.
Software Languages: Java, C, C++, Fortran, COBOL, PL-I, Assembly Language, MIPS, Lisp.
Platforms: Spark, Sun Solaris, Sparc, IBM PowerPC, INTEL 80X86, Pentium-X, IBM Mainframes.
Developer Environment: Visual Studio, Eclipse.
Big data technologies: Hadoop, Spark, Sqoop, Hive, Pig, HBase
PROFESSIONAL EXPERIENCE
Confidential, Minneapolis, MN
Data Scientist/Data Analyst
Responsibilities:
- Gathered and analyzed business requirements, interacted with various business users, project leaders, developers and took part in identifying different data sources
- Compiled data from multiple data sources, used SQL and Python packages for data extraction, loading and transformation
- Write codes in Python, Scala, Spark, Hive to implement the product
- Performed Data Cleaning, features scaling, features engineering,feature prioritization using pandas and NumPy packages in Python
- Build predictive models and machine-learning algorithms
- Developed a mixed effects linear regression model to predict student MAP test scores based on historical data, used lme4 package to predict student scores
- Performed exploratory data analysis (EDA), summarized descriptive statistics
- Handled anomalies in the data - removing duplicates, imputing missing values and treating null values using Python Scikit- learn
- Participated in implementation of Hadoop platform and Big data technologies: Spark, Hive, Pig, and HBase on Amazon Web Services S3, Redshift and Athena use cases.
- Helped in configuring Cassandra database
- Visualized the data with the help of box plots and scatter plots to understand the distribution of data using Tableau, Python libraries
- Developed a binary classification model to predict the risk of school dropouts with an accuracy of 89% using two class boosted decision trees
- Collaborate with data scientist to prototype predictive models for converting data to insights
- Involved in creating charts and graphs of the data from different data sources by using Matplotlib and SciPy libraries in python
Environment: Python 2.x/3.x, R Programming, SQL (Structured Query Language), Anaconda 3.x, Jupyter Notebooks, R Studio, Tableau Desktop, SQL Server 2012, Azure ML Studio, Jira, Git, Microsoft Excel
Confidential, Dallas, TX
Data Scientist / Data Analyst
Responsibilities:
- Converted data into insights by predicting and modelling future outcomes
- Utilizing MS SQL, Tableau, and other dashboard tools for data intelligence and analysis
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
- Developed Python scripts for cleaning and pre-processing the client data for consumption by data pipeline, leading to decreased effort in data cleaning from 5 days to less than 5 hours.
- Analyze customer data in Python using Matplotlib, Seaborn to track correlations in customer behaviour, define user segments to implement process and product improvements.
- Involved in coding and debugging to solve business problems.
- Implemented custom built machine learning application in Scala
- Prepared reports that interpret customer behaviour, market opportunities and conditions, market results, trends, and investment level.
- Solved client's analytics problems and effectively communicating results and methodologies
- Installed and configured Hive and also written Hive UDFs.
- Experience with NoSQL databases (HBase, Accumulo, Cassandra), in-memory databases (Redis, GridGain, Ignite), batch and streaming data processing (Spark, MapReduce, Kafka, Kinesis) and cloud services (AWS)
- Extracted data by writing complex SQL queries, created meaningful data visualizations, dashboard using Tableau BI to improve user engagement rate
- Created Heat Map showing current customers by colour that were broken into regions allowing business user to understand where we have most users vs. least users using Tableau
- Used Log4J to trace the flow of the application and logging, debugging the application.
- Extensively followed Agile principles like Continuous Integration, Pair Programming and Test Driven Development.
- Blended data from multiple databases into one report by selecting primary key from each database for data validation
Environment: Python 2.x/3.x, SQL (Structured Query Language), Java Script, Tableau Desktop, SQL Server MS PowerPoint, MS Excel, Anaconda, Jupyter Notebooks
Confidential
Data Analyst
Responsibilities:
- Performed ETL data extraction, transformation, cleansing and preparation of analytical datasets from structured formats such as RDBMS tables, Excel, CSV files
- Analysed customer demographics, credit card transaction data to reveal factors influencing customer churn rate
- Performed extensive prototyping using Mock-Up Screens and Tableau, Mock-Up Screens
- Performed financial analysis to determine the credit worthiness of clients; shortened the pre-approval process and accelerated the application submission to banking institutions
- Developed the database model and wrote the application from scratch
- Developed segmentation, propensity, and look-alike modelling of individuals across known, identified and anonymous audiences
- Evaluate Provider level data to assess provider data quality, profiling the data and building data quality reports using DQ analyzer, SQL server, MS Excel.
- Reviewed Test Plans that contains test scripts, test cases, test data and expected results for the User Acceptance testing.
- Investigate and conduct studies on the forecasts and demand, and capital of products
- Strong knowledge of statistical methods (regression, hypothesis testing, randomized experiment), data structures and data infrastructure.
- Performed A/B test analysis, simplified navigation in bank website with 70% improvement in conversions
Environment: Java, Microsoft Excel, TFS, Eclipse IDE, Optimizer for A/B Testing, UI Prototype, Proto Fluid for UI Testing, SQL Server, HTML/CSS
Confidential
Data Analyst
Responsibilities:
- Involved in technical design, development and end-to-end testing (Regression, System, UAT) for 5 different projects adopting Agile standards achieving 90% quality compliant and on-time product delivery
- Analysed customer data and created reports using Excel features pivot tables, VLookup, Charts Performed database testing validating database tables, stored procedures and triggers
- Heavily used the JSP'S and HTML for designing the screens.
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
- Created and modified database triggers, stored procedures, or complex analytical queries including multi-table joins, nested queries, and correlated sub queries optimized the performance
- Deployed application on JBOSS Application Server to get efficient performance.
- Extensively used SQL function such as SUM, COUNT, CASE Statements for creating, transformation logic for data mapping
- Managed the planning and development of design and procedures for metric report.
- Optimized data collection and procedure to generate reports weekly, monthly and quarterly
- Developed the application using Agile methodology and planned the scrum meetings.
- Involved in exhaustive documentation for technical phase of the project and training materials for all data management functions
- Led offshore efforts for code debugging and testing of 100+ critical/major functional tickets in an enhancement project
- Conducted major stakeholder interviews involving SME’s, Business Analysts and other stakeholders
- Used various SQL commands like Create, Delete, Update, and Inner, Outer, Left, and Right Joins to update the database and retrieve data for data analysis and validation.
- Created action plans to track identified open issues and action items related to the project Prepared analytical and status reports and updated the project plan as required
Environment: PowerPoint, MS Visio, MS Excel, MySQL, T-SQL, Tableau, MS Access, VBA, PL-SQL, TOAD for Data Analysis