Data Scientist Resume
San Jose, CA
PROFESSIONAL SUMMARY:
- Over 7+ years of IT experience in the field of Big Data, Machine Learning, Statistical Modeling, Predictive Modeling, Data Analytics, Data Modeling, Data Mining, Text Mining and Business Intelligence.
- 5 years of extensive experience as a Data Scientist and Data Analyst with extensive experience in Data Mining, Statistical Data Analysis, Exploratory Data Analysis, Predictive Modeling and Machine Learning.
- Expert knowledge in Big Data Ecosystem: Apache Hadoop, Spark, HDFS Architecture, HBase, Hive, Pig, MLlib, ETL, Sqoop.
- Developed several scripts on Apache Zeppelin, Jupyter Notebooks etc. to better solve and debug code to improve functionality and concise coding for readability and reliability.
- Good experience implementing and handling end - to - end data science products.
- Hands on experience of Data Science libraries in Python such as Pandas, NumPy, SciPy, scikit-learn, matplotlib, memory profiler, etc.
- Proficient in Hadoop, HDFS, MapReduce, Spark and NoSQL databases like Cassandra and expertise in applying data mining techniques and optimization techniques in B2B and B2C industries.
- Experience with Machine learning algorithms like K Means Clustering, Liner Regression, Logistic Regression, KNN (K- Nearest Neighbors), etc.
- Expert knowledge in breath of machine learning algorithms and find the best approach to a specific problem.
- Implemented several supervised and unsupervised learning algorithms.
- Collaborated with engineers to deploy successful models and algorithms into production environments.
- Very good experience working with large datasets and deep learning algorithms using apache spark.
- Strong with ETL, Data mining and Data warehousing and Data Store concepts.
- Strong experience in the Analysis, design, development, testing and Implementation of Business Intelligence solutions using Data Warehouse/Data Mart Design, ETL, BI, Client/Server applications.
- Developed and optimized dashboards in Tableau to identify trends and opportunities, surface actionable insights and help teams set goals, forecast and prioritization of initiatives.
- Strong expertise in using Tableau software as applied to BI data analytics, reporting and dashboard projects.
- Experience in Data Migrations, Data Cleaning, Transformation, Integration, Data Imports and Data Exports.
- Implemented projects in various project life cycle and SDLC methodologies - Agile, Waterfall
- Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in writing/documenting Technical Design Document.
- Proficient in version control tools like GIT, GitHub
- Excellent team player and self-starter possess excellent communication skills.
TECHNICAL SKILLS:
Big Data Ecosystem: Apache Hadoop ecosystem, Apache Spark, HDFS, MapReduce, Apache Kafka, Hive, Pig, ETL, Storm, Sqoop
Libraries: Python (NumPy, Panda, Scikit-learn, SciPy), Matplotlib, Spark ML, Spark MLlib
Databases: MySQL, PostgreSQL, SQL Server, Teradata
NoSQL: MongoDB
BI and Visualization: Tableau, Power BI, RShiny
IDE: Jupyter, Zeppelin, PyCharm, IntelliJ, Eclipse, Atom.
Tools: and Utilities:DBVisualiser, MS Office (Excel, Access, Word, PowerPoint, Mail Project), JIRA, SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Microsoft Management Console, Microsoft Office, Excel Data Explorer, Anaconda
PROFESSIONAL EXPERIENCE:
Confidential, San Jose, CA
Data Scientist
Responsibilities:
- Involved with Data Analysis primarily identifying data sets, source data, source meta data, data definitions and data formats
- Analyzed and found patterns and insights within structured and unstructured data using Machine Learning algorithms.
- Responsible for design and development of Python programs/scripts to prepare transform and harmonize data sets in preparation for modeling.
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Validated existing data models and cross-verified results on statistical terms, such as - Accuracy, RMSE, ROC, etc.
- Extensive usage of Python libraries, Pandas, NumPy, SciKit Learn, Matplotlib, MLlib, etc.
- Used Pyspark for data transformations and feature engineering
- Built predictive models including Support Vector Machine, Decision tree, Naive Bayes Classifier, Neural Network plus ensemble methods of the models to evaluate the likelihood of Customers being open to additional purchase opportunities.
- Time series algorithm development to forecast buyer-supplier costs using new sources of insights and previous costs history.
- Identify key processes within Confidential 's Supply chain which can be improved significantly using advanced analytics / data science and thereby strive for continuous improvement.
- Implemented training process using cross-validation and test sets, evaluated the result based on different performance matrices and collected feedback and retrained the model to improve the performance.
- Productionalized prototype Machine Learning models written in Python on Servers by building end-to-end cycle.
- Built visualization dashboards in Tableau for executives to gauge the performance of their verticals and take necessary actions to drive a change in respective units for the logistics.
Confidential, Deerfield, IL
Data Scientist
Responsibilities:
- Responsible for quantitative analysis of structured, semi-structured, and unstructured data working in small teams to develop, test, and harden advanced analytical models as required.
- Responsible for design and development of Python programs to prepare transform and harmonize data sets in preparation for modeling.
- Led hypothesis development with domain knowledge from statistics experience, consulting engineers.
- Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Wrote simple and advanced scripts to create standard and adhoc reports for senior managers.
- Collaborate the data mapping document from source to target and the data quality assessments for the source data.
- Value of Confidential Loyalty Program - measured the value of the loyalty program, presented findings to senior leadership, and offered strategic recommendations for driving greater incremental value.
- Uplift Modeling via Propensity Score Matching - Measured sales uplift and customer value, in situations with no control group, by developing propensity score algorithm for large-scale data processing.
- Customer Segmentation - built segments from customer data used for strategic analysis and marketing activations by internal business partners. Worked cross-functionally across IT and business teams to adopt new customer segmentations.
- Participated in Business meetings to understand the business needs & requirements.
- Collaborated with business leaders cross-functionally to understand company needs and develop possible solutions using techniques like mind-mapping and brainstorming.
Confidential, Memphis, TN
Data Scientist
Responsibilities:
- Performed Data Profiling to learn about behavior with various features such as caller ID, traffic pattern, location, number validity.
- Application of various machine learning algorithms like decision trees, regression models, neural networks, SVM, clustering to identify fraudulent profiles using scikit-learn package in python.
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
- Used Principal Component Analysis in feature engineering to analyze high dimensional data.
- Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
- Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify Scammer, Telemarketer.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database.
- Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
- Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created several types of data visualizations using Python and Tableau.
- Extracting meaning from huge volumes of data to help improve decision making and to provide business intelligence through data driven solutions.
- Work closely with other analysts, data engineers to develop data infrastructure (data pipelines, reports, dashboards etc.) and other tools to make analytics more effective
Confidential
Data Analyst/Data Modeler
Responsibilities:
- Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.
- Generated cost-benefit analysis to quantify the model implementation comparing with the former situation.
- Conducted model optimization and comparison using stepwise function.
- Wrote and optimized SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle, validating the ETL processed data in target database.
- Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency.
- Identified the variables that significantly affect the target
- Continuously collected business requirements during the whole project life cycle.
- Generated data analysis reports using Matplotlib, Tableau, successfully delivered and presented the results for C-level decision makers.
Confidential
Data Analyst
Responsibilities:
- Data analysis and reporting using My SQL, MS Power Point, MS Access and SQL assistant.
- Involved in My SQL, MS Power Point, MS Access Database design and design new database on Netezza which will have optimized outcome.
- Involved in writing complex SQL scripts, working on Data Cleansing, Data Scrubbing and Data Migration.
- Involved in writing scripts for loading data to target data Warehouse using Fast Load, Multiload
- Created ETL scripts using Regular Expressions and custom tools (Informatica, Pentaho, and Sync Sort) to ETL data.
- Worked on Data modeling using Dimensional Data Modeling
- Generated reports from Oracle database and analyzed the reports for Oracle wait events, time consuming SQL queries, table space growth, and database growth.
