Data Scientist Resume
5.00/5 (Submit Your Rating)
Plano, TX
SUMMARY
- A career minded professional with 8+ years of IT experience includes inDataScience (Machine Learning, Deep Learning, Text Mining), Data/Business Analytics, Data Visualization, Data Warehousing, Data Governance & Operations.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning,Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions dat scales across massive volume of structured and unstructured data.
- Equipped with experience in utilizing statistical techniques which include Correlation, Hypotheses modelling, Inferential Statistics as well as data mining and modelling techniques using Linear and Logistic regression, clustering, decision trees, and k - mean clustering.
- Documenting new data to help source to target mapping. Also updating teh documentation for existing data assisting with data profiling to maintain data sanitation, validation.
- Expertise in implementing scalable Statistical & Predictive Decision Science Models using Machine Learning platforms like R & Python Data Science Packages (Scikit-Learn, Pandas, NumPy, SparkR & Spark MLib).
- Identifies what data is available and relevant, including internal and external data sources, leveraging new data collection processes such geo-location information.
- Experienced in architecting, managing and deploying Big Data clusters ranging from 10-100 nodes in Cloud like Azure & Amazon as well as in production environments using Cloudera Manager.
- Proficient in research of current process and emerging technologies which need analytic models, data inputs and output, analytic metrics and user interface needs.
- Expertise in building Supervised and Unsupervised Machine Learning experiments using Azure & Amazon utilizing multiple algorithms to perform detailed predictive analytics and building Web Services models for all types of data: continuous, nominal, and ordinal.
- Understanding on Hadoop MapReduce & Amazon EMR or any big data frameworks.
- Mitigated risk factors through careful analysis of financial and statistical data. Transformed and processed raw data for further analysis, visualization, and modelling.
- Team builder with excellent communications, time & resource management & continuous client relationship development skills.
TECHNICAL SKILLS
- R Programming & R Studio
- Python
- SQL, PL/SQL
- Bash
- Azure & Amazon Machine Learning
- Talend, Tableau, Gretl, Matlab
- Notepad++
- Jupyter, Spyder
- Google, Elastic Search Analytics
- HDFS, Hive, Spark
- Microsoft Visual Studio.Net 2010, SQL Server 2008
- Machine Learning
- Regression
- K-Means Clustering
- Data Mining & Cleaning
- Statistics
- Decision Trees
- Random Forest
- Oracle
- SQL/MySQL
- Cassandra
- MongoDB
- Data Analysis
- Business Analysis & Monitoring
- Project Scheduling
- Deployment
- Operationalization
PROFESSIONAL EXPERIENCE
Data Scientist
Confidential, Plano, TX
Responsibilities:
- Designed applications of Machine learning, Statistical Analysis and Data visualizations with challenging large data processing problems.
- Involved writing teh mapping specifications for converting teh legacy building and warehouse datasets.
- Worked with various databases like Oracle, SQL and performed teh computations, log transformations, feature engineering, and Data exploration to identify teh insights and conclusions from complex data using R- programming in R-studio.
- Implemented predictive models using machine learning algorithms linear regression and linear boosting algorithms and performed in- depth analysis on teh structure of models, compared teh performance of all teh models and found tree boosting is teh best for teh prediction.
- Applied concepts of R-squared, R.M.S.E, P-value in teh evaluation stage to extract interesting findings through comparisons.
- Performed in-depth statistical analysis and data mining methods using R, including Cluster analysis, Logistic Regression, and boosting models.
- Proficient in teh entire CRISP-DM life cycle and actively involved in all teh phases of project life cycle including data acquisition, data cleaning, data engineering.
- Extensively used Azure Machine Learning to set up teh experiments and creating Web services for teh predictive analytics.
- Integrated SAS datasets into Excel using Dynamic Data Exchange, using SAS to analyze data, statistical tables, listings and graphs for reports.
- Performed feature scaling, feature engineering and statistical modeling.
- Worked on writing complex SQL queries in performing Data analysis using window functions, joins, improving performance by creating partitioned tables.
- Created indexes for various statistical parameters onElasticsearchand generated visualization using Kibana.
- Developed SQL procedures to synchronize teh dynamic data generated from GTID systems with teh Azure SQL Server.
- Prepared multiple dashboards using Tableau to reflect teh data behavior over period of time Analyzed and worked with all aspects of regression models (OLS etc.)
- Responsible for working with stakeholders to troubleshoot issues, communicate to team members, leadership and stakeholders on findings to ensure models are well understood and optimized.
Sr Data Analyst:
Confidential, Union, NJ
Responsibilities:
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn in Python for developing various machine learning algorithms.
- Experience with NoSQL databases, such as MongoDB, Cassandra, HBase and Utilized SQL, NoSQL databases, Python programing and API interaction.
- Experience using ETL and data visualization tools.
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, Naive Bayes.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so dat we would be able to assign each document a response label for further classification.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Deploy configurations of teh Cloudera Distribution of Hadoop from both teh command line and Cloudera Manager.
- Cluster management and maintenance using a variety of tools including Cloudera Manager.
- Apply quality assurance best practices for predictive modeling/analytics services.
- Involved in defining teh source to target data mappings, business rules, and data definitions.
- Performed automation engineer tasks and implemented teh ELK stack (Elasticsearch, Kibana) for AWS EC2 hosts.
- Defining teh list codes and code conversions between teh source systems and thedatamart using ReferenceDataManagement (RDM).
- Designed Cloud Architecture documents and developed cloud infrastructure for a compute environment utilizing Amazon Web Services (Hands on experience designing and deployingAWS services: VPCs, EC2s, S3, RedShift, MySQL DBs, Snowball, ELB, Auto-scaling, IAM policies/roles and Security and Networking Services on Linux and Windows OS).
- Performing an end to end Informatica ETL Testing for these custom tables by writing complex SQL Queries on teh source database and comparing teh results against teh target database.
- Extracting teh sourcedatafrom Oracle tables, MS SQL Server, sequential files and excel sheets.
- Developing and maintainingDataDictionary to create metadata reports for technical and business purpose.
- Predictive modeling using state-of-teh-art methods.
- Build and maintain dashboard and reporting based on teh statistical models to identify and track key metrics and risk indicators.
- Parse and manipulate raw, complexdatastreams to prepare for loading into an analytical tool.
- Broad knowledge of programming, and scripting (especially in R / Python).
- Troubleshot and resolved bugs in .NET applications to ensure optimal development environment.
- Involved in developing Patches & Updates Module.
Data Analyst
Confidential
Responsibilities:
- Responsible for design and development of R/Python programs to prepare transform and harmonizedatasets in preparation for modeling.
- Developed largedatasets from structured and unstructureddata. Performdatamining.
- Partnered with modelers to developdataframe requirements for projects.
- Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
- Tracked various campaigns, generating customer profiling analysis anddatamanipulation.
- Provided R/SQL programming, with detailed direction, in teh execution ofdataanalysis dat contributed to teh final project deliverables. Responsible fordatamining.
- Analyzed large datasets to answer business questions by generating reports and outcome.
- Worked in a team of programmers anddataanalysts to develop insightful deliverables dat support data-driven marketing strategies.
- Involved in fixing bugs and minor enhancements for teh front-end modules.
- Implemented Microsoft Visio and Rational Rose for designing teh Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of teh application.
- Maintenance in teh testing team for System testing/Integration/UAT.
- Guaranteeing quality in teh deliverables.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Was a part of teh complete life cycle of teh project from teh requirements to teh production support.
- Implemented teh project in Linux environment.
- Involved in loadingdatafrom RDBMS and web logs into HDFS.
- Worked on loading thedatafrom MySQL to HBase where necessary using Sqoop.
- Developed Hive queries for Analysis across different banners.
- Extracteddatafrom Twitter using Java and Twitter API. Parsed JSON formatted twitterdataand uploaded to database.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Exported teh result set from Hive to MySQL using Sqoop after processing thedata.
- Performed performance improvement of teh existingDatawarehouse applications to increase efficiency of teh existing system.
Jr Data Analyst
Confidential
Responsibilities:
- Performed data analysis and data profiling using complex SQL on various sources systems including Teradata, SQL Server.
- Developed teh logical data models and physical data models dat confine existing condition/potential status data fundamentals and data flows using ER Studio.
- Involved in analysis of Business requirement, Design and Development of High level and Low-level designs, Unit and Integration testing.
- Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, DataStandards,DataGovernance program, Scripts, Stored Procedures, triggers and execution of test plans.
- Reviewed and implemented teh naming standards for teh entities, attributes, alternate keys, and primary keys for teh logical model.
- Web crawling and text mining techniques to score referral domains, generate keyword taxonomies, and assess commercial value of bid keywords.
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
- Data analysis may encompass a range of analytical capabilities including aggregations, statistical analysis, regression, time-series analysis, clustering, network analysis and other statistical modeling techniques.
- Creation of multimillion bid keyword lists using extensive web crawling. Identification of metrics to measure teh quality of each list (yield or coverage, volume, and keyword average financial value).
- Processing, cleansing, and verifying teh integrity of data used for analysis.
- Enterprise Metadata Library with any changes or updates.
- Documentdataquality and traceability documents for each source interface.
- Worked on DTS Packages, DTS Import/Export for transferringdatabetween SQL Server .
- Generate weekly and monthly asset inventory reports.