We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Omaha, NE

SUMMARY:

  • Over 8 years of strong experience in DataScience, MachineLearning, Datamining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Statastical modeling,Datamodeling, DataVisualization, WebCrawling, WebScraping.
  • Adept in statistical programming languages like R and Python, SAS, Apache Spark, Matlab including Big Data technologies like Hadoop, Hive, Pig.
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Skilled in performing dataparsing, datamanipulation and datapreparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycleincluding data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation and data visualization.
  • Adept and deep understanding of Statisticalmodeling, MultivariateAnalysis, model testing, problem analysis, model comparison and validation.
  • Good industry knowledge, analyticaland problem solving skills and ability to work well with in a team as well as an individual.
  • Expertise in transforming business requirements into analytical models, designingalgorithms, buildingmodels, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees,Linear and LogisticRegression, SVM, Clustering, neuralnetworks, PrincipleComponentAnalysis.
  • Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, SparkSql.
  • Good Knowledge in Proof of Concepts (PoC's), gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging.
  • Highly creative, innovative, committed, intellectually curious, business savvy with good communication and interpersonal skills.
  • Experience in using various packages in Rand python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.
  • Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.

TECHNICAL SKILLS:

Programming & Scripting Languages: R, C, C++, JAVA, JCL, COBOL, HTML, CSS, JSP, Java Script

Databases: SQL Server 2014/2012/2008/2005/2000, MS-Access, Oracle 12c/11g/10g/9i and Teradata, big data, Hadoop

Statistical Software: SPSS, R, SAS.

Web Packages: Google Analytics, Adobe Test & Target, Web Trends

Bigdata Ecosystem: HDFS, PIG, MapReduce, HIVE, SQOOP, FLUME, HBase, Storm, Kafka, Elastic Search, Redis, Flume, Storm, Kafka, Elastic Search, Redis, Flume, Scoop.

Statistical Methods: Time Series, regression models, splines, confidence intervals, principal component analysis and Dimensionality Reduction, bootstrapping

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

Cloud: AWS, S3, EC2.

Big Data / Grid Technologies: Cassandra, Coherence, Mongo DB, Zookeeper, Titan, Elasticsearch, Storm, Kafka, HadoopTools and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio.Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA,Spark MLlib.

PROFESSIONAL EXPERIENCE:

Confidential, Omaha, NE

Data Scientist

Responsibilities:

  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, time, Date and Time etc.
  • Evaluated models using CrossValidation, Logloss function, ROCcurves and used AUC for feature selection.
  • Collected data needs and requirements by Interacting with the other departments.
  • Used Principal Component Analysis in feature engineering to analyze high dimensional data.
  • Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Ensured that the model has low False Positive Rate.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
  • Used MLlib, Spark'sMachinelearning library to build and evaluate different models.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Application of various machine learning algorithms and statistical modeling like decisiontrees, regressionmodels, neuralnetworks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database.
  • Performed DataCleaning, featuresscaling, featuresengineering using pandas and numpy packages in python.
  • Implemented rule based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Analyze traffic patterns by calculating autocorrelation with different time lags.
  • Addressed overfitting by implementing of the algorithm regularization methods like L2 and L1.
  • Performed Multinomial Logistic Regression, Randomforest, DecisionTree, SVM to classify package is going to deliver on time for the new route.
  • Communicated the results with operations team for taking best decisions.

Environment: Impala, Linux, Spark, Tableau Desktop, Python 2.x, CDH5, HDFS, Hadoop 2.3, Hive, SQL Server 2012, Microsoft Excel.

Confidential, Chicago, IL

Data Scientist

Responsibilities:

  • Conducted campaigns and run real-time trials to determine what works fast and track the impact of different initiatives.
  • Implemented public segmentation using unsupervised machine learning algorithms by implementing k-means algorithm using Pyspark.
  • Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, LinearRegression, LogisticRegression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
  • Scheduled the task for weekly updates and running the model in workflow. Automated the entire process flow in generating the analysis and reports.
  • Used R and python for Exploratory Data Analysis, A/Btesting, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
  • Created various types of data visualizations using R, python and Tableau.
  • Created clusters to classify Control and test groups and conducted group campaigns. s Explored and Extracted data from source XML in HDFS, preparing data for exploratory analysis using data munging.
  • Identified and targeted welfare high-risk groups with Machinelearningalgorithms.

Environment: Pig, Hive, Linux, R 3.x, HDFS, Hadoop 2.3, SQL Server, Pypark, R-Studio, Tableau 10, Ms Excel.

Confidential, Santa Ana, CA

Data Scientist/Data Analyst

Responsibilities:

  • Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency
  • Worked on model selection based on confusion matrices, minimized the TypeII error
  • Generated data analysis reports using Matplotlib, Tableau, successfully delivered and presented the results for C-level decision makers
  • Worked on data cleaning and reshaping, generatedsegmented subsets using Numpy and Pandas in Python
  • Continuously collected business requirements during the whole projectlifecycle.
  • Generated cost-benefit analysis to quantify the model implementation comparing with the former situation.
  • Identified the variables that significantly affect the target
  • Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, GradientBoostingMachine to build predictive model using scikit-learn package in Python
  • Conducted model optimization and comparison using stepwise function based on AIC value
  • Wrote and optimized complex SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle 11g, validating the ETL processed data in target database

Environment: Numpy, Pandas, Tableau 7, Python 2.6.8, Matplotlib, Oracle 10g, SQL,Scikit-Learn, MongoDB

Confidential, Stamford, CT

Data Architect/Data Modeler

Responsibilities:

  • Develop Integrations jobs to transfer data from source system to Hadoop.
  • Installation of TalendStudio.
  • Technical design documents for Transformation processes.
  • Application of business rules on the data being transferred.
  • Task allocation for the ETL and Reporting team.
  • Communicate effectively with client and their internal development team to deliver product functionality r equirements.
  • Architecting and design of data warehouse ETL processes.
  • Demo of POC built for the prospective customer and provide guidance and gather the feedback to b ackend ETL testing on SQLServer 2008 using SSIS.
  • Create Integration Jobs to backup a copy of data in network file system.
  • Design and implement the ETL Data model and create staging, source and Target tables in SQL server database.
  • Gathering and analysis requirements definition meetings with business users and document meeting outcomes.

Environment: Hadoop, MS Office, Talend Studio, ETL, ODS, OLAP, SQL Server 2008.

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Implemented a job which leads an electronic medical record, extract data into OracleDatabase and generate an output. Analyze the data and provide the insights about the customers using Tableau.
  • Designed, implemented and automated modeling and analysis procedures on existing and experimentally created data.
  • Created dynamic linear models to perform trend analysis on customer transactional data in Python.
  • Increased pace & confidence of learning algorithm by combining state of the art technology and statistical methods.
  • Parseddata, producing concise conclusions from rawdata in a clean, well-structured and easily maintainable format. Developed clustering models for customer segmentation using Python.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Implemented the presentation layer with HTML, CSS and JavaScript.
  • Involved in writing stored procedures using Oracle.
  • Optimized the database queries to improve the performance.
  • Designed and developed data management system using Oracle..

Environment: Python 2.x, Tableau, Oracle, MySQL 5.x, ORACLE, HTML5, CSS3, JavaScript, Shell, Linux & Windows, Django.

Confidential

Data Analyst

Responsibilities:

  • Applied Business Objects best practices during development with a strong focus on reusability and better performance.
  • Developed and executed load scripts using Teradata client utilities MULTILOAD, FASTLOAD and BTEQ.
  • Responsible for development and testing of conversion programs for importing Data from text files into map Oracle Database utilizing PERL shell scripts &SQL*Loader.
  • Developed Tableauvisualizations and dashboards using Tableau Desktop.
  • Used Graphical Entity-Relationship Diagramming to create new database design via easy to use, graphical interface.
  • Formatting the data sets read into SAS by using Format statement in the data step as well as Proc Format.
  • Worked with the ETL team to document the Transformation Rules for DataMigration from OLTP to Warehouse Environment for reporting purposes.
  • Used GraphicalEntity-Relationship Diagramming to create new database design via easy to use, graphical interface.
  • Co-ordinate with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of financial data.
  • Responsible for development and testing of conversion programs for importing Data from text files into map OracleDatabase, utilizing, PERL, shellscripts&SQL*Loader.

Environment: Business Objects, Oracle SQL Developer, PL/SQL, MS SQL Server, TOAD, Tableau, Informatica, SQL*PLUS, SQL*LOADER, XML.

We'd love your feedback!