Data Scientist Resume
Memphis, TN
SUMMARY:
- Over 10 years of strong experience in Data Science, Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Statistical modeling, Data modeling, Data Visualization. Adept in statistical programming languages like R and Python, SAS, Apache Spark,PySpark,Mat lab including Big Data technologies like Hadoop, Hive, Pig, NoSQL.
- Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS, SAS Eminer, SPSS, MATLAB, Relational databases. Deep understanding & exposure of BigDataEco - system.
- Experienced on data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
- Deep analytics and understanding of Big Data and algorithms using Hadoop, MapReduce, NoSQL and distributed computing tools.
- Expertise in synthesizing Machine learning, Predictive Analytics and Big data technologies into integrated solutions.
- Identifying the Momentum of the stock by using Non-Parametric Statistical Test for Randomness and Different types of Moving Averages.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis on Insurance and Banking domains.
- Expertise in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K-fold cross validation and data visualization.
- Extensively used SQL, Numpy, Pandas, Scikit-learn,Spark, Hive for Data Analysis and Model building.
- Experience in using various packages in Rand python like pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, Rpy2.
- Experience in SAS/BASE, SAS/MACRO, SAS/STAT, SAS/GRAPH, SAS/ACCESS, SAS/ETS, and SAS/ODS, SAS/SQL,SAS/DI STUDIO, SAS Enterprise Guide, SAS/MC, SAS/WRS, and SAS/EMINERon Windows UNIX and LINUX environments.
- Experience in performing Ad Hoc Queries for various application and reports on a daily basis using complexSQL Queries. Strong command over Structured Query Language (SQL).
- Extensively worked on Hadoop, Hive,Spark, Cassandra to build ETL and Data Processing systems having various data sources, data targets and data formats.
- Experience in developing SAS Macros and application for data updates, data cleansing and reporting.Successfully increased portability of existing SAS programs and created new programs using SAS macrovariables to improve efficiency and consistency of result.
- Expertise indatabase programming (SQL, PLSQL) MSAccessOracle12c/11g/10g/9i, XML, DB2, Informix, Teradata,,Database tuning and Query optimization.
- Expertise in performingdataanalysis anddataprofiling using complex SQL on various sources systems including Oracle and Teradata.
- Expertise in loading data by using the Teradata loader connection, writing Teradata utilities scripts (Fastload, Multiload) and working with loader logs.
- Certifications in Base SAS and CURRENCY DERIVATIVS.
TECHNICAL SKILLS:
Data Analytics Tools/Programming: Python (numpy, scipy, pandas,Gensim, Keras), R ( Caret, Weka, ggplot), MATLAB, SAS, SAS BI, SAS Eminer, Microsoft SQL Server, Oracle PLSQL, Spark/Scala, PySpark.
Big Data Tools: Hadoop, HDFS, MapReduce, SQOOP, Pig, Hive, NOSQL, MongoDB, Spark, Scala, HBase.
ETL Tools: SAS D.I. Studio, Informatica.
Programming Languages: Java, Base SAS and SAS/SQL, SQL, T-SQL, HTML, Python, UNIX shells scripting, PL/SQL, R.
Database Tools: Microsoft SQL Server 2000/2008, Teradata, Oracle 12c/10g/9i, and MS Access.
Reporting Tools: SAS, SAS BI, and SPSS.
Operating Systems: Microsoft Windows 9x / NT / 2000/XP / Vista/7 and UNIX
Big Data: Hadoop, HDFS 2, Hive, Pig, HBase, Sqoop, Spark/Scala.
Other Tools: MS-Office suite (Word, Excel, Excel VBA, Project and Outlook), BTEQ, Teradata, SQL Assistant, Scala npl, Spark MLLib, SAS, SPSS, Cognos.
PROFESSIONAL EXPERIENCE:
Confidential, Memphis, TN
Data Scientist
Responsibilities:
- Retrieved the customers’ data from Teradata database in to SAS environment using SAS/Access and SQL Pass through facility.
- Followed business requirement document and communicated with the business user to know the type of reports to be developed and the data that has to be shown in the reports.
- Performed ETL Process using SASto extract data from warehouse/databases and Spark as well.
- Used SAS BASE for manipulating and modifying the data and for creating the datasets showing the customers data.
- Experienced in Base Procedures, Graph, Macros, Data Step, SQL and Data Mining, SAS Array Processing extracting internal/external data including cleaning and validation, use of formats and in formats. Involved extensively in Data Extraction, Transformation, Loading, and Analysis.
- Used Python, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine.
- Manipulated the data and modified the Code wherever necessary using SAS functions and Proc sql statements, formats, Informants using BASE SAS.
- Worked on analyzing thedatastatistically and prepared statistical reports on SAS and Python.
- DevelopedSASMacro, which reduced time of programming syntax and increased the productivity for whole data processing steps.
- Extract source data from Oracle to SAS data sets by using PROC SQL.
- Extract data, create new datasets from raw data files using Import techniques and modify existing datasets using SET, MERGE, SORT, UPDATE and conditional statements.
- Solid foundation in use of SAS macro, SAS/Connect, SAS/Access, SQL,
- Teradata relational table structures and FTP interfaces between platforms and the UNIX environment.
- Document and validate reports in compliance with regulations specified.
- Perform time series analysis, create regression models and generate reports using PROC REPORT, PROC TABULATE for consumer profiling, segmentation and targeting.
- Create reports in Excel spreadsheets using SAS/ODS and implementing them in PowerPoint presentation.
- Access flat files using SAS/ACCESS.
- Create reports in PDF and RTF format using SAS/ODS.
Environment: Base SAS 9.4, SAS E.G 6.1/7.1 SAS/Access, SAS/AF, SAS/Connect, SAS/Stat,SAS/Graph, SAS/SQL, SAS/ODS, SAS/Macros, SAS Enterprise Guide, SAS Data Integration Studio, Teradata,DB2, MS Excel, MS Access.
Confidential, New York
Data Scientist/SAS Developer
Responsibilities:
- Conducted research in the field of bioinformatics, utilized data science concepts and methodology including data and Immersed in an intensive data science program covering statistical analysis, machine learning, model design, and working with data at scale. Applying data science advanced techniques to solve real-world problems.
- Advanced coursework in machine learning, statistics, data engineering (Scala, Spark, Map Reduce), and Python for data science (SQL, NLP, MongoDB, Pandas, Scikit-Learn, Tableau, Matplotlib).
- Developed and Improved Existing reports in the, converted existing SAS code into high quality SAS Code to improve efficiency.
- Worked with SAS Administration Team for performance tuning and scheduling programs.
- Developed new jobs, stored processes, and web reports in the Grid server.
- Worked on Monitoring and Identifying SAS job failures following existing standard structures and provide a resolution plan for the job recovery process.
- Wrote programs in SAS to generate reports, created RTF, HTML listings, tables and reports using SAS/ODS for Ad-Hoc report generation.
- Developed SQL Queries to fetch complex data from different tables from remote databases.
- Effectively prepared and published various performance reports and presentations.
- Developed routine SAS macros to create target tables, graphs and listings.
- Extensively used DI transformations such as Append, Lookup, Sort, Splitter, Transpose, User Written Code,and SQL Join for creating, updating and merging various SAS datasets.
- Effectively worked with users to define business processes and information systems for supporting those processes.
- Developed and executed SAS SQL queries for merging, concatenating and updating large volumes of data.
- Used the SAS Macro facility to produce weekly and monthly reports.
- Successfully handled multi-projects/tasks at a time.
Environment: SAS/BASE,SAS/SQL, SAS/MACROS,SAS/GRAPH, SAS/DI Studio, SAS ACCESS, and SAS/CONNECT, UNIX, PL/SQL, TERADATA, SQL Server, DB2, Putty, Python, Pyspark, Spark MLLib, SQL, Excel, Excel VBA, Teradata, SQL Server 2012, SPSS.
Confidential
Data Scientist/SAS Developer
Responsibilities:
- Exported and modified all SAS Jobs from the Legacy system to Grid environment based on the business requirement.
- Exported Metadata from Legacy system to Grid environment then modified the code based on the new Libraries, Datasets and Metadata in the new Environment.
- Executed and validated Code and data in the Grid Server.
- Developed new jobs, stored processes, and web reports in the Grid server.
- Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and Mllib.
- Conducted research in the field of bioinformatics, utilized data science concepts and methodology including data and Immersed in an intensive data science program covering statistical analysis, machine learning, model design, and working with data at scale. Applying data science advanced techniques to solve real-world problems.
- Advanced coursework in machine learning, statistics, data engineering (Scala, Spark, Map Reduce), and Python for data science (SQL, NLP, MongoDB, Pandas, Scikit-Learn, Tableau, Matplotlib).
- Fraud Detection - Mixture of anomaly detection (to detect new types of fraud) and classification algorithms (learning from already available fraud data) - Lsanomaly, OneClassSVM, Kmeans (how far is the new data from the nearest centroid), Logistic Regression, Decision Tree, Random Forest, GBM, Scikit-Learn, Python,SparkSQL,SparkML.
- Identified areas of improvement in existing business by unearthing insights by analyzing vast amount ofdatausing machine-learning techniques.
- Interpret problems and provides solutions to business problems usingdataanalysis,datamining, optimization tools, and machine learning techniques and statistics.
- Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route and Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database.
- Datasources are extracted, transformed and loaded to generate CSVdatafiles withPythonprogramming and SQL queries.
Confidential
SAS Developer/Data Analyst
Responsibilities:
- Performance tuning of central SAS resources and programs to reduce process times and improve efficiency.
- Maintained metadata (data definitions of table structures) and version controlling for the data model.
- Worked on query optimization and performance tuning using SQL Profiler and performance monitoring.
- Utilized Erwin's forward/reverse engineering tools and target database schema conversion process.
- Worked on creating enterprise wide Model EDM for products and services in Teradata Environment based on the data from PDM. Conceived, designed, developed and implemented this model from the scratch.
- Developed models in Pairs trading / Statistical Arbitrage, Correlation and Co-integration based, Unilateral Pairs trading, Momentum pairs trading, Basket trading hedging with Market index, Risk Neutral Strategies, MA, ARIMA etc.
- Involved in the Project of Market Making using Bayesian Network, O.U. Process and Markov Chains Models.
- Write SQL queries to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
- Involved in extensive DATA validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Developed and executed load scripts usingTeradataclient utilities MULTILOAD, FASTLOAD and BTEQ.
- Exporting and importing the data between different platforms such asSAS, MS-Excel.
- Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
- Worked with the ETL team to document the Transformation Rules forDataMigration from OLTP to Warehouse Environment for reporting purposes.
- Created new jobs for ETL processing for Extraction, Transformation and Loading the data into target tables.
- Created SQL Queries to find dataquality issues and to identify keys, data anomalies, and data validation issues.
- Formatting the data sets read intoSASby using Format statement in the data step as well as Proc Format.
- AppliedBusinessObjectsbest practices during development with a strong focus on reusability and better performance.
- Designed different type of STARschemas for detailed data marts and plan data marts in the SAS OLAP environment.
Environment: SAS/Base, SAS Macros, SAS ODS, SAS/Graph, SAS Enterprise Guide, UNIX, SQL, MS Access, Excel, and Windows.
Confidential
Sr. Statistical Analyst
Responsibilities:
- Developed Multiple Regression Model to identify and quantify the losses for insurance company andpredicting the potential future default customers based on the historical loan data.
- Developed different predictive models using techniques like Decision Tree, Logistic Regression then apply the best model on the production database contains loss amount and policy related information.
- Extract the data from data sources, exploring the data and identify business objective and variable selection.
- Prepare the data (identify the missing values, imputing values like mean for interval variable, mode for categorical variable).
- Preparing the presentation and documentation of the project report.
- Worked with the Credit Card division handling user requests, modifying existing report and creating new reports based on the business criteria. Also created ad hoc reports.
- Worked closely with the analysis team and the risk management team and provided support by providing reports based on their requirements.
- Performed data manipulations on data stored in Oracle using SQL Assistant and PROC SQL.
- Extract data, create new datasets from raw data files using Import techniques and modify existing datasets using SET, MERGE, SORT, UPDATE and conditional statements.
- WrittenSQLandPL/SQL, UNIX SHELLSCRIPTINGscripts for Oracle requirements for various applications.
- Teradata relational table structures and FTP interfaces between platforms and the UNIX environment.
Environment: Base SAS 9.1.3, SAS/Access, SAS/AF, SAS/Connect, SAS/Stat,SAS/Graph, SAS/SQL, SAS/ODS, SAS/Macros, SAS Enterprise Guide, SAS Eminer, SAS/ETL, Teradata, DB2, Oracle 9i, PL/SQL, MS Excel, MS Access, R, SPSS, Python.
Confidential
Statistical Analyst
Responsibilities:
- Developed the Risk Neutral Strategies like Spreads. Spreads long/short indicators by statistical arbitrage model using PE, EPS and ROCE, Highly Correlated stocks or highly Co-integrated stocks with less Half Life.
- Developed spread between two stocks, calculate the long term fixed Standard Deviation Limits and Identifying the Long Spread and Short Spread.
- Developed Technical Analysis Charts and historical simulation onStocks, Commodity, Forexindex, futures, and options Data.
- Involved in High frequency Market Making and Mean Reversion Model using Ornstein- UhlenbecProcess and Markov Chains.
- Perform time series analysis, create regression models and generate reports using PROC REPORT, PROC TABULATE for consumer profiling, segmentation and targeting.
- Create reports in Excel spreadsheets using SAS/ODS and implementing them in PowerPoint presentation.
- Executed the SAS jobs in batch mode through UNIX shell scripts.
- Created remote SAS sessions to run the jobs in parallel mode to cut off the extraction time as the datasets were generated simultaneously.
- Involved in code changes for SAS programs and UNIX shell scripts.
- Reviewed and modified SAS Programs, to create customized ad-hoc reports, processed data for publishing business reports.
- Created flow charts to exhibit the flow of data from source datasets to the final reports.
- Automated SAS jobs running on a daily, weekly and monthly basis using SAS/BI & Unix Shell Scripting.
- Created final reports in Excel sheets, which is then accessed by the business users.
- Created Informatica mappings with PL/SQL procedures to build business rules to load data. Extensively used PROC SQL and indexing for highly advanced matching and merging between large datasets.
- Understanding the Business Requirement, developing new code, and changing the existing code toimplement the enhancements.
- Worked closely with database management team and SAS business analysis team to understand therequirements and developing the SAS code to meet the requirements.
- Extensively used macros to increase the efficiency and accuracy of the application.
- Implemented data cleaning techniques using PROC PRINT, DATA NULL, Transpose, NODUPKEY, andPROC SQL.
- Extracting data from the database using SAS/Access, SAS SQL procedures and create SAS data sets.
- Creating SAS dataset from tables in Database using SAS/Access. Retrieved the Sales data from flat files,oracle database and converted to SAS data sets for Analysis using SAS/STAT procedures.
- Involved in defining and performing the UAT test cases for complex business scenarios and executed the test cases successfully to generate a report to the management to compare the success rate.
- Involved in preparing the documentation for the previous version of the application along with design diagrams and code samples for the requirements.
Environment: SAS 8.2, SAS/MACROS, SAS/STAT, SAS/GRAPH, SAS/SQL, Oracle 8, PL/SQL, UNIX AND WINDOWS 2000.
Confidential
Statistical Analyst
Responsibilities:
- Involved withCDM group and statisticiansfor Clean the data, Categorize data, Coded free-text data and processing data for analysis.
- Used SAS/Access to Import and Export data to and from MS application.
- Prepared data using data management methods such as if/else statement, DO grouping, SELECT, WHERE statement, ARRAY and SAS functions.
- Imported ASCII text and RDBMS data using SAS PROC IMPORT and LIBNAME.
- Employed techniques like sorting and merging on the raw datasets and coded them using PROC SQL and SAS MACRO facility to get the required output.
- Used SAS Macros and procedures like Proc SQL, TRANSPOSE, TABULATE, UNIVARIATE, MEANS, FREQ for creating summarized table for reporting.
- Generated Listing, PDF reports for presenting the findings of various statistical analysis summary with SAS/ ODS.
- DATA step programming for permanently defined labels to data.
- Wrote excel macros using VBA.
- Documented change and modification.
- SAS/GRAPH for various types of graph for analysis and submission.
Environment: BASE SAS, SAS/REPORT, SAS/STAT Oracle, MS EXCEL, VBA, SAS/SQL, and SAS/ODS, WIN XP, UNIX.
