We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

TennesseE

PROFESSIONAL SUMMARY:

  • 8 years of hands on experience and comprehensive industry knowledge of Machine Learning, Statistical Modeling,DataAnalytics,Data Modeling, Data Architecture, Data Analysis, DataMining, Text Mining & Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL, Erwin.
  • Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance.
  • Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake schema and Extended Star.
  • Expertise in applyingdatamining techniques and optimization techniques in B2B and B2C industries.
  • Expertise in writing functional specifications, translating business requirements to technical specifications, created/maintained/modified database design document with detailed description of logical entities and physical tables.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of BigDataEco - system.
  • Expertise inDataAnalysis,DataMigration,Data Profiling, DataCleansing, Transformation, Integration, DataImport, andDataExport through the use of multiple ETL tools such as Informatica Power Center.
  • Proficient in Machine Learning, Data/Text Mining, Statistical Analysis & Predictive Modeling.
  • Expertise in data acquisition, storage, analysis, integration, predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, cluster analysis, ANOVA and other advanced statistical techniques.
  • Excellent knowledge and experience in OLTP/OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool,
  • Experienced in building data models using machine learning techniques for Classification, Regression, Clustering and Associative mining.
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS, Map Reduce, HiveQL, SparkSQL, PySpark.
  • Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards.
  • Excellent Tableau Developer, expertise in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau(9.x/10.x).
  • Experienced in Agile methodology and SCRUM process.
  • Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.

TECHNICAL SKILLS:

Programming & Scripting Languages: R, C, C++, JAVA, JCL, python, HTML, CSS, JSP, Java Script

Databases: MS-Access, Oracle 12c/11g/10g/9i, and Teradata, bigdata, Hadoop, PostgreSQL.

Statistical Software: SPSS, R, SAS.

ETL/BI Tools: Informatica Power Center 9.x, Tableau, Cognos BI 10, MS Excel, SAS, SAS/Macro, SAS/SQL

Data Modelling: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Web Packages: Google Analytics, Adobe Test & Target, Web Trends

BigData Ecosystem: HDFS, PIG, MapReduce, HIVE, SQOOP, FLUME, HBase, Storm, Kafka, Elastic Search, Redis, Flume, Storm, Kafka, Elastic Search, Redis, Flume, Scoop.

Statistical Methods: Time Series, regression models, splines, confidence intervals, principal component analysis and Dimensionality Reduction, bootstrapping

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

Cloud: AWS, S3, EC2.

Big Data / Grid Technologies: Cassandra, Coherence, Mongo DB, Zookeeper, Titan, Elastic search, Storm, Kafka, Hadoop

PROFESSIONAL EXPERIENCE:

Confidential, Tennessee

Data Scientist

Responsibilities:

  • Builtdatapipelines for reporting, alerting, anddatamining. Experienced with table design anddata management using HDFS, Hive, Impala, Sqoop, MySQL, Mem SQL, Grafana/Influx DB, and Kafka.
  • Worked with statistical models for data analysis, predictive modelling, machine learning approaches, recommendation and optimization algorithms.
  • Working in Business and Data Analysis, Data Profiling, Data Migration, Data Integration and Metadata Management Services.
  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest using R and Python packages.
  • Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
  • Worked with BigDataTechnologies such Hadoop, Hive, MapReduce
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Performed scoring and financial forecasting for collection priorities using Python, R and SAS machine learning algorithms.
  • Handled importingdatafrom variousdatasources, performed transformations using Hive, MapReduce, and loadeddatainto HDFS
  • Managed existing team members, lead the recruiting and on boarding of a larger Data Science team that addresses analytical knowledge requirements.
  • Worked directly with upper executives to define requirements of scoring models.
  • Created SQL scripts and analyzed the data in MS Access/Excel and Worked on SQL and SAS script mapping.
  • Rapid model creation in Python using pandas, numpy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Developed a model for predicting repayment of debt owed to small and medium enterprise (SME) businesses.
  • Created SQLscripts and analyzed the data in MS Access/Excel and Worked on SQL and SAS script mapping.
  • Rapid model creation in Python using pandas, numpy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.

Environment: R, SQL, Python 2.7.x, SQL Server 2014, NLTK, XML, HIVE, HADOOP, Graph Lab, No SQLSAS, SPSS, Spark, Hadoop, Kafka, HBase, MLib.

Confidential, Pleasonton, CA

Data scientist/ R Developer

Responsibilities:

  • Designed an Industry standard data Model specific to the company with group insurance offerings, Translated the business requirements into detailed production level using Workflow Diagrams, Sequence Diagrams, Activity Diagrams and Use Case Modeling
  • Involved in design and development of data warehouse environment, liaison to business users and technical teams gathering requirement specification documents and presenting and identifying data sources, targets and report generation.
  • Recommend and evaluate marketing approaches based on quality analytics of customer consuming behavior.
  • Determine customer satisfaction and help enhance customer experience using NLP.
  • Work on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
  • Conceptualized the most-used product module (Research Center) after building a business case for approval, gathering requirements and designing the User Interface
  • A team member of Analytical Group and assisted in designing and development of statistical models for the end clients. Coordinated with end users for designing and implementation of e-commerce analytics solutions as per project proposals.
  • Conducted market research for client; developed and designed sampling methodologies, and analyzed the survey data for pricing and availability of clients' products. Investigated product feasibility by performing analyses that include market sizing, competitive analysis and positioning.
  • Successfully optimized codes in Python to solve a variety of purposes in data mining and machine learning in Python.
  • Facilitated stakeholder meetings and sprint reviews to drive project completion.
  • Successfully managed projects using Agile development methodology
  • Project experience in Data mining, segmentation analysis, business forecasting and association rule mining using Large Data Sets with Machine Learning.
  • Automated Diagnosis of Blood Loss during Accidents and Applied Machine Learning algorithms to diagnose blood loss from vital signs (ECG, HF, GSR, etc.). Demonstrated performances of 94.6% on par with state-of-the-art models used in industry

Environment: R, MATLAB, MongoDB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning), Python, Spark (MLlib, PySpark), Tableau, MicroStrategy, SAS, Tensor Flow, regression, logistic regression, Hadoop 2.7, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce

Confidential, Dublin, CA

Data Architect/Data Modeler

Responsibilities:

  • Worked with BI team in gathering the report requirements and Sqoop to export data into HDFS and Hive.
  • Worked on MapReduce jobs in Java for data cleaning and pre-processing.
  • Assisted with data capacity planning and node forecasting.
  • Collaborated with the infrastructure, network, database, application and BIteams to ensure data quality and availability.
  • Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
  • Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume.
  • Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data.
  • Using HiveQL developed many queries and extracted the required information.
  • Exported the data required information to RDBMS using Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Worked on MongoDBdatabase concepts such as locking, transactions, indexes, Sharing, replication, schema design, etc.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Involved in defining the source to target data mappings, business rules, business and data definitions.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures.
  • Involved in the Database Designing (Relational and Dimensional models) using Erwin.
  • Followed agile methodology for the entire project.
  • Experience in working with Hadoop clusters using Cloud era distributions.
  • Involved in Hadoop cluster tasks like Adding and Removing Nodes without any effect to running jobs and data.
  • Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Converting the existing relational database model to Hadoop ecosystem.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems

Environment: SQL/Server, Oracle 9i, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, Help-Point Claims Services.

Confidential

SQL Developer

Responsibilities:

  • Defined and designed the data sourcing and data flows, data quality analysis.
  • Proactively create framework to analyze system performance to ensure high availability and of processing using a variety of tools.
  • Identified the data source and defined them to build the data source views.
  • Creating Stored Procedures in MS SQL Server 2008.
  • Experience in writing complex SQL queries involving multiple tables inner and outer joins, temporary tables, table variables.
  • Strong SQL programming skills and some stored procedure development experience
  • String technical knowledge in Troubleshooting and performance tuning.
  • Wrote Complex Stored Procedures, Triggers, Views and Queries. Created indexes, Constraints and rules on database objects.
  • Developed and deployed SSIS packages to load data from OLTP to Data Warehouse system.
  • Designing flow of data from daily tables to weekly tables and monthly aggregates in the billing Data marts.
  • Defined best practices for Tableau report development.
  • Identified and modified the KPI’s, measures in co-ordinance with the requirement and created calculate members in cube using MDX queries in Business Intelligence Studio using SSAS.
  • Designed ETL packages dealing with different data sources (SQL Server, CSV, Flat Files etc.) and loaded the data into target data destination by performing different kinds of transformations using SQL Server Integration Services (SSIS).
  • Created/ Modified SSRS based reports based on T-SQL and Stored Procedure with Input parameters.
  • Worked with RDBMS objects such as tables, views, indexes, stored procedures, functions, and triggers in MS SQL Server Management Studio (SSMS) to make data transformation services (DTS) effective.
  • Created stored procedures and functions to support efficient data storage and manipulation. Created check constraints to maintain data integrity.
  • Performing basic troubleshooting of conversion logic.
  • Transferred data from various sources like MS Excel, MS Access, and SQL Server using SSIS 2005 and then created reports using this data using SSRS.
  • Configured and maintained Report Manager and Report Server for SSRS.
  • Identified and added the report parameters and created the reports based on the requirements using SSRS 2008.
  • Tested and managed the SSIS 2005/2008 and SSIS 2007/8 packages and was responsible for its security.

Environment: MS SQL Server, MS SQL Server Reporting Services, SQL, MS Excel, MS word, Visual Studio 2005.

Confidential

SQL developer

Responsibilities:

  • Responsible for the study of SAS Code, SQL Queries, Analysis enhancements and documentation of the system.
  • Used R, SAS, and SQL to manipulate data, and develop and validate quantitative models.
  • Brainstorming sessions and propose hypothesis, approaches, and techniques.
  • Analyzed data collected in stores (JCL jobs, stored-procedures, and queries) and provided reports to the Business team by storing the data in excel/SPSS/SAS file.
  • Performed Analysis and Interpretation of the reports on various findings.
  • Responsible for production support Abend Resolution and other production support activities and comparing the seasonal trends based on the data by Excel.
  • Used advanced Microsoft Excel functions such as pivot tables and VLOOKUPin order to analyze the data.
  • Successfully implemented migration of client's requirement application from Test/DSS/Model regions to production.
  • Prepared SQL scripts for ODBC and Teradata servers for analysis and modeling.
  • Provided complete assistance of the trends of the financial time series data.
  • Various statistical tests performed for clear understanding to the client.
  • Implemented procedures for extracting Excel sheet data into the mainframe environment by connecting to the database using SQL.
  • Complete support to all regions (Test/Model/System/Regression/Production).
  • Actively involved in Analysis, Development, and Unit testing of the data.

Environment: R/R Studio, SQL Enterprise Manager, SAS, Microsoft Excel, Microsoft Access, outlook.

We'd love your feedback!