We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • 8+ years of hands on experience and comprehensive industry knowledge of Machine Learning, Statistical Modeling,DataAnalytics,Data Modeling, Data Architecture, Data Analysis, DataMining, Text Mining & Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL, Erwin.
  • Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance.
  • Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake schema and Extended Star.
  • Expertise in applyingdatamining techniques and optimization techniques in B2B and B2C industries.
  • Expertise in writing functional specifications, translating business requirements to technical specifications, created/maintained/modified database design document wif detailed description of logical entities and physical tables.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable wif R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of BigDataEco - system.
  • Expertise inDataAnalysis,DataMigration,Data Profiling, DataCleansing, Transformation, Integration, DataImport, andDataExport through the use of multiple ETL tools such as Informatica Power Center.
  • Proficient in Machine Learning, Data/Text Mining, Statistical Analysis & Predictive Modeling.
  • Expertise in data acquisition, storage, analysis, integration, predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, cluster analysis, ANOVA and other advanced statistical techniques.
  • Excellent knowledge and experience in OLTP/OLAP System Study wif focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool,
  • Experienced in building data models using machine learning techniques for Classification, Regression, Clustering and Associative mining.
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS,MapReduce, HiveQL, SparkSQL, PySpark.
  • Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards.
  • Excellent Tableau Developer, expertise in building, publishing customized interactive reports and dashboards wif customized parameters and user-filters using Tableau(9.x/10.x).
  • Experienced in Agile methodology and SCRUM process.
  • Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.

TECHNICAL SKILLS

Databases: MySQL, Postgre SQL, Oracle, HBase, Amazon Redshift, MS SQL Server 2016/2014/2012/2008 R2/2008, Taradata

Statistical Methods: Hypothetical Testing, ANOVA, Time Series,Confidence Intervals, Bayes Law, PrincipalComponent Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation

Machine Learning: Regression analysis, Bayesian Method, Decision Tree, Random Forests, SupportVector Machine, Neural Network, SentimentAnalysis, K-Means Clustering, KNN andEnsemble Method

Hadoop Ecosystem: Hadoop 2.x, Spark 2.x, MapReduce, Hive, HDFS, Sqoop, Flume

Reporting Tools: Tableau Suite of Tools 10.x, 9.x, 8.x which includes Desktop, Server and Online, Server Reporting Services(SSRS)

Data Visualization: Tableau, MatPlotLib, Seaborn, ggplot2

Languages: Python (2.x/3.x), R, SAS, SQL, T-SQL

Operating Systems: PowerShell, UNIX/UNIX Shell Scripting (via PuTTY client), Linux and Windows

PROFESSIONAL EXPERIENCE

Confidential, Chicago IL

Data Scientist

Responsibilities:

  • Provided the architectural leadership in shaping strategic, business technology projects, wif an emphasis on application architecture.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Developed MapReduce/Spark Python modules for machine learning& predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Created ecosystem models (e.g. conceptual, logical, physical, canonical) dat are required for supporting services wifin the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Demonstrated experience in designing and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.

Environment: s: AWS, R, Python, HDFS, OLTP, Oracle 12c, Hive, OLAP, DB2, MS Excel, Map-Reduce, SQL, XML, MLlib, Regression, Cluster analysis, Random forest, XML, Python, Data Mining, Seaborn, Jupyter, TensorFlow, K-means.

Confidential, MN

Data Scientist

Responsibilities:

  • Documented logical, physical, relational and dimensional data models. Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Big Data Hadoop Distributed File System and PIG to pre-process the data.
  • Transformed Logical Data Model to Erwin, Physical Data Model ensuring the Primary Key and Foreign Key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index Considerations.
  • Predicting store sales at Store and SKU level using linear regression model wif an error of 1% in 95% of stores, using the statistical analytical tools and algorithms and helped retailer integrate the results into their sales and operations tools
  • Built Tableau dashboards dat tracked the pre and post changes in customer behavior post campaign launch; the ROI measurements helped retailer to strategically extend the campaigns to other potential markets
  • Built forecasting model by applying ARIMA models and come up wif statistical analysis on the Big data.
  • Modelling and exponential smootaning for multivariate time series data.
  • Developed 11 customer segments using K-means, Gaussian mixture techniques; the clusters helped retailer understanding lifetime values and in designing strategies to boost the per-household values
  • Developed a machine learning system dat predicted purchase probability at a particular offer based on customer's real time location data and past purchase behavior; these predictions are being used for mobile coupon pushes.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Checking the back-end databases connectivity dat using the JavaScript and JDBC connections to the databases.
  • Worked wif data compliance teams, data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
  • Performed Source System Analysis, database design, data modeling for the warehouse layer using MLDM concepts and package layer using Dimensional modeling.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Created Hive queries dat helped analysts spot emerging trends by comparing fresh data wif EDW tables and historical metrics and processed the data using HQL (like SQL) on top of Map-reduce.
  • Created tables, sequences, synonyms, joins, functions and operators in Netezza database.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website and managed and reviewed Hadoop log files.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc to import data into the data warehouse.
  • Performed administrative tasks, including creation of database objects such as database, tables, and views, using SQL DCL, DDL, and DML requests.
  • Coding new tables, views and modifications as well as Pl/PgSQL stored procedures, data types, triggers, constraints in PostgreSQLdatabases
  • Built and published customized interactive reports and dashboards, report scheduling using Tableau server.
  • Used SQL Loader to load data from the Legacy systems into Oracle databases using control files extensively.
  • Used Oracle External Tables feature to read the data from flat files into Oracle staging tables.

Environment: Teradata, PostgreSQL, Big Data Hadoop, HDFS, Pig, Hive, Python, MapReduce, Time series analysis, ARIMA models, MDM, SQL Server, Netezza, DB2, DFAST, Tableau, Architecture, SAS/Graph, SAS/SQL, Tableau, SAS/Connect and SAS/Access.

Confidential, Austin, CA

Data Analyst/Data Scientist

Responsibilities:

  • Gatheird, analyzed, documented and translated application requirements into data models and Supports standardization of documentation and the adoption of standards and practices related to data and applications.
  • Participated in Data Acquisition wif Data Engineer team to extract historical and real-time data by using Sqoop, Pig, Flume, Hive, MapReduce and HDFS.
  • Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Applied clustering algorithms i.e. Hierarchical, K-means using Scikit and Scipy.
  • Created logical data model from the conceptual model and it's conversion into the physical database design using ERWIN.
  • Mapped business needs/requirements to subject area model and to logical enterprise model.
  • Worked wif DBA's to create a best fit physical data model from the logical data model
  • Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/ columns as part of data analysis responsibilities.
  • Enforced referential integrity in the OLTPdatamodel for consistent relationship between tables and efficient database design.
  • Developed the data warehouse model (star schema) for the proposed central model for the project.
  • Created 3NF business area data modeling wif de-normalized physical implementation data and information requirements analysis using ERWIN tool.
  • Worked on the Snow-flaking the Dimensions to remove redundancy.
  • Worked in using Teradata14 tools like Fast Load, Multi Load, T Pump, Fast Export, Teradata Parallel Transporter (TPT) and BTEQ.
  • Helped in migration and conversion of data from the Sybase database into Oracle database, preparing mapping documents and developing partial SQL scripts as required.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems

Environment: Machine learning(KNN, Clustering, Regressions, Random Forest, SVM, Ensemble), Linux, Python 2.x (Scikit-Learn/Scipy/Numpy/Pandas), R, Tableau (Desktop 8.x/Server 8.x), Hadoop, Map Reduce,HDFS, Hive, Pig, HBase,Sqoop, Flume,Oracle 11g, SQL Server 2012

Confidential, Lowa

BI Developer/Data Analyst

Responsibilities:

  • Used SSIS to create ETL packages to Validate, Extract, Transform and Load data into Data Warehouse and Data Mart.
  • Maintained and developed complex SQL queries, stored procedures, views, functions and reports dat meet customer requirements using Microsoft SQL Server 2008 R2.
  • Created Views and Table-valued Functions, Common Table Expression (CTE), joins, complex subqueries to provide the reporting solutions.
  • Optimized the performance of queries wif modification in T-SQL queries, removed the unnecessary columns and redundant data, normalized tables, established joins and created index.
  • Created SSIS packages using Pivot Transformation, Fuzzy Lookup, Derived Columns, Condition Split, Aggregate, Execute SQL Task, Data Flow Task and Execute Package Task.
  • Migrated data from SAS environment to SQL Server 2008 via SQL Integration Services (SSIS).
  • Developed and implemented several types of Financial Reports (Income Statement, Profit& Loss Statement, EBIT, ROIC Reports) by using SSRS.
  • Developed parameterized dynamic performance Reports (Gross Margin, Revenue base on geographic regions, Profitability based on web sales and smartphone app sales) and ran the reports every month and distributed them to respective departments through mailing server subscriptions and SharePoint server.
  • Designed and developed new reports and maintained existing reports using Microsoft SQL Reporting Services (SSRS) and Microsoft Excel to support the firm's strategy and management.
  • Created sub-reports, drill down reports, summary reports, parameterized reports, and ad-hoc reports using SSRS.
  • Used SAS/SQL to pull data out from databases and aggregate to provide detailed reporting based on the user requirements.
  • Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical analyses.
  • Provided statistical research analyses and data modeling support for mortgage product.
  • Perform analyses such as regression analysis, logistic regression, discriminant analysis, cluster analysisusing SAS programming.

Environment: SQL Server 2008 R2, DB2,Oracle,SQL Server Management Studio, SAS/ BASE, SAS/SQL, SAS/Enterprise Guide, MS BI Suite(SSIS/SSRS), T-SQL, SharePoint 2010, Visual Studio 2010, Agile/SCRUM

Confidential

Data Analyst

Responsibilities:

  • Wrote SQL queries for data validation on the backend systems and used various tools like TOAD&DBVisualizer for DBMS(Oracle)
  • Perform Data analysis, Backend Database testing, Data Modeling and Developing SQL Queries to solve problems and meet user's need for Database management in Data Warehouse.
  • Utilize object-oriented languages, concepts, database design, star schemas and databases.
  • Create algorithms as needed to manage and implement proposed solutions.
  • Participate in test planning and test execution for functional, system, integration, regression, UAT (User Acceptance Testing), load and performance testing.
  • Work wif test automation tools for recording/coding in Database, and execute in regression testing cycles.
  • Transferred data from various OLTP data sources, such as Oracle, MS Access, MS Excel, Flat files, CSV files into SQL Server.
  • Working wif Databases DB2, Oracle DM, SQL Server for Database testing and maintenance.
  • Involved in writing and executing User Acceptance Testing (UAT) wif end users.
  • Involved in Post- Implementation validations after the changes have been to the Data Marts.
  • Chart out Graphs, and Reports alike in QC to point out the percentage of Test Cases passed, and theirby to point out the percentage of Quality achieved and uploading the status daily to ART reports an in-house tool.
  • Performed extensiveDataValidation,DataVerification againstDataWarehouse.
  • Used UNIX to check the Data marts, Tables and Updates made to the tables.
  • Writing advanced SQL Queries to query the data from Data marts and Landings to verify the changes TEMPhas been made.
  • Involved in Client requirement gathering, participated in discussion & brain storming sessions and documented requirements.
  • Validating and profilingFlat FileDatainto Teradata tables using UNIX Shell scripts.
  • Actively participated Functional, System and User Acceptance testing on all builds and supervised releases to ensure system / functionality integrity.
  • Closely interacted wif designers and software developers to understand application functionality and navigational flow and keep them updated about Business user sentiments.
  • Interacted wif developers to resolve different Quality Related Issues.
  • Wrote and executed manual test cases for functional, GUI, and regression testing of the application to make sure dat new enhancements do not break working features
  • Writing and executing Manual test cases in HP Quality Center.
  • Wrote test plans for positive and negative scenarios for GUI and functional testing
  • Involved in writing SQL queries and stored procedures using Query Analyzer and matched the results retrieved from the batch log files
  • Created Project Charter documents & Detailed Requirement document and reviewed wif Development & other stake holders.

Environment: Subversion, Tortoise SVN, Jira, Agile-Scrum, Web Services, Mainframe, Oracle, Perl, UNIX, LINUX, Shell Scripts, UML, Quality Center,, RequisitePro, SQL, MS Visio, MS Project, Excel, Power Point, Word, SharePoint, Win XP/7 Enterprise.

Confidential

Data Analyst

Responsibilities:

  • Performed Gatheird Business Requirements, interacted wif the Users, Designers, Developers, Project Manager, and SMEs to get a better understanding of the Business Processes, and analyzed and optimized the process.
  • Created reports containing all the information regarding the sales, traits and analytics data using Tableau.
  • Tune much of the code, re-write as needed to utilize newer features, such as bulk collects/DML, function-based indices, convert from dynamic to static SQL when possible
  • Responsible for ensuring data integrity using SQL and coordinating efforts wif testing and implementation
  • Responsible for the development and execution of test plans.
  • Tracked the extensive database of customers across the country, analyzed it using SQL queries, Python and visualized data by creating crystalized reports using Tableau, IBMCognos and MySQL
  • Gatheird business data requirements for the new datawarehouse as per the compliance standards.
  • Worked wif marketing team on Customer data analysis for different geographic regions using Siebel CRM.
  • Expertise in the concepts of Data Warehousing, Data Marts, Dimensional Modeling, Fact and Dimensional Tables.
  • Worked wif Data Architects, DBA and Development team and assisted in building data marts.
  • Performed Gap Analysis, statistical analysis and facilitated data migration from legacy systems.

Environment: HTML/CSS, JS, Bootstrap, Excel/Tableau 8.1, Oracle SQL developer 4.1.5, MS Office Suite

We'd love your feedback!