We provide IT Staff Augmentation Services!

Sr. Data Scientist/data Architect Resume

Nyc, NY


  • Over 8 + years of IT experience as Data Scientist, Data Architect, Data Modeler &Data Analyst in the Architecture, Design and Development.
  • Experienced in developing machine learning models for real - world problems usingR and python
  • Experience in use of statistical analysis environments such as MATLAB, SPSS or SASand comfortable with relational and non-relational databases
  • Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
  • Experience with command-line scripting, data structures and algorithms.
  • Solid Understanding of big data technologies like Hadoop, Spark, MapReduce, Pigand Hive
  • Experience with machine learning tools and libraries such as Scikit-learn, R, Sparkand Weka
  • Experience working with large, real world data big, messy, incomplete, full of errors
  • Hands-on experience with NLP, mining of structured, semi-structured, and unstructured data
  • Expertise in Database Performance Tuning using Oracle Hints, Explainplan,TKPROF, Partitioning and Indexes
  • Good understanding of machine learning algorithms, supervised and unsupervised modeling techniques
  • Experienced with Object Oriented Analysis and Design (OOAD) using UML,Rational Unified Process (RUP), Rational Rose, and Rational Clear Case.
  • Experience working with business to determine needs and explain concepts to get support for implementing solutions.
  • Experience with tools such as R Programming, visualizations, SAS, Open Sourceetc.
  • Strong experience writing stored procedures, functions, triggers, and adhoc queries using PL/SQL
  • Experience in working with business intelligence and data warehouse software, including SSAS, Pentaho, Cognos, OBIEE, QlikView, Greenplum Database, AmazonRedshift, or Azure Data Warehouse
  • Experience with data visualization tools or dashboard tools, including Tableau, D3.js, PowerBI
  • Experience with one or more reporting / dashboarding platforms (e.g. Qlikview, MicroStrategy, Business Objects)
  • Experienced in generating and documenting Metadata while designing OLTP andOLAP systems environment.
  • Experienced in developing Conceptual, Logical and Physical models using ERWIN,Sybase Power designer and Embarcadero E-R Studio.
  • Excellent understanding and working experience of industry standard methodologies like Software Development Life Cycle (SDLC), as per Rational Unified Process (RUP), Agile and Waterfall Methodologies.
  • Experienced in integration of various relational and non-relational sources such asDB2, Oracle, Netezza, SQLServer, NoSQL, COBOL, XML and Flat Files, to Netezzadatabase.
  • Extensive experienced in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for improved database performanceData Warehouse/Data Mart environments.
  • Experience in using Teradata ETL tools and utilizes such as BTEQ, MLOAD,FASTLOAD, TPT, FastExport.
  • Proficient in oracle tools and utilities such as TOAD, SQL*PLUS and SQL developer.
  • Strong expertise in Metadata, Data Quality, Master Data Management (MDM)and Data Governance.


Data Modeling Tools: Erwin 9.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer, Embarcadero.

Databases: MySQLServer 2016/2014/2012 , Oracle12c/11g/10g/9i,Teradata 16.0/15.1/14. x, MS Access 2010/2007, Sybase, Hive

Programming Languages: SQL, PL/SQL, UNIX shell scripting, PERL, R, Python

Operating Systems: Windows 10/8/7/xp, UNIX, MS DOS, Sun Solaris.

Reporting Tools: Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0, Tableau, Informatica Power Center v10/7.1/6.2, Ab - Initio,Data Stage

Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE,QlikView, SAP Business Intelligence, Amazon Redshirt, or Azure Data Warehouse

Other Tools: MS-Office suite (Word, Excel, MS Project and Outlook), TOAD, BTEQ, Fast Load, Multi Load, Fast Export.


Confidential, NYC, NY

Sr. Data Scientist/Data Architect


  • Designed predictive models using the machine learning platform - H2O, Flow UI.
  • Worked on integrating h2o with Cisco's native data management and analytics platform-PNDA for the SP-Optimization team.
  • Developed spark streaming jobs using MLlib and deployed it as a PNDA app working along the DevOps team.
  • Worked on AWS S3 buckets and intra cluster file transfer between PNDA and s3 securely.
  • Analyzed auction mechanism for real-time bidding advertisement market place and built yield management tools.
  • Compiled data from various sources to perform complex analysis for actionable results.
  • Utilized machine learning algorithms such as linear regression, multivariate regression, naive bayes, Random Forests, K-means, & KNN for data analysis.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Implemented a Python-based distributed random forest via Python streaming.
  • Pipelined (ingest/clean/munge/transform) data for feature extraction toward downstream classification.
  • Measured Efficiency of Hadoop/Hive environment ensuring SLA
  • Worked with (Tableau) Report Writers to Test, Validate Data Integrity of Reports
  • Captured Modeling requirements from Senior Stakeholders to Bench functional requirements for SAS/ R Python
  • Analyze source systems for data acquisition architect data feathering logic across different source systems.
  • Designed and tested Predictive Algorithms using Historical Data worked with middleware tools like tibco business works and tibco iprocess will be desired.
  • Designed logical and physical data models of medium to high complexity, according to company standards.
  • Communicated strategies and process around data modeling and data architecture to cross functional groups.
  • Specifies overall Data Architecture for all areas and domains of the enterprise, including Data Acquisition, ODS, MDM, Data Warehouse, Data Provisioning, ETLand BI.
  • Developed Python driven Data Lake (SQL Server, Archer, Oracle, etc)
  • Created reports with Crystal Reports and scheduled to run on a daily basis.
  • Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for datadeployed in various data systems.
  • Coordinated with Data Architects and Data Modelers to create new schemas and view in Netezza for to improve reports execution time, worked on creating optimized DataMart reports.
  • Used the Agile Scrum methodology to build the different phases of Software development life cycle.
  • Gathered and analyzed existing physical data models for in scope applications and proposed the changes to the data models according to the requirements.
  • Involved in database development by creating Oracle PL/SQL Functions, Procedures and Collections.

Environment: R, SAS, Python, Hadoop, Hive, Pig,Netezza, Map Reduce, Spark, Mongodb, MDM, PL/SQL,oracle 12c,AWS, Sql Server 2015, Nosql

Confidential, Baltimore, MD

Sr. Data Scientist/Data Architect


  • Worked on enhancements on PIG Scripts to include more Topics and creation of Pig UDFs to process User Cookies Data.
  • Developed POC in Spark to Query the dataset.
  • Performed data cleaning and data preparation tasks to convert data into a meaningful data set using R
  • Built analytics engine in R with machine learning for classifications and regression analyses.
  • Gathered and translated business requirements into technical specifications for in - house business intelligence(BI) tool.
  • Applied statistical modelling techniques on the queried datasets such as regression analysis and curve fitting techniques.
  • Identify the behavior and patterns in the data and visualize it through trend lines, box plots, bar charts.
  • Identified explanatory and response variables, computed collinearity and variance of the datasets.
  • Worked on Data Ingestion using Sqoop from Oracle to HDFS.
  • Worked with datasets collected from educational institutions and did cluster analysis and semantic grouping using k means algorithm.
  • Implemented supervised learning algorithms such as Neural networks, SVM, Decision trees and Na ve Bayes for advanced text analytics.
  • Advanced Text analytics using Deep learning techniques such as convolution neural networks to determine the sentiment of texts.
  • Interact with ETL Architects, Developers, DBDs and DBAs on how the model works and be able to make enhancements based on new and future requirements.
  • Communicate strategies and process around data modeling and data architecture to cross functional groups.
  • Developed ETL processes, including stored procedures, SSIS packages, and SQLServer jobs
  • Performed SAP Data Migration by using Business Objects Data Services as the ETL tool.
  • Developed PL/SQL triggers on master tables for automatic creation of primary keys and foreign keys.
  • Determine the tracking necessary to enable analytics of our products and features by working closely with product and engineering partners Key
  • Created SQL scripts and analyzed the data in MS Access/Excel and Worked on SQLand SAS script mapping.
  • Defined the overall EDW data architecture which may include EDW platform, ETLtools and processes, ODS, data marts and data modeling tools
  • Collaborate with Data Management, Business Intelligence team, stakeholders and team members to define, design, build, test, deploy and support a full datawarehouse environment
  • Responsible for leading the architectural development of metadata and reporting framework
  • Performed Database testing (Data Warehouse / ETL) on ETL applications validating the data in Sources (SAP, Oracle) Targets (EDW)
  • Involved in Designing the DB2 process to Extract translates and load data fromOLTP Oracle database system to Teradata data warehouse.

Environment: Python, R, SAP, Oracle, OLAP, OLTP, PL/SQL,Hadoop, Hive, Pig, SQl Server 2015,Sqoop, Spark

Confidential, San Francisco, CA

Sr. Data Scientist/Data Architect


  • Developed Apache Hadoop components like HDFS, Map Reduce, HiveQL, HBase,Pig, Sqoop, Ozzie, Mahout, Cassandra, Mongo db, Big Data and Big Data Analytics.
  • Worked on the development of Data Warehouse, Business Intelligence architecture that involves data integration and the conversion of data from multiple sources and platforms.
  • Developed statistical models using software such as R and SPSS
  • Sorted and identified patterns and clusters of facts with the use of data analytical software such as SPSS, R Languages
  • Follow the Type 2 - dimension methodology to accommodate designing and maintaining for history data
  • Implementing Apache Spark, Hadoop and Spark-R packages to provision big data analytics
  • Used Maven extensively for building jar files of Map Reduce programs and deployed to Cluster.
  • Used Spark for test data analytics using MLLib and Analyzed the performance to identify bottlenecks.
  • Used supervised learning techniques such as classifiers and neural networks to identify patters in these data sets.
  • Developed core Data Science models
  • Worked & merged Data in multiple formats/sources (CSV, fetch from SQLServer/SAP HANA etc.)
  • Identified risk features and they were combined to form groups with Bayesian predictors depending on the patterns in which they indicate fraudulent activity.
  • Worked on Linux shell scripts for business process and loading data from different interfaces to HDFS.
  • Build an in-depth understanding of the business domains and available data assets, and perform exploratory statistics on diverse datasets.
  • Collaborate with engineering team to transfer the business understanding, productize the model and validate output with business users.
  • Maintained meta data (data definitions of table structures) and version controlling for the data model.
  • Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
  • Work with software engineers to put predictive models in production
  • Built Cloud architecture on Amazon AWS.
  • Responsible for the development of target data architecture, design principles, quality control, and data standards for the organization
  • Worked with DBA to create Best-Fit Physical Data Model from the logical Data Model using Forward Engineering in Erwin.
  • Determine data reduction methodologies for dealing with noisy data (correlation matrix, principal components analysis, clustering), missing values, and outliers
  • Setup Hadoop cluster using EC2 (Elastic Map Reduce) on managed Hadoop Frame Work.
  • Used S3 Bucket to store the jar's, input datasets and used DynamoDB to store the processed output from the input data set

Environment: R, HDFS, Map Reduce, HiveQL, HBase, Pig, Sqoop, Ozzie, Mahout, Cassendra, Mongo db, Linux Shell Script, Dynamo DB, SAP, HANA, Amazon AWS, Millib

Confidential, Tampa, FL

Sr. Data Analyst/Data Modeler


  • Responsible to determine the fundamental organization of the system and the various components that consists the system and the principles guiding its design and evolution.
  • Involved in requirements gathering procedures along with the business analysts.
  • Translated requirements into sustainable and scalable conceptual and logical data models.
  • Developed, delivered and validated the logical and physical data models through collaboration with team leads, SMEs, business users, and other team members using ERwin 4.0
  • Used star/snowflake schemas in creating dimensional models.
  • Designed, built, and maintained enterprise conceptual, logical, and physical multidimensional data models and data - marts that adhere to department standards, policies, procedures, and industry best practices.
  • Created documentation as needed to support data architecture.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
  • Maintained warehouse metadata, naming standards and warehouse standards.
  • Used the Ab Initio to design and develop jobs for extracting, cleansing, transforming, integrating, and loading data into different Data Marts.
  • Assisted Lead Data Management Architect in the development of the best option for entry of business terms and definitions and maintained the metadata in the proper way to the business.
  • Responsible for loading and distribution of data sets into database table across various servers. Automated data loads, transfers and data validation procedures.
  • Assisted Business analysts with Teradata Utilities
  • Used Teradata Utilities (BTEQ, Multiload, and Fast Load) to maintain the database.
  • Performed ad-hoc queries by using SQL, PL/SQL, MS Access, MS excel and UNIX to meet business Analysts needs.
  • Developed test plans and assisted in System Integration Testing (SIT).
  • Documentation involved design docs, sop, mapping specifications and process flowcharts etc.
  • Involved in quality assurance to ensure quality, validity and accuracy of data across the servers.

Environment: ERwin 4.0, SQL, PL/SQL, MS Office (Word, Power Point, Access, Excel, Outlook), Lotus notes, Oracle 9i, SQL Server 2005, Business Objects, Informatica.

Hire Now