- Over 8 + years of IT experience as Data Scientist, Data Architect, Data Modeler &Data Analyst in the Architecture, Design and Development.
- Experienced in developing machine learning models for real - world problems usingR and python
- Experience in use of statistical analysis environments such as MATLAB, SPSS or SASand comfortable with relational and non-relational databases
- Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
- Experience with command-line scripting, data structures and algorithms.
- Solid Understanding of big data technologies like Hadoop, Spark, MapReduce, Pigand Hive
- Experience with machine learning tools and libraries such as Scikit-learn, R, Sparkand Weka
- Experience working with large, real world data big, messy, incomplete, full of errors
- Hands-on experience with NLP, mining of structured, semi-structured, and unstructured data
- Expertise in Database Performance Tuning using Oracle Hints, Explainplan,TKPROF, Partitioning and Indexes
- Good understanding of machine learning algorithms, supervised and unsupervised modeling techniques
- Experienced with Object Oriented Analysis and Design (OOAD) using UML,Rational Unified Process (RUP), Rational Rose, and Rational Clear Case.
- Experience working with business to determine needs and explain concepts to get support for implementing solutions.
- Experience with tools such as R Programming, visualizations, SAS, Open Sourceetc.
- Strong experience writing stored procedures, functions, triggers, and adhoc queries using PL/SQL
- Experience in working with business intelligence and data warehouse software, including SSAS, Pentaho, Cognos, OBIEE, QlikView, Greenplum Database, AmazonRedshift, or Azure Data Warehouse
- Experience with data visualization tools or dashboard tools, including Tableau, D3.js, PowerBI
- Experience with one or more reporting / dashboarding platforms (e.g. Qlikview, MicroStrategy, Business Objects)
- Experienced in generating and documenting Metadata while designing OLTP andOLAP systems environment.
- Experienced in developing Conceptual, Logical and Physical models using ERWIN,Sybase Power designer and Embarcadero E-R Studio.
- Excellent understanding and working experience of industry standard methodologies like Software Development Life Cycle (SDLC), as per Rational Unified Process (RUP), Agile and Waterfall Methodologies.
- Experienced in integration of various relational and non-relational sources such asDB2, Oracle, Netezza, SQLServer, NoSQL, COBOL, XML and Flat Files, to Netezzadatabase.
- Extensive experienced in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for improved database performanceData Warehouse/Data Mart environments.
- Experience in using Teradata ETL tools and utilizes such as BTEQ, MLOAD,FASTLOAD, TPT, FastExport.
- Proficient in oracle tools and utilities such as TOAD, SQL*PLUS and SQL developer.
- Strong expertise in Metadata, Data Quality, Master Data Management (MDM)and Data Governance.
Data Modeling Tools: Erwin 9.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer, Embarcadero.
Databases: MySQLServer 2016/2014/2012 , Oracle12c/11g/10g/9i,Teradata 16.0/15.1/14. x, MS Access 2010/2007, Sybase, Hive
Programming Languages: SQL, PL/SQL, UNIX shell scripting, PERL, R, Python
Operating Systems: Windows 10/8/7/xp, UNIX, MS DOS, Sun Solaris.
Reporting Tools: Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0, Tableau, Informatica Power Center v10/7.1/6.2, Ab - Initio,Data Stage
Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE,QlikView, SAP Business Intelligence, Amazon Redshirt, or Azure Data Warehouse
Other Tools: MS-Office suite (Word, Excel, MS Project and Outlook), TOAD, BTEQ, Fast Load, Multi Load, Fast Export.
Confidential, NYC, NY
Sr. Data Scientist/Data Architect
- Designed predictive models using the machine learning platform - H2O, Flow UI.
- Worked on integrating h2o with Cisco's native data management and analytics platform-PNDA for the SP-Optimization team.
- Developed spark streaming jobs using MLlib and deployed it as a PNDA app working along the DevOps team.
- Worked on AWS S3 buckets and intra cluster file transfer between PNDA and s3 securely.
- Analyzed auction mechanism for real-time bidding advertisement market place and built yield management tools.
- Compiled data from various sources to perform complex analysis for actionable results.
- Utilized machine learning algorithms such as linear regression, multivariate regression, naive bayes, Random Forests, K-means, & KNN for data analysis.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Implemented a Python-based distributed random forest via Python streaming.
- Pipelined (ingest/clean/munge/transform) data for feature extraction toward downstream classification.
- Measured Efficiency of Hadoop/Hive environment ensuring SLA
- Worked with (Tableau) Report Writers to Test, Validate Data Integrity of Reports
- Captured Modeling requirements from Senior Stakeholders to Bench functional requirements for SAS/ R Python
- Analyze source systems for data acquisition architect data feathering logic across different source systems.
- Designed and tested Predictive Algorithms using Historical Data worked with middleware tools like tibco business works and tibco iprocess will be desired.
- Designed logical and physical data models of medium to high complexity, according to company standards.
- Communicated strategies and process around data modeling and data architecture to cross functional groups.
- Specifies overall Data Architecture for all areas and domains of the enterprise, including Data Acquisition, ODS, MDM, Data Warehouse, Data Provisioning, ETLand BI.
- Developed Python driven Data Lake (SQL Server, Archer, Oracle, etc)
- Created reports with Crystal Reports and scheduled to run on a daily basis.
- Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for datadeployed in various data systems.
- Coordinated with Data Architects and Data Modelers to create new schemas and view in Netezza for to improve reports execution time, worked on creating optimized DataMart reports.
- Used the Agile Scrum methodology to build the different phases of Software development life cycle.
- Gathered and analyzed existing physical data models for in scope applications and proposed the changes to the data models according to the requirements.
- Involved in database development by creating Oracle PL/SQL Functions, Procedures and Collections.
Environment: R, SAS, Python, Hadoop, Hive, Pig,Netezza, Map Reduce, Spark, Mongodb, MDM, PL/SQL,oracle 12c,AWS, Sql Server 2015, Nosql
Confidential, Baltimore, MD
Sr. Data Scientist/Data Architect
- Worked on enhancements on PIG Scripts to include more Topics and creation of Pig UDFs to process User Cookies Data.
- Developed POC in Spark to Query the dataset.
- Performed data cleaning and data preparation tasks to convert data into a meaningful data set using R
- Built analytics engine in R with machine learning for classifications and regression analyses.
- Gathered and translated business requirements into technical specifications for in - house business intelligence(BI) tool.
- Applied statistical modelling techniques on the queried datasets such as regression analysis and curve fitting techniques.
- Identify the behavior and patterns in the data and visualize it through trend lines, box plots, bar charts.
- Identified explanatory and response variables, computed collinearity and variance of the datasets.
- Worked on Data Ingestion using Sqoop from Oracle to HDFS.
- Worked with datasets collected from educational institutions and did cluster analysis and semantic grouping using k means algorithm.
- Implemented supervised learning algorithms such as Neural networks, SVM, Decision trees and Na ve Bayes for advanced text analytics.
- Advanced Text analytics using Deep learning techniques such as convolution neural networks to determine the sentiment of texts.
- Interact with ETL Architects, Developers, DBDs and DBAs on how the model works and be able to make enhancements based on new and future requirements.
- Communicate strategies and process around data modeling and data architecture to cross functional groups.
- Developed ETL processes, including stored procedures, SSIS packages, and SQLServer jobs
- Performed SAP Data Migration by using Business Objects Data Services as the ETL tool.
- Developed PL/SQL triggers on master tables for automatic creation of primary keys and foreign keys.
- Determine the tracking necessary to enable analytics of our products and features by working closely with product and engineering partners Key
- Created SQL scripts and analyzed the data in MS Access/Excel and Worked on SQLand SAS script mapping.
- Defined the overall EDW data architecture which may include EDW platform, ETLtools and processes, ODS, data marts and data modeling tools
- Collaborate with Data Management, Business Intelligence team, stakeholders and team members to define, design, build, test, deploy and support a full datawarehouse environment
- Responsible for leading the architectural development of metadata and reporting framework
- Performed Database testing (Data Warehouse / ETL) on ETL applications validating the data in Sources (SAP, Oracle) Targets (EDW)
- Involved in Designing the DB2 process to Extract translates and load data fromOLTP Oracle database system to Teradata data warehouse.
Environment: Python, R, SAP, Oracle, OLAP, OLTP, PL/SQL,Hadoop, Hive, Pig, SQl Server 2015,Sqoop, Spark
Confidential, San Francisco, CA
Sr. Data Scientist/Data Architect
- Developed Apache Hadoop components like HDFS, Map Reduce, HiveQL, HBase,Pig, Sqoop, Ozzie, Mahout, Cassandra, Mongo db, Big Data and Big Data Analytics.
- Worked on the development of Data Warehouse, Business Intelligence architecture that involves data integration and the conversion of data from multiple sources and platforms.
- Developed statistical models using software such as R and SPSS
- Sorted and identified patterns and clusters of facts with the use of data analytical software such as SPSS, R Languages
- Follow the Type 2 - dimension methodology to accommodate designing and maintaining for history data
- Implementing Apache Spark, Hadoop and Spark-R packages to provision big data analytics
- Used Maven extensively for building jar files of Map Reduce programs and deployed to Cluster.
- Used Spark for test data analytics using MLLib and Analyzed the performance to identify bottlenecks.
- Used supervised learning techniques such as classifiers and neural networks to identify patters in these data sets.
- Developed core Data Science models
- Worked & merged Data in multiple formats/sources (CSV, fetch from SQLServer/SAP HANA etc.)
- Identified risk features and they were combined to form groups with Bayesian predictors depending on the patterns in which they indicate fraudulent activity.
- Worked on Linux shell scripts for business process and loading data from different interfaces to HDFS.
- Build an in-depth understanding of the business domains and available data assets, and perform exploratory statistics on diverse datasets.
- Collaborate with engineering team to transfer the business understanding, productize the model and validate output with business users.
- Maintained meta data (data definitions of table structures) and version controlling for the data model.
- Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
- Work with software engineers to put predictive models in production
- Built Cloud architecture on Amazon AWS.
- Responsible for the development of target data architecture, design principles, quality control, and data standards for the organization
- Worked with DBA to create Best-Fit Physical Data Model from the logical Data Model using Forward Engineering in Erwin.
- Determine data reduction methodologies for dealing with noisy data (correlation matrix, principal components analysis, clustering), missing values, and outliers
- Setup Hadoop cluster using EC2 (Elastic Map Reduce) on managed Hadoop Frame Work.
- Used S3 Bucket to store the jar's, input datasets and used DynamoDB to store the processed output from the input data set
Environment: R, HDFS, Map Reduce, HiveQL, HBase, Pig, Sqoop, Ozzie, Mahout, Cassendra, Mongo db, Linux Shell Script, Dynamo DB, SAP, HANA, Amazon AWS, Millib
Confidential, Tampa, FL
Sr. Data Analyst/Data Modeler
- Responsible to determine the fundamental organization of the system and the various components that consists the system and the principles guiding its design and evolution.
- Involved in requirements gathering procedures along with the business analysts.
- Translated requirements into sustainable and scalable conceptual and logical data models.
- Developed, delivered and validated the logical and physical data models through collaboration with team leads, SMEs, business users, and other team members using ERwin 4.0
- Used star/snowflake schemas in creating dimensional models.
- Designed, built, and maintained enterprise conceptual, logical, and physical multidimensional data models and data - marts that adhere to department standards, policies, procedures, and industry best practices.
- Created documentation as needed to support data architecture.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
- Maintained warehouse metadata, naming standards and warehouse standards.
- Used the Ab Initio to design and develop jobs for extracting, cleansing, transforming, integrating, and loading data into different Data Marts.
- Assisted Lead Data Management Architect in the development of the best option for entry of business terms and definitions and maintained the metadata in the proper way to the business.
- Responsible for loading and distribution of data sets into database table across various servers. Automated data loads, transfers and data validation procedures.
- Assisted Business analysts with Teradata Utilities
- Used Teradata Utilities (BTEQ, Multiload, and Fast Load) to maintain the database.
- Performed ad-hoc queries by using SQL, PL/SQL, MS Access, MS excel and UNIX to meet business Analysts needs.
- Developed test plans and assisted in System Integration Testing (SIT).
- Documentation involved design docs, sop, mapping specifications and process flowcharts etc.
- Involved in quality assurance to ensure quality, validity and accuracy of data across the servers.
Environment: ERwin 4.0, SQL, PL/SQL, MS Office (Word, Power Point, Access, Excel, Outlook), Lotus notes, Oracle 9i, SQL Server 2005, Business Objects, Informatica.