Data Scientist Resume New York City, NY - Hire IT People

PROFESSIONAL SUMMARY:

8+ Experience in Machine Learning Statistic Modelling, Predictive Modelling, DataAnalytics, DataModelling, DataArchitecture, Data Analysis, Data Mining, Text Mining and Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL.
Ability to collaborate with peers in business and technical areas to deliver optimal business process solutions in line with corporate priorities. DataArchitecture
Assist in various testing tasks such as System integration testing, UAT, Sanity testing, smoke testing etc.
Collaborated with the lead Data Architect to model the Data warehouse in accordance to FSLDM subject areas, 3NF format, and Snow flake schema.
Extensive experience in Text Analytics, generatingdatavisualizations using R, Python and creating dashboards using tools like Tableau.
Adept and deep understanding of Statistical modelling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.
Well versed inNormalization/De - normalizationtechniques for optimum performance in relational and dimensional database environments.
Expertise in applyingdatamining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning,Data/Text Mining, Statistical Analysis and Predictive Modelling.
Hands on experience with bigdatatools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, SparkSql.
Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
Well versed with Agile and Waterfall Methodologies.
Experience on advanced SAS programming techniques, such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
Expertise in complete software development life cycle process that includes Analysis, Design, Development, Testing and Implementation inHadoop Eco System, Documentum 6.5 sp2 suits of products and Java technologies.
Having extensive experience inText Mining.
Extensive working experience withPython 3.xincludingScikit-learn, Pandas and Numpy.
Extensive working experience withPython3.xincludingmatplotlib and seaborn.
Extensive working experience with Hadoop Core Concepts like HDFS and Map Reduce.
Having good knowledge on NoSQL Solutions likeHBase.
Extensive experience inHive, Sqoop, Flume, Hue and Oozie.
Created practical predictive models with machine learning methods including Supervised (linear regression, logistic regression, decision tree, random forests, SVM etc.) and Unsupervised (K-means, PCA etc.)
Expertise in Installation, Configuration and Customization of Documentum Suite of Products.
Extensive Experience in developing applications in DFC.
Extensive Experience in developingWorkflows.
Having strong analytical skills with proficiency in debugging, problem solving.
Self-motivated and quick learner of new concepts and technologies.
Experienced in Configuration onCARA 2.
Integration Architect & Data Scientist experience in Analytics, Big Data, BPM, SOA, ETL and Cloud technologies.
Expertise on Machine learning, Enterprise Integration, Cloud Integration, API Management, Micro Services, Internet of Things (IoT), web Technologies, JAVA, Security and Big Data.
A successful product owner commandeering agile teams in developing innovative software products.
Functional experience in Banking, Healthcare, Manufacturing, Life Science, retail, Oil Gas and BPO domains. Authored multiple whitepapers and articles describing enterprise integration best practices.
Built Coe competencies in the area of Analytics, SOA/EAI, ETL and BPM.
Experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning.
Experience in machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
Technical expertise in developingdatadriven solutions to drive business results. I enjoy solving complex problems and applying machine learning anddata sciencetechniques

TECHNICAL SKILLS

DataModelling Tools: Erwin r9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Programming Languages: Oracle PL/SQL, Python, SQL, UNIX shell scripting, Java.

Scripting Languages: Python (Numpy, SciPy, Pandas, Genism, Kera’s), R (Caret, Weka, ggplot)

BigDataTechnologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.

ETL: Informatic Power Centre, SSIS.

Project Execution Methodologies: Ralph Kimball and Bill Inmondatawarehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or AzureDataWarehouse

Tools: MS-Office suite (Word, Excel, MS Project and Outlook), Spark MLlib, Scala NLP, MariaDB, Azure, SAS.

Databases: Oracle, Teradata, Netezza, Microsoft SQL Server, MongoDB, HBase, Cassandra.

Operating Systems: Windows, UNIX, MS DOS, Sun Solaris.

Data Modelling Tools: Erwin r9.6/9.5, ER/Studio 9.7, Star-Schema Modelling, Snowflake-Schema Modelling, FACT and dimension tables, Pivot Tables.

Version Controller: TFS, Microsoft Visual SourceSafe, GIT, NUNIT, MSUNIT

Operating Systems: Windows Win8/XP/NT/95/98/2000/2008/2012, Android SDK.

Development Tools: R x 30, SQL x 27, Python x 22, Hadoop x 19, Hive x 13, Mat lab x 12, R Studio,SAS, MSOffice,Visual Studio 2010

Reporting Tools: Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0, Tableau.

PROFESSIONAL EXPERIENCE:

Confidential, New York City, NY

Data Scientist

Responsibilities:

Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XGBoost, SVM, and Random Forest.
A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
Used pandas, numpy, seaborn, SciPy, matplot, scikit-learn, NLTK in Python for developing various machine learning algorithms.
Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
Installed and used CUDA and CUDNN Tool kit Tensor flow Deep Learning Framework
Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.
Developed logical and Physical data models using Erwin to design OLTP system for different applications.
Worked asDataArchitectsand ITArchitectsto understand the movement ofdataand its storage and ERStudio9.7
Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
Transformed Logical Data Model to Erwin, Physical Data Model ensuring the Primary Key and Foreign Key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index Considerations.
Designed and developed user interfaces and customization of Reports using Tableau and OBIEE and designed cubes fordatavisualization, mobile/web presentation with parameterization and cascading.
PerformedDataAnalysis andDataProfiling and worked ondatatransformations anddataquality rules.
Created SSIS Packages using Pivot Transformation, Execute SQL Task,DataFlow Task, etc to importdatainto thedatawarehouse.
Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and SmartView.
Implemented Agile Methodology for building an internal application.
AsArchitectdelivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
Experience in Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
Updated Python scripts to match trainingdatawith our database stored in AWS Cloud Search, so that we would be able to assigneach document a response label for further classification.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLlib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce

Confidential, CA

Data Scientist

Responsibilities:

Enhancing data collection procedures to include information that is relevant for building analytic systems.
Processing, cleansing, and verifying the integrity of data used for analysis
Doing ad-hoc analysis and presenting results in a clear manner
Creating automated anomaly detection systems and constant tracking of its performance
Create data pipelines using big data technologies like Hadoop, spark etc.
Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
Used Pandas, Numpy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN fordataanalysis.
Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, Map Reduce, Pig and others.
Refine and train models based on domain knowledge and customer business objectives
Deliver or collaborate on delivering effective visualizations to support the client business objectives
Extensive understanding of the BI and analytics space with special focus on the consumer and customer space
Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprisedatamodel, metadata solution anddatalife cycle management in both RDBMS, BigData environments.
Strong command of data architecture and data modelling techniques.
Hands on experience with commercial data mining tools such as Splunk, R, Map reduced, Yarn, Pig, Hive, Floop, Oozie, Scala, HBase, Master HDFS, Sqoop, Spark, Scala (Machine learning tool) or similar software required depending on seniority level in job field.
Strong foundational statistical skills.
Strong communication skills - adept at communicating with business and technical stakeholders\Data Science -Ability to use data and apply robust analytics to model various business scenarios and analyse resulting cost, impact, risks, and benefits to inform and recommend business solutions.
Research machine learning algorithms and implement by tailoring to particular business needs and tested on large datasets.
Manipulating/mining data from database tables (Redshift, Oracle, Data Warehouse)
Create automated metrics using complex databases
Providing analytical network support to improve quality and standard work results.
Experience with Version Control (Git)
Create data pipelines using big data technologies like Hadoop, spark etc.
Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, Map Reduce, Pig and others
Interface with other technology teams to extract, transform, and load (ETL) data from a wide variety of data sources
Own the functional and non-functional scaling of software systems in your ownership area.
Provides input and recommendations on technical issues to BI Engineers, Business & Data Analysts and Data Scientists.
Outstanding analytical and problem-solving skills are essential.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, Informatic, Java, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.

Confidential

Data Analyst

Responsibilities:

Strong command of data architecture and data modelling techniques.
Hands on experience with commercial data mining tools such as, R, Map reduced, Yarn, Pig, Hive, Floop, Oozie, Scala, HBase, Master HDFS, Sqoop, Spark, Scala (Machine learning tool) or similar software required depending on seniority level in job field.
Strong foundational statistical skills.
Strong communication skills - adept at communicating with business and technical stakeholders\Data Science -Ability to use data and apply robust analytics to model various business scenarios and analyse resulting cost, impact, risks, and benefits to inform and recommend business solutions.
Research machine learning algorithms and implement by tailoring to particular business needs and tested on large datasets.
Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure
Manipulating/mining data from database tables (Redshift, Oracle, Data Warehouse)
Create automated metrics using complex databases.
Providing analytical network support to improve quality and standard work result.
Root cause research to identify process breakdowns within departments and providing data through use of various skill sets to find solutions to breakdown
Foster culture of continuous engineering improvement through mentoring, feedback, and metrics
Broad knowledge of programming, and scripting (especially in R / Java / Python)
Implemented Event Task for execute Application Automatically.
Troubleshot and resolved bugs in .NET applications to ensure optimal development environment.
Created views, queries, and data warehouse reports using SSRS providing management with financial information from SQL Server production databases.
Involved in developing Patches & Updates Module.
Proven experience building sustainable and trustful relationships with senior leaders.
Strong command of data architecture and data modelling techniques.
tested on large datasets.
Manipulating/mining data from database tables (Redshift, Oracle, Data Warehouse)
Create automated metrics using complex databases
Developed logical and Physical data models using Erwin to design OLTP system for different applications.
Providing analytical network support to improve quality and standard work results.
Experience with Version Control (Git)
Create data pipelines using big data technologies like Hadoop, spark etc.

Environment:Scala2Awk,Cascading,Cassandra,Clojure,Fortran,JavaScript,JMP,Mahout,objectiveC,QlikView,Redis,Redshifed, HTML, XML, XSLT, SQL Server 2008R2, SSRS, CSS, MS-Office.

Confidential

Data Analyst

Responsibilities:

Performed requirement analysis by gathering both functional and non-functional requirementsbased on interactions with the process owner and stake holders.
Interacted with department heads to finalize business requirements functional requirements
Gathering data information from multiple sources using analytical techniques, and presentingdata that visually communicates to user’s important aspects of the data to optimize the flow ofinformation
Pulled data from different databases and migrated data back and forth using SQL
Conducted data analysis including application of cross-functional use-cases that utilizeadvanced techniques such as ANOVA, linear/logistic regression, decision trees leveraginghuge datasets and provided the results of the analysis to offsite teams to support key businessfunctions using R.
Worked towards establishing best-practice to evaluate and improve the data quality foranalytical useBuilt advanced analytics models to identify significant features from the finalized datasetsLiaised with the other client-facing teams to help the client understand nuances of the modeldetails
Utilized Data Mining and Statistical Analysis skills for proposing strategy to acquire future

Environment: MySQL 5.5, SAS 9.4, R 3, MS Excel 2013, MS PowerPoint 2013, MS Visio 2013

Confidential

Data Analyst

Responsibilities:

Work with users to identify the most appropriate source of record and profile the data required for sales and service.
Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
Involved in defining the business/transformation rules applied for sales and service data.
Define the list codes and code conversions between the source systems and the data mart.
Worked with internal architects and, assisting in the development of current and target state data architectures.
Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
Involved in defining the source to target data mappings, business rules, data definitions.
Worked with BTEQ to submitSQLstatements, import and export data, and generate reports in Teradata.
Responsible for defining the key identifiers for each mapping/interface.
Responsible for defining the functional requirement documents for each source to target interface.
Document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team.
Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
Enterprise Metadata Library with any changes or updates.
Document data quality and traceability documents for each source interface.
Establish standards of procedures.
Generate weekly and monthly asset inventory reports.
Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
Responsible for defining the key identifiers for each mapping/interface.
Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
Performed data quality in Talend Open Studio.

Environment SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer

We provide IT Staff Augmentation Services!

Data Scientist Resume

New York City, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship