- Around 4 years of IT experience in the field of Data Scientist/Data analysis, Visualization and Machine Learning.
- Strong experience in Business and Data Analysis, Data Profiling, Data Migration, Data Integration and Metadata Management Services.
- Knowledge of CRISP - DM methodology for prediction
- Experienced with machine learning algorithm such as logistic regression, random forest, KNN, SVM, neural network, linear regression, lasso regression and k-means
- Implemented Bagging and Boosting to enhance the model performance.
- Experience of working on Python 3.5/2.7 (Numpy, Pandas, Matplotlib, NLTK and Scikit-learn)
- Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupiter Notebook 4.X and Alteryx
- Comprehensive knowledge and experience in normalization/de-normalization, data extraction, data cleansing and data manipulation
- Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server 2008, Oracle, Redshift, Neteza
- Expert in Informatica Power Center 9.x, 8.x (Designer, Workflow Manager, Workflow Monitor), and Power Connect, Power Exchange.
- Experienced in the Analysis, Design, Development, Testing, and Implementation of Data Warehouse solutions for Financial and Retail Sectors.
- Excellent in analyzing and documenting the business requirements in functional and technical terminology.
- Experience of working on Agile and Waterfall Methodology.
- Excellent interpersonal and communication skills.
- Worked on Data Cleaning and Statistical techniques like Regression Estimates, Time Series Analysis and Cohort Analysis.
- Extensive experience in project management best practices, processes, & methodologies including Rational Unified Process (RUP) and SDLC
- Ability to understand current business processes and implement efficient business process.
- Strong knowledge on open source search technologies - Elastic Search, SOLR and Lucene
- Excellent analytic, logical, programming and problem solving skills
- Experience in developing SOAP and REST based Web Services design development.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Map Reduce and YARN concepts.
- Experience in implementation of machine learning programs in Python.
Machine Learning: Prediction, Classification, Clustering and Time series algorithms
Programming Language: Python, SQL, Visual Basic, C#, C++
Tools: Microsoft Office Pro (Word, PowerPoint, Excel, Access), Quick Books, Crystal Reports, SQL Server
Platforms: Windows, Linux 95/98/NT/2000/XP/2003/2007
Database: MySQL,Oracle, Mongodb
Reporting: Tableau, QlikView, D3JS and Excel
Confidential, Kent, OH
Data Scientist/ Analyst
- Worked to fix income trading, structured fixed income portfolios, econometrics, and financial time series analysis using advanced analysis methods, such as PCA, autocorrelation, GARCH, Kalman filtering; and critical use of software such as MATLAB and Python.
- Managed and coded application development projects using C++ and Python for clinical trials, market research, and capital markets trading risk management systems.
- Managed global large-scale data analysis project with multicore system using MapReduce ("divide and conquer") using Hadoop software to produce deliverables in brief timeframes.
- Served on speaker panels as both a moderator and speaker on topics such as data science, quantitative finance, and information systems. Delivered customized in-depth training on financial concepts and risk management practices.
- Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Teradata.
- Written several shell scripts using UNIX Korn shell for file transfers, error logging, data archiving, checking the log files and cleanup process.
- Conducted GAP analysis so as to analyze the variance between the system capabilities and business requirements.
- Interacted with teams in AFS, ACBS and infolease to extract the information for the reports.
- Involved in defining the source to target data mappings, business rules, business and data definitions
- Metrics reporting, data mining and trends in helpdesk environment using Access
- Interacted closely with business users, analysts and developers. Wrote software for quantitative analysis of capital markets in statistical languages: MATLAB and Python.
- Performed Bayesian time series and econometric analysis of exogenous market variables, modeled in open source software.
- Collected historical data and third party data from different data source
- Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, Numpy
- Worked on outlier’s identification with box-plot, K-means clustering using Pandas, Numpy
- Participated in features engineering such as feature intersection generating, feature normalize and Label encoding with Scikit-learn preprocessing
- Modeled customers to discover untapped business opportunities.
Jr. Data Scientist/ Analyst
- Member of design team and developed application to load data coming from different sales system, validated data and loading data into targets.
- Requirement gathering and performing data modeling for new requirements
- Created logical and physical data models using Erwin
- Design 3NF schema design for the OLTP applications
- Identified and evaluated various distributed machine learning libraries like Mahout, MLLib (Apache Spark) and R.
- Evaluated the performance of Various Classification and Regression algorithms using R language to predict the future power.
- Worked closely with business, data governance, SMEs and vendors to define data requirements.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Designed the prototype of the Data mart and documented possible outcome from it for end-user.
- Developed Extraction, Transformation and Loading of data from different source systems using Informatica Power Center tools - Mapping Designer, Repository Manager, Workflow manager and workflow Monitor
- Created complex mappings using transformations like Source qualifier, Sequence generator, Lookup, Joiner, filter, Update Strategy, Rank and aggregators.
- Implemented Slowly Changing Dimension Type 1 and Type 2 for maintaining Targets.
- Created workflows and work lets taking into consideration the interdependencies between sessions and mappings and various commands like command, assignment, control and session tasks.
- Performance enhancement of the Mappings and data access
- Involved in preparing test plan and testing for ETL development.
- Proactively engage with product and development teams to define next generation product features, specifications and requirements, and research on existing web technologies to design and implement these requirements
- Performed data formatting involves cleaning up the data.
- Designed and prepared technical specifications and guidelines.
- Developed and maintained high performance, high-available, scalable data processing software frameworks and data models.
- Validation of data integrity by running different API in Elastic Search
- Adopted best engineering practices and develop high quality maintainable code.