Data Scientist/ Data Engineer Resume
New York, NY
SUMMARY:
- Over 10 years IT working experience developing business intelligence applications including analytical tools and
- ETL to drive decisions making. 4+ years experience working in Machine Learning, Data Mining with large datasets of structured and unstructured data,
- Data Validation, Data Visualization and Predictive Modelling.
- Strong mathematical knowledge and hands on experience in implementing Machine Learning algorithms like K -
- Nearest Neighbors, Confidential Regression, Linear regression, Support Vector Machines,
- Decision Trees, Confidential, unsupervised algorithms such as Confidential and good knowledge on Recommender Systems.
- Experience in using Optimization Techniques like Gradient Descent, Stochastic Gradient Descent.
- Hands on experience on Spark ML lib utilities such as classification, regression, clustering, collaborative filtering, etc.
- Expertise in leveraging the Exploratory Data Analysis with all Confidential and by plotting all kind of relevant visualizations to do feature engineering and to get feature importance.
- Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy, Scikit-Learn and Pandas for data analysis and Confidential .
- Involved in building Recommender systems leveging Confidential big data technology.
- Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments.
- Experience in working at Low Latency application with performance tuning on the Regularization.
- Extensive experience working in a Test-Driven Development and Agile-Scrum Development.
- Experience working on both windows, Linux platforms.
- Working with Operating Systems like Centos5/6, Ubuntu13/14.
- Experience in using GIT Version Control System.
- Experience with Real time streaming using Spark Streaming form different sources using python/scala to work with Spark RDD, DStream, Spark SQL and Data frames, etc.
- Working knowledge of big data ecosystem including MapReduce, HDFS, Pig, Confidential, Kafka, Sqoop, HBase, Ambari, HortonWorks, etc.
- Experiences of software development life cycle (SDLC) and project methodologies, agile methodologies as well as tools and techniques within each phase (e.g., requirement elicitation, data dictionaries, data diagram, ERD, programming languages, technical platforms, standards and procedures).
- Report/dashboard visualization through SSRS/Power BI / Micro Strategy / Tableau / QlikSense / Matplotlib, Seaborn .
- Installation and configuration of web server.
- Maintenance of metadata, updating, security, troubleshooting. Created reports/dashboards.
- Released to end users, or published to SharePoint/web portals.
- Experiences in leading projects and providing recommendations on key business decisions
- Strong troubleshooting and problem solving skills.
- Strong ability to prioritize multiple tasks.
- Excellent communication skills, proactive, self-managing and teamwork spirits.
- Authorized to work in United States for any employer.
TECHNICAL SKILLS:
Database: SQL, PL/SQL, NoSQL, MongoDB, MySQL, MS Access, SSIS, SSRS, SSAS, HiveQL
Operating Systems: Windows Enterprise 2008, Advanced Server /Windows 7/NT 4.0 Server, Azure, Centos5/6, Ubuntu13/14
Applications: Visual Studio 6.0/.NET, Performance Point Server2007, SharePoint 2013, Git, Jupyter Notebook/Pandas/SciPy, Beautiful Soup, Confidential, HBase
Reporting Tools: Tableau, SSRS, SAP Crystal Report, MS Excel, PowerBI, QlikSense / Matplotlib, Seaborn etc.
Language: Python .NET, C#, HTML, XML, ASP, JSP, JavaScript, Spark, Scala
Protocols: HTTP, TCP/IP, FTP
PROFESSIONAL EXPERIENCE:
Confidential, New York, NY
Data Scientist/ Data Engineer
Responsibilities:
- Working in a team 5 data scientists to develop a risk model for credit limit increase eligibility and segmentation, and other policies.
- Input variable pool includes over 2K bureau attributes and internal performance metrics
- Developed a revenue model for Partnerships credit card targeting using zero inflated regression modeling methodology.
- The model enables the business to target low risk, high revenue prospects
- Rationalized models for onboarding a new partner to Partnerships credit card business
- Built 3 valuations models for credit card customer acquisitions and customer management decisions.
- Two out of the three models were the first account level valuations models for Partnerships credit card business
- Conducted analysis on billions of customer transactions to identify sales and promotions scopes using Hadoop, Spark, Confidential and AWS suite
- Developed a model for predicting repayment of debt owed to Confidential businesses.
- Performed scoring and financial forecasting for collection priorities and predict fraud.
- Created and validating dashboards, KPI's, visualization reports by Tableau, especially focusing on measuring how the loyalty programs improving the total sales (year by year, month by month) within different customer segmentations.
- Worked with AWS Components like EC2, S3.
- Collected and aggregated large amounts of streaming data into HDFS using Confidential .
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Confidential queries
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS
Confidential, NJ
Data Scientist/ Data Engineer
Responsibilities:
- Characterized false positives and false negatives to improve a model for predicting customer churn rate.
- Conducted consumer segmentation and characterization to predict behavior.
- Analyzing promoters and detractors (defined using Net Promoter Score).
- Performed applied research and development, and prototype activities
- Acquired, cleaned and structured data from multiple sources and maintain databases/data systems.
- Identified, analyzed, and interpreted trends or patterns in complex data sets.
- Developed, prototyped and tested predictive algorithms.
- Filtering and "cleaning" data and review computer reports, printouts, and performance indicators to locate and correct code problems.
- Developed and implemented data collection systems and other strategies that optimize statistical efficiency and data quality.
- Used different statistical models like regression and classification models to create contact scoring models for all Cisco contacts
- Over 100TB scale data cleansing and data mining using Apache spark.
- Developed Spark code using python and Spark-SQL/Streaming for faster testing and processing of data.
- Interpreted data, analyzed results using statistical techniques and provide ongoing reports.
- Built a recommender system based on clients past renewal history to upsell and cross-sell them other similar products or services.
- Created a recommendations engine that finds related customers, product.
- Collaborated with project managers to set up data management conventions to ensure quality control, promote best practices, and methodologies of development and testing.
- Developed production code in Confidential utilizing random forest model for customer discount optimization on millions of customers in Spark cluster which improves the total sales by 5%
- Staying abreast of industry changes related to BI, analytics, and big data, and seek and capitalize on the opportunities to improve productivity, improving time-to-market, improving solutions quality, and improving cost effectiveness
- Performed in-depth analysis on market trends, applying quantitative research to understanding demand and industry trends that maximize the company position in the marketplace.
- Conducted data analysis and visualization using python Beautiful Soup, NumPy, SciPy, Pandas, Matplotlib, tkinter, scrapy and other libraries.
Confidential, NJ
Data Engineer/Scientist
Responsibilities:
- Worked with project managers and financial analyst to gather requirements and business rules from different departments and organizations.
- Develop measurement approach to identify new sales and margin opportunities through customer behavior insight.
- Built predictive models targeting on Net Sales at many different Loyalty & Personalization market segments using Light Gradient Boosting and decision trees by python packages.
- Built models for prediction of the sale trips downward migration at different channels with direct mail and email.
- Designed a Recommender system to suggest cross sell categories for Watches channel
- Analyzed features that leads to customer's turning dormant and communicated results to business partners.
- Created and validated dashboards, KPI's, visualization reports by Tableau, especially focusing on measuring how the loyalty programs improving the total sales (year by year, month by month) within different customer segmentations.
Confidential, Montvale, NJ
Lead BI/Data Analytics Developer
Responsibilities:
- Collected requirement directly from different groups across KTech, including internal employees and external data servicers.
- Documented the business requirements and technical specifications and designed the business/functional requirements.
- Designed and created a snowflake-schema Data Warehouse in SQL Server 2008 R2 with ErWin r8.
- Created SSIS packages to load the staging tables, Data Warehouse, and to run jobs.
- Data sources include flat files, spreadsheet, access, SAP Oracle database, SQL database, SharePoint list, HTML files; and other third party sources.
- Created Reports to validate the Data before and after the migration/integration.
- Created Views and Table-valued Functions to provide the reporting solutions. Stored procedures were created to load data and to log errors in ETL.
- Collected business rules and requirements. Multiple source system identification.
- Data cleansing. Data mapping, data dictionary, data modeling (conceptual, logical, physical), dimensional modeling. Staging, ODS, NDS, DW, DM, ET. Cube creation. Data analysis. Data mining. Data sourcing including Excel, Flat file, SQL Server, Oracle, Teradata, Sybase, MySQL, PostgreSQL, Hadoop (Kafka, Confidential, Pig, Impala, Spark), NoSQL, MongoDB, Azure, AWS.
- Staying abreast of industry changes related to BI, analytics, and big data, seek and capitalize on the opportunities to improve productivity, improving time-to-market, improving solutions quality, and improving cost effectiveness
Confidential, New York, NY
BI Developer
Responsibilities:
- Collected requirement directly from risk group members, internal employees and external data servicers. Documented the business requirements and technical specifications.
- Designed and created a snowflake-schema Data Mart in SQL Server 2008 with ErWin r7.
- Created SSIS packages to load the staging tables, data marts, and to run jobs.
- Wrote Macros in Excel to parse raw data in excel files so that they can be used in ETL.
- Created Views and Table-valued Functions to provide the reporting solutions. Stored procedures were created to load data and to log errors in ETL.
- Created SSAS cubes and SSRS Reports to demonstrate how they work and what reports they can provide.
- Assisted DBA to deploy all the data mart objects and SSIS packages to production server, and jobs were scheduled run the packages.
- Helped business users to reconcile reports generated from data mart with in-house reports.
- Scheduled and monitored all maintenance activities including database consistency check, data backup & restore strategy, index defragmentation, statistics updates.
Confidential, Montclair, NJ
SQL Server BI Developer
Responsibilities:
- Designed and implemented a star-schema Data Mart in SQL Server 2008.
- Created SSIS packages to load the staging tables, data marts, to refresh cubes, and to run jobs.
- Deployed the package to server and jobs were scheduled run the packages.
- Created OLAP cubes for the traders to analyze the performance of the algorithm from different perspectives.
- Scheduled and monitored all maintenance activities to meet business requirements activities including database consistency check, data backup & restore strategy, index defragmentation.
- Modeled and forecasted database and platform resource utilization to meet user needs and respond to anticipate technological innovations.