We provide IT Staff Augmentation Services!

Data Scientist  Resume

2.00/5 (Submit Your Rating)

San Francisco, CA

SUMMARY:

  • 15+ years of IT experience which includes machine learning, AI, NLP, Statistics, Data analysis and data visualization, Big Data ecosystem technologies, Business Intelligence Development, Business analysis, Database Management and, software development.
  • Experience in predictive analytic procedures used in supervised learning (Regression, Neural Networks, Decision trees), unsupervised learning (Clustering - k-Means and PCA) using scikit-learn, keras and statsmodels packages.
  • Experience with developing and planning data mining and analytics projects in response to strategic business needs. Involved in diagnosing and resolving predictive and analytical models performance issues, monitoring analytical system performance and, implementing efficiency improvements.
  • Working knowledge of Big Data Analytics, Hadoop ecosystems like Hive, PIG, MAPREDUCE, SQOOP, OOZIE and Spark, integration with Python and R .
  • Good Working knowledge of Microsoft Business Intelligence Tools and extensive experience with SQL, PL/SQL and database concepts.
  • Work with business domain experts and application developers to identify data relevant for analysis / mining and, develop new predictive / analytical modeling methods and/or tools in marketing and retail.
  • A self-motivated and business-savvy data scientist with broad experience in collaborating with focused groups within business and IT ecosystem to avoid redundancy, establish best practices and guidelines for selecting, developing, and imple menting BI and analytics projects.
  • A self-starter, team player, excellent communicator with experience in managing and coordinating teams. Proactive and well organized with effective time management skills and problem solving skills.

TECHNICAL SKILLS:

Hadoop Eco System: HDFS, MapReduce, Pig, Hive, Hbase, Sqoop, Flume, Oozie, Zookeeper and Spark

Programming Languages: C#, C++,VBA, JAVA, R and Python

Databases: MS-SQL, MySQL, Access and Oracle.

MSBI: SSIS, SSAS, SSRS

IDE s: R-Studio, Anaconda (Spyder, jupyter), Visual Studio 2008/2010/2012, Eclipse and NetBeans

Operating Systems: Windows, Linux, UNIX.

Web UI: HTML, JavaScript, JQuery

Subversion: GIT, GITHUB, SVN, Tortoise SVN, Microsoft team foundation, SAP, Salesforce, Dynamics, SharePoint and WorkflowGen (Business Process Management).

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco, CA

Data scientist

Responsibilities:

  • Partner with stakeholders across the organization to identify high-impact opportunities to leverage extensive data to better serve our customers.
  • Drive the creation of comprehensive datasets encompassing user profiles and behaviors, and incorporating a wide variety of signals and data types.
  • Assess the potential usefulness and validity of new statistical approaches and data sources.
  • Build complex predictive models to substantially improve and continuously optimize user engagement and revenue generation.
  • Rapidly develop proof-of-concept prototypes to prove out hypotheses.
  • Reach across multiple functions, such as Product Management and Data Engineering, to implement the models into production and to monitor their performance.
  • Motivate and mentor other data scientists to grow their skills and careers.
  • Directly interact with internal clients to understand and help solve their business problems.
  • Interact and collaborate with engineers and product managers to develop our products.
  • Work in multiple research projects and help the bank to improve customers banking experience.
  • Create and prepare a data samples for machine learning applications.
  • Execution of data science libraries with Spark Analytics Engine.
  • Utilize statistical natural language processing to mine unstructured data, and create insights; analyze and model structured data using advanced statistical methods and implement algorithms and software needed to perform analyses
  • Develop methods to support and drive client engagements focused on Big Data and Advanced Business Analytics, in diverse domains such as product development, marketing research, public policy, optimization, and risk management; Communicate results and educate others through reports and presentations
  • Build and support visualization and exploration capabilities around the Data Sets
  • Work with the Data Extraction and Data engineers on normalization and analytical processes
  • Work in an Agile manner with business users and data engineers to understand and discover the potential business value of new and existing Data Sets and help productize those discoveries
  • Analyzes requirements and architecture specifications to create detailed design.
  • Design machine learning algorithms and techniques like linear regression, time series analysis, decision trees, and clustering and Research areas of interest to the team and help facilitate solutions

Environment: Big Data: Hadoop eco system, HDFS, Pig, Hive, MapReduce, Sqoop, PySpark.

Database: SQL developer, Oracle

Programming: R, Python 3.5 (Pandas, Statsmodels, Keras, flask Restful and tensorflow).

Confidential, Portland, OR

Data scientist

Responsibilities:

  • Support marketing services and drive customer analytics engagements focusing on assessing high risk customers for wealth management group and reducing business costs.
  • Collaborate and work closely with various business stakeholders to understand the business imperatives/ strategies and, formulate analytical problem statements.
  • Research data to improve customer engagement. Finding and fixing customer pain points.
  • Optimized customer classification by modeling customer segmentation using demographic attributes, geographic attributes and purchasing patterns.
  • Worked closely with the CRM and marketing department to fetch relevant data and strategies in order to optimize the model performance.
  • Guided data analyst to design the data model and data extracting, Loading and Transforming process (ETL).
  • Documented and submitted reports on descriptive statistics and graphs of predictor variables.
  • Performed data balancing to balance out the ratio of subscribers for churn versus active subscribers.
  • Traced and analyzed customer churn pattern over the historical data of 2 years.
  • Carried out logistic regression, analyzed coefficient estimates, probabilities of predicted and observed responses and concordance and discordant pairs.
  • Installed Hadoop, Map Reduce, HDFS, and developed multiple MapReduce jobs in Pig and HIVE for data cleaning and pre-processing.
  • Involved to import migrate the data from where house servers to HADOOP by using Sqoop .
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Prepare various sources of data using data wrangling methods in R, Python, SQL in infrastructure including AWS, Spark, Hadoop and relational database environments
  • Carried out forward, backward, subset and stepwise variable selection to obtain best model giving high C-statistic/concordance percentage
  • Used glm (generalized-linear-model) R Package for best generalized linear model selection.
  • Performed k-Means clustering in order to understand customer attitudes, behaviors, actions and segment the customers based on different levels of churn risk
  • Understanding the propensity scores given to the customers who are likely to churn
  • Translated analytical model findings to business insights and presented them to non-technical audiences.
  • Identified key KPI’s that improve customer retention, reduce the marketing costs, effective targeted services and increased ROI.
  • Worked on Interactive Dashboards for building story and presenting to business (Tableau and SSRS).
  • Supported the process of scope definition and solution architecture modeling.
  • Design the Big data analytics architecture and suggested the best practices on analytics solution delivery techniques.
  • Worked on data modelling aspect with the DBAs and developed detailed design specification.

Environment: Big Data: Hadoop eco system, HDFS, Pig, Hive, MapReduce, Sqoop, SparkSQL.

Database: MySQL, MS SQL.

Programming: R, VBA and Java, C#.

Confidential, NYC, NY

Data analyst - Hadoop Developer

Responsibilities:

  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Developed the Sqoop scripts in order to make the interaction between Hive and MySql Database.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Developed data pipeline using Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell.
  • Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Hive.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Responsible for processing ingested raw data using MapReduce, Apache Pig and Hive.
  • Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Responsible for migrating tables from traditional RDBMS into Hive tables using Sqoop and later generate required visualizations and dashboards using Tableau.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: Big Data: Hadoop, MapReduce, Hive, Pig, HBase, Oozie, Sqoop, Flume, Cloudera CDH3.

Database: MySQL, MS-SQL, MS Access, Ingress, SSIS, SSRS, SSAS, Data warehouse.

ERP, CRM & EPM: SharePoint, Opale, Sales Force and SAP, Remedy, MS-Project.

Programming: Core Java, Eclipse, VBA.

Methodologies: Agile

Confidential

Analyst ETL-BI Team Lead

Responsibilities:

  • Developing solutions to meet business and technical requirements;
  • Review and provide feedback on the business and technical requirements;
  • Test and document systems and applications developed;
  • Produce updates on the progress of development;
  • Develop and support BI reports and internal applications;
  • Develop stored procedures and optimize SQL request;
  • Participate in unit testing, functional and integrity of developed patches of units;
  • Lead a team of four developers.
  • Work with business users to refine the requirements;
  • Interact with managers for estimates of effort;
  • Participate in team meetings, technical exchange, design and code reviews

Environment: Database: MS-SQL, MS Access, SSRS, Data warehouse, ETL.

Programming: C# (Visual studio 2010), Dot Net, MVC, Jquery, JavaScript, VBA (Macro Excel & MS Access).

Methodologies: Agile (SCRUM)

Confidential

Business Analyst Team Lead

Responsibilities:

  • Gather requirements from business units and translate those to programmers and developers.
  • Confer with clients regarding the nature of the information processing or computation needs.
  • Prepare cost-benefit and return-on-investment analyses to aid in decisions on system implementation.
  • Determine what information is processed and how it is processed.
  • Lead a team of two BA and three developers.
  • Coordinate and link the computer systems within an organization to increase compatibility.
  • Consult with management to ensure agreement on system principles.
  • Expand or modify system to serve new purposes or improve workflow.
  • Assist in the preparation of reports and manuals.
  • Assist in the development of logical and physical specifications.

Environment: SharePoint 2010, Visual Studio Team Foundation Server, SAP.

Database: MS SQL, MySQL, Oracle, SSRS, SSIS, SSAS, ETL.

Programming: C#, Visual Basic.

Confidential

Analyst Programmer .Net

Responsibilities:

  • Work with the contact person to maintain and develop the software architecture;
  • Write, modify, integrate and test software code;
  • Ensure achieve consistent code quality and make the necessary corrections;
  • Assist in the preparation of reports and manuals;
  • Assist in the collection and documentation of user requirements;
  • Assist in the development of logical and physical specifications;
  • Respect deadlines deliveries by adjusting the time versus quality;
  • Make available to the company the latest technology and offer hardware and software improvements.

Environment: SharePoint 2010, SharePoint Designer, InfoPath, Visual Studio Team Foundation Server.

Database: MS-SQL, SSRS, Cristal reports

Programming: C# (Visual studio 2010), Dot Net, Jquery, JavaScript, Python.

Confidential

Analyst Programmer

Responsibilities:

  • Participate in unit testing, functional and integrity of developed patches of units;
  • Work with business users to refine the requirements;
  • Ensure achieve consistent code quality and make the necessary corrections;
  • Assist in the preparation of reports and manuals;
  • Assist in the collection and documentation of user requirements;
  • Assist in the development of logical and physical specifications;
  • Respect deadlines deliveries by adjusting the time versus quality;
  • Make available to the company the latest technology and offer hardware and software improvements.

Environment: Database: MS SQL, MySql. Programming: C++, PHP, C# (Visual studio 2010), Dot Net, Jquery, JavaScript.

We'd love your feedback!