Data Scientist Resume
Reston, VirginiA
SUMMARY:
- Over 15 Years of extensive experiences in machine learning, data mining, structured and un - structured data analysis, and image data analysis, including feature extraction, pattern recognition, algorithm development, text mining, computer simulation, data modeling, databases design, model evaluation and deployment.
- In-depth data analysis experience, including extracting and transforming of datasets from heterogeneous sources; utilize machine learning, data mining and statistical techniques to uncover hidden patterns and insights from large datasets; interpret findings and derive actionable insights, utilizing multiple techniques including Decision Trees, Random Forest, Neuro Networks, Singular Value Decomposition (SVD), Support Vector Machine (SVM), Linear discriminant analysis (LDA), Multi-fractal Structural Function, etc.
- Research and development experience in investigating and developing innovative approaches to solve complex problems; Developed a novel approach that provided new capability to extract signals through-clouds, which considerably improved performance comparing to conventional approaches.
- Strong experience in user requirement analysis, collaborating with clients to understand and analyze customer’s unique needs, capturing client's business objectives and creating user requirement documents; collaborated with clients during project to identify and timely in corporate further user needs or modifications.
- Experienced in R, Python, Matlab, C, C++, Java, JavaScript, IDL, etc. Expertise in object-oriented analysis and design, multidimensional data model design. Consulting experience both with commercial clients and government agencies.
- Strong communication skills, team-oriented and result oriented; can-do attitude and attention to details.
TECHNICAL SKILLS:
- Machine learning, data mining, and statistical analysis skills, with data analysis tools including R, Python, Matlab. Image analysis tools such as IDL.
- Multidimensional Data modeling, data warehouse design; Data visualization and dynamic dashboard development.
- Programming languages including Python, R, SAS, C, C++, Java, JavaScript, XML, Perl, etc.
- Microsoft Windows, UNIX, Linux; Object Oriented Analysis and design (OOA, OOD), Rational Rose, UML.
- Distributed computing and Big Data technologies, such as Apache Spark, Hadoop, HDFS, SAP HANA high performance analytical databases
- Databases including: SQL Server, Oracle, MySQL; version control such as Git and Github, SVN.
PROFESSIONAL EXPERIENCE:
Confidential
Data Scientist
- In-depth data exploration experience to identify important attributes and data elements; extract characteristics and trends of interests from diverse data sources; created data transformation algorithms for data from different domains; conducted analysis with billions of clinical interaction records.
- Experienced in integrating data from heterogeneous data sources, with varied forms including structured, semi-structured and unstructured data; Experience with “real-world” data cleansing, and handling of missing data, inconsistent data elements, and inaccuracies, outliers, etc.
- Experienced in solving complex data analytics problems. Experience in time-series analysis of complex events, and extract insights from event interactions from multiple domains; Building machine learning models from algorithm development through testing and model validation. Applied machine learning, data mining and statistics in understanding and deriving actionable insights.
- Experience with natural language processing (NLP); utilizing machine learning and text mining algorithms, including rule-based algorithms, to extract domain-specific concepts, and transform into proper data format for further analysis. Recommended effective data-processing approaches for various domain-specific situations, and implement the approaches accordingly.
- Collaborated with team members to design the common data model (CDM) in order to model the clinical processes with multifaceted interactions and complex medical events. Utilized multiple ontologies and dictionaries for data codification to improve data analysis accuracy and efficiency.
- Experienced with large-scale, high performance analytics system such as SAP HANA, a memory based high performance analytics database system. Experienced in Integrating with external data sources and external APIs. Developed technical documents and reports, and designing data visualizations to communicate complex analysis results.
Confidential
Data Scientist
- Developed machine learning algorithms to extract material DHR features, and created range-optimizing algorithms to predict and extend the range of coverage of for certain materials. This work improved spectral analysis and material identification accuracy in remote sensing applications.
- Responsible for the development of an innovative approach, the Compound k-Space Invariant Moment Approach, for feature identification from remote sensing dataset. This approach provided a novel capability for through-cloud feature extraction under certain type of clouds; discovered a new trace feature in the marine surface boundary layer that considerably extended the feature detection range. These results have been highly recognized by top management.
- Developed a Machine Learning algorithms to extract BRDF optical properties from large remote sensing datasets, which is a challenging task because of the low level of signal from water body reflectivity. A custom correction method was developed to reduce noise in the data with atmospheric correction to further improve the accuracy.
- Big-data analysis of large, multi-source datasets, including extracting and transforming large binary datasets; Identify hidden patterns in data with various machine learning algorithms such as Singular Value Decomposition (SVD), Support Vector Machine (SVM), Multi-fractal Structural Function, etc.
- Developed data analytics solutions with full life cycle and in a collaborative environment, including communicate with clients, capture business objectives, data exploration, machine learning algorithm development, performance testing and optimization, and results presentation.
Confidential
Sr. Data Analytics Consultant
- Responsible for the Social Trending project using web data mining approach. Conduct user requirement analysis, collaborated with clients to understand and analyze customer’s unique needs.
- Carrying out research to investigate optimum approaches for the web mining project, and identifying key factors that could characterized the influencing forces on Internet populations in China in order to characterize and discovery of related social trends.
- Identified proper data sources on the Internet for the project that contained adequate data over long enough time periods, and provided broad enough geographical coverage. Developed retrieval tools for relevant datasets from the Internet.
- Conducted initial investigation of datasets, data preparation including data cleaning and filtering out noisy datasets. Identifying correlations between variables and performing statistical analysis with statistical and analysis tools. Verification and testing of the resulting model with cross verification of different data sets.
- Discovered certain important social trends among Internet population in China that provided fresh insights to understand the short-term and long-term impacts of the changing economic and social environment in the country.
Environment: MS Windows system, SQL Server database; Python, R, Matlab, Statistical ToolBox, Internet data sources.
Confidential
Visiting Professor
- Responsible for international exchange programs, collaborated with oversea institutions to establish collaborative educational programs for graduate students at the Confidential .
- Taught courses in object oriented analysis and design for graduate students at the Confidential .
- Organized international conferences on advanced Internet technologies, including conference program planning, inviting speakers, and served as panel speaker at the conferences.
Confidential, Reston, Virginia
Sr. Architect and Project Lead
- Collaborated with users of different agencies to understand and analyze their unique needs, created requirement documents to capture all user requirements.
- In charge of the architectural design of the patent review and collaboration system to in corporate the unique needs of the US Patent and Trademark Office and DTRA clients.
- Led the team for the development, testing and deployment of the patent review and collaboration system.
- Served as team lead in the design and development of the project.
- Collaborated with users of different departments of the county government to capture and analyze the unique needs of different departments; created requirement documents to capture all business objectives.
- Conducted full lifecycle development, including user requirement analysis, project architecture design, application development, testing and deployment of the final application; follow up closely with clients at each phase to ensure timely delivery of applications with customer satisfaction.
- Utilized Visual C++ to implement core functionalities of application for high application performance and scalability; and used VB for front end for fast prototyping and UI implementation.
Environment: MS Windows system, SQL Server database, Microsoft Visual Studio, C++, VB, Citrix.
Confidential, Alexandria, Virginia
Sr. Consultant
- Responsible for the project to build new capability of the Digital Asset Management Systems to in corporate the search function of video clips, with the advanced search capability to provide search results to specific video frames.
- Project architecture designing with Object Oriented Design, using Rational Rose and UML as design tool, to provide better reusability and scalability for the Digital Asset Management Systems.
- Developed the application with Microsoft Visual C++, with Microsoft Foundation Class (MFC) object model for implementation; Microsoft SQL Server and Oracle was used to manage digital asset data.
Environment: MS Windows system, Microsoft SQL Server and Oracle database, Microsoft Foundation Class (MFC), Microsoft Visual Studio, Visual C++, Windows Web Server, XML, DHTML.
Confidential
IT Consultant
- Serving as an IT consultant for this business strategy computer simulation project, I utilized my analytical and computer skills and expanded my experience into IT consulting.
- Responsible for development and upgrade of the business strategic and operation simulation system in collaboration with Wharton School professors, which simulated the dynamic competing environment of multiple companies in an industry.
- The business strategy simulation system was used in training to top managers of Fortune 100 companies in US.
- Experienced with visual C++ and MFC for implementation of design functionalities of this project.
Environment: MS Windows system, Microsoft Foundation Class (MFC), Microsoft Visual Studio, Visual C++.
Confidential, Philadelphia, PA,
Medical Imaging Physicist
- Conducted research in digital image processing and medical image pattern recognition, and resulted in improvement of imaging quality for better diagnostic accuracy. Experienced with image analysis software such as Matlab.
- Conducted research on the optimization of medical imaging device for various imaging systems, improved the dynamic range and spatial resolution of imaging devices.
- Responsible for design and development of an online radiation safety compliance training web application to accommodate the different shift and schedule of radiology technicians and physicians, which was presented and well accepted at a professional annual conference.
Environment: MS Windows system, Microsoft Active Server Page, Windows Web Server; Matlab Statistics and Image Processing Tools.
Confidential, Long Island, New York
Post-doctoral Fellow
- Participated in the research project to probe the fundamental quark structure of proton and neutron, in which I was responsible for developing computer simulations and data analysis for the project. Several research papers were published as a result in top professional journals.
- Experienced in analyzing and formulating solution to challenging research problems; expertise in developing computer algorithms to process and analyze data, identify and filter out various noises from data, analyze and extract hidden patterns and correlations from complex data.
- Responsible for the project of developing a Monte-Carlo simulation system for the whole body measurement facility at BNL, and achieved the desired improvement of measurement accuracy of the facility; Experienced with Matlab statistical analysis tools.
Environment: MS Windows System, UNIX System, Matlab, Statistics Tools, C and C++ Language.