Data Scientist / Analyst Resume

PROFESSIONAL SUMMARY:

Senior Data Scientist, Analyst & Modeler with over 10+ years of Experience in Architecture, Identity Innovation Opportunities, Interpreting & Analyzing in a Fast - Paced Environment. An Advanced proficiency in all aspects of the business needs.
Experienced the machine learning system implementation process: generating training data, model Designed, feature selection, system implementation, and evaluation
Designed and developed scalable recommendation platform that can be used by various systems/application
Designed and evaluated novel approaches for handling large scale data analyses and extract relevant information
Applied data-mining, machine learning and/or graph analysis techniques for a variety of modeling and relevance problems involving users and their interests in various content types
Provided technical leadership and mentoring to other team members. participate in cutting edge research in machine learning applications
Expert in developing and debugging in Java on Unix and/or Linux
Strong background in machine learning and data mining with a broad understanding of supervised and unsupervised learning methods (such as Regression, Decision Tree, Collaborative Filtering, PCA, Clustering, etc.)
Strong mathematical skills with knowledge of statistical methods
Extensive experience in working with large data stores
Understanding of content recommendation, personalization, and real-time data-mining
Strong collaborative skills and the ability to work with team members from multiple roles to inform, influence, support, and execute our product decisions and launches
Ability to mentor other team members and contribute to a collaborative team environment
Passion for solving the world’s toughest problems, and the smarts to actually solve them
Extensive experienced building large-scale server applications is a plus
Experienced with Open source framework such as Hadoop, Hive, Spark, and Oryx
Lead the projects and provided technical execution to Designed of machine learning algorithms/solutions to solve healthcare and banking industrial problems
Applied methodologies (Neural nets, CART, Bayesian methods) on monitoring, classification, prediction, controls, and related research domains
Evaluated data quality, perform and critique appropriate statistical analyses using software such as R, Python, MATLAB, and/or Splus. Explored, determined & developed technical approaches
Effectively communicated technical analysis and results with business users
Foundation in theories underlying machine learning techniques
Experienced in developing machine learning packages with modern programming languages
Experienced with Designing and prototyping algorithms on data

TECHNICAL SKILLS:

Programming Languages: C, C++, JAVA & Assembly Language

Scripting: Perl, Python & Shell

Databases: SQL, NoSQL, Oracle, PostgreSQL & Mongo DB

GUI Development: TCL, TK & JAVA

Markup Languages: HTML, XML

Operating Systems: Windows, Linux

Code Maintenance Tools: CVS, Tortoise SVN & GIT

Programming Methodologies: Waterfall, Agile & Iterative

Debug: GDB, PDB

Desktop Tools: MS Office Suite & Latex

ETL Tools: DataStage & SSIS

PROFESSIONAL EXPERIENCE:

Confidential

Data Scientist / Analyst

Responsibilities:

Lead the full machine learning system implementation process: Collecting data, model design, feature selection, system implementation, and evaluation
Fulfilled all data science duties for a high-end at healthcare and operated with Data Scientists and Product Managers to frame a problem, both mathematically and within the business context
Natural Language Processing to understand text content on including reviews, descriptions and interactions between users on our marketplace
Expert knowledge developing and debugging in C, C++, Java, Python, R and Perl
Knowledge of one or more open-source Machine Learning framework
Develop prototypes and validate the results
Apply data-mining, machine learning and/or graph analysis techniques for a variety of modeling and relevance problems involving users and their interests in various content types
Utilized various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performances.
Enhanced the tool DMV using Core Java JDBC.
Created scripts to create new tables and queries for new enhancement in the application using TOAD.
Involved in loading process into the HDFS and Pig to preprocess the data.
Used Spark-SQL to load JSON data and create schema RDD and loaded it into the Hive tables and handled structured data using Spark SQL.
Developed Spark jobs to parse the JSON data or XML data.
Loaded the data into Spark RDD and did the memory data computation to generate the Output response.
Developed Spark scripts by using Scala shell commands as per the requirement.
Reduced redundancy among incoming incidents by proposing rules to recognize patterns.
Worked on Clustering and classification of data using machine learning algorithms.
Analyzed data and recommended new strategies for root cause and finding quickest way to solve big data sets.
Hands on experience in writing Pig Latin and Pig Interpreter to run the Map Reduce jobs.
Worked on Hive, Pig Queries and UDF’s on different datasets and joining them.
Developed PIG Latin scripts for the analysis of semi structured data.
Worked on a robust automated framework in data lake for metadata management that integrates various metadata sources, consolidates and updates podium with latest and high-quality metadata using the big data technologies like Hive and Impala.
Ingested data from variety of data sources like Teradata, DB2, Oracle, SQL server and PostgreSQL sources to data lake using podium and solved various data transformation and interpretation issues during the process using Sqoop, GIT and udeploy.
Designed data models for Metadata Semantic Layer in ERwin data modeler tool
Contribute to the production solutions to development, testing and deployment
Strong mathematical skills with knowledge of statistical methods
Extensive experience working with large data stores
Ability to mentor other team members and contribute to a collaborative team environment

Environment: R, R Studio, Podium Data, Data Lake, HDFS, Hue, Hive, Impala, Spark, Scala, Pig, Looker, Python, Perl, ERwin 9.64, HTML, JavaScript, Core Java, PostgreSQL, SSIS, Teradata, R, Classification Models, Rally, Kanban.

Confidential

Data Analyst / Data Modeler

Responsibilities:

Understood the ingestion mechanism of the data in Big Data and developed internal tool for data ingestion process using Python, SQOOP and Developed framework to Ingest multiple in parallel
Developed the application in Python language in windows environment
Came up with several options to the application to support one shot ingestion, continuous ingestion to mimic the real-world scenario.
Checked out the hardware correction algorithm by running with different memory patterns
Wrapper developed in Python to run this application along with other applications
Performed analysis in different stages of the system development life cycle to support development and testing efforts, identify positive and negative trends, and formulate recommendations for process improvements and developments standards.
Created DataStage packages to load data from XML Files, Ftp server and SQL Server to SQL Server by using Lookup, Derived Columns, Condition Split, Ole DB Command, Term Extraction, Aggregate, Pivot Transformation, Execute SQL Task and Slowly Changing Dimension.
Recreated SQL Stored procedures to accommodate modified logic for Check duplicate detection
Created DataStage & SSIS packages to read X9 files and load data into the database
Performed data analysis and data profiling using complex SQL on various sources systems.
Involved in identifying the source data from different systems and map the data into the warehouse
Performed data analysis and data profiling using complex SQL on various sources systems.
Created SQL scripts to find data quality issues and to identify keys, data anomalies, and data validation issues
Involved in defining the source to target data mappings, business rules, and data definitions
Created HBase tables to store variable data formats of input data coming from different portfolios
Create technical design documentation for the data models, data flow control process, metadata management.
Reversed Engineered and generated the data models by connecting to their respective databases.
Strong experience in importing the metadata from various applications and build end-to-end Data Lineage using ERwin
Designed conceptual data models based on the requirement, interacted with non-technical end users to understand the business logics.
Modeled the logical and physical diagram of the future state, Delivered BRD and the low-level design document.
Discussed the Data models, data flow and data mapping with the application development team.
Developed Conceptual models using Erwin based on requirements analysis
Developed normalized Logical and Physical database models to design OLTP system for insurance applications.
Used ERWIN and TOAD to review Physical data model of Oracle sources, Constraints, Foreign keys and indexing of tables for data lineage.
Designed, developed and deployed a website for the clients to check the status of the Gates Application in IEX using python.
Developed entire frontend and backend modules using Python on Django framework.
Used python to extract information from our customers using XML files.
Enhanced horizontally scalable APIs using Python Flask.
Integrated the work tasks with relevant teams for smooth transition from testing to implementation
Extensive use of GIT as a versioning tool.
Worked on Agile Methodologies and used CA Agile Central Rally and Kanban dashboard.

Environment: BigData Technologies (Hadoop, Pig, Hive, Sqoop, HBase, Hadoop-Map reduce), MS SQL Server 2014, DataStage & SSIS Package, SQL BI Suite (SSMS, SSIS, SSRS, SSAS), XML, MS Excel, MS Access 2013, Windows Server 2012, SQL Profiler, Erwin r8, TFS 2013.

Confidential

Data Analyst

Responsibilities:

Fulfilled generating training data, model design, feature selection, system implementation, and evaluation
Succor in designing and developing a scalable recommendation platform that can be used by various systems/application
Applied machine learning and/or graph analysis techniques for a variety of modeling and relevance problems
Participate in cutting edge research in machine learning applications
Relevant knowledge of machine learning and data mining concepts with an understanding of supervised and unsupervised learning methods (such as Regression, Decision Tree)
Experienced in developing and debugging in Java on Unix and/or Linux
Experienced in building large-scale server applications
Experienced in with Open source framework such as Hadoop
Experienced in working with large data stores

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship