Jnr DataScientist Resume Kansas City - Hire IT People

PROFESSIONAL SUMMARY:

8+ years of IT industry experience and Data Scientist with 2+ years of experience specialized in implementing advanced Machine Learning, Deep Learning and Natural Language Processing algorithms upon data from diverse domains and building highly efficient models to derive actionable insights for business environments leveraging exploratory data analysis, feature engineering, statistical modeling and predictive analytics.
Experiences in machine learning, data mining, structured and un - structured data analysis, and image data analysis, including feature extraction, pattern recognition, algorithm development, text mining, computer simulation, data modeling, databases design, model evaluation and deployment.
Experience with Statistical Analysis, Data Mining and Machine Learning Skills using R, Python and SQL.
Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever evolving regulatory environment.
Strong practical understanding of statistical modeling and supervised/unsupervised/reinforcement machine learning techniques with keen interests in applying these techniques to predictive analytics
Good familiarity in entire Data Science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modeling, Evaluation, Optimization, Testing and Deployment.
Experience in problem solving, data science, machine learning, statistical inference, predictive analytics, descriptive analytics, prescriptive analytics, graph analysis, natural language processing, and computational linguistics; with extensive experience in predictive analytics and recommendation.
Extensive experience in various phases of software development like analyzing, gathering and designing the data with expertise in documenting.
Hands on experience on clustering algorithms like K-means & Medoids clustering and Predictive and Descriptive algorithms.
Expertise in Model Development, Data Mining, Predictive Modeling, Descriptive Modeling Data Visualization, Data Clearing and Management, and Database Management.
Good Knowledge of Apache Hadoop technologies like Pig, Hive, Scoop, Spark, Flume and HBase.
Experience using machine learning models such as random forest, KNN, SVM, logistic regressions and used packages such as ggplot, dplyr, lm, rpart, randomForest, nnet, PROC-(pca, dtree, corr, princomp, gplot, logistic, cluster), numpy, sci-kit learn, pandas, etc., in R, SAS and python.
Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiments), machine learning techniques, algorithms, data structures and data infrastructure.
Extensive hands-on experience and high proficiency with structured, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools.
Experienced in designing star schema (identification of facts, measures and dimensions), Snowflake schema for Data Warehouse, ODS Architecture by using tools like Erwin Data Modeler, Power Designer, E-R Studio and Microsoft Visio.
Having continuous learning approach in Elastic Search engine Lucene/Index based search, Kibana and other new tools.
Expertise in Excel Macros, Pivot Tables, Vlook-ups and other advanced functions and expertise R user with knowledge of statistical programming languages SAS.
Excellent experience on Teradata SQL queries, Teradata Indexes, Utilities such as MLOAD, TPump,Fast load and Fast Export.
In-depth experience in R, Python, Spark, PySpark, SQL, MongoDB, Scikit Learn, Hadoop, Amazon AWS, Microsoft Azure, REST APIs, Unix, LINUX, GIT, R Shiny & ShinyDashboard.
Strong skills in Statistics Methodologies such as Hypothesis Testing, Principle Component Analysis (PCA), Correspondence Analysis.
Professional working experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, K-Means Clustering and Association Rules.
Experience with Amazon Web Services (AWS) in planning, designing, implementing and maintaining system applications in AWS Cloud in Windows and Linux Environments.
Experienced in developing complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PL/SQL.
Proficient in Big Data, Hadoop, Hive, MapReduce, Pig and NoSQL databases like MongoDB, HBase, Cassandra.
Experienced in SQL Queries and optimizing the queries in Oracle, SQL Server, DB2, PostgreSQL, Netezza and Teradata.
Strong experience in maintenance of PostgreSQL, Orcale, Big Data databases and updating the versions.
Experience in installing, configuring and maintaining the databases like PostgreSQL, Oracle, Big Data HDFS systems.
Experienced of statistical analysis using R, SPSS, Matlab and Excel.
Highly motivated team player with excellent Interpersonal and Customer Relational Skills, Proven Communication, Organizational, Analytical, Presentation Skills, and Leadership Quality.

TECHNICAL SKILLS:

Programming & Scripting languages: R (Packages: Stats, Zoo, Matrix, data, table, OpenSSL), Python, SQL, C, C++, JAVA, JCL, COBOL, HTML, CSS, JSP, Java Script, Scala

Database: SQL, MySQL, TSQL, MS Access, Oracle, Hive, MongoDB, Cassandra, PostgreSQL

Statistical Software: SPSS, R, SAS

Algorithms Skills: Machine Learning, Neural Networks, Deep Learning, NLP, Bayesian Learning, Optimization, Prediction, Pattern Identification, Data / Text mining, Regression, Logistic Regression, Bayesian Belief, Clustering, Classification, Statistical modeling

Data Science/Data Analysis Tools & Techniques: Generalized Linear Models, Logistic Regressions, Boxplots, K-Means, Clustering, SVN, PuTTY, WinSCP, Redmine (Bug Tracking, Documentation, Scrum), Neural networks, AI, Teradata, Tableau

Development Tool: R Studio, Notepad++, Python, Jupiter, Spyder IDE

Python Packages: Numpy, SciPy, Pandas, scikit-learn, Matplotlib, seaborn, statsmodels, Keras, TensorFlow, Theano, TensorFlow, NLTK, Scrapy

Techniques: Machine learning, Regression, Clustering, Data mining

Machine Learning: Naïve Bayes, Decision trees, Regression models, Random Forests, Time-series, K-means

Cloud Technologies: AWS (EC2, S3, RDS, EBS,VPC, IAM, Security Groups), Microsoft Azure, Rackspace

Operating Systems: Microsoft Windows, Linux (Ubuntu)

Big Data: Hadoop, MapReduce, Apache Spark, Hive, Pig

PROFESSIONAL EXPERIENCE:

Confidential, Kansas City

Jnr DataScientist

Responsibilities:

Implemented end - to-end systems for Data Analytics, Data Automation and Integration.
Responsible for data identification, collection, exploration & cleaning for modeling, participated in model development.
Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network.
Implement various statistical techniques to manipulate data (missing data imputation, principle component analysis and sampling) and build predictive models.
Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop. Implemented a Python-based distributed random forest via Python streaming.
Used R, Python and Spark to develop variety of models and algorithms for analytic purposes
Identifying the energy consumption parameters and building the model to identify what causes more consumption.
Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions.
Worked on fraud detection analysis on payments transactions using the history of transactions with supervised learning methods.
Experience working with Enterprise Fraud management team developing new rules to detect Internal Fraud activity and enhancing the old Fraud detection rules
Collected data in Hadoop and retrieved the data required for building models using Hive.
Used Pandas, Numpy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Decision Trees, Logistic regression, Gradient Boosting, SVM and KNN.
Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.
Used PCA and other feature engineering techniques for high dimensional datasets while maintaining the variance of most important features.
Created Transformation Pipelines for preprocessing large amount of data with methods such as imputing, scaling, selecting, etc.
Ensemble methods were used to increase the accuracy of the training model with different Bagging and Boosting methods.

Environment: Hadoop 2.x, HDFS, Hive, Pig Latin, Python 3.x (Numpy, Pandas, Scikit-learn, Matplotlib), Jupyter, GitHub, Linux

Confidential - Austin,TX

Jnr Data Scientist / Data Analyst

Responsibilities:

Gathered information data from different sources, and performed resampling strategy to deal with the issue of imbalanced information.
Worked with ETL Team and Doctors to comprehend the information and characterize the uniform standard organization.
Worked with various data pools and DBAs to have access to data. Have knowledge of NLP, NLTK or Text Mining
Trained and supervised learning up to 8 other team members for the SQL/Scala/Spark programming language and assist in the installation and upgrading of Python, Scala, Java and Spark
Have programming knowledge in Java, Scala, spark, SQL and python
Used K - means clustering for grouping similar data and documented.
Extracted, transformed, and loaded data in Postgres data base using Python scripts.
Split the information into various littler dataset in view of various conclusions, accountable for leading exploratory information investigation for three of determinations datasets (Diabetes, frosty/influenza, hypersensitivity).
Created solutions to detect useful patterns out of 2 TB log data. This helps bring out crucial application Errors. Provides engineering solutions proactively for all business units involved to effectively make a variety of error handling decisions, especially from prioritization point of view using R-Studio.
Created the entire pipeline of information preprocessing (crediting, scaling, name encoding) through python pandas to prepare information to demonstrating part.
Built prescient models, utilizing python scikit-get the hang of, including Support Vector Machine, Decision tree, Naive Bayes Classifier, Neural Network to foresee a potential readmitted case.
Performed Ensemble strategies, including Gradient Boosting, Random Forest, redid outfit technique to create more exact arrangements.
Designed and actualized cross-approval and factual tests including Hypothesis testing, AVOVA, Chi-square test to check models' criticalness.
Created an API by utilizing Flask and imparted the plan to application group and enable them to characterize the prerequisites of new application.
Used agile procedure and Scrum process for venture creating.

Environment: SQL server 2012, SQL Server Integration Services, Python 2.7, Jupyter notebook, Flask 0.10, SharePoint 2013

Confidential, Atlanta, GA

Data Analyst

Responsibilities:

Worked with applications like R and Python to develop neural network algorithms, cluster analysis.
Conducted a range of statistical analyses to provide valuable data - driven insights for business decision making.
Worked with packages like ggplot2 and shiny in R to understand data and developing applications.
Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
Prepared ETL technical Mapping Documents along with test cases for each Mapping for future developments to maintain SDLC and Migration process. Used Talend for Extraction and Reporting purpose.
Actively taken part in Data Profiling, Data Cleansing, Data Migration, Data Mapping and actively helped ETL developers to Compare data with original source documents and validate Data accuracy.
Worked on Tableau, to create dashboards and visualizations.
Analyze customer data in Python and R to track correlations in customer behavior, define user segments to implement process and product improvements.
Worked on cleaning, exploring and manipulation of source data and transform to target system using Python and tools such as Pandas, Numpy, Matplotlib and PostgreSQL.
Gathered and analyzed business requirements, interacted with various business users, project leaders, developers and also took part in identifying different data sources.
Worked on python which is used for analyzing, designing, developing and implementing statistical/data models and integrating python with database.
Prepared several analytical reports comprised of different data modeling techniques such as time series analysis, financial modeling, and trend mapping
Worked on different data model designs, Data Extraction, Transformations, Mappings, Loading and generating Customized Analytical Reports.
Analyzed the business requirements and designed Conceptual and Logical Data models using Erwin and generated database schemas and DDL (Data Definition Language) by using Forward and Reverse Engineering.
Worked as data modeler/analyst by creating and developing relational and dimensional data models by using Erwin.
Identified and designed business Entities and attributes and relationships between the Entities to develop a Conceptual model and Logical model and then translated the model into Physical model.
Implemented Normalization and De-Normalization Techniques to build the tables, indexes, views and maintained and implemented stored procedures as per requirements.
Conducted design reviews with developers and business analysts. Extensively worked on Performance Tuning and understanding Joins and Data distribution.
Performed Data Analysis using Python by using Numpy and Pandas Library by taking the data from csv files, xml files and excel files.
Involved in creating charts and graphs of the data from different data sources by using matplotlib and scipy libraries in python.
Used ad hoc queries for querying and analyzing the data, participated in performing data profiling, data analyzing, data validation and data mining.
Developed complex ETL mappings for Stage, Dimensions, Facts and Data marts load.
Involved in Data Extraction for various Databases & Files using Talend.
Worked on Tableau for Data Analysis, Digging the data for source systems for analysis and deeply dive in the data for Predictive findings and for various data Analysis by using dash boards and visualization.

Environment: Python, R, NumPy, Pandas, SciPy, Erwin 9.6.1, ER/ Studio, Matplotlib, PostgreSQL Oracle 11g, TOAD, MS Excel, JIRA, Teradata, Tableau, MS- Office

Confidentia

SQL Developer

Responsibilities:

Converted Data Transformation Services (DTS) application to SQL Server Integrated Services (SSIS) as allotted.
Designed dynamic SSIS Packages to exchange information crossing distinctive stages, approve information amid exchanging, and chronicled information documents for various DBMS.
Involved in making profoundly intuitive score cards and Dashboards utilizing Performance Point Server.
Designed SSIS Packages to extricate, exchange, stack (ETL) existing information into SQL Server 2008 from various conditions for the SSAS 3D shapes.
Generating reports utilizing SQL Reporting Services (SSRS) for altered and impromptu Queries.
Involved in outlining, creating, investigating and testing of reports in SQL Server 2008 Reporting Services (SSRS).
Expert in making Star diagram blocks utilizing SSAS.
Designed and made the table diagram and put away methodology that will be utilized as a part of Data QC.
Designed and built up the Data QC Maintenance site that will be utilized to keep up the application tables, execution of principle Data QC store techniques, and report age that will be sent out to Microsoft Excel document.
Daily help of framework wide replication errands including checking, cautioning, and issue determination.
Responsible for documentation of framework related exercises.
Provides specialized documentation of the framework
Code surveys to move it to the creation created by designers.

Environment: MS SQL Server, DTS, SSIS, SSRS, SSAS, T-SQL, Star Schema, QC, XML, ETL

Confidential

Responsibilities:

Encouraged Joint Requirement Planning (JRP) sessions with SME's in understanding the Requirements relating to Loan Origination to Loan Processing.
Engaged with the investigation of the current Visa preparing framework, mapping stage as indicated by usefulness and information transformation system.
Responsible for the Business Requirements from the end clients remembering their requirement for the application and arranged Business Requirement Documents (BRD) utilizing Rational RequisitePro.
Performed GAP examination to meet the necessities.
Maintained necessities in JIRA for following and testing purposes.
Utilized Agile/SCRUM and PMI techniques to screen, steer and create venture goals.
Made and oversaw Project Templates, Use Case Project Templates, Requirement Types and Traceability Relationships in RequisitePro.
Designed an interpretation of business necessities to information mining prerequisites by utilizing distinctive information investigation instruments, for example, Tableau.
Composed with DBA and Business objects planner aggregate in comprehending different specialized issues.
Lead the setup of the customer's business procedure work process vault using different process demonstrating apparatuses in light of my suggestions.
Associated with making mechanized Test Scripts speaking to different Transactions, Documenting the Load Testing Process and Methodology. Made important reports for examination and incorporated the Performance Testing in the SDLC.
Directed Functional Walkthroughs, User Acceptance Testing (UAT), and managed the improvement of User Manuals for clients.
Utilized SQL to concentrate and gather information to produce reports utilizing BI apparatuses, for example, Tableau. Engaged with understanding the detailing necessities and giving Tableau revealing arrangements. .
Making, creating worldwide Business and User strategies for different BI instruments, for example, Tableau, Microstartegy and Qlikview. Likewise, fabricated complex recipes in Tableau for different business counts.
Different variants of the archives are produced amid the venture were kept up and overseen utilizing Rational ClearCase and performed deformity following utilizing Rational Clear Quest.
Designed manual Test Cases in HP ALM for different client stories in light of Release and Sprint Plan.

Environment: MS Visio, MS Access, MS Excel, UML, JIRA, HP ALM, Tableau, Rational Rose, Requisite Pro, Clear Case, Rational Clear Quest, GAP Analysis, Business Objects.

We provide IT Staff Augmentation Services!

Jnr Datascientist Resume

Kansas, CitY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship