Data Scientist Resume
Indianapolis, IN
SUMMARY
- Highly efficient Data Scientist with over 7 years of experience in areas including Data Analyst, Statistical Analysis, Machine Learning and Data mining.
- Expertise in complete software development life cycle process that includes Analysis, Design, Development, Testing and Implementation in Hadoop EcoSystem, Documentum sp2 suits of products and Java technologies.
- Extensive working experience with Python including Scikit - learn, Pandas and Numpy.
- Integration Architect & Data Scientist experience in Analytics, Big Data, AWS, BPM, SOA, ETL and Cloud AutoML technologies.
- Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
- Experienced the full software life cycle in SDLC, Agile and Scrum methodologies.
- Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.
- Experienced in Big Data with Hadoop 2, HDFS, Map Reduce, and Spark.
- Experienced in Spark 2.1, Spark SQL and PySpark.
- Skilled in using dplyr and pandas in R and python for performing exploratory data analysis.
- Experience working with data modeling tools like Erwin, Power Designer and ERStudio.
- Experience in designing star schema, Snow flake schema for Data Warehouse, ODS architecture.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Experience and Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
- Good understanding of Teradata SQL Assistant, Teradata Administrator and dataload Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables and OLAPreporting.
- Having good experience in NLP with Apache, Hadoop and Python.
- Experience working with data modeling tools like Erwin, Power Designer and ER Studio.
- Experience in designing star schema, Snow flake schema for Data Warehouse, ODS architecture.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, AutoML, DataMining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
- Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS, Map Reduce, HiveQL, SparkSQL, PySpark.
- Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
- Experience with AWS services related to AI/ML highly desirable, particularly EMR, AWS Lambda, Sagemaker, Kinesis, Machine Learning, Lex, Polly, Rekoginition, DynamoDB, S3, ECS etc.
- Experienced in Big Data with Hadoop, MapReduce, Spark, PySpark, Spark SQL, HDFS, Hive and AWS sagemaker.
- Working with large sets of complex datasets that include structured, semi-structured and unstructured data and discover meaningful business insights.
- Knowledge in NoSQL databases such as HBase, Cassandra, and Mongo DB etc.
- Highly skilled in using statistical analysis using R, Matlab and Excel.
- Experience working with SAS Language for validating data and generating reports.
- Experience working with Web languages such as Html, CSS, Rshiny etc.
- Strong Data Analysis skills using business intelligence, SQL and MS Office Tools.
- Strong SQL Server programming skills, with experience in working with functions, packages and triggers.
- Experienced in ticketing systems such as Jira/confluence and version control tools such as GitHub.
- Worked on deployment tools such as Azure Machine Learning Studio, Oozie, AWS Lambda, Kinesis.
PROFESSIONAL EXPERIENCE
Data Scientist
Confidential - Indianapolis, IN
Responsibilities:
- Work with deployments, maintenance and troubleshooting applications on Microsoft Cloud Infrastructure Azure. Also with Automating, Configuring and Deploying Instances on Azure environments and in Data centers.
- Taken part in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation and managed the security by ticketing mechanism.
- Developed recommendation engine for the wizards of coast gaming products by contributing train, test and develop A.I. models on AWS Sage maker.
- Analyzed data using Python, PySpark, and SparkSQL in order to do real time stream analytics.
- Packages used Pandas, Numpy, Matplotlib, Sci-kit-learn in Python for developing various machine learning algorithms.
- Developed internal auxiliary web apps using Python Flask framework with Angular.js and Twitter Bootstrap CSS / HTML framework.
- Wrote Python code and actively participated in the procedure to automate processes.
- Developed and designed Python based API (Restful Web Service) to interact with company's website.
- Implemented Python code to fix bugs and provides upgrades to existing functionality.
- Created Business Logic using Python to create Planning and Tracking functions.
- Created a Git repository and added the project to GitHub.
- Utilized Agile process and JIRA issue management to track sprint cycles.
- Performed API Level testing for web services, enhanced the Test harness and developed many Test suites using XML and Python.
- Applied the normal distribution for data by applying various techniques like missing value treatment, outlier treatment, and hypothesis testing.
- Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
- Involved in development of Web Services using REST API's for sending and getting data from the external interface in the JSON format.
- Configured AWS EC2 instances and created S3 data pipes using Boto API to load data from internal data sources.
- Extensive experience with AWS services like S3, ELB, EBS, Auto-Scaling, Route53, Storefront, IAM, Cloud Watch, RDS, etc.
- Implemented Agile Methodology for building an internal application.
- Worked closely with senior Data Scientists to know data requirements for the experiments and domain knowledge
Environment: R, Python, Flask, AWS (EC2, S3), AWS Lambda, Kinesis, Machine Learning (ScikitLearn), AutoML, PySpark, SparkSQL, AWS EC2, REST API’s, Hadoop, BigData Node, Pandas, Numpy, Matplotlib.
Data Scientist
Confidential - NY
Responsibilities:
- Used R, Python and Spark to develop a variety of models and algorithms for analytic purposes.
- Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frames.
- Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
- Data Warehouse, Data Migration Application under Extensive hands on experience using ETL tools like Talend, Informatica.
- Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
- A highly immersive Data Science program involving Data Manipulation and Visualization, Web Scraping, Machine Learning, AutoML, GIT, SQL, Unix Commands, Python programming, No SQL, Mongo DB, Hadoop.
- Extensively worked on Data Modeling tools Erwin Data Modeler to design the data models.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- A highly immersive Data Science program involving Big Data Manipulation & Visualization, Web Scraping, Machine Learning, SQL, GIT, Unix Commands, No SQL, Mongo DB.
- Performed K-means clustering, Multivariate analysis and Support Vector Machines in Python and R.
- Created map reduce running over HDFS for data mining and analysis using R and Loading & Storage data to Pig Script and R for MapReduce operations and created various types of data visualizations using R, and Tableau.
- Worked on machine learning on large size data using Spark and MapReduce.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database.
- Developed Spark/Scala, Python for regular expression (RegEx) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
- Designed tables and implemented the naming conventions for Logical and Physical DataModels in Erwin 7.0.
- Designed logical and physical data models for multiple OLTP and Analytic applications.
- Wrote simple and advanced SQL queries and scripts to create standard and Adhoc reports for senior managers.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
- Used Amazon Simple Workflow service (SWF) for data migration in data centers which automates the process and tracks every step and logs are maintained in S3bucket.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Programmed a utility in Python that used multiple packages (Numpy, scipy, pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, Naive Bayes, KNN.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
- Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc. to import data into the data warehouse.
- Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.
Environment: R / Python, SQL, GIT, HDFS, Pig, Hive, Oracle, DB2, Hadoop, Tableau Unix Commands, AutoML, NoSQL, MongoDB, SSIS, SSRS, SSAS, AWS, S3, EC2, RDS, CSV, SWF, Polly, Dynamo DB, BigData Glacier, Erwin, Tableau, OBIEE.
Jr. Data Scientist
Confidential - Flint, MI
Responsibilities:
- Performed data wrangling to clean, transform and reshape the data utilizing panda’s library. Analyzed data using SQL, R, Java, Scala, Python, Apache Spark and presented analytical reports to management and technical teams.
- Worked with different datasets which includes both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation and Visualization.
- Implemented public segmentation using unsupervised machine learning algorithms by implementing K-means algorithm by using PySpark using data munging.
- Worked on different Machine Learning models like Logistic Regression, Multi-layer perception classifier and K-means clustering.
- Used R and Python for Exploratory Data Analysis to compare and identify the effectiveness of the data.
- Created clusters to classify control and test groups.
- Analyzed and calculated the life cost of everyone in a welfare system using 20 years of historical data.
- Used Python, R, SQL to create statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, SVM for estimating and identifying the risks of welfare dependency.
- Designed and implemented a recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend policies for different customers.
- Performed analysis such as Regression analysis, Logistic Regression, Discriminate Analysis, Cluster analysis using SAS programming.
- Worked on NoSQL databases including Cassandra, Mongo DB, and HBase to access the advantages and disadvantages of them for a goal of a project.
Environment: R, Machine Learning, Qlikview, AWS, Python (Scikit-learn, Scipy Numpy, Pandas, Matplotlib, Seaborn), SQL Server, Hadoop, BigData, HDFS, Hive, Pig Latin, Apache Spark/PySpark/MLlib, GitHub, Tableau.