We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Resume

2.00/5 (Submit Your Rating)

Chicago, IL

PROFESSIONAL SUMMARY:

  • Close to Eight years of expert involvement in IT in which I have 3+ years of knowledge in Data Mining, Machine Learning and Spark Development with big datasets of Structured and Unstructured Data.
  • Data Acquisition, Data Validation, Predictive demonstrating, Data Visualization. Capable in measurable programming languages like R and Python.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.
  • Skilled in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, re index, melt and reshape.
  • Experience in using various packages in R and libraries in Python.
  • Working knowledge in Hadoop, Hive and NOSQL databases like Cassandra and HBase.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Good industry knowledge, analytical and problem - solving skills and ability to work well within a team as well as an individual.
  • Highly creative, innovative, committed, intellectually curious, business savvy with effective communication and interpersonal skills.
  • I can be able to quickly adapt the new work pace and learning.

TECHNICAL SKILLS:

  • Programming Languages
  • Python, SQL, R
  • Scripting Languages
  • Python
  • ETL Tool Talend
  • Data Sources
  • SQL Server, Excel
  • Data Visualization
  • Tableau, Power BI, SSRS
  • Predictive and Machine Learning
  • Linear Regression, Logistic regression, Principal Component Analysis (PCA), K-means, Random Forest, Decision Trees, SVM, K-NN, Deep learning, Time Series Analysis and Ensemble methods
  • Big Data Ecosystems
  • Hadoop, Map Reduce, HDFS, HBase, Hive, Spark
  • Operating System: Linux, Windows, Unix.

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Data Scientist/Machine Learning

Responsibilities:

  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed Map Reduce/Spark Python modules for machine learning& predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Used Pandas, Numpy, seaborne, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Installed and configured Postgre SQL databases and optimized postgresql.conf for the performance improvement.
  • Forecasted based on exponential smoothing, ARIMA modelling, statistical algorithms and statistical analysis and transfer function models.
  • Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build solution that optimize the quality and performance of data.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, Python, a broad variety of Machine Learning methods including classifications, regressions, dimensionally reduction etc.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Coding new tables, views and modifications as well as Pl/Pg. SQL stored procedures, data types, triggers, constraints.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.

Environment: Big Data Hadoop, Map reduce, Hive, Pig, Python, Scala, NZSQL, Teradata, Postgre SQL, Tableau, EC2, Netezza, Architecture, SAS/Graph, SAS/SQL, SAS/Access, Time-series analysis, ARIMA.

Confidential, Norcross, GA

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Built models using Statistical Techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, Mongo DB, Hadoop.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, Numpy, seaborne, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe Deep Learning Framework.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and performed Gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (SciPy, Numpy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Experience in Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
  • Updated Python scripts to match data with our database stored in AWScloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, Map Reduce.

Confidential, Des Moines, IA

Data Analyst

Responsibilities:

  • Identified/documented data sources and transformation rules required populating and maintaining data processing system and data warehousing system.
  • Prepared Use cases and identified important alternative and exceptional flows which are more prone to be missed by developers and requirements gathering.
  • Ensured financial master data was maintained accurately, timely and consistently in data warehouse by reviewing and monitoring key financial data fields and updating as needed. Prepared and analyzed data to ensure it is structured for accurate financial reporting.
  • Presented data model for review to business and technical teams for feedback and approval.
  • Analyzed test results, including user interface data presentation, output documents, and database field values, for accuracy and consistency.
  • Worked with the ETL team to document the transformation rules for data migration from source systems to data processing systems and then to data marts for reporting purposes.
  • Worked with business stakeholders, developers, and production teams and across functional units to identify business needs and discuss solution options. Involved in requirement gathering meetings, brainstorming sessions and detailed design meets.

Environment: SQL Server 2012/2014, MS Excel, HP ALM 11.52.

We'd love your feedback!