We provide IT Staff Augmentation Services!

Data Scientist/engineer Resume

2.00/5 (Submit Your Rating)

Irving, TX

SUMMARY:

  • Around 4+ years of experience in Data Analysis, Machine Learning, Deep Learning and Artificial Intelligence, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Leverage a wide range of data analysis, machine learning/ deep learning and statistical modeling algorithms and methods to solve business problems.
  • Experience with NLP technique such as LSTM, GRU, RNN and other NLP/ML techniques such as SVM, Bayesian Ntwk.
  • Experience working in Agile Scrum Software Development.
  • Knowledge in Cloud services such as Microsoft Azure and Amazon AWS.
  • Knowledge about Big Data toolkits like Mahout, Spark ML, H2O.
  • Professional working experience in Machine learning/ Deep learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, K - Means Clustering and Association Rules, Artificial Neural Network,
  • Convolutional Neural Network and Recurrent Neural Network.
  • Expertise with Statistical Analysis, Data mining and Machine Learning Skills using R, Python and SQL.
  • Experience in Machine learning using NLP text classification, churn prediction using Python.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project.
  • Hands on Spark Mllib/Tensorflow Lib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
  • Working experience in implementing Machine Learning Algorithms using MLLib and Mahout in Hadoop ecosystem and Apache Spark framework such as HDFS, Map Reduce, HiveQL, Spark SQL and PySpark.
  • Hands on experience Hadoop, Deep Learning Text Analytics and IBM Data Science work bench tools.
  • Hands on experience in Data Governance, Data Mining, Data Analysis, Data Validation,Predictive modeling, Data Lineage and Data Visualization in all the phases of the Data Science Life Cycle.
  • Extensively worked for data analysis using R Studio, SQL, Tableau and other BI tools.
  • Experience in visualization tools like, Tableau 9.X, 10.X for creating dashboards.
  • Used the version control tools like Git 2.X and VM.
  • Passionate about gleaning insightful information from massive data assets and developing a culture of sound, data-driven decision making.
  • Skilled in Advanced Regression Modelling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
  • Experienced in Visual Basic for Applications and VB programming languages to work with developing applications.
  • Working with large sets of complex datasets that include structured, semi-structured and unstructured data and discover meaningful business insights.
  • Knowledge in NoSQL databases such as HBase, Cassandra, and Mongo DB etc.
  • Skilled in using statistical analysis using R, Matlab and Excel.
  • Strong Data Analysis skills using business intelligence, SQL and MS Office Tools.
  • Strong SQL Server programming skills, with experience in working with functions, packages and triggers.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide Data Summarization.
  • Skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.

TECHNICAL SKILLS:

Languages: SQL, PL/SQL, Unix Shell Scripting (Korn/C/Bourne), Pro*C, Java, J2EE, C, C++

Documentation: MS Office Suite (PowerPoint, Word, Excel, Access, Project and Outlook), SharePoint, Rational Requisite Pro

Front end tools: Microsoft front page, Microsoft Office, Cold Fusion, HTML, XML

Version Tools: Visual Source safe, SVN, JIRA, Team Foundation Service (TFS), SharePoint

Reporting Tools: Business Objects Developer Suite 5.1, Business Objects XI/XIR2/XIR3.1, COGNOS Suite, COGNOS Report Net, Crystal Reports, Oracle Reports 10g/9i/6i

Databases: Oracle 11g/10g/9i/8i, SQL Server 2008/2005/2000 , Sybase, DB2, MS Access

Operating Systems: UNIX, LINUX, Windows NT/XP/Vista/98/95, Red Hat (RH5 to RHEL 6), MS DOS, IBM AIX, Sun Solaris

Other Tools: Informatica 9.1.6/8/6/1/8.5.1 (Power Center), Erwin 8.1/7.3, Visio 2007, Oracle Forms and Reports, APEX, Golden Gate, Autosys, Subversion, Revision Control System, BMC Control-M

PROFESSIONAL EXPERIENCE:

Confidential, Irving, TX

Data Scientist/Engineer

Responsibilities:

  • Analyzed data using SQL, R, Python, Apache Spark and presented analytical reports to management and technical teams.
  • Worked with different datasets which includes both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation and Visualization.
  • Lead discussions with users to gather business processes requirements and data requirements to develop a variety of conceptual, logical and Physical Data models.
  • Expertise in Business intelligence and Data Visualization tools like Tableau.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce and loaded data into HDFS.
  • Validating the Unsupervised data by using NLP, and NLP techniques such as LSTM, GRU, and other variant analysis.
  • Designed and implemented a recommendation system which leverage Statistical Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend policies for different customers.
  • Created Data Quality Scripts using SQL and Hive (HQL) to validate successful data load and quality of the data.
  • Participated in features engineering such as feature generating, PCA, Feature normalization and label encoding with Scikit-learn preprocessing. Data Imputation using variant methods in Scikit-learn package in Python.
  • Utilized Informatica toolset (Informatica Data Explorer and Data Quality) to inspect legacy data for data profiling.

Environment: SQL Server, Hive, Hadoop Cluster, ETL, Tableau, Teradata, Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering / Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), GitHub, MS Office suite, Agile Scrum, JIRA.

Confidential, San Jose, CA

Intern

Responsibilities:

  • Responsibilities involved the analyzing end user requirements and communicating and modeling them to the development team.
  • A Convolutional Neural Network algorithm was used, to predict the Image from the dataset and processed to 32-bit floating point processor.
  • Retrieved data from Hadoop Cluster by developing a pipeline using Hive(HQL), SQL to retrieve data from Oracle database and used ETL for data transformation.
  • Performed data wrangling to clean, transform and reshape the data utilizing pandas library. Analyzed data using SQL, R, Java, Scala, Python, Apache Spark and presented analytical reports to management and technical teams.
  • Worked with different datasets with complexity including both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation and Visualization.
  • Conducted exploratory data analysis using Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, SciPy, NLTK in Python for developing various machine learning algorithms.
  • Conducting unsupervised learning analysis using deep learning especially RNN for doing the NLP work.
  • Implemented Data Quality validation techniques to validate data and identified many anomalies. Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R and Python.

Environment: Python 3.6.4, R Studio, MLLib, Regression, NoSQL, SQL Server, Hive, Hadoop Cluster, ETL, Spyder 3.6, Agile, Tableau, Java, NumPy, Pandas, Matplotlib, Power BI, Scikit-Learn, Seaborn, e1071, ggplot2, Shiny, TensorFlow, AWS, Azure, HTML, XML, Informatica Power Center, Teradata, Tensorflow.

Confidential

Assocaite Software Engineer

Responsibilities:

  • Coached the stakeholders regarding the agile practices.
  • Mentored the team and helped in forging a strong relationship.
  • Conducted requirement analysis by carrying out GAP Analysis.
  • Evangelized the benefits of Scrum to ensure its smooth adoption and tracked velocity and sprint progress
  • Managed stakeholder expectations, status, risks, issues, and created product release plan.
  • Handled issues/ risks proactively and worked with the Senior Management Teams to deliver the project
  • Created Project initiation document (PID) and established Requirements traceability Matrix (RTM) using Rational Requisite Pro to trace the completeness of requirements
  • Created UML diagrams to convey the requirements to the team members.
  • Effectively conveyed weekly project status, burn down charts to the stakeholders.
  • Conducted Sprint planning and retrospective activities through Adobe Connect.
  • Conducted coaching sessions regarding Agile development methods
  • Participated and hosted meetings, technical discussion, reviews and release planning
  • Resolved impediments by coordinating and collaborating with the cross-functional teams.
  • Created functional and non- functional requirement documents and technical specification documents for the Hadoop environment
  • Understood all the Hadoop architecture and drove all the meetings
  • Conducted safety check to make sure that my team is feeling safe for the retrospectives
  • Aided in data profiling by examining the source data
  • Performed data mappings so as to map the source data to the destination data
  • Developed Use Case Diagrams to identify the users involved. Created Activity diagrams and Sequence diagrams to depict the process flows
  • Involved with data analysts during the ETL operations in identifying the source files/databases from OLTP systems. Implemented the transformation logic in order to achieve data uniformity
  • Performed SQL Stored Procedures for complex queries
  • Provided Knowledge Transfer to the team. Coached my team members (one-to-one), if they were stuck anywhere
  • Displayed information using Information radiators
  • Guide the stakeholders in performing UAT(User Acceptance Testing)

Environment: Scrum, Version One, Oracle, HTML5, Tableau, MS Excel Ideaboardz, Server Services, Informatica PowerCenter v9.1, SQL, Microsoft Test Manager, Adobe Connect, MS Office Suite, LDAP, Kerberos, Knox, Ranger, Atlas.

We'd love your feedback!