We provide IT Staff Augmentation Services!

Data Scientist Resume

Boston, MA

SUMMARY

  • Around 8+ years of experience in IT as Datascientist with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions.
  • Extensive experience in Text Analytics, developing different StatisticalMachineLearning, Data Mining solutions to various business problems and generating data visualizations using R, Python, andTableau.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale acrossa massive volume of structured and unstructured data.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Designing of Physical Data Architecture of New system engines.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principal Component Analysis and good knowledge on Recommender Systems.
  • Proficient in Statistical Modeling and MachineLearning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XGBoost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-basedmodels, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Worked and extracted data from various database sources like Oracle, SQLServer, DB2, and Teradata.
  • Well experienced in Normalization&De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Skilled in SystemAnalysis, E-R/DimensionalDataModeling, DatabaseDesign and implementingRDBMS specific features.
  • Expertise in all aspects of SoftwareDevelopmentLifecycle (SDLC) from requirement analysis, Design, Development Coding, Testing, Implementation, andMaintenance.
  • Hand on working experience in machine learning and statistics to draw meaningful insights from data. I am good at communication and storytelling with data.
  • Utilize analytical applications/libraries like Polly, D3JS, andTableau to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into marketing strategies that drive value.
  • Experienced in working with enterprise search platform like ApacheSolr and distributed real-time processing system like Storm.
  • Hands on experience on SparkMlib utilities such as classification, regression, clustering, collaborative filtering, dimensionality reduction
  • Extensive experience in Text Analytics, developing different StatisticalMachineLearning, Data Mining solutions to various business problems and generating data visualizations using R, Python, andTableau.
  • Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiment), machine learning, algorithms, data structures and data infrastructure.
  • Proficient in statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors) in Forecasting/Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensemble.
  • Solid team player, team builder, and an excellent communicator.
  • Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, ScikitLearn, HadoopMapReduce
  • Expertise in Technical proficiency in Designing, DataModelingOnline Application, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Expertise in the implementation of Core concepts of Java, JEETechnologies, JSP, Servlets, JSTL, EJB, JMS, Struts, Spring, Hibernate, JDBC, XML, Web Services, and JNDI.
  • Extensive experience working in a Test-Driven Development and Agile-Scrum Development.
  • Experience in working on both windows, Linux and UNIX platforms including programming and debugging skills in UNIX Shell Scripting.
  • Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Centos5/6, Ubuntu13/14, Cosmos.
  • Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing.
  • Experience in Data migration from existing data stores to Hadoop.
  • Developed MapReduce programs to perform Data Transformation and analysis.

TECHNICAL SKILLS

Languages: C, C++, Python, R, Java-SQL, PL/SQL, SQL, XML, HTML, DHTML, HTTP, MATLAB, DAX.

Databases: SQL Server, MS-Access, Oracle 11g/10g/9i and Teradata, big data, Hadoop

DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: ERWIN 4.5/4.0, MS Visio, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inman Methodologies

Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA

PROFESSIONAL EXPERIENCE

Confidential, Boston, MA

Data Scientist

Responsibilities:

  • Setup storage and data analysis tools in Amazon Webservices cloud computing infrastructure.
  • Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used CaffeDeep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation froma different source using Nexus, Toad, Business Objects, PowerBI and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, Name Node, Data Node, SecondaryNameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to built sustainable Big Data platforms for the clients
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.

Environment: R 9.0, Informatica 9.0, ODS, OLTP,Bigdata, Oracle 10g, Hive, OLAP, DB2, Metadata, Python, MS Excel, Mainframes MS Vision, Rational Rose.

Confidential, Salt Lake City, UT

Data Scientist

Responsibilities:

  • Analyzed the business requirements of the project by studying the Business Requirements Specification document.
  • Extensively worked on Data Modeling tools Erwin Data Modeler to design the datamodels.
  • Designeda mapping to process the incremental changes that exist in the source table. Whenever source data elements were missing in source tables, these were modified/added inconsistency with third normal form based OLTP source database.
  • Worked in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XGBoost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-basedmodels, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Designed tables and implemented the naming conventions for Logical and PhysicalDataModels in Erwin7.0.
  • Participated inthe conversion of ITS (Immigration Tracking System) Visual Basic client-server application into C#, ASP.NET3-tierIntranet application.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Perform a proper EDA, Uni-variate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Worked with Data Governance, Data quality, data lineage, Data architect to design various models and processes.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MS Visio.
  • Utilized ADO.Net Object Model to implement middle-tier components that interacted with MSSQL Server 2000database.
  • Participated in AMS (AlertManagementSystem) JAVA and SYBASE project. Designed SYBASE database utilizing ERWIN. Customized error messages utilizing SP ADDMESSAGE and SP BINDMSG. Created indexes, made query optimizations. Wrote stored procedures, triggers utilizing T-SQL.
  • Explained the data model to the other members of thedevelopment team. Wrote XML parsing module that populates alerts from theXML file into the database tables utilizing JAVA, JDBC, BEAWEBLOGICIDE, Document Object Model.
  • As an Architect implemented MDMhub to provide clean, consistent data for anSOA implementation.
  • Developed, Implemented & Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for forwarding/Reverse Engineered Databases.
  • Explored and Extracted data from source XML in HDFS, preparing data for exploratory analysis using data mining.

Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Hadoop, Windows Enterprise Server 2000, DTS, Bigdata, Python, machine learning,SQL Profiler, and Query Analyzer.

Hire Now