Data Scientist Resume
Atlanta, GA
SUMMARY:
- Around 8+ years of experience in IT and around 1 years experience in Datascientist with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions, hand on working experience in machinelearning and statistics to draw meaningful insights from data. I am good at communication and storytelling with data, Expertise in all aspects of SoftwareDevelopmentLifeCycle (SDLC) from requirement analysis, Design, DevelopmentCoding, Testing, Implementation,andMaintenance.
- Extensive experience in TextAnalytics, developing different StatisticalMachineLearning, DataMining solutions to various business problems and generating data visualizations using R, Python,andTableau, Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms, utilize analytical applications/libraries like Plotly, D3JS,andTableau to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into marketing strategies that drive value.
- Expertise in transforming business requirements into analyticalmodels, designingalgorithms, buildingmodels, developing data mining and reporting solutions that scale across amassive volume of structured and unstructured data, Proficient in StatisticalModeling and MachineLearning techniques (Linear, Logistics, DecisionTrees, RandomForest, SVM, K-NearestNeighbors, Bayesian, XGBoost) in Forecasting/ PredictiveAnalytics, Segmentationmethodologies, Regression-basedmodels, Hypothesistesting, Factoranalysis/ PCA, Ensembles.
- Designing of PhysicalDataArchitecture of New system engines, Hands on experience in implementing LDA, NaiveBayes and skilled in RandomForests, DecisionTrees, Linear and LogisticRegression, SVM, Clustering, NeuralNetworks, PrincipleComponentAnalysis and good knowledge on Recommender Systems, expertise in Technical proficiency in Designing, DataModelingOnlineApplications, Solution Lead for ArchitectingDataWarehouse/BusinessIntelligenceApplications.
- Well experienced in Normalization&De-Normalizationtechniques for optimum performance in relational and dimensional database environments, regularly accessing JIRA tool and other internal issue trackers for the Project development, experienced in working with enterprise search platform like ApacheSolr and distributed real-time processing system like Storm, hands on experience on SparkMlib utilities such as classification, regression, clustering, collaborativefiltering, and dimensionalityreductions.
- Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing, experience in Datamigration from existing data stores to Hadoop, developed MapReduce programs to perform DataTransformation and analysis, extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, ScikitLearn, HadoopMapReduce.
- Expertise in the implementation of Core concepts of Java, JEETechnologies, JSP, Servlets, JSTL, EJB, JMS, Struts, spring, Hibernate, JDBC, XML, WebServices, and JNDI.
- Extensive experience working in a Test-DrivenDevelopment and Agile-ScrumDevelopment.
- Flexible with Unix/Linux and WindowsEnvironments, working with OperatingSystems like Centos5/6, Ubuntu13/14, andCosmos.
- Skilled in SystemAnalysis, E-R/DimensionalDataModeling, DatabaseDesign and implementingRDBMS specific features, Worked and extracted data from various database sources like Oracle, SQLServer, DB2, and Teradata.
TECHNICAL SKILLS:
Languages: C, C++, Python, T-SQL, PL/SQL, SQL, XML, HTML, DHTML, HTTP, Matlab, DAX.
Databases: SQL Server, MS-Access, Oracle 11g/10g/9i and Teradata, Big data, Hadoop, Cassandra.
DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.
Database Design Tools & Data Modeling: ERWIN 4.5/4.0, MS Visio, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball & Inmon Methodologies.
Tools: and UtilitiesSQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA.
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, GA
Data Scientist
Responsibilities:
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients, setup storage and data analysis tools in AmazonWebServicescloud computing infrastructure.
- Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python, installed and used CaffeDeepLearning Framework, Implemented Classification using supervised algorithms like LogisticRegression, Decisiontrees, KNN, NaiveBayes, Designed both 3NFdatamodels for ODS, OLTPsystems and dimensionaldata models using Star and Snowflake Schemas.
- Participated in all phases of datamining, datacollection, datacleaning, developingmodels, validation, visualization, and performedGapanalysis, DataManipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView, Worked as DataArchitects and ITArchitects to understand the movement of data and its storage and ERStudio9.7, datatransformation from variousresources, dataorganization, features extraction from raw and stored.
- Used pandas, numpy, Seaborn, scipy, matplotlib, sci - kit-learn, NLTK in Python for developing various machine learning algorithms, focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems, good knowledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
- Implemented AgileMethodology for building an internal application, as Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards,andreports.
- Programmed a utility in Python that used multiple packages (scipy, numpy, pandas), updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Interaction with BusinessAnalyst, SMEs,and other DataArchitects to understand Business needs and functionality for various project solutions, Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
Environment: R 9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Rational Rose.
Confidential, Melbourne, FL
Data Scientist
Responsibilities:
- Analyzed the business requirements of the project by studying the BusinessRequirementSpecification document.
- Extensively worked on DataModeling tools ErwinDataModeler to design the datamodels.
- Designed amapping to process the incremental changes that exist in the source table. Whenever source data elements were missing in source tables, these were modified/added inconsistency with third normal form based OLTP source database.
- Designed tables and implemented the naming conventions for Logical and PhysicalDataModels in Erwin7.0.
- Participated in theconversion of ITS (ImmigrationTrackingSystem) VisualBasicclient-server application into C#, ASP.NET3-tierIntranet application.
- Performed ExploratoryDataAnalysis and DataVisualizations using R, and Tableau.
- Perform a proper EDA, Uni-variate and bi-variate analysis to understand the intrinsic effect/combined effects.
- Worked with DataGovernance, Dataquality, datalineage, Dataarchitect to design various models and processes.
- Designed data models and data flow diagrams using Erwin and MSVisio.
- Participated in AMS (AlertManagementSystem) JAVA and SYBASE project. Designed SYBASE database utilizing ERWIN. Customized error messages utilizing SP ADDMESSAGE and SP BINDMSG. Created indexes, made query optimizations. Wrote stored procedures, triggers utilizing T-SQL.
- Explained the data model to the other members of thedevelopment team. Wrote XML parsing module that populates alerts from theXML file into the database tables utilizing JAVA, JDBC, BEAWEBLOGICIDE, andDocumentObjectModel.
- As an Architect implemented MDMhub to provide clean, consistent data for anSOA implementation.
- Developed, Implemented & Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for forwarding/ReverseEngineeredDatabases.
- Explored and Extracted data from source XML in HDFS, preparing data for exploratory analysis using datamunging.
Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Hadoop, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyze.
Confidential, Chicago, Illinois.
Data Scientist
Responsibilities:
- Coded R functions to interface with CaffeDeepLearningFramework
- Working in AmazonWebServices cloud computing environment
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, space-time.
- Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R, Mahout, Hadoop,andMongoDB.
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in theanalysis.
- Performed ExploratoryDataAnalysis and DataVisualizations using R, andTableau.
- Perform a proper EDA, Uni-variate and bi-variate analysis to understand the intrinsic effect/combined effects.
- Worked with DataGovernance, Dataquality, datalineage, Dataarchitect to design various models and processes.
- Designed data models and data flow diagrams using Erwin and MSVisio.
- As an Architect implemented MDM hub to provide clean, consistent data for anSOA implementation.
- Developed, Implemented &Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for forwarding/ReverseEngineeredDatabases.
- Established Dataarchitecture strategy, bestpractices, standards, and roadmaps.
- Performed datacleaning and imputation of missing values using R.
- Worked with Hadoopeco system covering HDFS, HBase, YARN,andMapReduce
- Take up ad-hoc requests based on different departments and locations
- Used Hive to store the data and perform datacleaning steps for huge datasets.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau.
- Creating customized business reports and sharing insights to the management.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
Environment: Erwin r, Informatica, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, and Requisite Pro, Hadoop, PL/SQL, etc.
Confidential, New York.
Data Scientist
Responsibilities:
- Statistical Modeling with ML to bring Insights in Data under the guidance of Principal Data Scientist
- Data modeling with Pig, Hive, Impala.
- Ingestion with Sqoop, Flume.
- Used SVN to commit the Changes into the main EMM application trunk.
- Worked with Ajax API calls to communicate with Hadoop through Impala Connection and SQL to render the required data through it.
- These API calls are similar to Microsoft Cognitive API calls.
- Good grip on Cloudera and HDP ecosystem components.
- Used ElasticSearch (Big Data) to retrieve data into theapplication as required.
- Performed Map Reduce Programs those are running on the cluster.
- Involved in loading data from RDBMS and weblogs into HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Developed Hive queries for Analysis across different banners.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Have hands-on experience working withSequence files, AVRO, HAR file formats and compression.
- Used Hive to partition and bucket data.
- Experience in writing MapReduce programs with Java API to cleanse Structured and unstructured data.
- Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
- Worked on improving theperformance of existing Pig and Hive Queries.
Environment: SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, Business Objects, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE,AWS.
Confidential
Data Architect/Data Modeler
Responsibilities:
- Worked with large amounts of structured and unstructured data.
- Knowledge of MachineLearningconcepts ( Confidential, Regularization, RandomForest, TimeSeriesmodels, etc.)
- Worked in BusinessIntelligencetools and visualizationtools such as BusinessObjects, Tableau, ChartIO, etc.
- Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, and AJAX.
- Configured the project on WebSphere6.1 application servers
- Implemented the online application by using CoreJava, JDBC, JSP, Servlets and EJB1.1, WebServices, SOAP, WSDL.
- Communicated with other Health Care info by using Web Services with the help of SOAP, WSDLJAX-RPC.
- Used Singleton, factory design pattern, DAODesignPatterns based on the application requirements
- Used SAX and DOM parsers to parse the raw XML documents
- Used RAD as Development IDE for web applications.
- Preparing and executing Unittestcases.
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements to the front-end modules.
- Implemented MicrosoftVisio and RationalRose for designing the UseCaseDiagrams, Classmodel, Sequencediagrams, and Activitydiagrams for SDLC process of the application.
- Doing functional and technical reviews
- Maintenance in the testing team for Systemtesting/Integration/UAT.
- Guaranteeing quality in the deliverables.
- Implemented the project in Linux environment.
Environment: R, Erwin, Tableau, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.