- Highly efficient Data Scientist with 9+ years of experience in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Statistical modeling, NLP, Text Mining, Predictive modeling, Data Visualization, Web Crawling, Web Scraping.
- Adept in statistical programming languages like R and Python including Big Data technologies like Hadoop, Hive, Pig, Spark, Scala.
- Excellent hands on experience with bigdatatools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, SparkSql.
- Extensive experience on usage of ETL & Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS)
- Experienced indatamining & loading and analyzing unstructureddata - XML, JSON, flat file formats into Hadoop.
- Expertise in Excel Macros, Pivot Tables, vlookups and other advanced functions and expertise R user with knowledge of statistical programming languages SAS.
- Excellent working experience and knowledge in Hadoop eco-system like HDFS, MapReduce, Hive, Pig, MongoDB, Cassandra, HBase.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
- Extensive experience inDataVisualization including producing tables, graphs, listings using various procedures and tools such as Tableau.
Data Analytics Tools/Programming: Python (numpy, scipy, pandas,Gensim, Keras), R (Caret, Weka, ggplot), MATLAB, Microsoft SQL Server, Oracle PLSQL, Python, SQL, PL/SQL, T-SQL, UNIX shell scripting, Java, SAS.
Data Modeling: Erwin, ER Studio, MS Visio, Erwin 9.x, Star Schema, Snow-Flake Schema, ER Studio.
Big Data Tools: Hadoop, MapReduce, SQOOP, Pig, Hive, NOSQL, Cassandra, MongoDB, Spark, Scala.
Machine Learning Algorithms: Classifications, Regression, Clustering, Feature Engineering.
Databases: Oracle, Teradata, Netezza, SQL Server, MongoDB, HBase, Cassandra.
Other Tools: MS-Office suite (Word, Excel, MS Project and Outlook), Spark MLLib, Scala NLP, MariaDB, Azure.
Confidential, Nashville, TN
Sr. Data Scientist
- Developed analytics solutions based on Machine Learning platform and demonstrated creative problem-solving approach and strong analytical skills.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Creating various B2B Predictive and descriptive analytics using R and Tableau
- DataStory teller, MiningDatafrom differentDataSource such as SQL Server, Oracle, Cube Database, Web Analytics, Business Object and Hadoop. Provided AD hoc analysis and reports to executive level management team.
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, R, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Used MLlib, Spark's Machine learning library to build and evaluate different models.
- Used Spark Data frames, Spark-SQL, SparkMLLibextensively and developing and designing POC's using Scala, Spark SQL andMLliblibraries.
- Created high level ETL design document and assisted ETL developers in the detail design and development of ETL maps using Informatica.
- Analyzed data and predicted end customer behaviors and product performance by applying machine learning algorithms using SparkMLlib.
- Used External Loaders like Multi Load, T Pump and Fast Load to loaddatainto Teradata Database, Involved in analysis, development, testing, implementation and deployment.
Environment: R, Machine Learning, Teradata 14, Hadoop Map Reduce, Pyspark, Spark, R, Spark MLLib, Tableau, Informatica, SQL, Excel, Erwin, SAS, Scala Nlp, Cassandra, Oracle, MongoDB, Cognos,SQL Server 2012, Teradata, DB2, T-SQL, PL/SQL, Flat Files, XML, and Tableau
Confidential, Oldsmar, FL
- Implemented public segmentation using unsupervised machine learning algorithms by implementing k-means algorithm using Pyspark.
- Explored and Extracted data from source XML in HDFS, preparing data for exploratory analysis using data munging.
- Used R and python for Exploratory Data Analysis, A/B testing, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
- Used Spark for testdataanalytics using MLLib and Analyzed the performance to identify bottlenecks.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in R.
- Worked on Linux shell scripts for business process and loading data from different interfaces to HDFS.
- Created MDM, OLAP data architecture, analytical data marts, and cubes optimized for reporting.
- Worked with different sources such as Oracle, Teradata, SQL Server2012 and Excel, Flat, Complex Flat File, Cassandra, MongoDB, HBase, and COBOL files.
- Performed K-means clustering, Multivariate analysis and Support Vector Machines in R.
- Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
- Identified and targeted welfare high-risk groups with Machine learning algorithms.
Environment: Python, MDM, MLLib, PL/SQL, Tableau, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, SQL Server, MLLib, Scala NLP, SSMS, ERP, CRM, Netezza, Cassandra, SQL, PL/SQL, SSRS, Informatica, PIG, Spark, Azure, R Studio, MongoDB, JAVA, HIVE
Confidential - New York, NY
- Gathering the requirements by interacting heavily with the business users, multiple technical teams to design and develop the workflows for the new functional piece
- Collaborated with various business stakeholders to create Business Requirement Document (BRD), translated gathered high-level requirements into a Functional Requirement Document (FRD) to assist implementation side SMEs and developers, along withdataflow diagrams, user stories and use cases
- Part of a Scrum Agile team
- Experience in SQL joins, sub queries, tracing and performance tuning for better running of queries
- Extensively used joins and sub queries for complex queries involving multiple tables from different databases
- Performance tuning of stored procedures and functions to optimize the query for better performance
- Successfully implemented indexes on tables for optimum performance
- Developed complex stored procedures using T-SQL to generate Ad-hoc reports within SQL Server Reporting services
- Strong analytical, problem solving skills coupled with interpersonal, and leadership skills
- Interaction with the clients to gather out the requirements and assist them in immediate workarounds for the issues with the application
- Developed detailed test scenarios as documented in business requirements documents, assisted the test team with UAT
- Real time usage of Tableau for analytical purpose
Environment: MY SQL, MS Power Point, MS Access, T- SQL, MS Power Point, MS Access, T-SQL, DTS, SSIS, SSRS, SSAS, ETL, MDM, Teradata
Confidential, Westlake, TX
- Create variousDataMapping Repository documents as part of Metadata services (EMR).
- Collaborate withdatamodelers, ETL developers in the creating theDataFunctional Design documents.
- Provide inputs to development team in performing extraction, transformation and load fordatamarts anddatawarehouses.
- Performed in depth analysis indata& prepared weekly, biweekly, monthly reports by using SQL, MsExcel, MsAccess, and UNIX.
- Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Good Understanding of advanced statistical modeling and logical modeling using SAS.
- Work with the business and the ETL developers in the analysis and resolution ofdatarelated problem tickets.
- Performeddataanalysis anddataprofiling using complex SQL on various sources systems including Oracle.
- Data analysis and reporting using MY SQL, MS Power Point, MS Access and SQL assistant.
- Involved in MY SQL, MS Power Point, MS Access Database design and design new database onNetezzawhich will have optimized outcome.
- Involved in writing T-SQL, working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
- Involved in writing scripts for loadingdatato targetdataWarehouseusing Bteq, Fast Load, MultiLoad
- Create ETL scripts using Regular Expressions and custom tools (Informatica, Pentaho, and Sync Sort) to ETL data.
- Developed SQL Service Broker to flow and sync of data from MS-I to Microsoft's master database management (MDM).
- Involved in loading data betweenNetezzatables using NZSQL utility.
- Worked onDatamodeling using DimensionalDataModeling, Star Schema/Snow Flake schema, and Fact & Dimensional, Physical & Logicaldatamodeling.
- Generated Stats pack/AWR reports fromOracledatabase and analyzed the reports for Oracle8.xwait events, time consuming SQL queries, table space growth, and database growth.
Environment: ER Studio, MY SQL, MS Power Point, MS Access, MY SQL, MS Power Point, MS Access, Netezza, DB2, T-SQL, DTS, SSIS, SSRS, SSAS, ETL, MDM, Teradata, Oracle8.x, (StarSchemaandSnowFlakeSchema) etc.