Scientist Resume
PROFESSIONAL SUMMARY:
- I have 8+ years of experience in development of technologies and algorithms for internet applications. Overthe years, I have worked on large - scale social recommendation systems, link prediction in social networks,search technologies, categorization of documents/query/tweets, information retrieval, search relevance, information extraction, query expansion, search spam detection, web mining, machine learning algorithms,d Confidential clustering, and classification algorithms.
- Extensive experience in Text Analytics, developing different StatisticalMachineLearning, D Confidential Mining solutions to various business problems and generating d Confidential visualizations using R, Python and Tableau.
- Designing of PhysicalD Confidential Architecture of New system engines.
- Hands on experience in implementing LDA, NaiveBayes and skilled in RandomForests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neuralnetworks, Principle Component Analysis and good knowledge on Recommender Systems.
- Proficient in StatisticalModeling and MachineLearning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
- Expertise in transforming business requirements into analyticalmodels, designingalgorithms, buildingmodels, developing d Confidential mining and reportingsolutions that scales across massive volume of structured and unstructured d Confidential .
- Developing LogicalD Confidential Architecture with adherence to Enterprise Architecture.
- Strong experience in Software Development Life Cycle (SDLC) including RequirementsAnalysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- I have also developed and published articles on novel algorithms fordistributed caching, network monitoring, multi-processor scheduling, web mining, etc.
- Result-oriented, hands-on freelance professional, with a successful record of s in ICT project management, as well as in academic research.
- Experience and Technicalproficiency in Designing,D Confidential ModelingOnlineApplications, Solution Lead for ArchitectingD Confidential Warehouse/BusinessIntelligence Applications.
- Good understanding of Terad Confidential SQLAssistant, Terad Confidential Administrator andd Confidential load/ export utilities like BTEQ, FastLoad, MultiLoad, FastExport.
- Experience with D Confidential Analytics, D Confidential Reporting, Ad-hocReporting,Graphs, Scales, PivotTables and OLAP reporting.
- Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of d Confidential in the infrastructure to provide d Confidential summarization.
- Work Experience in a cloud-based service and software for managing connected products andmachines and implementing Machine-to-Machine (M2M) and Internet of Things (IoT)applications like AxedaiSupport.
- Work Knowledge in a platform for the rapid development of applications designed for smart, connectedsensors, devices, and products or the Internet of Things (IoT) like ThingWorx.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
- Implemented and programmed the Google AdWordsAPI to automatically find millions of new high value/high volume keywords for advertising camp Confidential ns (Perl, SOAP, XML)Taxonomy improvement.
- Creation of multimillion bid keyword lists using extensive web crawling.
- Identification of metrics to measure the quality of each list (yield or coverage, volume, and keyword average financial value).
- Understanding and implementation of text mining concepts, graph processing and semi structured and unstructured d Confidential processing.
- Statistical Modelling with ML to bring Insights in D Confidential under guidance of Principal D Confidential Scientist.
- Big D Confidential Hub coordination technically with applications to visualize the insights.
- Design, build, deploy Machine Learning applications to solve real-world problems empirically.
- Experience with varied forms of practical d Confidential, including Image, Speech, Text, Video, Motion-capture & other high-dimensional d Confidential .
TECHNICAL SKILLS
Languages: T-SQL, PL/SQL, SQL, C, C++, XML, HTML, DHTML, HTTP, Matlab, DAX, Python
D Confidential bases: SQL Server, MS-AccessOracle 11g/10g/9i and Terad Confidential, big d Confidential, hadoop
DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.
D Confidential base Design Tools and D Confidential Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical d Confidential modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies
Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel D Confidential Explorer, Tableau, JIRA
PROFESSIONAL EXPERIENCE:
Confidential
Scientist
Responsibilities:
- Improving Fraud Detection using Digital Links at Amazon, Seattle.
- Scaled upto Machine Learning pipelines: 4600 processors, 35000 GB memory achieving 5-minute execution.
- Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX.
- Configured the project on WebSphere 6.1 application servers
- Designed a new Machine Learning pipeline to replace existing prod: AUC perf. increase from 83% to 90%.
- Handled 2+ TB d Confidential with graphs upto130 GB (50M nodes, 100M edges) using single-node in-disk scaling.
- Developed a Machine Learning test-bed for robustness with 24 different model learning and feature learning algorithms.
- By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
- Upto 10 times more accurate predictions over existing state-of-the-art algorithms.
- Developed in-disk, huge (100GB+), highly complex Machine Learning models.
- Used SAX and DOM parsers to parse the raw XML documents
- Used RAD as Development IDE for web applications.
- Demonstrated performances comparable to other state-of-the-art deep learning models.
- Long Short-Term Memory Recurrent Neural Networks (LSTM RNNs) learnt using Deep Learning techniques applied to Problem X.
- LSTM RNNs applied to Problem Y.
Environment: R 9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metad Confidential, MS Excel, Mainframes MS Vision, Rational Rose.
Confidential
Scientist
Responsibilities:
- Manipulating, cleansing & processing d Confidential using Excel, Access and SQL.
- Responsible for loading, extracting and validation of client d Confidential .
- Liaising with end-users and 3rd party suppliers.
- Analyzing raw d Confidential, drawing conclusions & developing recommendations
- Writing T-SQL scripts to manipulate d Confidential for d Confidential loads and extracts.
- Developing d Confidential analytical d Confidential bases from complex financial source d Confidential .
- Performing daily system checks.
- D Confidential entry, d Confidential auditing, creating d Confidential reports & monitoring all d Confidential for accuracy.
- Designing, developing and implementing new functionality.
- Monitoring the automated loading processes.
- Advising on the suitability of methodologies and suggesting improvements.
- Carrying out specified d Confidential processing and statistical techniques.
- Supplying qualitative and quantitative d Confidential to colleagues & clients.
- Using Informatica & SAS to extract, transform & load source d Confidential from transaction systems.
- Created d Confidential pipelines for robustness using big d Confidential technologies like Hadoop, spark etc.
- Creating statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
- Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs,Hadoop, Spark, MapReduce, Pig and others
- Refine and train models based on domain knowledge and customer business objectives
- Deliver or collaborate on delivering effective visualizations to support the client business objectives
- Communicate to your peers and managers promptly as and when required.
- Produce solid and effective strategies based on accurate and meaningful d Confidential reports and analysis and/or keen observations.
- Establish and maintain communication with clients and/or team members; understand needs, resolve issues, and meet expectations
- Developed web applications using .net technologies; work on bug fixes/issues that arise in the production environment and resolve them at the earliest
Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Hadoop, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer.
Confidential
Scientist
Responsibilities:
- D Confidential mining using state-of-the-art methods
- Extending company’s d Confidential with third party sources of information when needed
- Enhancing d Confidential collection procedures to include information that is relevant for building analytic systems
- Processing, cleansing, and verifying the integrity of d Confidential used for analysis
- Doing ad-hoc analysis and presenting results in a clear manner
- Creating automated anomaly detection systems and constant tracking of its performance
- Strong command of d Confidential architecture and d Confidential modelling techniques.
- Hands on experience with commercial d Confidential mining tools such as Splunk, R, Map reduced, Yarn, Pig,Hive, Floop, Oozie, Scala, HBase, Master HDFS, Sqoop, Spark, Scala (Machine learning tool) or similar software required depending on seniority level in job field.
- Developed scalable machine learning solutions within a distributed computation framework (e.g. Hadoop, Spark, Storm etc.) for robustness.
- Utilizing NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive d Confidential sets.
- Knowledge in ML& Statistical libraries (e.g. Scikit-learn, Pandas).
- Having knowledge to build predict models to forecast risks for product launches and operations and help predict workflow and capacity requirements for TRMS operations
- Having experience with visualization technologies such as Tableau
- Draw inferences and conclusions, and create dashboards and visualizations of processed d Confidential, identify trends, anomalies
- Generation of TLFs and summary reports, etc. ensuring on-time quality delivery.
- Participated in client meetings, teleconferences and video conferences to keep track of project requirements, commitments made and the delivery thereof.
- Solved analytical problems, and effectively communicate methodologies and results
- Worked closely with internal stakeholders such as business teams, product managers, engineering teams, and partner teams.
- Created automated metrics using complex d Confidential bases.
- Foster culture of continuous engineering improvement through mentoring, feedback, and metrics.
Environment: Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metad Confidential, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro. Hadoop, PL/SQL, etc..
Confidential
Scientist
Responsibilities:
- Statistical Modelling with ML to bring Insights in D Confidential under guidance of Principal D Confidential Scientist
- D Confidential modeling with Pig, Hive, Impala.
- Ingestion with Sqoop, Flume.
- Used SVN to commit the Changes into the main EMM application trunk.
- Understanding and implementation of text mining concepts, graph processing and semi structured and unstructured d Confidential processing.
- Worked with Ajax API calls to communicate with Hadoop through Impala Connection and SQL to render the required d Confidential through it .These API calls are similar to Microsoft Cognitive API calls.
- Good grip on Cloudera and HDP ecosystem components.
- Used ElasticSearch (Big D Confidential ) to retrieve d Confidential into application as required.
- Performed Map Reduce Programs those are running on the cluster.
- Developed multiple MapReduce jobs in java for d Confidential cleaning and preprocessing.
- Developed scalable machine learning solutions within a distributed computation framework (e.g. Hadoop, Spark, Storm etc.).
- Analyzed the partitioned and bucketed d Confidential and compute various metrics for reporting.
- Involved in loading d Confidential from RDBMS and web logs into HDFS using Sqoop and Flume.
- Worked on loading the d Confidential from MySQL to HBase where necessary using Sqoop.
- Developed Hive queries for Analysis across different banners.
- Extracted d Confidential from Twitter using Java and Twitter API. Parsed JSON formatted twitter d Confidential and uploaded to d Confidential base.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuringlaunched instances with respect to specific applications to improve robustness.
- Exported the result set from Hive to MySQL using Sqoop after processing the d Confidential .
- Analyzed the d Confidential by performing Hive queries and running Pig scripts to study customer behavior.
- Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
- Used Hive to partition and bucket d Confidential .
- Experience in writing MapReduce programs with Java API to cleanse Structured and unstructured d Confidential .
- Wrote Pig Scripts to perform ETL procedures on the d Confidential in HDFS.
- Created HBase tables to store various d Confidential formats of d Confidential coming from different portfolios.
- Worked on improving performance of existing Pig and Hive Queries.
Environment: SQL/Server, Oracle 9i, MS-Office, Terad Confidential, Informatica, ER Studio, XML, Business Objects, HDFS, Terad Confidential 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS..
Confidential
Analyst Modeler
Responsibilities:
- National Highways Authority (Govt. of India) is evaluating the design for installations across the country.
- IIT Madras has installed the speed detectors across the institute for permanent speed limit enforcement.
- Developed & tested feature tracking algorithms for Intelligent Transportation Systems Computer Vision .
- Developed Internet traffic scoring platform for ad networks, advertisers and publishers (rule engine, site scoring, keyword scoring, lift measurement, linkage analysis).
- Responsible for defining the key identifiers for each mapping/interface.
- Clients include eBay, Click Forensics, Cars.com, Turn.com, Microsoft, and Looksmart.
- Designed the architecture for one of the first analytics 3.0. online platforms: all-purpose scoring, with on-demand, SaaS, API services. Currently under implementation.
- Web crawling and text mining techniques to score referral domains, generate keyword taxonomies, and assess commercial value of bid keywords.
- Developed new hybrid statistical and d Confidential mining technique known as hidden decision trees and hidden forests.
- Reverse engineering of keyword pricing algorithms in the context of pay-per-click arbitrage.
- Implementation of Metad Confidential Repository, Maintaining D Confidential Quality, D Confidential Cleanup procedures, Transformations, D Confidential Standards, D Confidential Governance program, Scripts, Stored Procedures, triggers and execution of test plans
- Performed d Confidential quality in Talend Open Studio.
- Devised and implemented a Vehicle Speed Detector using low-power LEDs and field-tested for robustness.
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
- Automated bidding for advertiser camp Confidential ns based either on keyword or category (run-of-site) bidding.
- Creation of multimillion bid keyword lists using extensive web crawling. Identification of metrics to measure the quality of each list (yield or coverage, volume, and keyword average financial value).
- Enterprise Metad Confidential Library with any changes or updates.
- Document d Confidential quality and traceability documents for each source interface.
- Establish standards of procedures.
- Generate weekly and monthly asset inventory reports.
Environment: Erwin r7.0, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.