- 5+ years of experience in Data Analysis , Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Scraping. Adept in statistical programming languages like Python including Big Data technologies like Hadoop, Hive .
- Analyze Format data using Machine Learning algorithm by Python Scikit - Learn.
- Experienced with Deep learning techniques such as Convolutional Neural Networks, Recurrent Neural Networks by using Tensorflow .
- Expertise in Python programming with various packages including NumPy, Pandas, SciPy and Scikit Learn.
- Hands on experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries, using Oracle, SQL Server and MySQL .
- Excellent understanding and knowledge of NOSQL databases like HBase, Cassandra.
- Good experience using Informatica Workflow Manager for running/managing workflows and Informatica Workflow Monitor for monitoring the workflows and checking the session logs.
- Worked with Google Cloud for a data storage and data analytics .
- Highly skilled in using Hadoop ( Hive ) for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Highly skilled in using visualization tools like Tableau, Power BI, flask for creating dashboards.
- Experienced in ticketing systems such as JIRA and version control tools such as GitHub .
- Created UNIX shell scripts for Informatica post and pre session operations, database.
- Excellent communicative, interpersonal, intuitive, analysis, leadership skills, quick starter with ability to master and apply new concepts.
SDLC, Agile, Waterfall, Python, Big Data, Hadoop, Hive,Tensorflow, NumPy, Pandas, SciPy, Scikit Learn, SQL, PL/SQL, Oracle, SQL Server, MySQL, HBase, Cassandra, Informatica, Tableau, Power BI, GitHub, Machine Learning, Google Cloud, Unix, Windows
- Involved in all phases of the SDLC and acted as the main liaison between business managers and IT division.
- Built Fast Load and Fast Export scripts to load data into Teradata and extract data from Teradata .
- Involved in performance tuning to optimize SQL queries .
- Utilized Tableau and custom SQL feature to create dashboards and identify correlation.
- Built an internal visualization platform for the clients to view historic data, make comparisons between various issuers, analytics for different bonds and market.
- Clean data and processed third party spending data into maneuverable deliverables within specific format with Excel macros and python libraries such as NumPy and matplotlib .
- Used BigQuery as a Scalable, managed enterprise data warehouse for analytics. As well as Cloud Dataflow used for Managed service based on Apache Beam for stream and batch data processing.
- Worked on Stackdriver to Monitoring, logging, and diagnostics for applications on Google Cloud Platform .
- Used Pandas as API to put the data as time series and tabular format for manipulation and retrieval of data.
- Generated, wrote and run SQLscript to implement the DB changes including table update, addition or update of indexes, creation of views and store procedures.
- Tested various KPI’s for Tableau and SF dashboard reports.
- Used UNIX commands for file management; placing inbound files for ETL and retrieving outbound files and log files from UNIX environment.
- Involved in Data mapping specifications to create and execute detailed system test plans. The data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
- Coordinated with DBA on database build and table normalizations and de - normalizations.
- Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.
- Worked on Cloud Storage to store object with integrated edge caching to store unstructured data.
- Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/columns on Teradata database as part of data analysis responsibilities.
- Performed complex data analysis in support of ad-hoc and standing customer requests.
- Delivered data solutions in report/presentation format according to customer specifications and timelines.
- Involved in SQL Development , Unit Testing and Performance Tuning and to ensure testing issues are resolved on the basis of using defect reports.
- Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
- Wrote MySQL queries from scratch and created views on MySQL for Tableau .
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs .
- Analyzed solution architecture and design documents including Source to Target & detail design documents needed for each tracks and plan test design activities.
- Created UML diagrams including context, Business Rules Flow, and Class Diagrams.
- Drafted complex SQL queries and perform allocation testing from front end / user interface.
- Analysis of functional and non-functional categorized data elements for data profiling and mapping from source to target data environment.
- Extensively used ETL methodology for supporting data extraction, transformations and loading processing, in a complex EDW using Informatica.
- Involved in data mining, transformation and loading from the source systems to the target system.
- Built compelling, interactive dashboards in Tableau that answer key business questions.
- Wrote SQL, PL/SQL , stored procedures for implementing business rules and transformations.
- Worked with various Transformations like Joiner, Expression, Lookup, Aggregate, Filter, Update Strategy, Stored procedure and Normalizer etc.
- Designed and Developed pre-session, post-session routines and batch execution routines using InformaticaServer to run sessions.
- Analyzed, verified, and modified UNIX and Python scripts to improve data quality and performance.