Big Data Analyst Resume
2.00/5 (Submit Your Rating)
Reston, VA
PROFESSIONAL EXPERIENCE:
Confidential, Reston, VA
Big Data Analyst
Responsibilities:
- Extracted, compiled, formatted, reconciled and submitted the physical energy sales data for the Federal Energy Regulatory Commission (FERC) Electric Quarterly Report (EQR). Significantly reduced report creation time by designing and developing a Microsoft Access application to merge, consolidate, and format sales data from over 88 Microsoft Excel spreadsheets. Created additional applications to format data extracted from the Allegro, ZaiNet and nMarket systems.
- Extracted all data from the existing billing system and associated systems as part of a new billing system implementation (Banner, 9 - month project). Created queries that combined, manipulated and exported the data to Excel workbooks. Created various data verification and compression processes.
- Designed and created back office accounting reports via the report-writer function of the ZaiNet energy trading, scheduling and risk management system.
- Created ad-hoc and production property preservation SQL reports. Imported property inspection orders and cancellations from banks and mortgage companies, and returned the inspection results.
- Created and maintained process and procedure documentation.
- Provided special reporting and ad hoc reporting (scholar/fund data /market values/book values) as needed. Researched and resolved data integrity issues.
- Initiated, compiled, and communicated the merge of Word and OneNote process and procedure documents into one resource file.
- Initiated and maintained many improvements to multiple FileMaker Endowed Scholarship databases resulting in significant time-savings and increased efficiencies:
- Created a multi-script process pulling data from multiple databases to be merged into one pdf file for endowed scholarship donor acknowledgement and reporting.
- Designed and created a FileMaker scholar thank-you letter proofing layout which increased colleague efficiency and timeliness. Created additional layouts that mirrored Dartmouth stationery eliminating the need for letter-head stock.
- Created scripts which identified all funds supported by a household (e.g. husband and wife with multiple and separate funds) and then produced the household's scholar announcement letter.
- Merged the data from two separate scholarship databases into one database resulting in a significant maintenance time-savings.
- Performed data updates to Scholarship fund, Monitored fund, In Memory Of and Prizes/Awards databases by importing data from multiple FileMaker and Oracle database sources (Financial Aid Office, Advance, Data Warehouse and iModules).
- Provided technical and software support as well as training (FileMaker, Excel, Acrobat Pro, printing and processes) to colleagues.
- Extracted, compiled and tracked data, and analyzed data to generate reports in a variety of layouts Excel, PDF, Tableau and SAS dashboard and Modelled data structures for multiple projects using Mainframe and Oracle
- Maintained the data integrity during extraction, ingestion, manipulation, processing, analysis and storage.
- Presented more than 15 impactful visualization time series dashboards and stories by employing Tableau desktop and server, Excel, pivot tables, Power BI, SAS and Visual Basic macros
- Presented more than 15 impactful visualization time series dashboards and stories by employing Tableau desktop and server, Excel, pivot tables, SQL queries, Power BI, SAS and Visual Basic macros
- Responsible for enhancements of data model according to the business requirements.
- Developed the scripts for creating the sequences in all the databases that will accommodate the extended enterprise key in it.
- Analyzed and developed the performance improvements of required data and tables.
- Deployed the changes to various environments and tested the changes.
- Worked on the QA and staging builds in TFS and merged all the builds based on the requirement.
- Enhancements of already existing models that reduces the data redundancies.
- Worked on the parent and child hierarchy relating to the keys and created the script for their sequences according to their levels so that data migration from each database will have no big challenges.
- Worked with Java team to accommodate the changes in the front end.
- Modified the PL SQL packages for better performance of the jobs and batch processes.
- Responsible for enhancements of data model according to the business requirements.
- Developed the scripts for creating the sequences in all the databases that will accommodate the extended enterprise key in it.
- Analyzed and developed the performance improvements of required data and tables.
- Deployed the changes to various environments and tested the changes.
- Worked on the QA and staging builds in TFS and merged all the builds based on the requirement.
- Enhancements of already existing models that reduces the data redundancies and improve the functionality.
- Worked on the parent and child hierarchy relating to the keys and created the scripts for their sequences according to their levels so that data migration from each database will have no big challenges.
- Worked with Java team to accommodate the changes in the front end based on the changes we make in the tables in the database.
- Modified the PL SQL packages for better performance of the jobs and batch processes.
- Worked on the global master data in the production databases and analyzed the column length definitions, maximum values of the primary keys and their differences.
- Worked on altering the tables based on the requirements to improve the performance of the data processes.
- Worked on the analysis of different levels of data in the production and extracted the parent level data on application basis to migrate it into the new database.
- Worked on converting the already existing data based on expanded column length of the primary key in around 300 tables.
- Worked on modifying the PL/SQL code, replacing the table with joins between the source tables.
- Used SQL to extract and transform the data from Freud environment and load the structured data into the Smart Care environment. This reduced the service calls from 6000 to 4000.
- Responsible for analytical data needs, handling of complex data requests, reports, and predictive data modeling.
- Designed ad-hoc queries with SQL in Cognos ReportNet. Examined reports and presented findings in PowerPoint and Excel.
- Used Anomaly & Fraud detection techniques with SAS E-Miner for the Confidential client resulting in reduction of 22% of fraudulent cases.
- Reporting of frauds, missed transactions, forecast, user behavior using Tableau in direct weekly cross-functional team meetings for continuous process improvement.
- Implemented Agile Scrum practices for project implementation which reduced the project touch time by 300 man-hours and cost reduction of $30,000/year.
- Conducted statistical analysis to leverage the results to drive brand decision making and survey development resulting in 4 new projects from business partners
- Evaluated performance of 300+ stores for Nielsen clients based on key metrics and identified opportunities to enable stores to meet and exceed their financial targets through increased sales.
- Created Data Lake by extracting customer’s data from various data sources into HDFS. This includes data from Teradata, Mainframes, RDBMS, CSV and Excel.
- Involved in optimization of SQL scripts and designed efficient queries to query data
- Developed the SQL table schema for the effective storage of the customer data.
- Involved in preparing design and unit and Integration test documents
- Developed an internal web-scraper tool for inspection of ad-hosting on websites using google, URLLib, Beautiful Soup packages in python.
- Two Sigma Financial modeling: Worked on a Kaggle challenge of leveraging data science and business analytics tools in financial market as a semester project. Coding is done in Python and visualizations in Power BI.
- Fraud Detection: Anomaly detection method was used to detect outliers in financial transactions for the Confidential clients applying the logic on the Spark framework which allows for large scale data processing. Anomalies detected are reported to the downstream teams for further action on the client account.
- Hospital Readmission Project: Taking Hospital readmission dataset, after visualizing the patterns in Tableau and built a predictive model in R to predict readmission risk..
- Behavioral Analysis: Analyzed more than 100K Patient Records for early readmission risk using Py-Spark and Spark Machine Learning Library (MLlib).
- Surgical Schedule Optimization: Designed optimal surgical scheduling and staff planning for Medical College by building generalized linear model and using AMPL optimization tool, this helped in 10% reduction in the under allocated operating hours.
- Revenue Analysis: Worked on movie revenue data sets and devised a dynamic forecasting model through Regression Stepwise, KNN and En- semble techniques. Average Ensemble model results are 92.3% accurate.
- Twitter sentiment analysis using Python and NLTK:Implemented sentiment analysis of the tweets (mobile carriers) using NLTK sentiment analysis and twitter API
- Processed and cleaned the data by treating missing values using imputation method.
- Detected and treated outliers, ran stepwise regression and all subset regression methods to choose effective variables to build the revenue model.
- Developed a predictive algorithm using Decision Tree (Regression Tree) to implement Pricing Optimization.
- Identified optimum prices of products so that items can be sold for maximum profit, by sustaining its demand.
- Predicted the revenue using linear modeling as well as ran the price elasticity model to show what happens to the revenue when the price of product increases, which helped to improve profit by 23%.
- Applied Logistic Regression(GLM), Linear Discriminant Analysis, K-Nearest Neighbors to identify fraudulent customers using customer Total Pay credit card transaction data.
- Caught fraudulent activity more quickly and efficiently, leading to a drop-in cost of fraud to the customer by nearly 96% and a drop in the cost of goods sold by over 95%.
- Provided additional analyses where needed to determine inefficiencies within the department and implemented the fixes to these problems.
- Created ad-hoc Access queries to provide quick and precise answers to various customer and vendor requests.
- Worked with loan lending dept to identify and quantify potential risk factors and the loan defaulters.
- Researched various machine learning algorithms such as Naïve Bayes, KNN, Random Forest, Logistic Regression, and applied to predict loan defaulters to improve the previous model and increase the prediction accuracy.
- Validated the results using the cost matrix which is calculated based on cost incurred due to false positives and false negatives.
- Worked on a huge transactional data set of 40 million rows and performed exploratory data analysis.
- Converted continuous variables to dummy variables for performance improvement.
- Assisted in marketing analytics by identifying cause-effect relationships between marketing actions and financial outcomes to raise profitability.
- Conducted queries via Partners EHR/EMR system and output in SQL Server database as part of Readmission Project.
- Developed algorithm to convert insurance-orientated ICD-9 codes to clinical practice meaningful disease classification using Python.
- Conducted data analysis using logistical model, KNN and random forest method to identify high readmission risk patient and improved the
- Accuracy (C-scores) by 30 percent.
- Extracted twitter data using Python and did text mining analysis with BeautifulSoup (Python) and SAS E-Miner to improve the AHN facilities which increased the occupancy rate by 12%.
- Built complex SQL reports to audit $2.5 million of pay and insurance benefits for over 150 individual records.
- Designed and developed various analytical reports from multiple data sources by blending data on a single Worksheet in Tableau Desktop.
- Involved in the planning phase of internal & external table schemas in Hive with appropriate static and dynamic portions for efficiency.
Environment: Oracle 11g/12c, Sybase power designer, Windows7, SQL, PL SQL, Toad, TFS, MS Visio