Data Analyst Resume
AtlantA
SUMMARY
- Professional with 8+ years of experience in Acquisition of correct Datasets, Data Scrubbing to mine the target data, Data Engineering to extract features utilizing Statistical Techniques, Exploratory Data Analysis with an inquisitive mind, build diverse Machine Learning Algorithms for developing Predictive Models, and design Stunning Visualizations to halp the growth of Business Profitability.
- Skilled in performing data parsing, data manipulation, and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.
- Experience in using various packages in R and python - like ggplot2, caret, dplyr, Rweka, gmodels, Twitter, NLP, Reshape2, rjson, dplyr, pandas, NumPy, Seaborn, SciPy, Matplotlib, sci-kit-learn, Beautiful Soup.
- Outstanding pre-eminence in Data extraction, Data cleaning, Data Loading, Statistical Data Analysis, Exploratory Data Analysis, Data Wrangling, Predictive Modeling using R, Python, and Data visualization using Power BI.
- Solid experience in Deep Learning techniques with Convolutional Neural Networks (CNN), Recursive Neural Networks (RNN), max pooling, normalization, and different architectures such as Alexnet, VGG, and Darknet.
- Proficient in Machine Learning algorithms and Predictive Modeling including Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Neural Networks, Random Forest, Ensemble Models, SVM, KNN, and K-means clustering.
- Solid experience on data extraction for analysis from Oracle, MS SQL Server, MS Access and Vertica using SQL. Plus noledge of PostgreSQL/Pg Admin, MySQL & MySQL Workbench.
- Expert in python libraries such as NumPy, SciPy for mathematical calculations, Pandas for data pre-processing/wrangling, Mat plot, Seaborn for datavisualization, Sklearn for machine learning, Theano, TensorFlow, Keras for Deep learning, and NLTK for NLP.
- Experience and Technical proficiency in Designing, Data Modelling Online Applications, Solution Lead for Architecting Data Warehouse / Business Intelligence Applications.
- Performed Data Analysis and Data validation by writing complex SQL queries using SQL against the SQL server database.
- Knowledge of various reporting objects like Facts, Attributes, Hierarchies, Transformations, Filters, Prompts, Sets, and groups in Tableau.
- Created multiple table views and reports generated through Qlik view and tableau for business analysis.
- Experience with Data Extraction, Transforming, and Loading (ETL) using various tools such as Data Transformation Service (DTS), SSIS.
- Used Tableau to analyze and obtain insights into large datasetsto create visually compelling and actionable interactive reports and dashboards.
- Experienced in generating and documenting Metadata while designing the OLTP and OLAP systems environment.
- Excellence in handling Big Data Ecosystems like Apache Hadoop, MapReduce, Spark, HDFS Architecture, Cassandra, HBase, Sqoop, Hive, Pig, MLLib, ELT.
- Increased customer visibility by developing real-time insights to salespeople and sales managers using Tableau, SAS, QlikView, Microsoft BI, Matplotlib, ggplot2, Bokeh, Shiny, Dygraphs resulting in boosting the revenue by 10%.
- Conscientiously skilled in System Analysis, E-R/Dimensional Data Modeling, Design, and implementing RDBMS-specific features. Conventionally accessing the JIRA tool and other internal issue trackers for the project developments.
TECHNICAL SKILLS
Application/Web Servers: JBoss, Glassfish 2.1, WebLogic, Web Sphere, Apache Tomcat Server.
Machine Learning: Classification, Regression, Decision Trees, Random Forest, SVM, k-nearest Neighborhood, k-means Clustering, Hierarchical Clustering, Logistic Regression, TensorFlow, Keras.
MS-Office Package: Microsoft Office (Windows, Word, Excel, PowerPoint, Visio, Project).
Data Science: Data Modelling, Data Wrangling, Feature Selection/Engineering, Data Visualization.
Databases: Microsoft SQL Server 2014/2012/2008 R2/2008, MySQL, Oracle, DB2, Teradata, MS Access.
Tools: Anaconda, PyCharm, GitHub, Foursquare API, Power BI, PostgreSQL
R and Python Packages: Dplyr, Ggplot2, Caret, TensorFlow, Pandas, Matplotlib,Seaborn, Scikit learn, NumPy, NLTK
Programming: Python (Pandas, NumPy, ScikitLearn, Seaborn, Matplotlib, Geopy, SciPy), SQL, C.
PROFESSIONAL EXPERIENCE
Data Analyst
Confidential, Atlanta
Responsibilities:
- Analyzed the Business Requirements Specification Documents and Source to Target Mapping Documents and identified the test requirements.
- Developed predictive models on large-scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning, and deep learning.
- Updated Pythonscripts to match data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Created XML schema definitions (XSDs) with the XMLSpy tool and converted them into Informatica metadata.
- Automated the report delivery to the management Team.
- Working noledge of REMS (Risk Evaluation and Mitigation Strategies) to ensure that the benefits of drugs and biological products outweigh the risks
- Preparedtest cases,technical documentsandfunctional documents.
- Expanded existing limited code documentation with additional line REMS statements, “FDIC code” notes, run notes and caveats, likely future changes, and overall process documentation.
- Used the Data Stage Designer to develop processes for extracting, cleansing, transforms, integrating, and loading data into the data warehouse database.
- Worked with datasets of varying degrees of size and complexity including both structured and unstructured data. Piping and processing massive datastreams in distributed computing environments such as Hadoop to facilitate analysis (ETL).
- Gathered business reporting / intelligence requirements (specifically, risk reporting) from SMEs, meetings, emails, complaints and concerns voiced in team meetings, change management, and detailed reads of EU, federal and state banking regulations; added data quality standards and risks to requirements
- SQLqueries for variousRDBMSsuch asSQL Server, MySQL, Microsoft SQL, Postgre SQL, TeradataandOracle,NoSQL databasessuch asMongoDB, HBaseandCassandrato handle unstructured data
- Participated in data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using pig, and Hive
- Superintended usage of Python NumPy, SciPy, Pandas, Matplotlib, Stats packages to perform dataset manipulation, data mapping, data cleansing, and feature engineering. Built and analyzed datasets using R and Python.
- Used Data Quality validation techniques to validate Critical Data Elements (CDE) and identified many anomalies. Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R, and Python.
- Added line REM documentation to explain all filters and processing steps; created overall processing documentation (“runbook”) for run conditions and use of SQL scripts within C# wrapper. Started conversion of T-SQL scripts to Oracle SQL.
- Good experience in working withAutonomous Transactionsin PL/SQL.
- ETL processes using SSIS to bring data from the hospitals Eagle and All scripts products
- Plans and coordinates the administration of PostgreSQL databases to ensure accurate, appropriate, and TEMPeffective use of data, including database definition, structure, documentation, long - range requirements, and operational guidelines. experience as a techno - functional resource in Oracle E-Business Suite (11i/ R12) implementations, upgrades, and support mainly in the Procure to Pay modules
- R programming language for graphically critiquing the data and performed data mining. Interpreting Business requirements, data mapping specifications, and visualized data as per the business requirements using R shiny.
- Installing and monitoring PostgreSQL database using the standard monitoring tools like Nagios etc.
- Consulted with and advises staff DBA’s on database management issues, influencing the PostgreSQL database.
- Studied and stayedcurrentonfeaturesand functionality ofPostgreSQL.
- The predominant practice of the Python Matplotlib package and Power BI to visualize and graphically analyze the data. Data pre-processing, Splitting the identified data set into set and Test set using other libraries in python.
- Experience inDataLake implementationwith Big Data Technologies including HDFS, Spark Cloudera, Hive
- Install,configure, test, monitor, upgrade, andtune new and existingPostgreSQL databases.
- Worked with Azure Data Factory (ADF) since its a greatSaaS solutionto compose and orchestrate Azure data services.
- Experience in designing Star Schema, Snowflake schema for Data Warehouse, by using tools like Erwin data modeler, Power Designer and Embarcadero E - R Studio.
- Involved in data Modelling using HIVE for Analytics, File processing using PIG, and Data load management.
- Elucidating the continuous improvement opportunities for current predictive modeling algorithms. Proactively collaborates with business partners to determine identified population segments and develop actionable plans to enable the identification of patterns related to quality, use, cost, and other variables.
- Expertise in the development of High-level design, Conceptual design, Logical and Physical design for Database, Data warehousing and many Distributed IT systems.
- Performed extensive data modeling to differentiate between the OLTP and Data Warehouse data models.
- Developed Test Cases for Deployment Verification, ETL Data Validation, Cube Testing,and Report testing.
Environment: Informatica Power center, ERWIN, PL/SQL, ETL Data Validation, Cube Testing, Report testing, OLTP, SDLC, Pandas, NumPy, Seaborn, Data Marts, MicroStrategy, UAT, Python.
Data Analyst
Confidential, Dublin
Responsibilities:
- Perform root cause analysis on smaller self-contained data analysis tasks that are related to assigned data processes.
- Worked to ensure high levels of data consistency between diverse source systems including flat files, XML, and SQL Database.
- Created reports and dashboards, by using D3.js and Tableau 9.x, to explain and communicate data insights, significant features, model scores, and performance of the new recommendation system to both technical and business teams.
- Updated Pythonscripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Used DataQuality validation techniques to validate Critical Data Elements (CDE) and identified many anomalies. Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel and Python.
- Involved in planning the Hadoop cluster infrastructure, resources capacity and build plan for Hadoop cluster installations.
- Extracted data from databases like Oracle, SQL Server, and DB2 using Informatica to load it into a single repository for data analysis.
- Excellent in Data Analysis, Data Profiling, Data Validation, Data Cleansing, Data Verification, and Data Mismatch Identification.
- Develop and run ad hoc data queries from multiple database types to identify the system of records, data inconsistencies, and data quality issues.
- Wrote SQL Stored Procedures and Views, and coordinate and perform in-depth testing of new and existing systems.
- Provided support to Data Architect and Data Modeler in Designing and Implementing Databases for MDM using ERWIN Data Modeler Tool and MS Access.
- Architecting WorkFlows, Activity Hierarchy & Process Flows; Documenting using Interface Diagrams, Flow Charts & Specification Documents.
- Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and their definitions.
- Perform Data Analysis on the Analytic data present in Teradata, Hadoop, and AWS using SQL, Teradata SQL Assistant
- Expertise in configuring and creating SSIS solutions for ETL and Business Intelligence Process for Data Integration and Migration services.
- Develop and maintain data solutions that utilize Oracle DB., Microsoft SQL Server Reporting Services,and Excel.
Environment: Informatica, SAS/BASE, SAS/Access, SAS/Connect, XML,Erwin, Pivot tables, Snowflake schema, Star schema, VLOOKUP, Teradata, Python, Informatica, SSRS, SSIS, UNIX, SQL, Oracle, Tableau, JIRA.
Data Modeler/Data Analyst
Confidential, Akron, OH
Responsibilities:
- Created conceptual, logical, and physical models based on requirements gathered through interviews with the business users.
- Updated existing models to integrate new functionality into an existing application. Conducted one-on-one sessions with business users to gather warehouse requirements.
- Analyzed database requirements in detail with the project stakeholders by conducting joint Requirement Development sessions.
- Developed normalized Logical and Physical database models to design the OLTP system.
- Created a dimensional model for the reporting system by identifying the required dimensions and facts using Erwin.
- Business Analyst for REMS project to convert MS Access reports through SQL Server Reporting Services (SSRS). Defined requirements through to implementation for REMS balancing controls.
- Used forward engineering to create a Physical Data Model with DDL that best suits the requirements from the Logical Data Model.
- Maintaining and implementing Data Models for Enterprise Data Warehouse using ERWIN.
- Create and maintain Metadata, including table, column definitions.
- Worked on PL/SQL programming Stored Procedures, Functions, Packages,and Triggers
- Used Model Mart of Erwin for TEMPeffective model management of sharing, dividing, and reusing model information and design for productivity improvement.
- Eliminated errors in Erwin models through the implementation of Model Mart (a companion tool to Erwin that controls the versioning of models).
- Used Erwin for reverse engineering to connect to existing database and ODS to create a graphical representation in the form of Entity Relationships and elicit more information
- Identified the most appropriate data sources based on an understanding of corporate data thus providing a higher level of consistency in reports being used by various levels of management.
Environment: Erwin r8, Windows XP NT 2000, SQL Server 2008, Teradata, Oracle11g, DB2, Informix, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro.
Data Modeler/Data Analyst
Confidential - Los Angeles, CA
Responsibilities:
- Involved in the entire data Migration process from analyzing the existing data, cleansing, validating, translating tables, converting, and subsequent upload into the new platform.
- Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
- Implemented Forward engineering to create tables, views and SQL scripts, and mapping documents.
- Prepared data dictionaries and Source-Target Mapping documents to ease the ETL process and user's understanding of the data warehouse objects.
- Performed data manipulations using various Informatica Transformations like Joiner Expression Lookup Aggregate Filter Update Strategy and Sequence Generator etc.
- Created documentation and test cases, worked with users for new module enhancements and testing.
- Worked with business analysts to design weekly reports using a combination of Crystal Reports.
- Identified existing data models and documented suspected designs effecting the performance of the system.
- Extracted data from databases like Oracle, SQL Server, and DB2 using Informatica to load it into a single repository for data analysis.
- Involved in the development and implementation of SSIS, SSRS, and SSAS application solutions for various business units across the organization.
Environment: Oracle SQL Developer, Oracle Data Modeler, Teradata, SSIS, Business Objects, Teradata, Oracle 10g, SQL server 2012, SQL Assistant, data stage 8.1, DB2, Informatica Power Center.
Jr.Data Analyst
Confidential
Responsibilities:
- Preparation of unstructured data from multiple data files from customers database and ad hoc reporting requests for customers as needed along with business review analysis of the program data.
- Communicated with the Source code provider in case of any discrepancies.
- Extracted, Transformed, and Loaded (ETL) data to map it from disparate sources to the required target database.
- Worked on production support activities to monitor all Daily, Weekly, Monthly, Quarterly jobs and in Scheduler, fixing the failed workflows, communicating to the different teams to get the issue fixed based on the issue.
- Monitoring all Daily, Weekly, Monthly, Quarterly jobs, and tracking the run statistics.
- Delved into data to discover discrepancies and patterns.
- Collaborated with management and internal teams to implement and evaluate improvements.
