- Over 8+ years of experience as a data engineer, data analyst, business analyst, predictive model building, data visualization and statistical analysis
- Experience in building intuitive products and experiences, while working alongside an excellent, cross - functional team across Engineering, Product and Design
- Hands on experience in writing queries inSQL and Pythonto extract, transform and load (ETL) data from large datasets using Data Staging
- Strong experience with python and its libraries Pandas, NumPy, Sci-Kit learn, Seaborn, Matplotlib and R for algorithm development, data manipulation, analysis and visualization
- Worked in Data Extraction, Transforming and Loading (ETL) using various tools such as SQL Server Integration Services (SSIS), Reporting Services (SSRS) and Analysis Services (SSAS), Pipeline Pilot
- Expert in transforming business requirements into analytical models and designing algorithms
- Proficient in developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data to improve business performance in every aspect
- Experience in designing and developing SSRS reports using data from ETL Loads, SSAS
- Expert in Data Visualization, Data Analysis, Business Intelligence, IT Analysis/Design, and Client/Server System Architecture
- Knowledge in developing and designing analytical reports and dashboards using guided analysis, interactive dashboard design and visual best practices
- Worked on Long Short-Term Memory (LSTM) using Keras for auto speech recognition and anomaly detection
- Used Sentiment Analysis to determine the emotional tone behind the series of words and gain the express of the attitudes to analyse the market of a product, customer service, fraudulent activities
- Proficient mathematical knowledge on Linear Algebra, Probability, Statistics, Stochastic Theory, Information Theory and logarithms
- Knowledge in data analysis skills including Data Mapping, Data Cleansing, logical data modelling, writing data extraction scripts and querying to report the progress of migration of jobs.
- Proficient in writing complex SQL queries like stored procedures, triggers, joints and subqueries to access and manipulate database systems like MySQL, PostgreSQL, NoSQL
- Experience in performing analytics on structured data in hive with Hive queries, Views, Partitioning, Bucketing and UDF’s usingHiveQL
- Knowledge in extracting data from multiple sources such as Excel, Access, Flat files, SQL Server to design and development dashboards using TIBCO application.
- Experience in using Alteryx data tool for data cleaning, data blending and to create more efficient data sets for reporting purposes.
- Experience in using Tableau for data visualization and designing dashboards for publishing and presenting storyline on web and desktop platforms
- Proficient in the entire project life cycle and actively involved in all the phases including data acquisition, cleaning, engineering, feature scaling, feature engineering, statistical modelling and visualization
- Experienced working with AWS & GCP - using tools such as EMR, S3, EC2, SageMaker
Analysis Tools: Tableau, TIBCO Spotfire, Power BI, MS SSRS, MS SSAS, MS Excel (VBA, VLOOKUP, Pivot tables)
Languages: Python, R, SQL, Java, C, C++
Libraries: Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn, NLKT, Tensor Flow, Keras
Database/Analytics: MySQL, PostgreSQL, NoSQL, DynamoDB, Aurora, MongoDB, Cassandra, Hive, Teradata, Vertica, Hadoop Streaming, MapReduce, SPSS, SAS, Weka, SSIS
Technical: Amazon Web Services (AWS), Spark, GIT, Jenkins, Agile
Mathematical skills: Statistics, Linear Algebra, Probability
Machine Learning Algorithms: Linear Regression, Logistic Regression, Linear Discrimination Analysis (LDA), Decision Trees, Random Forests with Adaboost and Gradient Descent Boosting, Naïve Bayes, K - Nearest Neighbor, Hierarchical clustering, K-means clustering, Density based clustering (DBSCAN)
Testing: MS Office, Eclipse, MATLAB, Visual Studio, MS Excel advance, MS Access, MS Visio, Toad Oracle
Data Engineer/Data Analyst
Confidential, Philadelphia, PA
- Responsible for understanding and analysing business requirements to develop and debug applications using BI tools
- Interact with the users and business analyst to assess the requirements and perform impact analysis
- Responsible for managing data structures and pipelines using SQL/snowflake.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis
- Worked on SQL Queries, PL/SQL procedures and convert them to ETL tasks
- Retrieved, manipulated, analyzed, aggregated and performed ETL through billions of records of claim data from databases and Hadoop cluster using PL/SQL
- Created dashboards in Tableau to determine profit generated, customer sentiment such as call volumes, case volumes for every quarter to be presented
- Responsible for design, development and testing Hadoop ETL using Hive on data at different stages of pipeline
- Knowledge in using analytical features in Tableau like Statistical function and calculations, trend lines, forecasting in tableau reports.
- UsedMatplotlib, Seabornin Python to visualize the data and performed featuring engineering such as detecting outliers, missing value and interpreting variables.
- Experience with Python Libraries Pandas, Numpy, Seaborn, Matplotlib, NLTK, Spacy, Scikit - learn, Keras and TensorFlow in developing end to end Analytical models.
- Created SSIS packages to load data from Oracle to SQL Server using various transformations in SSIS
- Responsible for manipulating data set and building models for document classification and OCR using Python and SQL
- Wrote SQL queries to identify and validate data inconsistencies in data warehouse against source system
- Created reports from OLAP, sub reports, bar charts and matrix reports using SSRS
- Worked on different data formats such as JSON, XML, CSV
- Used the AWSenvironment to extract, transform, load data from various data sourcesto tableau for reporting
- Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data.
- Worked on data processing on very large datasets thathandle missing values, creating dummy variablesand various noises in data.
- Collaborated with different functional teams to implement models and monitor outcomes
- Developed process and tools to monitor and analyse model performance and data accuracy
- Performed data wrangling to clean, transform and reshape the data utilizing Pandas library. Analysed data using SQL, Python and presentedanalytical reportsto management and technical teams
- UsedGitto apply version control. Tracked changes in files and coordinated work on the files among multiple team members
- Participated in regular grooming sessions to discuss the stories and ensure report development tasks are up to date using agile tools like RALLY and JIRA
Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Snowflake, AWS, Amazon SageMaker, Amazon Textract, Amazon Athena, Amazon Comprehend, Microsoft Excel, SSIS, SSRS, SSAS, SAS Analytics, Python, Hive, Spark, Rally
Confidential, Raliegh, NC
- Performed end-to-end analytics from data extraction to insights development for quarterly data using Tableau
- Responsible for development of data warehouse, ETL system using relational and non-relational tools like SQL and NoSQL
- Worked on combination of structured and unstructured data from multiple sources and automated the cleaning process using python scripts.
- Responsible for working on analytical problems such as data gathering, exploratory analysis, data cleaning, feature engineering using large scale complex data sets using python.
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Seaborn, Scikit-learn, NLTK in Python at various stages for developing various python scripts
- Performed in-depth analysis on data and prepared ad-hoc reports in MS Excel and SQL scripts.
- Involved in creating SSIS packages with various transformations like Slowly Changing Dimensions, Look up, Aggregate, Derived Column, Conditional Split, Fuzzy Lookup, Multicast and Data Conversion
- Used SSIS to create ETL packages (.dtsx files) to validate, extract, transform and load data to data warehouse databases
- Worked on Tableau dashboards optimization for better performance and monitoring and dealing with huge volume of data sets in Tableau
- Created Drill down Reports, Parameterized Reports, Linked Reports and Sub Reports apart from generating Ad-hoc Reports using SSRS
- Experienced in scripting HIVE QL queries to analyze millions records with different file formats like RCFile, Sequence files and text files using portioning and bucketing concepts on customer data
- Collected historical data and third-party data from different data sources. Also, worked on outlier's identification with box-plot using Pandas, Numpy
- Implemented a recommendation model to optimize sales and marketing efforts that Increased the revenue by ~3%.
- Worked on reading queues in Amazon SQS, which have paths to files in Amazon S3 Bucket. Also worked on AWS CLI to aggregate clean files in Amazon S3
- Performed database audits and created reports based on audit logs in Tableau
- Provided clear direction and motivation to project team members, champion communication across all levels of the organization and provide daily reports to management on project status.
- Developed GIT hooks for the local repository, code commit and remote repository, code push functionality and worked on the GIT-Hub.
Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Snowflake, AWS, Microsoft Excel, SSIS, SSRS, SSAS, SAS Analytics, Python, Hive, Spark.
Confidential, Dearborn, MI
- Worked on data analysis using SQL to identify data discrepancies in the existing systems and based on the analysis worked with the business stakeholders to figure out the requirements for those attributes to support data clean-up
- Wrote complex Spark SQL queries for data analysis to meet business requirement
- Developed Map Reduce/Spark Python modules for predictive analytics in Hadoop on AWS
- Worked on Celenois module, data cleaning and ensured data quality, consistency, integrity using Pandas,Numpy
- Participated in feature engineering such as feature intersection generation, feature normalize and label encoding with Scikit-learn pre-processing
- Prepare ETL mapping to extract data from multiple source systems to target system and combine the data and perform data flow and data transformation using SSIS
- Generated and deployed ETL SSIS packages to process data to target databases.
- Created reports in Tableau to illustrate the correctness of data flow from multiple sources to the system database which is used to determine the root cause of mismatch and percentage of inaccuracy with respect to each source and entity
- Worked on retrieving large datasets from database platforms like Teradata SQL, Hive, Snowflake for creating reports in business objects for summarizing important metrics
- Designed, Developed and Deployed reports in MS SQL Server using Microsoft SQL Server Reporting Services (SSRS)
- Used big data tools Spark (Pyspark, SparkSQL, Mllib) to conduct real time analysis of loan default based on AWS
- Conducted Data blending, Data preparation using Alteryx and SQL for tableau consumption and publishing data sources to Tableauserver
- Created multiple custom SQL queries in Teradata SQL Workbench to prepare the right data sets for Tableau dashboards.
- Queries involved retrieving data from multiple tables using various join conditions that enabled to utilize efficiently optimized data extracts for Tableau workbooks
Environment: MS SQL Server 2014, Teradata, ETL, SSIS, SSRS, SSAS, Alteryx, Tableau (Desktop 9.x/Server 9.x), AWS Redshift, Spark(Pyspark, MLlib, Spark SQL),Hadoop 2.x, Map Reduce, HDFS, SharePoint, Hive
Data Analyst/ Business Analyst
- Performed data analysis using SQL, data quality validation and data governance
- Responsible for creating data reports, validation data using Data quality checks and using Informatica to Profile the data then load into a new database for creating data visualizations with Power BI
- Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from RedShift.
- Explored and analyzed the customer specific features by using Spark SQL.
- Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
- Wrote test plan and created test cases and scenarios for JIRA items based on business requirements
- Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn) and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
- Performed data testing over all the environments using SQL and Tableau
- Performed various SSIS transformation on the data present in the staging database like data conversion, conditional split, copy column, merge join, derived column transformation.
- Generated data extracts in Tableau by connecting to the view using Tableau MySQL connector.
- Established ETL design practices for optimization and performance of packages and parameterized SSIS packages with the use of environment variables.
- Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis
- Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
Environment: MS SQL Server 2014, Teradata, ETL, SSIS, SSRS, SSAS, Alteryx, Tableau (Desktop 9.x/Server 9.x), Python 3.x(Scikit-Learn/Scipy/Numpy/Pandas), AWS Redshift, Spark (Pyspark, MLlib, Spark SQL), Hadoop 2.x, MapReduce, HDFS, SharePoint, Hive
- Developing thePython APIswhich represent the memory subsystem.
- Understand the memory controller functionality and developing the test plan for covering the memory controller functionalities
- Developing/Running the stress test in C to cover the memory controller flows, DIMM manufacturers and NUMA architecture
- Launching harasser across several threads along with memory stress test using thepython wrapper.
- Lead the teamin development of the harassers using thePythonwhich involved development of several flows and state machines in the memory controller.
- Lead the teamregarding the failure signature, debugging usingPython.
- Debugging/Root-causing the failures issues by capturing the DDR3/DDR4 traces using Logic analyzers
- Running and debugging the Python harassers on the Linux environment
Environment: C,Python, Linux
- Prepared test cases and plans as per functional specification documentation of existing software applications.
- Coordinated with software programmers for identification and resolution of problems in FSD.
- Managed relational database applications with UI designing services and Python languages.
- Designed and developed Hydro platform for testing and research of water resource management models.
- Provided technical assistance for development and execution of servers and client software applications.
- Worked with technical teams for performance of necessary upgrades to client and server software.
- Worked on the Reports module of the project as a developer on MS SQL Server 2005 (using SSRS, T-SQL, scripts, stored procedures and views)
- Combined data from multiple source systems like Excel, oracle, SSMS, MySql to a common staging database using SSIS data flow and control flow tasks
- Participated in examination of database applications in SQL Management studio through query tables.