Data Engineer/Data Analyst Resume Philadelphia, PA - Hire IT People

SUMMARY

Over 8+ years of experience as a data engineer, data analyst, business analyst, predictive model building, data visualization and statistical analysis
Experience in building intuitive products and experiences, while working alongside an excellent, cross - functional team across Engineering, Product and Design
Hands on experience in writing queries inSQL and Pythonto extract, transform and load (ETL) data from large datasets using Data Staging
Strong experience with python and its libraries Pandas, NumPy, Sci-Kit learn, Seaborn, Matplotlib and R for algorithm development, data manipulation, analysis and visualization
Worked in Data Extraction, Transforming and Loading (ETL) using various tools such as SQL Server Integration Services (SSIS), Reporting Services (SSRS) and Analysis Services (SSAS), Pipeline Pilot
Expert in transforming business requirements into analytical models and designing algorithms
Proficient in developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data to improve business performance in every aspect
Experience in designing and developing SSRS reports using data from ETL Loads, SSAS
Expert in Data Visualization, Data Analysis, Business Intelligence, IT Analysis/Design, and Client/Server System Architecture
Knowledge in developing and designing analytical reports and dashboards using guided analysis, interactive dashboard design and visual best practices
Worked on Long Short-Term Memory (LSTM) using Keras for auto speech recognition and anomaly detection
Used Sentiment Analysis to determine the emotional tone behind the series of words and gain the express of the attitudes to analyse the market of a product, customer service, fraudulent activities
Proficient mathematical knowledge on Linear Algebra, Probability, Statistics, Stochastic Theory, Information Theory and logarithms
Knowledge in data analysis skills including Data Mapping, Data Cleansing, logical data modelling, writing data extraction scripts and querying to report the progress of migration of jobs.
Proficient in writing complex SQL queries like stored procedures, triggers, joints and subqueries to access and manipulate database systems like MySQL, PostgreSQL, NoSQL
Experience in performing analytics on structured data in hive with Hive queries, Views, Partitioning, Bucketing and UDF’s usingHiveQL
Knowledge in extracting data from multiple sources such as Excel, Access, Flat files, SQL Server to design and development dashboards using TIBCO application.
Experience in using Alteryx data tool for data cleaning, data blending and to create more efficient data sets for reporting purposes.
Experience in using Tableau for data visualization and designing dashboards for publishing and presenting storyline on web and desktop platforms
Proficient in the entire project life cycle and actively involved in all the phases including data acquisition, cleaning, engineering, feature scaling, feature engineering, statistical modelling and visualization
Experienced working with AWS & GCP - using tools such as EMR, S3, EC2, SageMaker

TECHNICAL SKILLS

Analysis Tools: Tableau, TIBCO Spotfire, Power BI, MS SSRS, MS SSAS, MS Excel (VBA, VLOOKUP, Pivot tables)

Languages: Python, R, SQL, Java, C, C++

Libraries: Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn, NLKT, Tensor Flow, Keras

Database/Analytics: MySQL, PostgreSQL, NoSQL, DynamoDB, Aurora, MongoDB, Cassandra, Hive, Teradata, Vertica, Hadoop Streaming, MapReduce, SPSS, SAS, Weka, SSIS

Technical: Amazon Web Services (AWS), Spark, GIT, Jenkins, Agile

Mathematical skills: Statistics, Linear Algebra, Probability

Machine Learning Algorithms: Linear Regression, Logistic Regression, Linear Discrimination Analysis (LDA), Decision Trees, Random Forests with Adaboost and Gradient Descent Boosting, Naïve Bayes, K - Nearest Neighbor, Hierarchical clustering, K-means clustering, Density based clustering (DBSCAN)

Testing: MS Office, Eclipse, MATLAB, Visual Studio, MS Excel advance, MS Access, MS Visio, Toad Oracle

PROFESSIONAL EXPERIENCE

Data Engineer/Data Analyst

Confidential, Philadelphia, PA

Responsibilities:

Responsible for understanding and analysing business requirements to develop and debug applications using BI tools
Interact with the users and business analyst to assess the requirements and perform impact analysis
Responsible for managing data structures and pipelines using SQL/snowflake.
Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis
Worked on SQL Queries, PL/SQL procedures and convert them to ETL tasks
Retrieved, manipulated, analyzed, aggregated and performed ETL through billions of records of claim data from databases and Hadoop cluster using PL/SQL
Created dashboards in Tableau to determine profit generated, customer sentiment such as call volumes, case volumes for every quarter to be presented
Responsible for design, development and testing Hadoop ETL using Hive on data at different stages of pipeline
Knowledge in using analytical features in Tableau like Statistical function and calculations, trend lines, forecasting in tableau reports.
UsedMatplotlib, Seabornin Python to visualize the data and performed featuring engineering such as detecting outliers, missing value and interpreting variables.
Experience with Python Libraries Pandas, Numpy, Seaborn, Matplotlib, NLTK, Spacy, Scikit - learn, Keras and TensorFlow in developing end to end Analytical models.
Created SSIS packages to load data from Oracle to SQL Server using various transformations in SSIS
Responsible for manipulating data set and building models for document classification and OCR using Python and SQL
Wrote SQL queries to identify and validate data inconsistencies in data warehouse against source system
Created reports from OLAP, sub reports, bar charts and matrix reports using SSRS
Worked on different data formats such as JSON, XML, CSV
Used the AWSenvironment to extract, transform, load data from various data sourcesto tableau for reporting
Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data.
Worked on data processing on very large datasets thathandle missing values, creating dummy variablesand various noises in data.
Collaborated with different functional teams to implement models and monitor outcomes
Developed process and tools to monitor and analyse model performance and data accuracy
Performed data wrangling to clean, transform and reshape the data utilizing Pandas library. Analysed data using SQL, Python and presentedanalytical reportsto management and technical teams
UsedGitto apply version control. Tracked changes in files and coordinated work on the files among multiple team members
Participated in regular grooming sessions to discuss the stories and ensure report development tasks are up to date using agile tools like RALLY and JIRA

Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Snowflake, AWS, Amazon SageMaker, Amazon Textract, Amazon Athena, Amazon Comprehend, Microsoft Excel, SSIS, SSRS, SSAS, SAS Analytics, Python, Hive, Spark, Rally

Data Analyst

Confidential, Raliegh, NC

Responsibilities:

Performed end-to-end analytics from data extraction to insights development for quarterly data using Tableau
Responsible for development of data warehouse, ETL system using relational and non-relational tools like SQL and NoSQL
Worked on combination of structured and unstructured data from multiple sources and automated the cleaning process using python scripts.
Responsible for working on analytical problems such as data gathering, exploratory analysis, data cleaning, feature engineering using large scale complex data sets using python.
Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Seaborn, Scikit-learn, NLTK in Python at various stages for developing various python scripts
Performed in-depth analysis on data and prepared ad-hoc reports in MS Excel and SQL scripts.
Involved in creating SSIS packages with various transformations like Slowly Changing Dimensions, Look up, Aggregate, Derived Column, Conditional Split, Fuzzy Lookup, Multicast and Data Conversion
Used SSIS to create ETL packages (.dtsx files) to validate, extract, transform and load data to data warehouse databases
Worked on Tableau dashboards optimization for better performance and monitoring and dealing with huge volume of data sets in Tableau
Created Drill down Reports, Parameterized Reports, Linked Reports and Sub Reports apart from generating Ad-hoc Reports using SSRS
Experienced in scripting HIVE QL queries to analyze millions records with different file formats like RCFile, Sequence files and text files using portioning and bucketing concepts on customer data
Collected historical data and third-party data from different data sources. Also, worked on outlier's identification with box-plot using Pandas, Numpy
Implemented a recommendation model to optimize sales and marketing efforts that Increased the revenue by ~3%.
Worked on reading queues in Amazon SQS, which have paths to files in Amazon S3 Bucket. Also worked on AWS CLI to aggregate clean files in Amazon S3
Performed database audits and created reports based on audit logs in Tableau
Provided clear direction and motivation to project team members, champion communication across all levels of the organization and provide daily reports to management on project status.
Developed GIT hooks for the local repository, code commit and remote repository, code push functionality and worked on the GIT-Hub.

Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Snowflake, AWS, Microsoft Excel, SSIS, SSRS, SSAS, SAS Analytics, Python, Hive, Spark.

Data Analyst

Confidential, Dearborn, MI

Responsibilities:

Worked on data analysis using SQL to identify data discrepancies in the existing systems and based on the analysis worked with the business stakeholders to figure out the requirements for those attributes to support data clean-up
Wrote complex Spark SQL queries for data analysis to meet business requirement
Developed Map Reduce/Spark Python modules for predictive analytics in Hadoop on AWS
Worked on Celenois module, data cleaning and ensured data quality, consistency, integrity using Pandas,Numpy
Participated in feature engineering such as feature intersection generation, feature normalize and label encoding with Scikit-learn pre-processing
Prepare ETL mapping to extract data from multiple source systems to target system and combine the data and perform data flow and data transformation using SSIS
Generated and deployed ETL SSIS packages to process data to target databases.
Created reports in Tableau to illustrate the correctness of data flow from multiple sources to the system database which is used to determine the root cause of mismatch and percentage of inaccuracy with respect to each source and entity
Worked on retrieving large datasets from database platforms like Teradata SQL, Hive, Snowflake for creating reports in business objects for summarizing important metrics
Designed, Developed and Deployed reports in MS SQL Server using Microsoft SQL Server Reporting Services (SSRS)
Used big data tools Spark (Pyspark, SparkSQL, Mllib) to conduct real time analysis of loan default based on AWS
Conducted Data blending, Data preparation using Alteryx and SQL for tableau consumption and publishing data sources to Tableauserver
Created multiple custom SQL queries in Teradata SQL Workbench to prepare the right data sets for Tableau dashboards.
Queries involved retrieving data from multiple tables using various join conditions that enabled to utilize efficiently optimized data extracts for Tableau workbooks

Environment: MS SQL Server 2014, Teradata, ETL, SSIS, SSRS, SSAS, Alteryx, Tableau (Desktop 9.x/Server 9.x), AWS Redshift, Spark(Pyspark, MLlib, Spark SQL),Hadoop 2.x, Map Reduce, HDFS, SharePoint, Hive

Data Analyst/ Business Analyst

Confidential, Maryland

Responsibilities:

Performed data analysis using SQL, data quality validation and data governance
Responsible for creating data reports, validation data using Data quality checks and using Informatica to Profile the data then load into a new database for creating data visualizations with Power BI
Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from RedShift.
Explored and analyzed the customer specific features by using Spark SQL.
Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
Wrote test plan and created test cases and scenarios for JIRA items based on business requirements
Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn) and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
Performed data testing over all the environments using SQL and Tableau
Performed various SSIS transformation on the data present in the staging database like data conversion, conditional split, copy column, merge join, derived column transformation.
Generated data extracts in Tableau by connecting to the view using Tableau MySQL connector.
Established ETL design practices for optimization and performance of packages and parameterized SSIS packages with the use of environment variables.
Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis
Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.

Environment: MS SQL Server 2014, Teradata, ETL, SSIS, SSRS, SSAS, Alteryx, Tableau (Desktop 9.x/Server 9.x), Python 3.x(Scikit-Learn/Scipy/Numpy/Pandas), AWS Redshift, Spark (Pyspark, MLlib, Spark SQL), Hadoop 2.x, MapReduce, HDFS, SharePoint, Hive

Python developer

Confidential

Responsibilities:

Developing thePython APIswhich represent the memory subsystem.
Understand the memory controller functionality and developing the test plan for covering the memory controller functionalities
Developing/Running the stress test in C to cover the memory controller flows, DIMM manufacturers and NUMA architecture
Launching harasser across several threads along with memory stress test using thepython wrapper.
Lead the teamin development of the harassers using thePythonwhich involved development of several flows and state machines in the memory controller.
Lead the teamregarding the failure signature, debugging usingPython.
Debugging/Root-causing the failures issues by capturing the DDR3/DDR4 traces using Logic analyzers
Running and debugging the Python harassers on the Linux environment

Environment: C,Python, Linux

Software Engineer

Confidential

Responsibilities:

Prepared test cases and plans as per functional specification documentation of existing software applications.
Coordinated with software programmers for identification and resolution of problems in FSD.
Managed relational database applications with UI designing services and Python languages.
Designed and developed Hydro platform for testing and research of water resource management models.
Provided technical assistance for development and execution of servers and client software applications.
Worked with technical teams for performance of necessary upgrades to client and server software.
Worked on the Reports module of the project as a developer on MS SQL Server 2005 (using SSRS, T-SQL, scripts, stored procedures and views)
Combined data from multiple source systems like Excel, oracle, SSMS, MySql to a common staging database using SSIS data flow and control flow tasks
Participated in examination of database applications in SQL Management studio through query tables.

We provide IT Staff Augmentation Services!

Data Engineer/data Analyst Resume

Philadelphia, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship