Sr. Data Engineer / Data Analyst Resume
Burr Ridge, IL
PROFESSIONAL SUMMARY:
- 8+ years of experience as Data Analyst/Big Data Engineer
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Proficient inData Analysis, Cleansing, Transformation, Data Migration, Data Integration, Data Import, and Data Exportthrough use of ETL tools such as Informatica.
- Analyzed data and provided insights with R Programming and Python Pandas
- Expertise in Business Intelligence, Data warehousing technologies, ETL and Big Data technologies.
- Experience in Creating ETL mappings using Informatica to moveData from multiple sources like Flat files, Oracle into a common target area such asData Warehouse.
- Experience in writingPL/SQLstatements - Stored Procedures, Functions, Triggers and packages.
- Involved in creating database objects like tables, views, procedures, triggers, and functions using T-SQL to provide definition, structure and to maintain data efficiently.
- Skilled in Tableau Desktop versions 10x for data visualization, Reporting and Analysis.
- Developed reports, dashboards using Tableau for quick reviews to be presented to Business and IT users.
- Extensive knowledge in various reporting objects like Facts, Attributes, Hierarchies, Transformations, filters, prompts, Calculated fields, Sets, Groups, Parameters etc., in Tableau.
- Hands on learning with different ETL tools to get data in shape where it could be connected to Tableau through Tableau Data Extract.
- Expertise in writingcomplex SQL queries, made use of Indexing, Aggregation and materialized views to optimize query performance.
- Experience in utilizing SAS Procedures, Macros, and other SAS application for data extraction using Oracle and Teradata.
- Cloudera certified developer for Apache Hadoop. Good knowledge of Cassandra, Hive, Pig, HDFS, Sqoop and Map Reduce.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and load into Hive tables, which are partitioned.
- Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Expertise in working with Linux/Unix and shell commands on the Terminal.
- Extensively used Microservices and Postman for hitting Hadoop clusters.
- Expertise with Python, Scala and Java in Design, Development, Administrating and Supporting of large-scale distributed systems.
- Experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
- Migration of on premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake Store (ADLS) using Azure Data Factory (ADF V1/V2).
- Experience working with Hortonworks and Cloudera environments.
- Expertise working with AWS cloud services like EMR, S3,Redshift, EMR cloud watch, for big data development.
- Good working knowledge of Amazon Web Services(AWS) Cloud Platform which includes services likeEC2,S3,VPC,ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy,DynamoDB, Auto Scaling, Security Groups, Red shift, Cloud Watch, Cloud Formation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.
- Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
- Experience in using build/deploy tools such asJenkins, Docker and Open Shiftfor Continuous Integration & Deployment for Microservices.
- Experienced with Docker and Kuberneteson multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploying either on public or private cloud.
- Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB usingPython.
- Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle)
- Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
- Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
- Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
TECHNICAL SKILLS:
Big Data Tools: Hadoop Ecosystem: Map Reduce, Spark 2.3, Airflow 1.10.8, Nifi 2, HBase 1.2, Hive 2.3, Pig 0.17 Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0
Data Modeling Tools: Erwin Data Modeler, ER Studio v17
Programming Languages: SQL, PL/SQL, and UNIX.
Methodologies: RAD, JAD, System Development Life Cycle (SDLC), Agile
Cloud Platform: AWS, Azure, Google Cloud.
Cloud Management: Amazon Web Services (AWS)- EC2, EMR, S3, Redshift, EMR, Lambda, Athena
Databases: Oracle, MySQL, Sql Server, DB2, No SQL, MongoDB, Cassandra, HBase
OLAP Tools: Tableau, SSAS, Business Objects, and Crystal Reports 9
ETL/Data warehouse Tools: Informatica 9.6/9.1, and Tableau.
Operating System: Windows, Unix, Sun Solaris
PROFESSIONAL EXPERIENCE:
Confidential, Burr Ridge, IL
Sr. Data Engineer / Data Analyst
Responsibilities:
- Worked with team of developers designed, developed and implement BI solutions for multiple projects
- Created complex dashboard using parameters, sets, groups, and calculations to drill down and drill up in worksheets and customization using filters and actions.
- Making a data pipelining with help Data Fabric job, SQOOP, SPARK, Scala and KAFKA. Parallel working in data side oracle and MYSQL server for data designing to source to target.
- Worked on AWS Data pipeline to configure data loads from S3 to into Redshift.
- Using AWS Redshift, I Extracted, transformed and loaded data from various heterogeneous data sources and destinations
- Migrate data from on-premises to AWS storage buckets
- Developed a python script to hit REST API’s and extract data to AWS S3
- Writing big query to get data wrangling for with help of data flow in gcp cloud.
- Closely work on pub-sub model as well because of Lambda model we implemented in tcf bank.
- Design & implement Spark Sql tables, Hive scripts job with stone branch for scheduling and create work flow and task flow.
- We generally used partitions and bucketing for data in hive to get query faster. This part of hive optimization
- Write programs using Spark to move data from Storage input location to output location by running data loading, validation, and transformation to the data
- Used Scala function, dictionary and data structure (array, list, map) for better code reusability
- Based on Development, we need to do the Unit Testing.
- Developed Tableau Data Visualizations using Cross Map, Scatter Plots, Geographic Map, Heat maps, worked on different chart types based on the requirements including Pie Charts, Bar Charts, Combination Charts, Page Trails, and Density Chart.
- Developed metrics, attributes, filters, reports, dashboards and also created advanced chart types, visualizations and complex calculations to manipulate the data.
- Successfully developed front end web base dashboard with trending data, animated pipeline charts, and demographic charts with drill downs functions that displays a breakdown of Regional/States sales, utilizing data aggregation from SQL statements.
- Implanted the analyzed data into Tableau and show the regression, Trend and forecast in the dashboard for the datasets which were considered.
- Analyzed huge volumes of data. Experience with various ETL, data warehousing tools and concepts. Created data warehouse design.
- Involved in extensive data validation by writing several complexSQLqueries and Involved in back-end testing and worked with data quality issues.
- Created source to target data mappings, business rules, and business and data definitions.
- Developed mapping for fact loading from various dimension tables.
- Working on Dimensional and Relational Data Modeling using Star and Snowflake Schemas, OLTP/OLAP system, Conceptual, Logical and Physical data modeling using Erwin.
- Performing PoC for Big data solution using Hadoop for data loading and data querying
- Writing Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
- Using Sqoop to channel data from different sources of HDFS and RDBMS.
- To meet specific business requirements wrote UDF's in Scala and Store Procedures Replaced the existing Map Reduce programs and Hive Queries into Spark application using Scala
- Involved in Normalization and De-Normalization of existing tables for faster query retrieval.
- Developed python programs and excel functions using VB Script to move data and transform data.
- Developing and maintained data dictionary to create metadata reports for technical and business purpose.
- Migration of on premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2).
- Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight
- Extensively using MS Access to pull the data from various data bases and integrate the data.
Environment: Erwin9.8, BigData3.0, Hadoop3.0, AWS, Oracle12c, PL/SQL, MongoDB, DB2, Scala, Spark-SQL PySpark, Python, kafka1.1, SAS, Azure SQL, MDM, Oozie4.3, SSIS, T-SQL, ETL, HDFS, Cosmos, Pig0.17, Sqoop1.4 MS Access.
Confidential, Urbandale, IA
Sr. Data Engineer
Responsibilities:
- Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines
- Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines
- Performed data extraction, transformation, loading, and integration in data warehouse, operational data stores and master data management
- Strong understanding of AWS components such as EC2 and S3
- Performed Data Migration to GCP
- Responsible for data services and data movement infrastructures
- Experienced in ETL concepts, building ETL solutions and Data modeling
- Worked on architecting the ETL transformation layers and writing spark jobs to do the processing.
- Aggregated daily sales team updates to send report to executives and to organize jobs running on Spark clusters
- Loaded application analytics data into data warehouse in regular intervals of time
- Designed & build infrastructure for the Google Cloud environment from scratch
- Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS, GCP
- Worked on confluence and Jira
- Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python
- Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling
- Compiled data from various sources to perform complex analysis for actionable results
- Measured Efficiency of Hadoop/Hive environment ensuring SLA is met
- Optimized the Tensor Flow Model for efficiency
- Analyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes
- Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWS
- Built performant, scalable ETL processes to load, cleanse and validate data
- Participated in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies
- Collaborate with team members and stakeholders in design and development of data environment
- Start working with AWS for storage and halding for tera byte of data for customer BI Reporting tools
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Imported several transactional logs from web servers with Flume to ingest the data into HDFS.
- Using Flume and Spool directory for loading the data from local system (LFS) to HDFS.
- Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
- Created Partitioned Hive tables and worked on them using Hive QL.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Worked on continuous Integration tools Jenkins and automated jar files at end of day.
- Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.
- Developed Map Reduce programs in Java for parsing the raw data and populating staging Tables.
- Experience in setting up the whole app stack, setup, and debug log stash to send Apache logs to AWS Elastic search.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Data bricks.
- Tested Apache Tez for building high performance batch and interactive data processing applications on Pig and Hive jobs.
- Monitored cluster health by Setting up alerts using Nagios and Ganglia
- Adding new users and groups of users as per the requests from the client
- Working on tickets opened by users regarding various incidents, requests
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Preparing associated documentation for specifications, requirements, and testing
Environment: HBase, Map Reduce, AWS, Gcp, Azure, Data bricks. Data factory, G-Cloud Function, Apache Tez Cloud Dataflow, Jira, MySQL, Posgres, Sql Server, Python, Scala, Spark, Hive, Spark-Sql
Confidential, Columbus, OH
Big Data Engineer
Responsibilities:
- Gathering data and business requirements from end users and management. Designed and built data solutions to migrate existing source data in Data Warehouse to Atlas Data Lake (Big Data)
- Performed all the Technical Data quality (TDQ) validations which include Header/Footer validation, Record count, Data Lineage, Data Profiling, Check sum, Empty file, Duplicates, Delimiter, Threshold, DC validations for all Data sources.
- Analyzed huge volumes of data Devised simple and complex HIVE, SQL scripts to validate Dataflow in various applications. Performed Cognos report validation. Made use of MHUB for validating Data Profiling & Data Lineage.
- Involved in creating CreatedTableaudashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts etc. using show me functionality.Dashboards and stories as needed usingTableauDesktop andTableauServer
- Responsible for daily communications to management and internal organizations regarding status of all assigned projects and tasks.
- Executed quantitative analysis on chemical products to recommend effective combinations
- Performed statistical analysis using SQL, Python, R Programming and Excel.
- Worked extensively with Excel VBA Macros, Microsoft Access Forms
- Import, clean, filter and analyze data using tools such as SQL, HIVE and PIG.
- Used Python& SAS to extract, transform & load source data from transaction systems, generated reports, insights, and key conclusions.
- Manipulated and summarized data to maximize possible outcomes efficiently
- Developed story telling dashboards inTableauDesktop and published them on toTableauServer which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.
- Analyzed and recommended improvements for better data consistency and efficiency
- Designed and Developeddata mapping procedures ETL-Data Extraction,Data Analysis and Loading process for integratingdata using R programming.
- Effectively Communicated plans, project status, project risks and project metrics to the project team planned test strategies in accordance with project scope.
- Analyzing the issue and doing Impact analysis for the same
- Data Ingest from Sqoop & flume from Oracle data base.
- Work on implementing various stages of Data Flow in the Hadoop ecosystem - Ingestion, Processing, Consumption.
- Start working with AWS for storage and halding for tera byte of data for customer BI Reporting tools
- Decommissioning nodes and adding nodes in the clusters for maintenance
- Adding new users and groups of users as per the requests from the client
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
- Working on tickets opened by users regarding various incidents, requests
- Experience in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension)
- Monitored cluster health by Setting up alerts using Nagios and Ganglia
- Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance.
- Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines
- Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling skilled in data visualization like Matplotlib and seaborn library
- Hands on experience with big data tools like Hadoop, Spark, Hive
- Experience implementing machine learning back-end pipeline with Pandas, NumPy
Environment: Hive, AWS, Hadoop, HDFS, Python, PL/SQL, SQL, Oracle, Python, R Programming, Cognos, Tableau NumPy, Pandas, Jira, PIG, Spark, Linux.
Confidential
Data Analyst
Responsibilities:
- Assisting in designing the overall ETL solutions including analyzing data, preparation of high level and detailed design documents, test and data validation plans and deployment strategy.
- Prepared the technical mappingspecifications, process flow and error handling document.
- Developed both simple and complex mappings implementing complex business logic using variety of transformation logic like Unconnected and connected Lookups, Router, Filter, Expression, Aggregator, Joiner, Update Strategy, Unconnected and Connected Stored procedures, normalizer and more.
- Created various tasks like Pre\Post Session Commands, Timer, Event Wait, Event Raise, Email and Command task.
- Experienced in writing live Real-time Processing using Spark Streaming with Kafka
- Used Hive QL to analyse the partitioned and bucketed data and compute various metrics for reporting
- Experienced in querying data using Spark SQL on top of Spark engine
- Involved in managing and monitoringHadoopcluster using Cloudera Manager.
- Used Python and Shell scripting to build pipelines.
- Developed data pipeline using sqoop, HQL, Spark and Kafka to ingest Enterprise message delivery data into HDFS.
- Developed workflow in Oozie also in Airflow to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts
- Extensively worked in database components like SQL, PL/SQL, Stored Procedures, Stored Functions, Packages and Triggers.
- Performs Code review and troubleshooting of existing Informatica mappings and deployment of code from Development to test to production environment.
- Supporting other ETL developers, providing mentoring, technical assistance, troubleshooting and alternative development solutions
Environment: HDFS, Hive, Java, Sqoop, Spark, Clouder Manager, Splunk, Oracle, Elastic search, Jira, Confluence Shell/Perl Scripting, Pl/SQL, Sql Server
Confidential
Data Engineer
Responsibilities:
- Participated in testing of procedures and Data utilizing, PL/SQL to ensure integrity and quality of Data in Data warehouse.
- Gathered Data from Help Desk Ticketing System and write ad-hoc reports and, charts and graphs for analysis.
- Worked to ensure high levels of Data consistency between diverse source systems including flat files, XML and SQL Database.
- Developed and run ad-hoc Data queries from multiple database types to identify system of records, Data inconsistencies, and Data quality issues.
- Developed complex SQL statements to extract the Data and packaging/encrypting Data for delivery to customers.
- Provided business intelligence analysis to decision-makers using an interactive OLAP tool
- Created T/SQL statements (select, insert, update, delete) and stored procedures.
- Defined Data requirements and elements used in XML transactions.
- Created Informatica mappings using various Transformations like Joiner, Aggregate, Expression, Filter and Update Strategy.
- Performed Tableau administering by using tableau admin commands.
- Involved in defining the source to target Data mappings, business rules and Data definitions.
- Ensured the compliance of the extracts to the Data Quality Centre initiatives
- Metrics reporting, Data mining and trends in helpdesk environment using Access
- Worked on SQL Server Integration Services (SSIS) to integrate and analyse data from multiple heterogeneous information sources.
- Built reports and report models using SSRS to enable end user report builder usage.
- Created Excel charts and pivot tables for the Ad-hoc Data pull.
- Created Column Store indexes on dimension and fact tables in the OLTP database to enhance read operation.
Environment: SQL, PL/SQL, T/SQL, XML, Informatica, Tableau, OLAP, SSIS, SSRS, Excel, OLTP.
