We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

Plano, TX

SUMMARY

  • Software professional with 7+ years of experience in providing comprehensive business reports with detailed analysis for executive review to make business decisions
  • Skill in Database Design and Development, Data Flow Diagrams, Normalization, Report Automation, SQL Reporting, Requirements Analysis, Data Quality Assurance, Data Modeling, Data Warehousing, Data Analysis, Agile/Scrum, OLAP, OLTP, Multidimensional Databases, KPIs
  • Proficient in data transformation, processing, and extraction using Python, SQL, ETL’s and Macros
  • Good experience of software development in Python and IDEs: PyCharm, sublime text, Jupyter Notebook
  • Experienced in using python libraries like Pandas, NumPy, SQLalchemy, PySpark, Boto3 and Matplotlib
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
  • Good understanding of Amazon Web Services (AWS) cloud computing platform and migrating the application from existing systems to AWS
  • Worked on Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Migrated data from legacy systems (SQL Server, Teradata, Cassandra) to cloud based (AWS) Snowflake Data Warehouse.
  • Experience in writing Snowflake’s SnowSQL
  • Running of Apache Hadoop, CDH and Map - R distributions, Elastic Map Reduce (EMR) on (EC2).
  • Expert in using ELK Stack; Elasticsearch for deep search and data analytics, Logstash for centralized logging, log enrichment and parsing and Kibana for powerful and beautiful data visualizations.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce concepts
  • Configured alerting rules and setup PagerDuty alerting for Elasticsearch, Kafka, Logstash and different Microservices in Kibana
  • Highly experienced in creating different types of Tabular Reports, Crystal Reports, Matrix Reports, Drill-Down, Cross Tab Reports and distributed reports in multiple formats using SSRS
  • Strong knowledge on creating Extract, Transform and Load (ETL) packages in SQL Server Integration Services for data migration between various databases and Building data sources for Tableau Desktop
  • Experience including analysis, modeling, design, and development ofTableaureports and dashboards for analytics. Expertise in usingTableauServer,TableauDesktop andTableauPublic.
  • Strongly followed PEP-8 coding standard and test a program by running it across test cases to ensure validity and effectiveness of code using Pylint

TECHNICAL SKILLS

Programming Languages: C, C++, Visual Basic, SQL, PYTHON, JavaScript, UNIX Shell, HTML and CSS

Operating System: Windows 2008/2012/2013 , XP/Vista/7, Linux

Database: MS Access, MySQL, MS SQL SERVER, Oracle, Cassandra

Business Intelligence Tools: SSRS, Tableau 10, SSIS, SSMS, Query Analyzer, BIDS

Cloud Services: Amazon Web Services(AWS), Amazon EC2, Amazon S3, Snowflake, Amazon ELK, Salesforce, Big Data Technologies

IDE s: Jupyter Notebook, PyCharm, Sublime Text

Documenting and Modeling Tools: UML 2.0, MS Project, MS Office, MS Visio, MS SharePoint

Area of Expertise: SDLC, Business Analysis, Database Testing

Tools: SQL workbench, GitHub, Rally, JIRA, HP ALM, Confluence,Airflow, PagerDuty.

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, Plano, TX

Responsibilities:

  • Migrate data (Amazon Connect Call Metrics) from legacy systems (Cassandra) to cloud based (AWS) Snowflake Data Warehouse
  • Prioritize the development of new tables, schemas, and data structures
  • Proactively identify new opportunities to support the business with data, aggregate various data sets to inform business decisions
  • Migrate data from legacy systems (Salesforce) to cloud based (AWS) Snowflake Data Warehouse and develop metrics to provide data insights as per business requirement
  • Develop ETL pipelines in and out of data warehouse using combination of Python and Snowflake’s SnowSQL
  • Write sql queries to perform data quality checks/validation in AWS snowflake as compared to legacy data
  • Collaborate with the Business Intelligence Analyst to support data modeling efforts
  • Develop presentation layer and Create Tableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts using show me functionality
  • Design and Optimize Data Connections, Data Extracts, Schedules for background Tasks and Incremental Refresh for the weekly and monthly dashboard reports on Tableau Server
  • Data validation of the results in Tableau by validating the numbers against the data in the database tables by querying on the database.
  • Complied interactive dashboards in Tableau Desktop and published them to Tableau server which allowed Story telling with Quick filters for on demand needed information and data insights with just one click of a button.
  • Schedule AWS snowflake reports through Python airflow, data visualization in AWS S3 cloud box. Automatically upload S3 cloud box files to databases in AWS snowflake through Python airflow.
  • Create and maintain source codes in GitHub, kept track of source codes, explored and shared the changes of coding scripts, notes in GitHub.
  • Develop a layer of applications modules over the Python - Pandas library, delivered various data frame visualization tools, Data wrangling and cleaning using Python Pandas
  • Develop DAGs and Setup production environment for the Apache Airflow for scheduling and automation system that managed ETL and reporting
  • Utilize Snowflake SQL and ipython Jupyter notebooks to extract, transform, clean and load data in the target tables to enable effective reporting and business intelligence functions.
  • Effectively used numpy, pandas, SQLalchemy and sci-kit packages to support the migration in Python
  • Troubleshooting, problem solving and performance tuning of queries accessing Snowflake data warehouse
  • Supported real-time data handling by making use of Amazon S3 buckets for storing and accessing process results

Data Engineer

Confidential, Plano, TX

Responsibilities:

  • Design robust, reusable and scalable data driven solutions and data pipeline frameworks to automate the ingestion, processing and delivery of both structured and unstructured batch and real-time streaming data using Python Programming.
  • Design and develop ETL Data Pipelines to migrate Abinitio jobs to Python using Pyspark.
  • Develop Lambda functions in python using AWS Lambda, Kinesis and Elastic Search which invokes python scripts to perform various transformations and analytics on large data sets in AMAZON EMR.
  • Develop Spark applications using Spark tools like RDD transformations and Spark SQL, Hadoop HDFS, Apache Hadoop YARN.
  • Spin up an EMR cluster in AWS to run PySpark applications and SSH to master instance to perform spark-submit options to run application in Client and Cluster mode.
  • Iterate rapidly and work collaboratively with product owners, developers, and other members of the development team.
  • Defect tracking and resolving bug-fixes, hot-fixes by merging the feature branch code to the master branch in GIT source control management.
  • Use Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Work with different types of data sources such as application data, process data and formats like CSV, JSON, Parquet etc.
  • Querying multiple databases like PostgreSQL and Teradata for data-processing.
  • Implement and modify various SQL queries and Functions, Cursors and Triggers as per the client requirements for Data Analysis and Extraction
  • Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users
  • Perform the back-end testing by executing SQL queries for validating output data in the database tables
  • Automate configuration of monitoring tools to enhance our awareness of the availability and utilization of AWS resources.
  • Maintain AWS Data Pipeline as web service to process and move data between Amazon S3, Amazon EMR and Amazon RDS at specified intervals.
  • Using Python class approach for wrapping the script and deploy into production environment.
  • Troubleshoot and resolve data processing issues and proactively engaged in data modelling discussions.
  • Write programs in Spark using Python, Pyspark packages for performance tuning, optimization and data quality validations.
  • Perform Erato scan to identify vulnerabilities and ingestions in application
  • Perform PyLint scan to maintain the code quality.
  • Implement ELK (Elastic Search, Logstash, and Kibana) Stack on AWS to aggregate logs from all systems and applications, analyze these logs, and create visualizations for application and infrastructure monitoring, faster troubleshooting, security analytics, and more
  • Configure alerting rules and setup PagerDuty alerting for Elasticsearch, Logstash and different microservices in Kibana
  • Ongoing system monitoring and management activities including participation in PagerDuty rotation.
  • Building tools and processes around CICD pipelines involving integrations with Jenkins, testing frameworks, GitHub, etc. End to end understanding and work experience on CICD pipeline.
  • Building Dev (develop), Quality Assurance Testing, User Acceptance Testing, Production environments to test run application in multiple stages to increase the robustness of the application.
  • Work closely with Teams across the BU to create comprehensive test tools and automation frameworks.

Data Engineer

Confidential, Plano, TX

Responsibilities:

  • Migrated data from legacy systems (SQL Server, Teradata) to cloud based (AWS) Snowflake Data Warehouse
  • Refactored advanced SQL queries from the Teradata and SQL server database environments to exclusive Snowflake database environment
  • Leveraged large volumes of data by creating ETL jobs using Python to extract data from various sources for answering business critical requirements
  • Wrote sql queries to perform data quality checks/validation in AWS snowflake as compared to legacy data. In AWS snowflake data warehouse, migrated SAS reports to Python reports.
  • Scheduled AWS snowflake reports through Python airflow, data visualization in AWS S3 cloud box. Automatically uploaded S3 cloud box files to databases in AWS snowflake through Python airflow.
  • Created and maintained source codes in GitHub, kept track of source codes, explored and shared the changes of coding scripts, notes in GitHub.
  • Developed a layer of applications modules over the Python - Pandas library, delivered various data frame visualization tools, Data wrangling and cleaning using Python Pandas
  • Developed DAGs and Setup production environment for the Apache Airflow for scheduling and automation system that managed ETL and reporting
  • Utilized Snowflake SQL and ipython Jupyter notebooks to extract, transform, clean and load data in the target tables to enable effective reporting and business intelligence functions.
  • Effectively used numpy, pandas, SQLalchemy and sci-kit packages to support the migration in Python
  • Troubleshooting, problem solving and performance tuning of queries accessing Snowflake data warehouse
  • Supported real-time data handling by making use of Amazon S3 buckets for storing and accessing process results
  • Maintained SSN and NPI data tokenization and retokenization
  • Simplified the analytical reporting process by flattening/deformalizing the central data layer making it more user friendly
  • Involved in the creation of various metrics that are used to evaluate the performance of the Confidential 's auto financing sales team
  • Complied interactive dashboards in Tableau Desktop and published them to Tableau server which allowed Story telling with Quick filters for on demand needed information and data insights with just one click of a button.
  • Develop presentation layer and Create Tableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts using show me functionality

Business Intelligence Engineer

Confidential, Phoenix, AZ

Responsibilities:

  • Involved in the Requirement Analysis, Design phase and Development phase of SDLC model system.
  • Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python.
  • Developed ETL scripts in Python to get data from one database table and insert, update the resultant data to another database table.
  • Wrote Python and batch scripts to automate the ETL scripts runs every hour.
  • Involved in reviewing business requirements and analyzing data sources form Excel, SQL Server for design, development, testing, and production rollover of reporting and analysis projects
  • Designed ETL packages dealing with different data sources (SQL Server, Flat Files, and XMLs etc) and loaded the data into target data sources by performing different kinds of transformations using SQL Server Integration Services (SSIS).
  • Created views in Tableau Desktop that were published to internal team for review and further data analysis and customization using filters and actions.
  • Created very complex Dashboards using Tableau using the drill down, drill across, actions for business steering committee
  • Complied interactive dashboards in Tableau Desktop and published them to Tableau server which allowed Story telling with Quick filters for on demand needed information and data insights with just one click of a button.
  • Involved and hands on development in every phase of the project right from Requirements Analysis, Scoping, Designing, Developing (SSIS, SSRS) Debugging, Testing and Documentation, deployment and UAT.
  • Involved in writing complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL, DML and Multi-Dimensional Expression (MDX) for OLAP databases.
  • Involved in deploying, configuring and managing reports using Report Manager and Report Builder.
  • Created different kind of Report templates, bar graphs and pie charts based on the fiscal year Data for analysis and created reports using SQL Server Reporting Services (SSRS).
  • Performed unit testing and code review to validate accuracy and validity of reports and data integrity and to ensure that all the reporting objects being created are standards-based
  • Increased query performance, necessary for statistical reporting by more than 25% after performance monitoring, tuning, and optimizing indexes using Performance Monitor, Profiler, and Index Tuning Wizard

Business Intelligence Engineer

Confidential, Houston, TX

Responsibilities:

  • Collaborated with business analysts and the project manager to gather business requirements
  • Conducted Performance tuning/MS SQL Server development - SQL Profiler, SQL scripts, Stored Procedures, Triggers, User-Defined Functions and transactions analysis, thorough understanding of indices and statistics (Query Optimizer)
  • Designed, developed, tested, and maintained Tableaureports and dashboards based on user requirements
  • Created dashboard utilizing geographic maps, pie charts, bar charts, heat maps, tree maps, dual axis chart, histograms, bullets, scatter plots, line charts in Tableau desktop
  • Described data utilizing trend line, line and statistical techniques in Tableau desktop
  • Schedule the reports and setup email reports usingTableauServer
  • Created ETL packages with different data sources (SQL Server, Flat Files, Excel source files, XML files etc.) and then loaded the data into destination tables by performing different kinds of transformations using SSIS/DTS packages
  • Implementing incremental loading from Source system into Data Warehouse & data mart using the techniques including T-SQL/Stored Procedures (Merge Statement Functions /Checksum Functions), Slowly Changing Dimensions (SCDs) type 1/2/3, and Change Data Capture (CDC)
  • Developed complex reports in both SSRS and Crystal Reports based on the requirements provided and going through multiple discussions with the team
  • Created various types of reports like table, matrix, charts using SSRS
  • Wrote Python and batch scripts to automate the ETL scripts runs every hour.
  • Developed ETL scripts in Python to get data from one database table and insert, update the resultant data to another database table.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Created Python scripts to automate parsing of PDF documents.
  • Developed SQL queries to pull complex financial data from different tables using joins as per strict business rules by users

Jr. Software Engineer

Confidential

Responsibilities:

  • Monitored performance and optimized SQL queries for maximum efficiency and worked on client requirements
  • Collaborate with the business to gather and define the business requirements, business definitions, process flows, dashboard design, and report design to create BI solutions
  • Perform data analysis and data profiling against source systems and data warehouse
  • Developed Stored Procedures to generate various Drill-through reports, parameterized reports, sub-reports and linked reports using SSRS and integrated into the frontend
  • Proven experience in installation and maintenance of Tableau Server and administering users, user groups, projects and scheduling instances for dashboards in Tableau.
  • Performed Tableau Server admin duties like installation, configuration, security, migration, upgrades, maintenance and monitoring
  • Creating dashboards using Tableau to leverage interactive, reliable reporting and visually stunning, accurate dashboards.
  • Tested, Cleaned, and Standardized Data to meet the business standards

We'd love your feedback!