Data Engineer/ Python Developer Resume
Green Bay, WI
TECHNICAL SKILLS
Technologies: AWS, EMR, Spark, Apache Airflow, Docker
Languages: Python, R
Web Technologies: HTML, CSS, XML, JavaScript
Databases: Postgres, Redshift, Athena, Dynamo DB, Oracle, MS Access, SQL Server
Platforms: Windows and Linux
Miscellaneous Tools: GitLab, Tableau, Power BI, MS Visio, one note, JIRA, Version One.
Domains: Agriculture, Financial, Insurance, Healthcare, Telecom
PROFESSIONAL EXPERIENCE
Confidential, Green Bay, WI
Data Engineer/ Python Developer
Responsibilities:
- Requirement gathering for the ongoing projects by working closely with Data Science and Business teams.
- Insomnia rest client tool for making HTTP requests (JSON input) and API debugging.
- Shell scripting to run the Flask application on Linux servers and open shift containers.
- Data modeling in MS Visio of all the data sources ingested into a web application.
- Machine learning Algorithms & Neural Networks for giving out an offer for the customers.
- Exception handling in python to add logs to the application.
- Git filter branching on a repository in order to cleanup and organize the git network.
- Python, spark and EMR clusters for data integrations.
- Jira for project management and GitHub for code reposition.
- Power BI and Tableau for Data Visualization and Analytics.
Technical Environment: Oracle, GitHub, Python, R, Linux, AWS, OpenShift
Confidential, Durham, NC
Data Engineer
Responsibilities:
- Requirement gathering for the ongoing projects by working closely with subject matter experts and product selection leads across EAME & North America.
- Sqoop jobs for Migrating data from sources such as Oracle, SQL Server to Amazon S3 bucket.
- Web API creation to ingest data from the S3 bucket (AWS) to the web application.
- Use of AWS Appsync (GraphQL) for web API creation and Data synchronization into Aurora Postgres or Dynamo DB engine.
- AWS Glue for cataloging S3 bucket (Data Lake) and loading it in Athena.
- AWS Lambda to trigger an SQS queue to migrate data back and forth from S3 bucket and AWS SNS as pub - sub topic creation.
- Python and spark programs on EMR clusters for data integration.
- DAG jobs in Apache Airflow for scheduling.
- Solution design document creation and Architecture for an Enterprise project.
- Data engineering tasks that involved applying statistical analysis on a high-volume input file (30 GB) and run it as a parallel process in a high-performance cluster.
- Gurobi Optimization on python for Predictive analysis.
- Data modeling in MS Visio of all the data sources ingested into a web application.
- Jira for project management and GitHub for code reposition.
- Power BI and Tableau for Data Visualization and Analytics.
Technical Environment: AWS (EC2, S3, Lambda, SQS & SNS, Glue, Athena, AWS Amplify, Workspace, Aurora DB etc.), Boto 3, R, SQL Server, EMR, Hadoop, Oracle, Spark, Python, Json, XML&CSV files, GitHub, Jira.
Confidential, Richmond, VA
Data Engineer
Responsibilities:
- Leverage Python development environment for data analysis and report building.
- Build General Ledger reports using 4sight reporting tool built on Apache Tomcat Application.
- Utilize Amazon Web Services to efficiently move the on-premise data to the cloud.
- Replace VBA macros with python using PyXll, a python add-in for Microsoft Excel.
- Work with SQL Server and Oracle database engines to write Store procedures, Triggers and to query data.
Technical Environment: AWS (EC2, S3, Redshift & CFT), R, SQL Server, Oracle, Spark, Python, Oracle Java, XML&CSV files.
Confidential, Richmond, VA
Data Analyst/Python Developer
Responsibilities:
- Automated one of the Enterprise Operational Risk Management Reports being a part of Risk Analytical Solutions team.
- Used openpyxl module in python to format excel files.
- Used python win32com.client library to write macros as a replacement for visual basics in excel.
- Wrote scripts in python that would pull data from Red shift database, manipulate the data as per the requirement by writing necessary conditional functions and store it in data frames.
- Loaded the data from the pandas data frames to the team's user defined space in Redshift database by using copy command from AWS S3 bucket.
- Created Action filters for the Tableau dashboard to be interactive.
- Followed the necessary guidelines in tableau for better performance such as, extracting data in Tableau from the database as a view rather than a custom SQL query, aggregating the data to test the functionality before loading the complete data.
- Scheduled the reports for quarterly refresh in SAMGW server.
- Played key role in Data movement to the cloud (Oracle OBIEE to AWS Redshift) by being a part of HR Data management team.
- Parsed XML files, JSON documents to load the data from them into database by scripting in python.
- Extensively used python modules like numpy, pandas, xmltodict, pycompare, datetime and SQL alchemy to perform data analysis.
- Managed storage in AWS using S3, created volumes and configured snapshots.
- Created EC2 instances to run automated python scripts.
- Automated EC2 instances using AWS cloud formation templates.
- Wrote python scripts to validate and test source to target mapping (STTM) migration from Oracle to Redshift.
- Implemented ETL logic in python which was originally written in Scala.
- Used Hydrograph as an ETL tool for loading the data.
Technical Environment: AWS (EC2, S3, Redshift & CFT), Python, Oracle OBIEE, Hydrograph, SAMGW server, XML&CSV files, Scala.