Big Data Engineer/python Backend Developer Resume
SUMMARY
- 8 years of professional data experience including Hadoop ecosystem wif an emphasis in big data solutions.
- Adept in Data Extraction, Data Cleansing, Data Manipulation, and Exploratory Data Analysis.
- Recognized expert ETL processes to move data between databases.
- Experience in web - based application development using flask, Spring Boot, Web Services.
- Strong interpersonal, analytical, problem solving, decision making and conflict resolution skills.
- Working noledge on Amazon Web Services (AWS) including Dynamo DB, S3, Data Streams, Secrets Manager.
- Proficient in Agile Scrum and Kanban methodologies.
- Self-starter, highly motivated, enthusiastic, professional team player.
- Detail oriented professional wif excellent communication and Interpersonal skills.
TECHNICAL SKILLS
Programming Languages: Java, PL/SQL, Bash/Korn shell
Scripting Languages: Python, CSS, Java Script, HTML, Typescript
Databases: Hadoop, Oracle 12c/11g, MongoDB, SQL Server, MySQL, Presto
Big Data Tools: Sqoop, Hive, Pentaho, Cosort, Fact, Kafka, PySpark, Minio
Data Science: Pandas, NumPy, Jupyter Notebook
IDE platforms: IntelliJ, PyCharm, Visual Studio Code, Microsoft Visual Studio
DevOps: Docker, Kubernetes, GoCD
Other Tools: GIT, Team Foundation Server 2012 (TFS), Rally, Jira
PROFESSIONAL EXPERIENCE
Big Data Engineer/Python backend developer
Confidential
Responsibilities:
- Currently working to implement parallel processing of Spark sql queries to cut down runtime.
- Actively working teh Consumer Privacy Program project using AWS Dynamo DB, S3, PySpark, Minio to report on customer privacy data.
- Extensively used Python Pandas library to read and manipulate data from json and csv files.
- Implemented multi-threading on extract queries using hive jdbc to cut down SLA time.
- Utilized slack’s Incoming Webhook to send alerts to slack channels. Later migrated to Microsoft Teams channel.
- Containerized CPP spark project using Docker image and deployed in On-perm Kubernetes cluster.
- Developed REST API’s using Python Flask framework. Implemented all CRUD operations and used MySql to store data.
- Created GoCD pipelines for backend services using yaml configuration.
- Use Confidential corp Vault to store and retrieve Database, Client and AWS secrets.
- Implemented standalone consumer app using AWS Kinesis Data Streams to load data to MongoDB and deployed to Kubernetes Cluster.
- Implemented UI pages using templates provided by teh Portal team. Used Angular 6 framework, typescript.
Big Data ETL Developer
Confidential
Responsibilities:
- Developed RESTful services using Java and Spring Boot to load data from Kafka streams to Oracle.
- Experience wif continuous integration/continuous deployment using GoCD and best practices in DevOps.
- Streamlined Pentaho ETL processes to gain efficiency and faster runtimes.
- Refactored ETL code to smaller components to support manageability and faster job completion time.
- Responsible for teh migration of ticketing data across large Oracle databases.
- Collaborated wif business stakeholders to refine requirements and testing strategies.
- Designed and developed ETL processes using Pentaho, Cosort, Fact, and Bash/Korn shell scripts.
- Refactored large Oracle load processes to use Partition Exchange to minimize database impact.
- Engaged in DevOps in supporting all Hadoop and Pentaho ETL jobs.
- Analyzed data and created proper Partitioned/Bucketed Hive tables for optimal data processing.
- Completed POC to determine teh viability of using Spark and Scala for ETL processing.
Hadoop Developer
Confidential
Responsibilities:
- Implemented Data pipeline to consume and load data into Spark from Hive, HDFS.
- Processed different data files like JSON, CSV, and flat files into Spark code using Python, Spark-SQL and stored in parquet format.
- Ingested and transformed teh data using spark SQL.
- Performed optimizations by using distributed cache for small datasets, Partition, Bucketing in Hive and Map side joins.
- Used UC4 workflow engine to automate multiple Hive jobs.
Hadoop Developer
Confidential
Responsibilities:
- Analyzed large data sets by running Hive queries.
- Developed simple to complex MapReduce jobs using Hive.
- Developed MapReduce jobs for transforming teh data and loading teh same in different files.
- Involved in running Hadoop jobs for processing millions of records.
- Involved in loading data from LINUX file system to HDFS.
- Responsible for managing data from multiple sources.