Big Data Engineer/Python backend developer Resume

SUMMARY

8 years of professional data experience including Hadoop ecosystem wif an emphasis in big data solutions.
Adept in Data Extraction, Data Cleansing, Data Manipulation, and Exploratory Data Analysis.
Recognized expert ETL processes to move data between databases.
Experience in web - based application development using flask, Spring Boot, Web Services.
Strong interpersonal, analytical, problem solving, decision making and conflict resolution skills.
Working noledge on Amazon Web Services (AWS) including Dynamo DB, S3, Data Streams, Secrets Manager.
Proficient in Agile Scrum and Kanban methodologies.
Self-starter, highly motivated, enthusiastic, professional team player.
Detail oriented professional wif excellent communication and Interpersonal skills.

TECHNICAL SKILLS

Programming Languages: Java, PL/SQL, Bash/Korn shell

Scripting Languages: Python, CSS, Java Script, HTML, Typescript

Databases: Hadoop, Oracle 12c/11g, MongoDB, SQL Server, MySQL, Presto

Big Data Tools: Sqoop, Hive, Pentaho, Cosort, Fact, Kafka, PySpark, Minio

Data Science: Pandas, NumPy, Jupyter Notebook

IDE platforms: IntelliJ, PyCharm, Visual Studio Code, Microsoft Visual Studio

DevOps: Docker, Kubernetes, GoCD

Other Tools: GIT, Team Foundation Server 2012 (TFS), Rally, Jira

PROFESSIONAL EXPERIENCE

Big Data Engineer/Python backend developer

Confidential

Responsibilities:

Currently working to implement parallel processing of Spark sql queries to cut down runtime.
Actively working teh Consumer Privacy Program project using AWS Dynamo DB, S3, PySpark, Minio to report on customer privacy data.
Extensively used Python Pandas library to read and manipulate data from json and csv files.
Implemented multi-threading on extract queries using hive jdbc to cut down SLA time.
Utilized slack’s Incoming Webhook to send alerts to slack channels. Later migrated to Microsoft Teams channel.
Containerized CPP spark project using Docker image and deployed in On-perm Kubernetes cluster.
Developed REST API’s using Python Flask framework. Implemented all CRUD operations and used MySql to store data.
Created GoCD pipelines for backend services using yaml configuration.
Use Confidential corp Vault to store and retrieve Database, Client and AWS secrets.
Implemented standalone consumer app using AWS Kinesis Data Streams to load data to MongoDB and deployed to Kubernetes Cluster.
Implemented UI pages using templates provided by teh Portal team. Used Angular 6 framework, typescript.

Big Data ETL Developer

Confidential

Responsibilities:

Developed RESTful services using Java and Spring Boot to load data from Kafka streams to Oracle.
Experience wif continuous integration/continuous deployment using GoCD and best practices in DevOps.
Streamlined Pentaho ETL processes to gain efficiency and faster runtimes.
Refactored ETL code to smaller components to support manageability and faster job completion time.
Responsible for teh migration of ticketing data across large Oracle databases.
Collaborated wif business stakeholders to refine requirements and testing strategies.
Designed and developed ETL processes using Pentaho, Cosort, Fact, and Bash/Korn shell scripts.
Refactored large Oracle load processes to use Partition Exchange to minimize database impact.
Engaged in DevOps in supporting all Hadoop and Pentaho ETL jobs.
Analyzed data and created proper Partitioned/Bucketed Hive tables for optimal data processing.
Completed POC to determine teh viability of using Spark and Scala for ETL processing.

Hadoop Developer

Confidential

Responsibilities:

Implemented Data pipeline to consume and load data into Spark from Hive, HDFS.
Processed different data files like JSON, CSV, and flat files into Spark code using Python, Spark-SQL and stored in parquet format.
Ingested and transformed teh data using spark SQL.
Performed optimizations by using distributed cache for small datasets, Partition, Bucketing in Hive and Map side joins.
Used UC4 workflow engine to automate multiple Hive jobs.

Hadoop Developer

Confidential

Responsibilities:

Analyzed large data sets by running Hive queries.
Developed simple to complex MapReduce jobs using Hive.
Developed MapReduce jobs for transforming teh data and loading teh same in different files.
Involved in running Hadoop jobs for processing millions of records.
Involved in loading data from LINUX file system to HDFS.
Responsible for managing data from multiple sources.