We provide IT Staff Augmentation Services!

Big Data Engineer/python Backend Developer Resume

5.00/5 (Submit Your Rating)

SUMMARY

  • 8 years of professional data experience including Hadoop ecosystem wif an emphasis in big data solutions.
  • Adept in Data Extraction, Data Cleansing, Data Manipulation, and Exploratory Data Analysis.
  • Recognized expert ETL processes to move data between databases.
  • Experience in web - based application development using flask, Spring Boot, Web Services.
  • Strong interpersonal, analytical, problem solving, decision making and conflict resolution skills.
  • Working noledge on Amazon Web Services (AWS) including Dynamo DB, S3, Data Streams, Secrets Manager.
  • Proficient in Agile Scrum and Kanban methodologies.
  • Self-starter, highly motivated, enthusiastic, professional team player.
  • Detail oriented professional wif excellent communication and Interpersonal skills.

TECHNICAL SKILLS

Programming Languages: Java, PL/SQL, Bash/Korn shell

Scripting Languages: Python, CSS, Java Script, HTML, Typescript

Databases: Hadoop, Oracle 12c/11g, MongoDB, SQL Server, MySQL, Presto

Big Data Tools: Sqoop, Hive, Pentaho, Cosort, Fact, Kafka, PySpark, Minio

Data Science: Pandas, NumPy, Jupyter Notebook

IDE platforms: IntelliJ, PyCharm, Visual Studio Code, Microsoft Visual Studio

DevOps: Docker, Kubernetes, GoCD

Other Tools: GIT, Team Foundation Server 2012 (TFS), Rally, Jira

PROFESSIONAL EXPERIENCE

Big Data Engineer/Python backend developer

Confidential

Responsibilities:

  • Currently working to implement parallel processing of Spark sql queries to cut down runtime.
  • Actively working teh Consumer Privacy Program project using AWS Dynamo DB, S3, PySpark, Minio to report on customer privacy data.
  • Extensively used Python Pandas library to read and manipulate data from json and csv files.
  • Implemented multi-threading on extract queries using hive jdbc to cut down SLA time.
  • Utilized slack’s Incoming Webhook to send alerts to slack channels. Later migrated to Microsoft Teams channel.
  • Containerized CPP spark project using Docker image and deployed in On-perm Kubernetes cluster.
  • Developed REST API’s using Python Flask framework. Implemented all CRUD operations and used MySql to store data.
  • Created GoCD pipelines for backend services using yaml configuration.
  • Use Confidential corp Vault to store and retrieve Database, Client and AWS secrets.
  • Implemented standalone consumer app using AWS Kinesis Data Streams to load data to MongoDB and deployed to Kubernetes Cluster.
  • Implemented UI pages using templates provided by teh Portal team. Used Angular 6 framework, typescript.

Big Data ETL Developer

Confidential

Responsibilities:

  • Developed RESTful services using Java and Spring Boot to load data from Kafka streams to Oracle.
  • Experience wif continuous integration/continuous deployment using GoCD and best practices in DevOps.
  • Streamlined Pentaho ETL processes to gain efficiency and faster runtimes.
  • Refactored ETL code to smaller components to support manageability and faster job completion time.
  • Responsible for teh migration of ticketing data across large Oracle databases.
  • Collaborated wif business stakeholders to refine requirements and testing strategies.
  • Designed and developed ETL processes using Pentaho, Cosort, Fact, and Bash/Korn shell scripts.
  • Refactored large Oracle load processes to use Partition Exchange to minimize database impact.
  • Engaged in DevOps in supporting all Hadoop and Pentaho ETL jobs.
  • Analyzed data and created proper Partitioned/Bucketed Hive tables for optimal data processing.
  • Completed POC to determine teh viability of using Spark and Scala for ETL processing.

Hadoop Developer

Confidential

Responsibilities:

  • Implemented Data pipeline to consume and load data into Spark from Hive, HDFS.
  • Processed different data files like JSON, CSV, and flat files into Spark code using Python, Spark-SQL and stored in parquet format.
  • Ingested and transformed teh data using spark SQL.
  • Performed optimizations by using distributed cache for small datasets, Partition, Bucketing in Hive and Map side joins.
  • Used UC4 workflow engine to automate multiple Hive jobs.

Hadoop Developer

Confidential

Responsibilities:

  • Analyzed large data sets by running Hive queries.
  • Developed simple to complex MapReduce jobs using Hive.
  • Developed MapReduce jobs for transforming teh data and loading teh same in different files.
  • Involved in running Hadoop jobs for processing millions of records.
  • Involved in loading data from LINUX file system to HDFS.
  • Responsible for managing data from multiple sources.

We'd love your feedback!