SR. BIG DATA ENGINEER Resume San Francisco, CA - Hire IT People

SUMMARY

Big Data Engineer with 9+ years of experience in Big Data technologies and 11+ years of experience in IT.
Expertise in Hadoop, Spark, Big Datatools, various clouds (AWS, Azure, GCP) and data warehousing using on - premises and cloud services, automation tools, and ETL design process.
Experience working with both SQL and NoSQL databases.
Add value to Agile/Scrum processes such as Sprint Planning, Backlog, Sprint Retrospective, and Requirements Gathering and provide planning and documentation for projects.
Create Spark Core ETL processes to automate using a workflow scheduler.
Successfully worked on AWS services like EMR, EC2, Lambda, S3, Glue, Redshift, Kinesis etc.
Well versed with Azure environment. E.g., ADL Gen2, Blob Storage, ADF (Data Factory), Azure Databricks, Azure SQL, Azure Synapse for analytics.
Proficient with HDFS, Spark, Hive, Sqoop, HBase, Flume, Oozie, and Zookeeper.
Experience in ecosystems like Hive, Sqoop, MapReduce, Flume, and Oozie.
Experience handling XML files as well as Avro and Parquet SerDes
Performance tuning at source, Target, and Data Stage job levels using Indexes, Hints, and Partitioning in DB2, ORACLE
Hand on experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions for various business problems, and generating data visualizations using R, Python, and Tableau
Designed Spark Core ETL processes to automate using a workflow scheduler.
Use Apache Hadoop to work with Big Data and analyze large data sets efficiently.
Hands-on experience developing PL/SQL Procedures and Functions and SQL tuning of large databases by creating tables, views, indexes, stored procedures, and functions.
Skilled at bucketing, partitioning, multi-threading computing and streaming (Python, PySpark).
Efficiently used Apache Hadoop to work with Big Data and analyze large data sets.
Experience in handling XML files as well as Avro and Parquet SerDes
Performance tuning at source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE
Developed and deployed complex ETL workflows using Apache Airflow and Jenkins to automate data processing and improve efficiency.
Excellent written, communication, oral as well as interpersonal and presentation skills. Ability to perform at a high level, meet deadlines, adaptable to ever-changing priorities, and project management skills.

TECHNICAL SKILLS

IDE: Workbench, Jupiter Notebooks, Eclipse, IntelliJ, PyCharm

PROJECT METHODS: Agile, Kanban, Scrum, DevOps, Continuous Integration, Test-Driven Development

HADOOP DISTRIBUTIONS: Hadoop, Cloudera Hadoop, Hortonworks Hadoop

CLOUD PLATFORMS: Amazon AWS - Microsoft Azure

CLOUD SERVICES: Databricks, Snowflake

DATABASES AND DATA WAREHOUSES: SQL, Snowflake, MongoDB, Redshift, DynamoDB, Cassandra, Hbase

PROGRAMMING LANGUAGES: Spark, Spark Streaming, Java, Python, Scala, PySpark, Django, Flask. Netcore

SCRIPTING: Shell Scripting, Python

CONTINUOUS INTEGRATION (CI-CD): Jenkins, Github, bitbucket, Jira

FILE FORMAT AND COMPRESSION: CSV, JSON, Avro, Parquet, ORC

FILE SYSTEMS: HDFS, Data Lake, S3

ETL TOOLS: Apache Nifi, Flume, Kafka, Talend, Sqoop, Oozie

DATA VIZIUALIZATION TOOLS: Tableau, Kibana, Python

SECURITY: Kerberos, Ranger

AWS: AWS Lambda, AWS S3, AWS RDS, AWS EMR, AWS Redshift, AWS Kinesis, AWS ELK, AWS Cloud Formation, AWS IAM

PROFESSIONAL EXPERIENCE

SR. BIG DATA ENGINEER

Confidential - San Francisco, CA

Responsibilities:

Designs and Implements monitoring systems for various components that can raise failure and warning alerts using serverless, python, and AWS Lambda among other tools.
Liaise with product users to understand needs and
Designs and implements ETL based solutions from data ingestion stage to presentation of useful information to users.
Creates and maintains objects used to manage data.
Creates front interface and applications that present information to users.
Writes SQL and spark queries for data transformations.
Works with other developers in pipeline hardening to strengthen and improve existing resources.
Resolves issues raised by users that use various resources in platform.
Maintains and supports general ETL process and data pipelines.
Communicates with team on steps and resources needed for development.
Helps to maintain data quality within the pipelines and automates the process of data integrity.
Assists in knowledge transfer and training of new team at site.
Code debugging and improvement of codebase.
Participates in PR review.
Training of new team members
Writing documentation for various features and processes used in the project.
Leads and participates in white boarding sessions for various ideas that could bring in new features or improve the existing ones.

CLOUD DATA ENGINEER

Confidential - San Jose, CA

Responsibilities:

Populated a Data Lake data using AWS S3 from various data sources using AWS Kinesis.
Used an Amazon EMR for processing Big Data implementing tools like Hadoop, Spark, and Hive.
Authored AWS Lambda functions to run Python scripts in response to events in S3.
Created AWS Cloud Formation templates to create infrastructure in the cloud.
Created and implemented AWS IAM user roles and policies to authenticate and control access.
Implemented optimizations in Spark nodes and improved the performance of the Spark Cluster.
Processed multiple terabytes of data stored in S3 using AWS Redshift and AWS Athena.
Designed, built, and maintained a database to analyze the life cycle of checking transactions.
Developed ETL jobs in AWS Glue to extract data from S3 buckets and loaded it into the data mart in Amazon Redshift.
Implemented and maintained EMR, Redshift pipeline for data warehousing.
Used Spark, Spark SQL, and Spark Streaming for data analysis and processing.
Worked under an agile methodology and contributed to the creation of user stories.
Dove the conversation with non-technical stakeholders and worked closely with an offshore team.

BIG DATA ENGINEER

Confidential - Los Angeles, GA

Responsibilities:

Implemented Spark jobs and used Spark SQL for optimized analysis and data processing.
Created Spark Streaming jobs to process real-time data via Kafka and store it to HDFS.
Loaded data from multiple AWS services to AWS S3 buckets and configured bucket permissions using IAM roles.
Used Apache Kafka to ingest and process data in real-time.
Implemented Partitioning, Dynamic Partitions, and Bucketing in HIVE for performance optimization.
Maintained and Hadoop cluster and performed log file analysis for error handling, access statistics for fine-tuning.
Optimized data storage in Kafka Brokers within the Kafka cluster by partitioning Kafka Topics.
Authored distributed, well-documented and testable code in python and Pyspark.
Created complex SQL queries for data aggregation and analysis.
Designed fine-tuning techniques for spark jobs running in AWS EMR clusters.
Implemented EMR auto-scaling policies to improve the performance of the queries in the cluster.
Created Hive External Tables from parquet files stored in the data lake in S3.
Attended the Scrum calls daily and contributed to the User Stories creation.

BIG DATA ENGINEER

Confidential - Hartford, CT

Responsibilities:

Ingested data from multiple sources into the HDFS data lake.
Created and analyzed python code for unit testing and data validation.
Pre-processed data to make it available and reliable for the end-users.
Built Spark code in Python to import data from parquet files and various other Database Engines.
Creating Spark Jobs and HiveQL queries to pull data from the database and manipulate the data.
Developed Python Scripts for data ingestion code using Python and perform ETL and processing phases using the Apache Hive, Spark using Pyspark, and SQL Spark scripting.
Performed different data processing techniques like joins, aggregates, and map-reduce using Spark in Python.
Authored Spark scripts for Data processing and the creation of automated reports.
Configured a full Kafka cluster.
Created and managed Topic creation inside Kafka
Installed and configured replication factor on topic partitions
Communicated and managed consumer groups over Kafka
Ingested information from spark over HBase

HADOOP DEVELOPER

Confidential - Chicago, IL

Responsibilities:

Cleansed and preprocessed data implementing map-reduce jobs in a multi-node Hadoop cluster.
Performed aggregation functions with SQL
Migrated Spark applications from Map Reduce to improve performance
Created a benchmark between Cassandra and Hbase for fast ingestion
Processed Terabytes of information in real-time using spark streaming
Loaded data from legacy warehouses onto HDFS using Sqoop.
Built data pipeline using MapReduce scripts and Hadoop commands to store onto HDFS.
Used Oozie to orchestrate the MapReduce jobs that extract the data on time.
Created ELT jobs using Hive, Spark, Pig programming to store onto HDFS.
Build Graphs and Plots using python libraries like pyplot to visualize data.
Performed Analytics and Recommendations for the Business using Hive & python scripts for ETL processing.
Developed and Deployed Spark submit command with suitable Executors, Cores, and Drivers on the Cluster.
Applied transformations such as filters and aggregations to the data using Spark in java.

DATABASE ADMINISTRATOR

Confidential - Redmond, WA

Responsibilities:

Creating and maintaining databases in SQL Server 2010.
Design and establish SQL applications.
Create tables and views in the SQL database.
Supported schema changes and maintained the database to perform in optimal conditions.
Creating and managing tables, views, user permissions, and access control.
Sent requests to source REST Based API from a Scala script via Kafka producer
Utilized a cluster of multiple Kafka brokers to handle replication needs and allow for fault tolerance
Writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language
Built the Hive views on top of the source data tables, and built a secured provisioning
Created and managed dynamic web parts.
Customization of library attributes, import, and export of existing data, and connections of data.
Provided a workflow and initiated the workflow processes.
Worked on SharePoint Designer and InfoPath Designer and developed workflows and forms.

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

San Francisco, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship