Cloud Data Engineer Resume CA - Hire IT People

SUMMARY:

Experienced, result - oriented, resourceful and problem-solving Data engineer with leadership skills. Adapt and met challenges of tight release dates. Over 9+ years of diverse experience in Information Technology field, includes
Development, and Implementation of various applications in big data environments.

TECHNICAL SKILLS:

Database: Oracle, MySQL

Operating systems: Windows, MacOS, Linux

Networking: Cisco, IPv6

Presentation software: Power Point

Coding: Java, C++, BASH, JUnit

Agile/Scrum collaboration tools: GitHub, MS Teams, Gantt charts

PROFESSIONAL EXPERIENCE:

Cloud Data Engineer

Confidential, MA

Responsibilities:

Experienced in Automating, Configuring and deploying instances on AWS, Azure cloud environments and Data centers, also familiar with EC2, Cloud watch, Cloud Formation, and managing security groups on AWS.
Created automated python scripts to convert teh data from different sources and to generate teh ETL pipelines.
Populated a Data Lake data using AWS S3 from various data sources using AWS Kinesis.
Experience on Migrating SQL database toAzure data Lake, Azure data lake Analytics,Azure SQL Database, Data BricksandAzure SQL Data warehouseand Controlling and granting database accessandMigrating On premise databases toAzure Data lake storeusing Azure Data factory.
Extensively used Hive optimization techniques like partitioning, bucketing, Map Join, and parallel execution.
Experience in DevelopingSparkapplications usingSpark - SQLinDatabricksfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage patterns.
Implemented solutions for ingesting data from various sources and processing teh Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
Extensive experience in Apachi/Hudi Datasets on Insert /Bulk insert.
Maintenance and design of software installation shell scripts.
Developed maintenance and software installation scripts.
Design extraction of data from different databases and scheduling Oozie workflows to execute daily tasks.
Developed distributed query agents for performing distributed queries against Hive
Load teh data from different sources such as HDFS or HBase into Spark data frames and implement in-memory data computation to generate teh output response.
Monitoring resources, such as Amazon DB, CPU Memory, using cloud watch.
Collaborated on a Hadoop cluster (CDH) and reviewed log files of all daemons.
Used Spark SQL to realize quicker results compared to Hive throughout information Analysis.
Created Hive external tables and designed information models in hive
Developed multiple Spark Streaming and batch Spark jobs using Python

Hadoop Engineer

Confidential, CA

Responsibilities:

Installed and configured a fullKafkacluster, Topics and Replicas
Created and managed Topic creation inside Kafka
Installed and configured replication factor partitions
Communicated and managed consumer groups overKafka
Wrote python scripts to receive requests from REST Based API’s, through teh request libraries and serve to Kafka producer.
Loaded data from multiple AWS services to AWS S3 buckets and configured bucket permissions using IAM roles
Developed Spark applications usingPysparkandSpark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage patterns
Worked on Data streaming on various OLTP.
Performed ETL to Hadoop file system (HDFS) and wrote Hive UDFs.
Ingested information from spark Data Frames over HBase
Performed aggregation and windowing functions with SQL
Migrated Spark applications from Map Reduce to improve performance
Created a benchmark between Cassandra and Hbase for fast ingestion
Processed Terabytes of information on real time using spark streaming
Created and managed code reviews
Collaborated on Spring planning and sprint grooming
Developed unit tests to evaluate test functionality of spark applications
Developed scripts for collecting high-frequency log data from various sources and integrating it into HDFS using Flume; staging data in HDFS for further analysis.
Write producer /consumer scripts to process JSON response in python
Wrote streaming applications with Spark Streaming/Kafks.
Developed DBC/ODBC connectors between Hive and Spark for teh transfer of teh newly populated data frames from MSSQL
Built Hive views on top of source data tables
Built a secured provisioning Hive Metastore
Involved in loading data from teh UNIX file system to HDFS.

Data Engineer

Confidential, CA

Responsibilities:

Participated in planning meetings and assisted with documentation and communication.
Worked on moving some on-prem data repositories to cloud using Amazon AWS to make use of reduced cost as well as scalability.
Implemented all SCD types using server and parallel jobs. Extensively implemented error handling concepts, testing, debugging skills and performance tuning of targets, source, transformation logics and version control to promote teh jobs.
Involved in loading and transforming large sets of structured, semi-structured and unstructured data.
Involved in loading data from UNIX file system to HDFS.
Developed ETLs to pull data from various sources and transform it for reporting applications using PL/SQL
Hands-on experience extracting data from different databases and scheduling Oozie workflows to execute teh task daily.
Used Sqoop to expeditiously transfer information between information data bases and HDFS and used Flume to stream teh log data from servers. Successfully loaded files to Hive and HDFS from Oracle, SQL Server using SQOOP.
Captured data and importing it to HDFS using Flume and Kafka for semi-structured data and Sqoop for existing relational databases.
Used Zookeeper for providing coordinating services to teh cluster.
Used Oozie hardware system to alter teh pipeline advancement and execute jobs in a timely manner.

Junior Data Engineer

Confidential - Charlotte, NC

Responsibilities:

Administered Hadoop cluster(CDM) and reviewed log files of all daemons.
Involved in scheduling Oozie workflow engine to run multiple Hive, Sqoop and Pig jobs.
Implemented workflows using Apache Oozie framework to automate tasks.
Used Zookeeper for various types of centralized configurations, GIT for version control, and Maven as a build tool for deploying teh code.
Involved in scheduling Oozie workflow engine to run multiple HiveQL, Sqoop and Pig jobs.
Developed workflow in Oozie to automate teh tasks of loading data into HDFS and pre-processing with Pig and Hive.
Configured Fair Scheduler to allocate resources to all teh applications across teh cluster.
Performed maintenance, monitoring, deployments, and upgrades across infrastructure dat supports all Hadoop clusters.
Used Zookeeper and Oozie for coordinating teh cluster and scheduling workflows.
Managed jobs using Fair Scheduler to allocate processing resources.
Developed job processing scripts using Oozie workflow to run multiple Spark Jobs in sequence for processing data
Configured Zookeeper to coordinate teh servers in clusters to maintain teh data consistency and to monitor services.
Automated all teh jobs for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
Used Oozie workflows and coordinators for integrating MapReduce workflow- including Java REST service consumption and MongoDb/Neo4j ingress, and scheduling teh data flow pipeline.

We provide IT Staff Augmentation Services!

Cloud Data Engineer Resume

CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship