Cloud Data Engineer Resume
4.00/5 (Submit Your Rating)
CA
SUMMARY:
- Experienced, result - oriented, resourceful and problem-solving Data engineer with leadership skills. Adapt and met challenges of tight release dates. Over 9+ years of diverse experience in Information Technology field, includes
- Development, and Implementation of various applications in big data environments.
TECHNICAL SKILLS:
Database: Oracle, MySQL
Operating systems: Windows, MacOS, Linux
Networking: Cisco, IPv6
Presentation software: Power Point
Coding: Java, C++, BASH, JUnit
Agile/Scrum collaboration tools: GitHub, MS Teams, Gantt charts
PROFESSIONAL EXPERIENCE:
Cloud Data Engineer
Confidential, MA
Responsibilities:
- Experienced in Automating, Configuring and deploying instances on AWS, Azure cloud environments and Data centers, also familiar with EC2, Cloud watch, Cloud Formation, and managing security groups on AWS.
- Created automated python scripts to convert teh data from different sources and to generate teh ETL pipelines.
- Populated a Data Lake data using AWS S3 from various data sources using AWS Kinesis.
- Experience on Migrating SQL database toAzure data Lake, Azure data lake Analytics,Azure SQL Database, Data BricksandAzure SQL Data warehouseand Controlling and granting database accessandMigrating On premise databases toAzure Data lake storeusing Azure Data factory.
- Extensively used Hive optimization techniques like partitioning, bucketing, Map Join, and parallel execution.
- Experience in DevelopingSparkapplications usingSpark - SQLinDatabricksfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage patterns.
- Implemented solutions for ingesting data from various sources and processing teh Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
- Extensive experience in Apachi/Hudi Datasets on Insert /Bulk insert.
- Maintenance and design of software installation shell scripts.
- Developed maintenance and software installation scripts.
- Design extraction of data from different databases and scheduling Oozie workflows to execute daily tasks.
- Developed distributed query agents for performing distributed queries against Hive
- Load teh data from different sources such as HDFS or HBase into Spark data frames and implement in-memory data computation to generate teh output response.
- Monitoring resources, such as Amazon DB, CPU Memory, using cloud watch.
- Collaborated on a Hadoop cluster (CDH) and reviewed log files of all daemons.
- Used Spark SQL to realize quicker results compared to Hive throughout information Analysis.
- Created Hive external tables and designed information models in hive
- Developed multiple Spark Streaming and batch Spark jobs using Python
Hadoop Engineer
Confidential, CA
Responsibilities:
- Installed and configured a fullKafkacluster, Topics and Replicas
- Created and managed Topic creation inside Kafka
- Installed and configured replication factor partitions
- Communicated and managed consumer groups overKafka
- Wrote python scripts to receive requests from REST Based API’s, through teh request libraries and serve to Kafka producer.
- Loaded data from multiple AWS services to AWS S3 buckets and configured bucket permissions using IAM roles
- Developed Spark applications usingPysparkandSpark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage patterns
- Worked on Data streaming on various OLTP.
- Performed ETL to Hadoop file system (HDFS) and wrote Hive UDFs.
- Ingested information from spark Data Frames over HBase
- Performed aggregation and windowing functions with SQL
- Migrated Spark applications from Map Reduce to improve performance
- Created a benchmark between Cassandra and Hbase for fast ingestion
- Processed Terabytes of information on real time using spark streaming
- Created and managed code reviews
- Collaborated on Spring planning and sprint grooming
- Developed unit tests to evaluate test functionality of spark applications
- Developed scripts for collecting high-frequency log data from various sources and integrating it into HDFS using Flume; staging data in HDFS for further analysis.
- Write producer /consumer scripts to process JSON response in python
- Wrote streaming applications with Spark Streaming/Kafks.
- Developed DBC/ODBC connectors between Hive and Spark for teh transfer of teh newly populated data frames from MSSQL
- Built Hive views on top of source data tables
- Built a secured provisioning Hive Metastore
- Involved in loading data from teh UNIX file system to HDFS.
Data Engineer
Confidential, CA
Responsibilities:
- Participated in planning meetings and assisted with documentation and communication.
- Worked on moving some on-prem data repositories to cloud using Amazon AWS to make use of reduced cost as well as scalability.
- Implemented all SCD types using server and parallel jobs. Extensively implemented error handling concepts, testing, debugging skills and performance tuning of targets, source, transformation logics and version control to promote teh jobs.
- Involved in loading and transforming large sets of structured, semi-structured and unstructured data.
- Involved in loading data from UNIX file system to HDFS.
- Developed ETLs to pull data from various sources and transform it for reporting applications using PL/SQL
- Hands-on experience extracting data from different databases and scheduling Oozie workflows to execute teh task daily.
- Used Sqoop to expeditiously transfer information between information data bases and HDFS and used Flume to stream teh log data from servers. Successfully loaded files to Hive and HDFS from Oracle, SQL Server using SQOOP.
- Captured data and importing it to HDFS using Flume and Kafka for semi-structured data and Sqoop for existing relational databases.
- Used Zookeeper for providing coordinating services to teh cluster.
- Used Oozie hardware system to alter teh pipeline advancement and execute jobs in a timely manner.
Junior Data Engineer
Confidential - Charlotte, NC
Responsibilities:
- Administered Hadoop cluster(CDM) and reviewed log files of all daemons.
- Involved in scheduling Oozie workflow engine to run multiple Hive, Sqoop and Pig jobs.
- Implemented workflows using Apache Oozie framework to automate tasks.
- Used Zookeeper for various types of centralized configurations, GIT for version control, and Maven as a build tool for deploying teh code.
- Involved in scheduling Oozie workflow engine to run multiple HiveQL, Sqoop and Pig jobs.
- Developed workflow in Oozie to automate teh tasks of loading data into HDFS and pre-processing with Pig and Hive.
- Configured Fair Scheduler to allocate resources to all teh applications across teh cluster.
- Performed maintenance, monitoring, deployments, and upgrades across infrastructure dat supports all Hadoop clusters.
- Used Zookeeper and Oozie for coordinating teh cluster and scheduling workflows.
- Managed jobs using Fair Scheduler to allocate processing resources.
- Developed job processing scripts using Oozie workflow to run multiple Spark Jobs in sequence for processing data
- Configured Zookeeper to coordinate teh servers in clusters to maintain teh data consistency and to monitor services.
- Automated all teh jobs for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Used Oozie workflows and coordinators for integrating MapReduce workflow- including Java REST service consumption and MongoDb/Neo4j ingress, and scheduling teh data flow pipeline.
