Data Engineer Resume Walnut Creek, CA - Hire IT People

PROFESSIONAL SUMMARY:

Around 5+ years of overall IT experience along with 2+ years in Hadoop (Cloudera Distribution CDH 4 and 5) on cluster of 30 nodes.
Worked with data with size of over 60 TB.
Extensive experience in HDFS, Sqoop, Flume, Hive, Pig, Spark, Oozie, Impala.
Good understanding and working experience on Hadoop Distributions like Cloudera and Hortonworks.
Experience in importing and exporting multi terabytes of data using Sqoop from Relational Database Management System to HDFS and vice versa.
Experience in using HiveQL of querying and analyzing large datasets.
Experience in writing simple to complex Pig scripts for processing and analyzing large volumes of data.
Querying both Managed and External tables created in Hive using Impala.
Extensive experience with ETL and query big data tools HiveQL and Pig Latin.
Experience in loading logs from multiple sources into HDFS using Flume.
Experience with Oozie workflow engine in running jobs with actions that run Sqoop, Pig and Hive jobs.
Experience in using Spark API over Map Reduce to perform analytics on data.
Experience in creating Resilient Distributed Datasets for the input data and data transformations using PySpark.
Experienced with Spark processing framework such as Spark SQL.
Experience in Data Warehousing and ETL processes.
Experience in processing large datasets of different forms like structured, semi - structured and unstructured data.
Experience in working with different file formats like Avro , Parquet , ORC , Sequence , and JSON files.
Background with traditional databases such as MySQL, SQL Server
Good analytical, interpersonal, communication, problem solving skills with ability to quickly master new concepts and capable of working in group as well as independently.

TECHNICAL SKILLS:

Hadoop Distribution: Cloudera, Hortonworks

Big Data Ecosystem: HDFS, Sqoop, Flume, Hive, Pig, Impala, Oozie, Spark

Databases: MySQL, MS SQL Server

NoSQL / Storage: Hbase, AWS Redshift, S3, EMR

Languages: Java, Python

Operating System: Windows XP/7/8/10, Linux, Mac OS

PROFESSIONAL EXPERIENCE:

Confidential, Walnut Creek, CA

Data Engineer

Responsibilities:

Worked on Cloudera CDH 5.4 distribution of Hadoop.
Extensively working with MySQL for identifying required tables and views to export into HDFS.
Responsible for moving data from MySQL to HDFS to development cluster for validation and cleansing.
Responsible for creating Hive tables on top of HDFS and developed Hive Queries to analyze the data.
Developed Hive tables on data using different SERDE’s, storage format and compression techniques.
Optimized the data sets by creating Dynamic Partition and Bucketing in Hive.
Used Pig Latin to analyze datasets and perform transformation according to requirements.
Implemented Hive custom UDF’s for comprehensive data analysis.
Involved in loading data from local file systems to Hadoop Distributed File System.
Experience working with SparkSQL and creating RDD’s using PySpark.
Extensive experience working with ETL of large datasets using PySpark in Spark on HDFS.
Developed ETL workflow which pushes web server logs to an Amazon S3 bucket.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Sqoop script, Pig script, Hive queries.
Exporting data from HDFS environment into RDBMS using Sqoop.

Confidential, Denver, CO

Data Engineer

Responsibilities:

Worked on live 30 nodes Hadoop cluster running CDH 4.4
Worked with highly unstructured and semi structured data of 20TB in size.
Responsible for building scalable distributed data solutions using Hadoop .
Managing data from various file system to HDFS using UNIX command line utilities.
Involved in importing and exporting data between RDBMS and HDFS using Sqoop.
Creating Hive tables on top of the loaded data and writing hive queries for adhoc analysis.
Implemented Partitioning, Dynamic Partition, and Bucketing in Hive for efficient data access.
Performed querying of both managed and external tables created by Hive using Impala.
Developed Pig scripts for data analysis and perform transformation.
Implemented Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Developed Spark code and Spark SQL for faster testing and processing of data.
Involved in converting Hive SQL queries into Spark transformation using Spark RDD’s, Python.
Developed UNIX shell scripts to load large number of files into HDFS from Linux File System.
Implemented Oozie workflow for Sqoop, Pig and Hive actions.
Exported the analyzed data to the relational databases using Sqoop.
Debugged the results to find if there is any missing at the outcome.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Involved in performance tuning and fixing bugs.

Confidential

SQL Developer

Responsibilities:

Involved in database development and creating SQL scripts.
Involved in Requirement Study, UI Design, Development, Implementation, Code Review, Validation, Testing.
Managed database related activities.
Designed tables and indexes.
Writing SQL queries to fetch the business data.
Developed Views, Sequence and indexes.
Created Joins and Sub queries involving multiple tables.
Analyzing SQL data, identifying issues and modifying the SQL scripts to fix the issues.
Involved in trouble shooting and fine tuning of databases for its performance and concurrency.
Involved in fixing bugs and different forms of testing including black and white box testing .
Handling issues regarding database, its connectivity and maintenance.
Manage the priorities, deadlines and deliverables of individual project and issues related to it.
Effectively prioritize work while considering business need and urgency.
Worked effectively and efficiently on multiple tasks and deadlines and produces high quality results.
Involved in performance improvement of web application for user friendly experience and solving a critical issue that happens in the production environment.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Walnut Creek, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship