Sr. Data Engineer Resume Los Angeles, CA - Hire IT People

SUMMARY

More than 8 years of experience in data engineering field.
Worked in data ingestion, storage, querying, processing and analysis of big data with hands on experience in Hadoop ecosystem development including mapreduce, HDFS, Hive, Pig, Spark, Sqoop, Flume, AWS
Proficient with apache Spark ecosystem such as Spark using python
Experience in troubleshooting errors in Hive and MapReduce.
Hands on experience on Hadoop architecture and various components such as HDFS, job tracker, task tracker, name node, data node, hive.
Good understanding and knowledge of Hadoop architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode and MapReduce concepts.
Experience in importing and exporting data using Sqoop from relational database systems to HDFS and vice - versa.
In depth understanding and knowledge of Hadoop architecture and various components such as HDFS, MR, high availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
Well versed with job workflow scheduling and monitoring tools like oozie.
Intensive working experience with amazon web services (AWS) using S3 for storage, EC2 for computing.
Experience in source control repositories like SVN and GIT.
Knowledge of datawarehousing and ETL tools like Informatica.
Worked on data ingestion using sqoop from various sources like sql server to hdfs.
Extensive experience working with spark tools like RDD transformations and SparkSQL.
Good exposure on spark concepts.
Strong knowledge on unix/linux shell commands.
Developed unix shell scripts for high level automation of executing hql files and transferring the files to client server.
Adequate knowledge and working experience in agile & waterfall methodologies.
Support development, testing, and operation teams during new system deployments.

TECHNICAL SKILLS

Big Data Frameworks: Hadoop (HDFS, MapReduce), Spark, Spark SQL, Hive, Impala, Sqoop, Oozie

Bigdata distribution: Cloudera, Hortonworks, Amazon EMR

Programming languages: Python, Java, Shell scripting

Operating Systems: Windows, Linux, Mac OS

Databases: Oracle, SQL Server, MySQL

Designing Tools: UML, Visio

IDEs: Eclipse, NetBeans

Web Technologies: XML, HTML, JavaScript, jQuery, JSON

Linux Experience: System Administration Tools, Puppet

Development methodologies: Agile, Waterfall

Version Tools: Git and CVS

Others: Putty, WinSCP, Data Lake, Talend, AWS

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Los Angeles, CA

Responsibilites:

Worked on preparing data pre-processing, wrangling and processing using various transformations, actions and build-in functions in spark with python and Scala.
Create spark SQL queries as part of a processing framework.
Used spark SQL to handle structured data and loaded into hive table.
Created RDD using spark SQL to load JSON data and loaded into hive table.
Created dataframe using spark SQL with python from existing hive tables as part of daily load.
Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets using spark.
Experienced in handling large datasets using partitions, Spark in memory capabilities, effective and efficient joins, transformations and other during ingestion process itself.
Worked on RDD and dataframe techniques in PySpark for processing data at a faster rate.
Involved in performance tuning of Spark applications for fixing right batch interval time and memory tuning.
Worked on migrating the existing applications and developed new applications using AWS cloud services. Developed python scripts to get the recent S3 keys from elastic search.
Developed the batch scripts to fetch the data from AWS S3 storage and do the required transformations.
Implemented spark SQL with various data sources like JSON, Parquet, ORC and Hive.
Applied different transformation techniques on dataframes, storing with ORD files format with appended mode.
Worked in agile/scrum methodology.

Data Engineer

Confidential, Long Island city, NY

Responsibilites:

Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’s.
Used the memory computing capabilities of spark and performed advanced procedures.
Developed spark code using python and spark-SQL for faster processing of data.
Developed workflow in oozie to automate the tasks of loading the data into HDFS.
Involved in performance tuning of spark applications.
Developed spark scripts and python functions that involve performing transformations and actions on data sets.

Hadoop developer

Confidential, NY

Responsibilites:

Analyzed and understood the design document and mapping document.
Used sqoop to ingest the data from DB2, Teradata and sql server to Hadoop layer. Based on the requirement loaded the data to hive partitioned tables.
Using shell scripting, automated the existing sqoop jobs and scheduled these jobs on autosys.
Root cause analysis was performed on failed jobs and based on that decided whether to restart, hold or kill the jobs.
Developed a shell script automation process for count validation, columns, null condition checking and schema validation between multiple hive tables.
Developed data validation and default values process of checking null, space and blank values for different hive tables for both nullable and not nullable columns using shell script.
Used to perform data cleansing activities for HDFS and local path. Scheduled automated process to purge old data for every 15 days.

ETL Developer

Confidential

Responsibilites:

Designed and developed Informatica’s mappings and session based on business user requirements and business rules to load data from source flat files and oracle tables to target tables.
Created mapping using the transformations like source qualifier, SQL, Aggregator, expression, look-up, router, filter, update strategy, joiner, stored procedure etc.
Instrumental in performance tuning of mapping/session at database and informatica level to improve ETL load timings.
Created database objects like tables, indexes, stored procedures, database triggers, and views.
Extensively used partitioning concepts to improve session performance in informatica.
Written UNIX scripts to invoke informatica workflows and sessions.
Wrote complex SQL scripts to avoid informatica look-ups to improve the performance as the volume of the data was heavy.

We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Los Angeles, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship