We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

2.00/5 (Submit Your Rating)

Los Angeles, CA

SUMMARY

  • More than 8 years of experience in data engineering field.
  • Worked in data ingestion, storage, querying, processing and analysis of big data with hands on experience in Hadoop ecosystem development including mapreduce, HDFS, Hive, Pig, Spark, Sqoop, Flume, AWS
  • Proficient with apache Spark ecosystem such as Spark using python
  • Experience in troubleshooting errors in Hive and MapReduce.
  • Hands on experience on Hadoop architecture and various components such as HDFS, job tracker, task tracker, name node, data node, hive.
  • Good understanding and knowledge of Hadoop architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode and MapReduce concepts.
  • Experience in importing and exporting data using Sqoop from relational database systems to HDFS and vice - versa.
  • In depth understanding and knowledge of Hadoop architecture and various components such as HDFS, MR, high availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
  • Well versed with job workflow scheduling and monitoring tools like oozie.
  • Intensive working experience with amazon web services (AWS) using S3 for storage, EC2 for computing.
  • Experience in source control repositories like SVN and GIT.
  • Knowledge of datawarehousing and ETL tools like Informatica.
  • Worked on data ingestion using sqoop from various sources like sql server to hdfs.
  • Extensive experience working with spark tools like RDD transformations and SparkSQL.
  • Good exposure on spark concepts.
  • Strong knowledge on unix/linux shell commands.
  • Developed unix shell scripts for high level automation of executing hql files and transferring the files to client server.
  • Adequate knowledge and working experience in agile & waterfall methodologies.
  • Support development, testing, and operation teams during new system deployments.

TECHNICAL SKILLS

Big Data Frameworks: Hadoop (HDFS, MapReduce), Spark, Spark SQL, Hive, Impala, Sqoop, Oozie

Bigdata distribution: Cloudera, Hortonworks, Amazon EMR

Programming languages: Python, Java, Shell scripting

Operating Systems: Windows, Linux, Mac OS

Databases: Oracle, SQL Server, MySQL

Designing Tools: UML, Visio

IDEs: Eclipse, NetBeans

Web Technologies: XML, HTML, JavaScript, jQuery, JSON

Linux Experience: System Administration Tools, Puppet

Development methodologies: Agile, Waterfall

Version Tools: Git and CVS

Others: Putty, WinSCP, Data Lake, Talend, AWS

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Los Angeles, CA

Responsibilites:

  • Worked on preparing data pre-processing, wrangling and processing using various transformations, actions and build-in functions in spark with python and Scala.
  • Create spark SQL queries as part of a processing framework.
  • Used spark SQL to handle structured data and loaded into hive table.
  • Created RDD using spark SQL to load JSON data and loaded into hive table.
  • Created dataframe using spark SQL with python from existing hive tables as part of daily load.
  • Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets using spark.
  • Experienced in handling large datasets using partitions, Spark in memory capabilities, effective and efficient joins, transformations and other during ingestion process itself.
  • Worked on RDD and dataframe techniques in PySpark for processing data at a faster rate.
  • Involved in performance tuning of Spark applications for fixing right batch interval time and memory tuning.
  • Worked on migrating the existing applications and developed new applications using AWS cloud services. Developed python scripts to get the recent S3 keys from elastic search.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do the required transformations.
  • Implemented spark SQL with various data sources like JSON, Parquet, ORC and Hive.
  • Applied different transformation techniques on dataframes, storing with ORD files format with appended mode.
  • Worked in agile/scrum methodology.

Data Engineer

Confidential, Long Island city, NY

Responsibilites:

  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’s.
  • Used the memory computing capabilities of spark and performed advanced procedures.
  • Developed spark code using python and spark-SQL for faster processing of data.
  • Developed workflow in oozie to automate the tasks of loading the data into HDFS.
  • Involved in performance tuning of spark applications.
  • Developed spark scripts and python functions that involve performing transformations and actions on data sets.

Hadoop developer

Confidential, NY

Responsibilites:

  • Analyzed and understood the design document and mapping document.
  • Used sqoop to ingest the data from DB2, Teradata and sql server to Hadoop layer. Based on the requirement loaded the data to hive partitioned tables.
  • Using shell scripting, automated the existing sqoop jobs and scheduled these jobs on autosys.
  • Root cause analysis was performed on failed jobs and based on that decided whether to restart, hold or kill the jobs.
  • Developed a shell script automation process for count validation, columns, null condition checking and schema validation between multiple hive tables.
  • Developed data validation and default values process of checking null, space and blank values for different hive tables for both nullable and not nullable columns using shell script.
  • Used to perform data cleansing activities for HDFS and local path. Scheduled automated process to purge old data for every 15 days.

ETL Developer

Confidential

Responsibilites:

  • Designed and developed Informatica’s mappings and session based on business user requirements and business rules to load data from source flat files and oracle tables to target tables.
  • Created mapping using the transformations like source qualifier, SQL, Aggregator, expression, look-up, router, filter, update strategy, joiner, stored procedure etc.
  • Instrumental in performance tuning of mapping/session at database and informatica level to improve ETL load timings.
  • Created database objects like tables, indexes, stored procedures, database triggers, and views.
  • Extensively used partitioning concepts to improve session performance in informatica.
  • Written UNIX scripts to invoke informatica workflows and sessions.
  • Wrote complex SQL scripts to avoid informatica look-ups to improve the performance as the volume of the data was heavy.

We'd love your feedback!