We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Little Falls, NJ

TECHNICAL SKILLS

  • Big Data Technologies
  • NO SQL Databases
  • Languages
  • Java & J2EE Technologies
  • Application Servers
  • Cloud Computing Tools
  • Databases
  • Build Tools
  • Business Intelligence Tools
  • Development Tools
  • Development Methodologies

PROFESSIONAL EXPERIENCE

Big Data Engineer

Confidential - Little Falls, NJ

Responsibilities:

  • Acquired, Analysed and documented business requirements as a part of development team
  • Developed Numerous data models for data migration from oracle DB to Cassandra.
  • Built common data intake, an ETL framework to be an end to end solution for data transformation using pig, python and Hive.
  • Extensively worked wif Scala\Spark,SQL for data cleansing and generating data.Frames to transform into row DF’s to populate teh aggretable tables in cassandra.
  • Adept at developing generic Spark-Scala methods for transformation and designing schemes for rows.
  • Adept at writing efficient Spark-scala code to generate aggregation functions on Data Frames according to business logic.
  • Experienced in using DataStax Spark Connector which is used to store teh data in Cassandra database from Spark.
  • Worked extensively wif Oracle DB and developed Sqoop jobs for data ingestion into NoSQL database Cassandra.
  • Extracted Real time feed using Kafka and Spark Streaming and converted it to RDD and processed data into Data Frame to save teh data as Parquet format in HDFS.
  • Worked on design, optimization, multi-datacenter replication, scalability, security, and monitoring of Kafka infrastructure
  • Assisted in designing and implementing big data solutions integrating wif Java applications (messaging, web services integration, stream processing).
  • Worked Closely wif architects to design data models and coding optimizations to build a generic data transformation framework (not client specific, can work on various client implementations.) using Kafka “Streams API” (KStream, KTable, GlobalKTable) and “KAFKA Connect API.”
  • Successfully delivered a transformation framework which can work on various transformations and be an end to end solution for a data transformation project using a topology that is integrated wif both Streams DSL and Processor API.
  • Integrated Schema registry (confluents version) to KStreams Project to check schema compatibility for Kafka and for managing AVRO schemas.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Coordinated wif offshore team members to write and generate test scripts, test cases for numerous user stories.
  • Communicate wif business and IT leadership.

Big Data Developer

Confidential - Rockville, MD

Responsibilities:

  • Involved building scalable distributed data solutions using Spark and Cloudera Hadoop.
  • Explored Spark framework for improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Core, Spark SQL, Spark Streaming APIs.
  • Ingested data from relational databases to HDFS on regular basis using Sqoop incremental import.
  • Involved in Development of Spark Scala applications to process and analyze text data from emails, complaints, forums, and clickstreams to achieve comprehensive customer care.
  • Extracted structured data from multiple relational data sources as DataFrames in SparkSQL.
  • Involved in schema extraction from file formats like Avro, Parquet.
  • Involved in converting teh data from Avro format to Parquet format and vice versa.
  • Transformed teh DataFrames as per teh requirements of teh data science team.
  • Loaded teh data into HDFS in parquet, avro formats wif compression codecs like Snappy, LZO as per teh requirement.
  • Worked on teh integration of kafka service for stream processing, website tracking, log aggregation.
  • Worked towards creating near real time data streaming solutions using Spark Streaming, Kafka and persisting teh data in Cassandra.
  • Involved in configuring and developing kafka producers, consumers, topics, brokers using java.
  • Involved in data modeling, ingesting data into Cassandra using CQL, java APIs and other drivers.
  • Implemented CRUD operations using CQL on top of Cassandra file system.
  • Involved in writing Pig Scripts to wrangle teh log data and store it back to HDFS and Hive tables.
  • Involved in accessing teh hive tables using HiveContext and transforming teh data and store it to HBase.
  • Involved in writing HiveQL scripts on beeline, impala, hive cli for teh structured data analysis to meet business requirements.
  • Worked on building ETL pipelines using spark scale to move teh data to S3 and HDFS.
  • Involved in creating Hive tables from a wide range of data formats like text, sequential, avro, parquet, orc.
  • Analyze teh transactional data in HDFS using Hive and optimize teh performance of teh queries by segregating thAe data using clustering and partitioning.
  • Worked on a POC to compare processing time of Impala wif SparkSQL for efficient batch processing.
  • Developed Spark Applications for various business logics using Scala, Python.
  • Involved in moving teh data between HDFS and AWS S3 by using apache distCp.
  • Involved in pulling teh data from Amazon S3 data lake and built Hive tables using Hive Context in Spark
  • Involved in running hive queries and spark jobs on data stored in S3.
  • Run short term ad-hoc queries, jobs on teh data stored on S3 using AWS EMR.
  • Worked wif BI (Business Intelligence) teams in generating teh reports and designing ETL workflows on Tableau. Deployed data from various sources into HDFS and building reports using Tableau.

Big Data Engineer

Confidential

Responsibilities:

  • Worked closely wif Business Analysts to gather requirements and design a reliable and scalable distributed solution using Horton works distributed Hadoop.
  • Ingested structured data from MySQL, SQL Server to HDFS as incremental import using Sqoop. These imports are scheduled to run in a periodic manner.
  • Configured flume agents on different web servers to ingest teh streaming data into HDFS.
  • Developed Pig Latin scripts for data cleaning and loading it to HDFS, Hive tables, HBase depending teh use case.
  • Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE, and analyze data using visualization/reporting tools.
  • Used HCatalog to move structured data between Pig relation and Hive.
  • Involved in developing Spark Scala applications using Core Spark, Spark SQL APIs.
  • Involved in POC to check teh efficiency of Spark application on Mesos cluster and Hadoop Yarn cluster.
  • Developed and implemented workflows using Apache Oozie for tasks automation.
  • Responsible for creating Hive tables, loading teh structured data resulting from Map Reduce jobs into teh tables and writing hive queries to further analyze teh logs to identify issues and behavioral patterns.
  • Tuning teh performance of Hive data analysis using clustering and partitioning of data wif respective to date, location.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented change data capture (CDC) using Sqoop incremental import, hive, spark, SQL, hive context.
  • Used Tableau to connect wif Hive Server2 for generating daily reports of customer purchases.
  • Collaborated wif teh infrastructure, network, database, application, and BI teams to ensure data quality and availability.
  • Involved in working wif data formats like csv, text, sequential, avro, parquet, orc, json and customized Hadoop formats.
  • Exported processed data in HDFS to DWH using Sqoop export through a staging table.

We'd love your feedback!