We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

NJ

SUMMARY:

  • 8+ years of IT experience and over 4+ years in working with Big Data in Health Care, Banking, Insurance and Retail with good knowledge in Hadoop ecosystem technologies.
  • Involved in various SDLC methods from analysis, design, development, testing, implementation and maintenance with timely delivery against aggressive deadlines in both Agile/Scrum and Waterfall methodology.
  • Good experience in Team Leadership with excellent Communication, Management and Presentation skills.

TECHNICAL SKILLS:

DATA INGESTION: Sqoop, Kafka, Flume

DATA PROCESSING: Spark, Hive, Map Reduce

LANGUAGES: Python, Scala, Java, Shell, SQL

RELATIONAL DATABASES: MySQL, Oracle, SQL Server

NoSQL DATABASES: HBASE, Cassandra

ETL: Talend, DataStage

MONITORING: Ambari, Cloudera Manager

DISTRIBUTIONS: Cloudera, Hortonworks

VERSION CONTROL: GIT, SVN

BUILD TOOLS: ANT, Maven, Gradle

CLOUD: AWS EMR, EC2, S3

PROFESSIONAL EXPERIENCE:

Confidential, NJ

Big Data Engineer

Environment: Apache Spark, HDFS, Hive, HBase, Kafka, SQL, Oozie, Cloudera Manager, Zoo Keeper, Cloudera.

Responsibilities:
  • Involved in project Life Cycle - from analysis to production implementation, with emphasis on identifying the source and source data validation, developing logic and transformation as per the requirement and creating mappings and loading the data into different targets.
  • Involved in deployment phase meetings for change management.
  • Load real time data into HDFS using Kafka and structured batch data using Sqoop .
  • Used Spark Streaming with Kafka to perform transformations.
  • Worked on reading multiple data formats on HDFS using Python and Scala .
  • Spark transformation scripts using API’s like SparkRDD, Pair RDD, Spark SQL, Data Frames with Scala and Python .
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data.
  • Developed Map Reduce jobs to process the data and create necessary HFiles .
  • Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
  • Automated the process with Environment variables to run jobs.

Confidential, NJ

Big Data Developer

Environment: Java, Eclipse, Hadoop, Map Reduce, HDFS, Sqoop, FLUME, Oozie, WinSCP, UNIX Shell Scripting, HIVE, PIG, Cloudera (Hadoop distribution), AWS, EC2, EMR, S3, JIRA, Microsoft Visio, Oracle PL/SQL, HP QC, MS Sharepoint, etc.

Responsibilities:
  • Worked on a live 35 nodes Hadoop cluster running CDH4.4.
  • Worked with highly unstructured and semi structured data of 30 TB in size (90 TB with replication factor of 3).
  • Extracted the data from Oracle, Teradata into HDFS using Sqoop.
  • Created and worked Sqoop (version 1.4.3) jobs with incremental load to populate Hive External tables.
  • Used Flume to collect, aggregate, and store the log data from web servers.
  • Executed custom interceptors for Flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Developed UDFs in Java as and when necessary to use in Hive queries.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.
  • Good working knowledge of Amazon Web Service components like EC2, EMR, S3 etc
  • Used JIRA for incident creation, bug tracking and change management process.
  • Implemented Authentication and Authorization using Kerberos, Knox and Apache Ranger.
  • Good Working knowledge of Tableau.

Confidential, NY

Hadoop Developer

Environment: Hortonworks (HDP 2.2), HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Jenkins, Maven, Oozie, ZooKeeper, Ambari, Oracle Database, MySQL, HBase, Java (jdk1.7), Tableau and CentOS

Responsibilities:
  • Extracted data from Oracle SQL server and MySQL databases to HDFS using Sqoop.
  • Optimized MapReduce jobs to use HDFS efficiently by using Gzip, LZO, Snappy and Bzip2 compression techniques.
  • Experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
  • Created Hive tables to store the processed results and written Hive scripts to transform and aggregate the disparate data.
  • Driving the application from development phase to production phase using Continuous Integration and Continuous Deployment (CICD) model using Maven and Jenkins.
  • Automated the process for extraction of data from warehouses and weblogs into Hive tables by developing workflows and coordinator jobs in Oozie.
  • Worked with data in multiple file formats including Avro, Parquet, Sequence files, ORC and Text/ CSV.
  • Exported the aggregated data onto RDBMS using Sqoop for creating dashboards in the Tableau.
  • Utilized Agile Scrum Methodology to manage and organize the team with regular code review sessions.

Confidential, NY

Hadoop Developer

Environment: Hadoop, HDFS, Java, Map Reduce, Hive, Sqoop, SQL, Ambari, Ranger, Zoo Keeper, Hortonworks(HDP).

Responsibilities:
  • Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems.
  • Responsible for Installation and configuration of on the Hortonworks Hadoop cluster using Ambari.
  • Developed Sqoop scripts to import data from Oracle database and handled incremental loading on the point of sale tables.
  • Created Hive external tables, views and scripts for transformations such as filtering, aggregation and partitioning tables.
  • Developed bash scripts to automate the above process of Extraction, Transformation and Loading.
  • Managing Hadoop cluster using Ambari and created roles and user groups in Ambari for permitted access to Ambari functions.

Confidential, CA

Data Base Developer

Environment: SQL Server 2005, DDL, DML, Store Procedures, Views, Waterfall methodology.

Responsibilities:
  • Developed SQL scripts to Insert/Update and Delete data.
  • Written complex SQL statements using joins, scalar and table user defined functions, views.
  • Worked on stored procedures and database triggers and worked.
  • Generated database SQL scripts and deployed databases including installation and configuration.
  • Experience in creating Indexes for faster performance and views for controlling user access to data.

We'd love your feedback!