We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Lakewood, NJ

SUMMARY

  • 9 years of experience in IT industry with extensive experience in Java, J2ee and Big data technologies.
  • 4 +years working of exclusive experience on Big Data technologies and Hadoop stack
  • Strong experience working with HDFS, MapReduce, Spark. Hive, Pig, Sqoop, Flume, Kafka, Oozie and HBase.
  • Good understanding of distributed systems, HDFS architecture, internal working details of MapReduce and Spark processing frameworks.
  • More than two years of hands on experience using Spark framework with Scala.
  • Good exposure to performance tuning hive queries, map - reduce jobs, spark jobs.
  • Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using SQOOP.
  • Tuned PIG and HIVE scripts by understanding the joins, group and aggregation between them.
  • Extensively worked on HiveQL, join operations, writing custom UDF's and having good experience in optimizing Hive Queries.
  • Worked on various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to implement and make use of those.
  • Participated in design, development and system migration of high performance metadata driven data pipeline with Kafka and Hive/Presto on Qubole, providing data export capability through API and UI.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Hands on experience in NOSQL databases and SQL databases.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, Spark, Scala, MapReduce, HDFS, Hive, Pig, Sqoop.Flume, Kafka,HBase

Java Technologies: JSP, Servlets, Junit, Spring Hibernate

Database Technologies: MySQL, SQL server, Oracle, MS Access

Programming Languages: Scala, Python, Java and Linux shell scripting

Operating Systems: Windows, LINUX

PROFESSIONAL EXPERIENCE

Big Data Engineer

Confidential, Lakewood,NJ

Responsibilities:

  • Involved in requirements gathering and building data lake on top of hdfs.
  • Worked on Go-cd (ci/cd tool) to deploy application and have experience with Munin frame work for Bigdata Testing.
  • Involved in writing udfs in hive.
  • Worked extensively onAWSComponents such as Airflow, Elastic Map Reduce(EMR), Athena, Snow-Flake.
  • Developed SQOOP scripts to migrate data from Oracle to Big data Environment.
  • Extensively worked with Avro and Parquet files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in Spark.
  • Developed a Python Script to load the CSV files into the S3 buckets and createdAWSS3 buckets, performed folder management in each bucket, managed logs and objects within each bucket.
  • Created Hive DDL on Parquet and Avro data files residing in both HDFS and S3 bucket
  • Created Airflow Scheduling scripts in Python to automate the process of sqooping wide range of data sets.
  • Involved in file movements between HDFS andAWSS3 and extensively worked with S3 bucket inAWS
  • Created data partitions on large data sets in S3 and DDL on partitioned data.
  • Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
  • Extensively used Stash Git-Bucket for Code Control
  • Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and kibana.
  • Created data pipeline for different events of ingestion, aggregation and load consumer response data inAWSS3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
  • Worked with different file formats like Json, AVRO and parquet and compression techniques like snappy.
  • Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
  • Developed shell scripts for dynamic partitions adding to hive stage table, verifying Json schema change of source files, and verifying duplicate files in source location.
  • Converted Hive queries intoSparktransformations usingSparkRDDs.
  • Exploring with theSparkimproving the performance and optimization of the existing algorithms in Hadoop usingSparkContext,Spark-SQL, Data Frame, Pair RDD's,SparkYARN.
  • Worked with importing metadata into Hive using Python and migrated existing tables and applications to work onAWScloud(S3).
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive andAWScloud and making the data available in Athena and Snowflake.
  • Imported the data from different sources likeAWSS3, Local file system intoSparkRDD.

Environment: Spark,AWS, EC2, EMR, Hive,SQL Workbench, GenieLogs, Kibana, Sqoop,Spark SQL,Spark Streaming,Scala,Python

Big DataDeveloper

Confidential, Detroit,MI

Responsibilities:

  • Integrated Kafka withSparkStreaming for real time data processing
  • Experience in writingSparkapplications for Data validation, cleansing, transformations and custom aggregations.
  • Imported data from different sources intoSparkRDD for processing.
  • Developed custom aggregate functions usingSparkSQL and performed interactive querying.
  • Worked on installing cluster, commissioning & decommissioning of Data node, Name node high availability, capacity planning, and slots configuration.
  • DevelopedSparkapplications for the entire batch processing by using Scala.
  • Automatically scale-up the EMR instances based on the data.
  • Run and Schedule the Spark script in EMR pipes.
  • Utilizedsparkdata frame andsparksql extensively for all the processing
  • Experience in managing and reviewing Hadoop log files.
  • Experience in hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
  • Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
  • Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
  • Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters.
  • Installed and configured various components of Hadoop ecosystem.
  • Optimized HIVE analytics SQL queries, created tables/views, written custom UDFs and Hive based exception processing.
  • Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
  • Replaced default Derby metadata storage system for Hive with MySQL system.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
  • Configured Fair Scheduler to provide fair resources to all the applications across the cluster.

Environment: Hadoop (cloudera stack), Hue,Spark, Kafka, HBase, HDFS, Hive, Pig, Sqoop,Oracle

HadoopDeveloper

Confidential, Columbus,OH

Responsibilities:

  • Experience on AWS-EMR, Spark installation, HDFS and MapReduce Architecture.
  • Participated in Hadoop Deployment and infrastructure scaling.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Parsed high-level design spec to simple ETL coding and mapping standards.
  • Maintained warehouse metadata, naming standards and warehouse standards for future application development.
  • Worked with Linux systems and RDBMS database on a regular basis to ingest data using Sqoop.
  • Implemented Kafka consumers to move data from Kafka partitions into Cassandra for near real-time analysis.
  • Involved in Hadoop cluster task like adding and removing nodes.
  • Managed and reviewed Hadoop log files and loaded log data into HDFS using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries, Pig Scripts, Sqoop jobs.

Environment: Hadoop(Hortonworks stack), HDFS, Oozie, Pig, Hive, MapReduce, Sqoop, Cassandra, Linux.

HadoopDeveloper

Confidential, Denver, CO

Responsibilities:

  • Worked on analyzing, writing HadoopMapReduce jobs using JavaAPI, Pig and Hive.
  • Responsible for building scalable distributed data solutions usingHadoop.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using LZO, Snappy, and Bzip etc.
  • Analyze large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, HiveUDF, Pig, Sqoop, Zookeeper, &Spark.
  • Developed custom aggregate functions using Spark-SQL and performed interactive querying.
  • Used Scoop to store the data into HBase and Hive.
  • Worked on installing cluster, commissioning & decommissioning of DataNode, NameNode high availability, capacity planning, and slots configuration.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
  • Used Pig to parse the data and Store in Avro format.
  • Stored the data in tabular formats using Hive tables and Hive Serdes.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with NoSQL databases like HBase for creating HBase tables to load large sets of semi structured data coming from various sources.
  • Implemented a script to transmit information from Oracle to HBase using Sqoop.
  • Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • Fine-tuned Pig queries for better performance.
  • Involved in writing the shell scripts for exporting log files toHadoopcluster through automated process.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hadoop, MapReduce, HDFS, Yarn, Sqoop, Oozie, Pig, Hive, HBase,Java, Eclipse, UNIX shell scripting, python, Horton works.

JavaDeveloper

Confidential, Richmond, TX

Responsibilities:

  • Effectively interacted with team members and business users for requirements gathering.
  • Involved in analysis, design, and implementation phases of the software development lifecycle (SDLC).
  • Implementation of spring core J2EE patterns like MVC, Dependency Injection (DI), and Inversion of Control (IOC).
  • Implemented REST Web Services with Jersey API to deal with customer requests.
  • Developed test cases using J Unit and used Log4j as the logging framework.
  • Worked with HQL and Criteria API from retrieving the data elements from database.
  • Developed user interface using HTML, Spring Tags, JavaScript, JQuery, and CSS.
  • Developed the application using Eclipse IDE and worked under Agile Environment.
  • Design and implementation of front end web pages using CSS, JSP, HTML, java Script Ajax and, Struts
  • Utilized Eclipse IDE as improvement environment to plan, create and convey Spring segments on Web Logic

Environment: Java, J2EE, HTML, JavaScript, CSS, J Query, Spring 3.0, JNDI, Hibernate 3.0, Java Mail, Web Services, REST, Oracle 10g, JUnit, Log4j, Eclipse, Web logic 10.3.

JavaDeveloper

Confidential

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, analysis, design and development.
  • Involved in overall performance improvement by modifying third party open source tools like FCK Editor.
  • Developed Controllers for request handling using spring framework.
  • Involved in Command controllers, handler mappings and View Resolvers.
  • Designed and developed application components and architectural proof of concepts using Java, EJB, JSF, Struts, and AJAX.
  • Participated in Enterprise Integration experience web services
  • Configured JMS, MQ, EJB and Hibernate on Web sphere and JBoss
  • Focused on Declarative transaction management
  • Developed XML files for mapping requests to controllers
  • Extensively used Java Collection framework and Exception handling.

Environment: Core Java, XML, Servlets, Hibernate Criteria API, Web service, WSDL, XML,UML, EJB, Java script, JQuery, Hibernate, SQL, CVS, Agile, JUnit.

We'd love your feedback!