Sr. Big Data Engineer Resume Lakewood, NJ - Hire IT People

SUMMARY OF QUALIFICATIONS:

IT Professional wif 8+ years of referable expertise Java, J2EE, BigData technologies and Hadoop Stack, Spark Framework wif Scala.
Strong experience on HDFS, MapReduce, Spark. Hive, Pig, Sqoop, Flume, Kafka, Oozie and HBase.
Good understanding of distributed systems, HDFS architecture, internal working details of MapReduce and Spark processing frameworks.
Good exposure to performance tuning hive queries, map - reduce jobs, spark jobs.
Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using SQOOP.
Extensive experience on HiveQL, join operations, writing custom UDF's and having good experience in optimizing Hive Queries.
Experience on various Hadoop Distributions (Cloudera, Hortonworks, and Amazon (AWS) to implement and make use of those.
Proficient in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
Hands on experience in NOSQL databases and SQL databases.
Versatile team player wif excellent analytical and inter personal skills wif an ability to quickly adapt to new technologies and project environments.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, Spark, Scala, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Kafka, HBase

Java Technologies: JSP, Servlets, Junit, Spring Hibernate

Database Technologies: MySQL, SQL server, Oracle, MS Access

Programming Languages: Scala, Python, Java and Linux shell scripting

Operating Systems: Windows, LINUX

PROFESSIONAL EXPERIENCE:

Confidential - Lakewood, NJ

Sr. Big Data Engineer

Roles & Responsibilities:

Involved in requirements gathering and building data lake on top of HDFS.
Worked on Go-cd (ci/cd tool) to deploy application and has experience wif Munin frame work for BigData Testing.
Extensively involved in writing UDFS in Hive.
Worked onAWSComponents such as Airflow, Elastic Map Reduce (EMR), Athena and Snow-Flake.
Developed SQOOP scripts to migrate data from Oracle to Big data Environment.
Extensively worked wif Avro and Parquet files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in Spark.
Developed a Python Script to load the CSV files into the S3 buckets and createdAWSS3 buckets, performed folder management in each bucket, managed logs and objects wifin each bucket.
Created Hive DDL on Parquet and Avro data files residing in both HDFS and S3 bucket
Created Airflow Scheduling scripts in Python to automate the process of Sqooping wide range of data sets.
Involved in file movements between HDFS andAWSS3 and extensively worked wif S3 bucket inAWS
Created data partitions on large data sets in S3 and DDL on partitioned data.
Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
Extensively used Stash Git-Bucket for Code Control
Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and Kibana.
Created data pipeline for different events of ingestion, aggregation and load consumer response data inAWSS3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
Worked wif different file formats like JSon, AVRO and parquet and compression techniques like snappy.
Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
Developed shell scripts for dynamic partitions adding to hive stage table, verifying JSon schema change of source files, and verifying duplicate files in source location.
Converted Hive queries intoSparktransformations usingSparkRDDs.
Explored theSparkfor improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's,SparkYARN.
Worked wif importing metadata into Hive using Python and migrated existing tables and applications to work onAWScloud (S3).
Worked extensively wif importing metadata into Hive and migrated existing tables and applications to work on Hive andAWScloud and making the data available in Athena and Snowflake.
Imported the data from different sources likeAWSS3, Local file system intoSparkRDD.

Environment: Spark, AWS, EC2, EMR, Hive, SQL Workbench, Genie Logs, Kibana, Sqoop, Spark SQL, Spark Streaming, Scala, Python

Confidential - Detroit, MI

Sr. BigData Developer

Roles & Responsibilities:

Integrated Kafka wifSparkStreaming for real time data processing
Wrote Sparkapplications for Data validation, cleansing, transformations and custom aggregations.
Imported data from different sources intoSparkRDD for processing.
Developed custom aggregate functions usingSparkSQL and performed interactive querying.
Worked on installing cluster, commissioning & decommissioning of Data node, Name node high availability, capacity planning, and slots configuration.
DevelopedSparkapplications for the entire batch processing by using Scala.
Automatically scale-up the EMR instances based on the data.
Run and Schedule the Spark script in EMR pipes.
Utilizedsparkdata frame andSparkSQL extensively for all the processing
Involved in managing and reviewing Hadoop log files.
Worked on Hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
Perform maintenance, monitoring, deployments, and upgrades across infrastructure dat supports all our Hadoop clusters.
Installed and configured various components of Hadoop ecosystem.
Optimized HIVE analytics SQL queries, created tables/views, written custom UDFs and Hive based exception processing.
Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
Replaced default Derby metadata storage system for Hive wif MySQL system.
Supported in setting up QA environment and updating configurations for implementing scripts wif Pig.
Configured Fair Scheduler to provide fair resources to all the applications across the cluster.

Environment: Hadoop (Cloudera Stack), Hue,Spark, Kafka, HBase, HDFS, Hive, Pig, Sqoop, Oracle

Confidential - Columbus, OH

Sr. Hadoop Developer

Roles & Responsibilities:

Worked on AWS-EMR, Spark installation, HDFS and MapReduce Architecture.
Participated in Hadoop Deployment and infrastructure scaling.
Involved in creating Hive tables, and loading and analyzing data using hive queries.
Developed Simple to complex Map Reduce Jobs using Hive and Pig.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing wif Pig.
Parsed high-level design spec to simple ETL coding and mapping standards.
Maintained warehouse metadata, naming standards and warehouse standards for future application development.
Worked wif Linux systems and RDBMS database on a regular basis to ingest data using Sqoop.
Implemented Kafka consumers to move data from Kafka partitions into Cassandra for near real-time analysis.
Involved in Hadoop cluster task like adding and removing nodes.
Managed and reviewed Hadoop log files and loaded log data into HDFS using Sqoop.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries, Pig Scripts, Sqoop jobs.

Environment: Hadoop (Hortonworks stack), HDFS, Oozie, Pig, Hive, MapReduce, Sqoop, Cassandra, Linux

Confidential - Denver, CO

Hadoop Developer

Roles & Responsibilities:

Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive.
Responsible for building scalable distributed data solutions usingHadoop.
Involved in loading data from edge node to HDFS using shell scripting.
Created HBase tables to store variable data formats of PII data coming from different portfolios.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Worked wif using different kind of compression techniques to save data and optimize data transfer over network using LZO, Snappy, and Bzip etc.
Analyze large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, &Spark.
Developed custom aggregate functions using Spark-SQL and performed interactive querying.
Used Scoop to store the data into HBase and Hive.
Worked on installing cluster, commissioning & decommissioning of DataNode, NameNode high availability, capacity planning, and slots configuration.
Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
Used Pig to parse the data and Store in Avro format.
Stored the data in tabular formats using Hive tables and Hive Serdes.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Worked wif NoSQL databases like HBase for creating HBase tables to load large sets of semi structured data coming from various sources.
Implemented a script to transmit information from Oracle to HBase using Sqoop.
Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
Fine-tuned Pig queries for better performance.
Involved in writing the shell scripts for exporting log files toHadoopcluster through automated process.
Installed Oozie workflow engine to run multiple Hive and pig jobs.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment:Hadoop, MapReduce, HDFS, Yarn, Sqoop, Oozie, Pig, Hive, HBase, Java, Eclipse, UNIX shell scripting, python, Horton Works

Confidential - Richmond, TX

Java Developer

Roles & Responsibilities:

Effectively interacted wif team members and business users for requirements gathering.
Involved in analysis, design, and implementation phases of the software development lifecycle (SDLC).
Implementation of spring core J2EE patterns like MVC, Dependency Injection (DI), and Inversion of Control (IOC).
Implemented REST Web Services wif Jersey API to deal wif customer requests.
Developed test cases using J Unit and used Log4j as the logging framework.
Worked wif HQL and Criteria API from retrieving the data elements from database.
Developed user interface using HTML, Spring Tags, JavaScript, JQuery, and CSS.
Developed the application using Eclipse IDE and worked under Agile Environment.
Design and implementation of front end web pages using CSS, JSP, HTML, java Script Ajax and, Struts
Utilized Eclipse IDE as improvement environment to plan, create and convey Spring segments on Web Logic

Environment: Java, J2EE, HTML, JavaScript, CSS, J Query, Spring 3.0, JNDI, Hibernate 3.0, Java Mail, Web Services, REST, Oracle 10g, JUnit, Log4j, Eclipse, Web logic 10.3.

Confidential

Java Developer

Roles & Responsibilities:

Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, analysis, design and development.
Involved in overall performance improvement by modifying third party open source tools like FCK Editor.
Developed Controllers for request handling using spring framework.
Involved in Command controllers, handler mappings and View Resolvers.
Designed and developed application components and architectural proof of concepts using Java, EJB, JSF, Struts, and AJAX.
Participated in Enterprise Integration experience web services
Configured JMS, MQ, EJB and Hibernate on Web sphere and JBoss
Focused on Declarative transaction management
Developed XML files for mapping requests to controllers
Extensively used Java Collection framework and Exception handling.

Environment: Core Java, XML, Servlets, Hibernate Criteria API, Web service, WSDL, XML,UML, EJB, Java script, JQuery, Hibernate, SQL, CVS, Agile, JUnit

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Lakewood, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship