Hadoop Big Data/Spark Developer Resume Berkley Heights, NJ - Hire IT People

SUMMARY:

Over 5+ years of experience in IT and 2+ years of experience Hadoop/Big Data eco systems and Java technologies like HDFS, MapReduce, Apache Pig, Hive, Hbase, Spark Kafka and Sqoop.
In depth knowledge of Hadoop Architecture and Hadoop daemons such as Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker.
Experience in writing Map Reduce programs using Apache Hadoop for analyzing Big Data.
Hands on experience in writing Ad - hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
Experience in importing and exporting data using SQOOP from Relational Database Systems to HDFS.
Experience in writing Hadoop Jobs for analyzing data using Pig Latin Commands.
Good Knowledge of analyzing data in HBase using Hive and Pig.
Working Knowledge in NoSQL Databases like HBase and Cassandra.
Hands on Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Good Knowledge in Amazon AWS concepts like EMR, EC2, EBS, S3 and RDS web services which provides fast and efficient processing of Big Data.
Experience in Integrating BI tools like Tableau and pulling required data to in-memory of BI tool.
Experience in Launching EC2 instances in Amazon EMR using Console.
Extending Hive and PIG core functionality by writing custom UDFs like UDAFs and UDTFs.
Experience in administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig in Distributed Mode.
Experience in using Apache Flume for collecting, aggregating and moving large amounts of data from application servers.
Passionate towards working in Big Data and Analytics environment.
Knowledge on Reporting tools like Tableau which is used to do analytics on data in cloud.
Extensive experience with SQL, PL/SQL, Shell Scripting and database concepts.
Experience with front end technologies like HTML, CSS and JavaScript.
Experience in working with Windows, UNIX/LINUX platform with different technologies such as Big Data,SQL, XML, HTML, Core Java, Shell Scripting etc.

TECHNICAL SKILLS:

Database: DB2, MySQL, Oracle, MS SQL Server

Languages: Core Java, PIG Latin, SQL, Hive QL, Shell Scripting and XML

API s/Tools: NetBeans, Eclipse, MYSQL workbench, Visual Studio

Web Technologies: HTML, XML, JavaScript, CSS

BigData Ecosystem: HDFS, PIG, MAPREDUCE, HIVE, KAFKA,SQOOP, FLUME, HBase

Operating System: Unix, Linux, Windows XP

Visualization Tools: Tableau, Zeppelin

Virtualization Software: VMware, Oracle Virtual Box.

Cloud Computing Services: AWS (Amazon Web Services).

PROFESSIONAL EXPERIENCE:

Confidential, Berkley Heights, NJ

Hadoop Big Data/Spark Developer

Responsibilities:

Analyzing the requirement to setup a cluster.
Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
Created Hive queries to compare the raw data with EDW reference tables and performing aggregates
Importing and exporting data into HDFS and Hive using SQOOP.
Writing PIG scripts to process the data.
Developed and designed Hadoop, Spark and Java components.
Developed Spark programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
Explored the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, spark YARN and converted Hive queries into Spark transformations using Spark RDDs.
Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple companies.
Developed Unix/Linux Shell Scripts and PL/SQL procedures.
Worked towards creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
Installed and configured Hive and written Hive UDFs.
Involved in converting Map Reduce programs into Spark transformations using Spark RDD's using Scala and Python.
Involved in creating Hive tables, loading with data and writing hive queries using the HIVEQL which will run internally in MAPREDUCE way.
Loaded some of the data into Cassandra for fast retrieval of data.
Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports by our BI team.
Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
Implementation of Big Data solutions on the Hortonworks distribution and AWS Cloud platform.
Developed Pig Latin scripts for handling data formation.
Extracted the data from MySQL into HDFS using SQOOP.
Experience in managing and monitoring Hadoop cluster using Cloudera Manager.

Environment: Hadoop, Cloudera distribution, Hortonworks distribution, AWS, EMR, Azure cloud platform, HDFS, MapReduce, DocumentDB Unix Shell Scripting, Kafka, Pig, Hive, Sqoop, Flume, Oozie, Zoo keeper, Core Java, impala, HiveQL, Spark, UNIX/Linux Shell Scripting.

Confidential, Newark, CA

Big Data Developer

Reponsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Involved in loading data from LINUX file system to HDFS.
Working experience in HDFS Admin Shell commands.
Experience in ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
Understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node and Data Node concepts.
Developed Kafka producer and consumers, HBase clients, Apache Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD, Spark Streaming, Pair RDD Operations, Check-pointing, and SBT.
Used Kafka to transfer data from different data systems to HDFS.
Migrated complex map reduce programs into Spark RDD transformations, actions.
Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
Developed a script in Scala to read all the Parquet Tables in a Database and parse them as Json files, another script to parse them as structured tables in Hive.
Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
Used spark to parse XML files and extract values from tags and load it into multiple hive tables.
Experience on different Hadoop distribution Systems such as: Cloudera & Hortonworks
Hands on experience on Cassandra DB.
Analyzed large data sets by running Hive queries and Pig scripts.
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Hands on using SQOOP to import and export data into HDFS from RDBMS and vice-versa.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Used SQOOP, AVRO, HIVE, PIG, Java, MAPREDUCE daily to develop ETL, Batch Processing and data storage functionality.
Supported implementation and execution of MAPREDUCE programs in a cluster environment.

Environment: Hadoop, MapReduce, Hive,Pig, Hbase, Sqoop, Kafka, Cassandra, Flume, Java, SQL, Cloudera Manager, Eclipse, Unix Script, YARN.

Confidential, Columbus, OH

Hadoop Developer

Responsibilities:

Written MapReduce code to parse the data from various sources and storing parsed data into Hbase and Hive.
Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
Worked on a stand-alone as well as a distributed Hadoop application.
Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
Used Oozie and Zookeeper to automate the flow of jobs and coordination in the cluster respectively.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Extensive knowledge on PIG scripts using bags and tuples and Pig UDF'S to pre-process the data for analysis.
Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers.
Used Teradata to build Hadoop project and also as ETL project.
Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
Involved in writing query using Impala for better and faster processing of data.
Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
Experienced in migrating HiveQL into Impala to minimize query response time.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Involved in collecting and aggregating large amounts of log data using Apache and staging data in HDFS for further analysis.
Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.

Environment: HDFS, MapReduce, Python, CDH5, Hbase, NOSQL, Hive, Pig, Hadoop, Sqoop, Impala, Yarn, Shell Scripting, Ubuntu, Linux Red Hat.

Confidential

Java / Hadoop Developer

Reponsibilities:

Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
Collected the logs data from web servers and integrated in to HDFS using Flume.
Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
Worked on analyzing Hadoop cluster and different big data analytic tools including Map Reduce, Hive and Spark.
Developed the Map Reduce programs to parse the raw data and store the pre-Aggregated data in the portioned tables.
Involved in start to end process of Hadoop cluster installation, configuration and monitoring
Responsible for building scalable distributed data solutions using Hadoop and Involved in submitting and tracking Map Reduce jobs using Job Tracker.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Worked with HBase in creating tables to load large sets of semi structured data coming from various sources.
Created design documents and reviewed with team in addition to assisting the business analyst / project manager in explanations to line of business.
Responsible for understanding the scope of the project and requirement gathering.
Involved in analysis, design, construction and testing of the application
Developed the web tier using JSP to show account details and summary.
Designed and developed the UI using JSP, HTML, CSS and JavaScript.
Used Tomcat web server for development purpose.
Involved in creation of Test Cases for JUnit Testing.
Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
Developed application using Eclipse and used build and deploy tool as Maven.

Environment: Hadoop, HBase, HDFS, Pig Latin, Sqoop, Hive,Java, J2EE Servlet, JSP, JUnit, AJAX, XML, JavaScript, Maven, Eclipse, Apache Tomcat, and Oracle.

We provide IT Staff Augmentation Services!

Hadoop Big Data/spark Developer Resume

Berkley Heights, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship