Big Data & Spark Developer Resume Florham Park, NJ - Hire IT People

PROFESSIONAL SUMMARY:

8+ years of experience in IT industry which above 3+ years of experience in Big Data implementing complete Hadoop solutions along with 5 years of experience in Java.
Good working experience in using Apache Hadoop eco system components like MapReduce, HDFS, Hive, Sqoop, Pig, Oozie, Flume, HBase, Spark, Storm, Kafka, Scala and Zoo Keeper.
Writing UDFs and integrating with Hive and Pig.
Experience with Sequence files, AVRO and ORC file formats and compression.
Experience in Hadoop Distributions: Cloudera and Hortonworks,
Performed importing and exporting data into HDFS and Hive using Sqoop.
Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Extensive knowledge in using SQL Queries for backend database analysis.
Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice - versa.
Led many Data Analysis & Integration efforts involving HADOOP along with ETL.
Hands on experience on Enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
Extensive experience with SQL, PL/SQL and database concepts.
Transferred bulk data from RDBMS systems like Teradata into HDFS using Sqoop.
Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
Well-versed in Agile, other SDLC methodologies and can coordinate with owners and SMEs.
Worked on different operating systems like UNIX, Linux, and Windows
Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, Enterprise Edition (Java EE), Enterprise Java Bean (EJB), JavaServer Pages (JSP), Java Servlets (including JNDI), Struts, and Java database Connectivity (JDBC) technologies.
Fluid understanding of multiple programming languages, including C#, C, C++, JavaScript, HTML, and XML.
Experience in web application design using open source MVC, Spring and Struts Frameworks.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Scala, HBase, Oozie, Flume, Zookeeper

DB Languages: SQL, PL/SQL, Oracle

Databases: Oracle 11g/10g, MySQL, Teradata, HBase, Cassandra, MongoDB

Programming Languages: Java, JavaScript, Java Beans, JSP, C, HTML, XML, Python, Spark SQL and Scala

Frameworks: JSF, J2EE, Apache Struts

Scripting Languages: JSP & Servlets, JavaScript, Python and HTML

Tools: Eclipse, Net Beans.

Application Servers: Apache Tomcat, WebSphere, Sun Java Enterprise System, JES

Methodologies: Agile and Waterfall

PROFESSIONAL EXPERIENCE:

Confidential, Florham Park, NJ

Big Data & Spark Developer

Responsibilities:

Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
Developed Spark API to import data into HDFS from Teradata and created Hive tables.
Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL using Python and Scala.
Involved in performance tuning of Hive from design, storage and query perspectives.
Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
Developed Spark core and Spark SQL scripts using Scala for faster data processing.
Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
Integrated Hive and Tableau Desktop reports and published to Tableau Server.
Developed shell scripts for running Hive scripts in Hive and Impala.
Orchestrated number of Sqoop, Hive scripts using Oozie workflow, and scheduled using Oozie coordinator.
Used Jira for bug tracking, Bitbucket to check-in, and checkout code changes.

Environment: HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL Linux Shell Scripting, Cloudera.

Confidential, Florham Park, NJ

Big Data Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop
Imported data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
Involved in loading data from LINUX file system to HDFS.
Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
Hands on experience in setting up HBase Column based storage repository for archiving and retro data.
Responsible for creating Hive tables based on business requirements.
Used Enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Involved in data modeling and sharding and replication strategies in Cassandra.
Load the data into Spark RDD and do in memory data Computation to generate the Output response.
Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Apache Hadoop 2x, Cloudera, HDFS, MapReduce, Hortonworks, Hive, Pig, HBase, Spark, Scala, Sqoop, Kafka, FLUME, Cassandra, Oracle 11g/10g, Linux, XMLMYSQL.

Confidential

Big Data Developer

Responsibilities:

Understanding business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms.
Optimizing Hadoop MapReduce code, Hive and Pig scripts for better scalability, reliability and performance.
Developed the OOZIE workflows for the Application execution.
Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
Writing Pig scripts for data processing.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Implemented Hive tables and HQL Queries for the reports.
Imported data from Cassandra into HDFS using Mongo export utility.
Involved in developing shell scripts and automated data management from end to end integration work
Experience in performing data validation using HIVE dynamic partitioning and bucketing.
Written and used complex data type in storing and retrieved data using HQL in Hive.
Developed Hive queries to analyze reducer output data.
Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
Highly involved in designing the next generation data architecture for the Unstructured data.
Developed PIG Latin scripts to extract data from source system.
Created and maintained technical documentation for executing Hive queries and Pig scripts.
Involved in Extracting, loading Data from Hive to Load an RDBMS using Sqoop

Environment: HDFS, Map Reduce, MySQL, Cassandra, Hive, HBase, Oozie, PIG, ETL, Hortonworks (HDP 2.0), Shell Scripting, Linux, Sqoop, Flume and Oracle 11g.

Confidential

Hadoop Developer

Responsibilities:

Analyzing the system requirement including Hadoop Cluster and HBase
Moving log files to HDFS
Analyzing the structure of the log files
Writing a map reduce program to parse and convert it into structured key value format
Inserting the structured data into HBase Table in the form of key value pair.
Analyzing the results.

Environment: Hadoop, Map Reducer, HDFS, Hive, Pig, Sqoop, Spark, Kafka, Oozie, Zookeeper

Confidential

Sr. Java Developer

Responsibilities:

Gathered business requirements from Kia.com and prepared functional specifications.
Prepared Technical specs based on the functional document.
Follow Agile scrum methodology for the project development.
Integrated services with various clients using SOAP interfaces.
Developed shell script for dealer data transfer to various vendors viz. KBB, Edmunds.
Provided support for new vendor to integrate their application for sending leads into system.
Developed free marker template to send email to dealers.
Built and deployed Java applications into multiple Unix based environments.
Provided recommendations on, best practices, exception handling,and identifying and fixing potential memory, performance, and transactional issues.
Used Drools Rule engine to implement business validation for lead processing.
Used Java Collection e.g. blocking queue, Hash Map for orchestrating the lead data.
Wrote package, stored procedure, synonym for reporting module, database import export in Postgres.
Coordinated the effort to move the infrastructure from dedicated environment to Rackspace cloud.
Worked with SEO team to develop algorithm to calculate lead close rate.
Applied design patterns and OO design conceptsto improve the existing Java/JEE based code base.
Developed lead Score algorithm in java for multithread environment.
Configured apache mod jk for load balancing on application server.
Developed Restful endpoint for redesign of the application.
Provided post production support for the application.

Environment: Java6, JSP, Spring IOC, Spring, Apache Webserver, Postgres9, Shell Script, Maven, Ant, Shell Scripting, JSP, JDBC, Hibernate, XML, JBoss, UNIX, PL/SQL & Agile.

Confidential

Java Developer

Responsibilities:

Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
Prepared the High and Low-level design document and Generating Digital Signature
For analysis and design of application created Use Cases, Class and Sequence Diagrams.
For the registration and validation of the enrolling customer developed logic and code.
Developed web-based user interfaces using struts frame work.
Handled Client-Side Validations used JavaScript
Wrote SQL queries, stored procedures and enhanced performance by running explain plans.
Involved in integration of various Struts actions in the framework.
Used Validation Framework for Server-side Validations
Created test cases for the Unit and Integration testing.
Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.

Environment: Java Servlets, JSP, Java Script, Web Services, XML, HTML, UML, Apache Tomcat, JDBC, Oracle, SQL.

We provide IT Staff Augmentation Services!

Big Data & Spark Developer Resume

Florham Park, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship