We provide IT Staff Augmentation Services!

Big Data & Spark Developer Resume

5.00/5 (Submit Your Rating)

Florham Park, NJ

PROFESSIONAL SUMMARY:

  • 8+ years of experience in IT industry which above 3+ years of experience in Big Data implementing complete Hadoop solutions along with 5 years of experience in Java.
  • Good working experience in using Apache Hadoop eco system components like MapReduce, HDFS, Hive, Sqoop, Pig, Oozie, Flume, HBase, Spark, Storm, Kafka, Scala and Zoo Keeper.
  • Writing UDFs and integrating with Hive and Pig.
  • Experience with Sequence files, AVRO and ORC file formats and compression.
  • Experience in Hadoop Distributions: Cloudera and Hortonworks,
  • Performed importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Extensive knowledge in using SQL Queries for backend database analysis.
  • Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice - versa.
  • Led many Data Analysis & Integration efforts involving HADOOP along with ETL.
  • Hands on experience on Enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
  • Extensive experience with SQL, PL/SQL and database concepts.
  • Transferred bulk data from RDBMS systems like Teradata into HDFS using Sqoop.
  • Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
  • Well-versed in Agile, other SDLC methodologies and can coordinate with owners and SMEs.
  • Worked on different operating systems like UNIX, Linux, and Windows
  • Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, Enterprise Edition (Java EE), Enterprise Java Bean (EJB), JavaServer Pages (JSP), Java Servlets (including JNDI), Struts, and Java database Connectivity (JDBC) technologies.
  • Fluid understanding of multiple programming languages, including C#, C, C++, JavaScript, HTML, and XML.
  • Experience in web application design using open source MVC, Spring and Struts Frameworks.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Scala, HBase, Oozie, Flume, Zookeeper

DB Languages: SQL, PL/SQL, Oracle

Databases: Oracle 11g/10g, MySQL, Teradata, HBase, Cassandra, MongoDB

Programming Languages: Java, JavaScript, Java Beans, JSP, C, HTML, XML, Python, Spark SQL and Scala

Frameworks: JSF, J2EE, Apache Struts

Scripting Languages: JSP & Servlets, JavaScript, Python and HTML

Tools: Eclipse, Net Beans.

Application Servers: Apache Tomcat, WebSphere, Sun Java Enterprise System, JES

Methodologies: Agile and Waterfall

PROFESSIONAL EXPERIENCE:

Confidential, Florham Park, NJ

Big Data & Spark Developer

Responsibilities:

  • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Developed Spark API to import data into HDFS from Teradata and created Hive tables.
  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL using Python and Scala.
  • Involved in performance tuning of Hive from design, storage and query perspectives.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Integrated Hive and Tableau Desktop reports and published to Tableau Server.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Orchestrated number of Sqoop, Hive scripts using Oozie workflow, and scheduled using Oozie coordinator.
  • Used Jira for bug tracking, Bitbucket to check-in, and checkout code changes.

Environment: HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL Linux Shell Scripting, Cloudera.

Confidential, Florham Park, NJ

Big Data Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
  • Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
  • Involved in loading data from LINUX file system to HDFS.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
  • Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Hands on experience in setting up HBase Column based storage repository for archiving and retro data.
  • Responsible for creating Hive tables based on business requirements.
  • Used Enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in data modeling and sharding and replication strategies in Cassandra.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
  • Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Apache Hadoop 2x, Cloudera, HDFS, MapReduce, Hortonworks, Hive, Pig, HBase, Spark, Scala, Sqoop, Kafka, FLUME, Cassandra, Oracle 11g/10g, Linux, XMLMYSQL.

Confidential

Big Data Developer

Responsibilities:

  • Understanding business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms.
  • Optimizing Hadoop MapReduce code, Hive and Pig scripts for better scalability, reliability and performance.
  • Developed the OOZIE workflows for the Application execution.
  • Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
  • Writing Pig scripts for data processing.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Implemented Hive tables and HQL Queries for the reports.
  • Imported data from Cassandra into HDFS using Mongo export utility.
  • Involved in developing shell scripts and automated data management from end to end integration work
  • Experience in performing data validation using HIVE dynamic partitioning and bucketing.
  • Written and used complex data type in storing and retrieved data using HQL in Hive.
  • Developed Hive queries to analyze reducer output data.
  • Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
  • Highly involved in designing the next generation data architecture for the Unstructured data.
  • Developed PIG Latin scripts to extract data from source system.
  • Created and maintained technical documentation for executing Hive queries and Pig scripts.
  • Involved in Extracting, loading Data from Hive to Load an RDBMS using Sqoop

Environment: HDFS, Map Reduce, MySQL, Cassandra, Hive, HBase, Oozie, PIG, ETL, Hortonworks (HDP 2.0), Shell Scripting, Linux, Sqoop, Flume and Oracle 11g.

Confidential

Hadoop Developer

Responsibilities:

  • Analyzing the system requirement including Hadoop Cluster and HBase
  • Moving log files to HDFS
  • Analyzing the structure of the log files
  • Writing a map reduce program to parse and convert it into structured key value format
  • Inserting the structured data into HBase Table in the form of key value pair.
  • Analyzing the results.

Environment: Hadoop, Map Reducer, HDFS, Hive, Pig, Sqoop, Spark, Kafka, Oozie, Zookeeper

Confidential

Sr. Java Developer

Responsibilities:

  • Gathered business requirements from Kia.com and prepared functional specifications.
  • Prepared Technical specs based on the functional document.
  • Follow Agile scrum methodology for the project development.
  • Integrated services with various clients using SOAP interfaces.
  • Developed shell script for dealer data transfer to various vendors viz. KBB, Edmunds.
  • Provided support for new vendor to integrate their application for sending leads into system.
  • Developed free marker template to send email to dealers.
  • Built and deployed Java applications into multiple Unix based environments.
  • Provided recommendations on, best practices, exception handling,and identifying and fixing potential memory, performance, and transactional issues.
  • Used Drools Rule engine to implement business validation for lead processing.
  • Used Java Collection e.g. blocking queue, Hash Map for orchestrating the lead data.
  • Wrote package, stored procedure, synonym for reporting module, database import export in Postgres.
  • Coordinated the effort to move the infrastructure from dedicated environment to Rackspace cloud.
  • Worked with SEO team to develop algorithm to calculate lead close rate.
  • Applied design patterns and OO design conceptsto improve the existing Java/JEE based code base.
  • Developed lead Score algorithm in java for multithread environment.
  • Configured apache mod jk for load balancing on application server.
  • Developed Restful endpoint for redesign of the application.
  • Provided post production support for the application.

Environment: Java6, JSP, Spring IOC, Spring, Apache Webserver, Postgres9, Shell Script, Maven, Ant, Shell Scripting, JSP, JDBC, Hibernate, XML, JBoss, UNIX, PL/SQL & Agile.

Confidential

Java Developer

Responsibilities:

  • Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
  • Prepared the High and Low-level design document and Generating Digital Signature
  • For analysis and design of application created Use Cases, Class and Sequence Diagrams.
  • For the registration and validation of the enrolling customer developed logic and code.
  • Developed web-based user interfaces using struts frame work.
  • Handled Client-Side Validations used JavaScript
  • Wrote SQL queries, stored procedures and enhanced performance by running explain plans.
  • Involved in integration of various Struts actions in the framework.
  • Used Validation Framework for Server-side Validations
  • Created test cases for the Unit and Integration testing.
  • Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.

Environment: Java Servlets, JSP, Java Script, Web Services, XML, HTML, UML, Apache Tomcat, JDBC, Oracle, SQL.

We'd love your feedback!