Big Data & Spark Developer Resume
Florham Park, NJ
PROFESSIONAL SUMMARY:
- 8+ years of experience in IT industry which above 3+ years of experience in Big Data implementing complete Hadoop solutions along with 5 years of experience in Java.
- Good working experience in using Apache Hadoop eco system components like MapReduce, HDFS, Hive, Sqoop, Pig, Oozie, Flume, HBase, Spark, Storm, Kafka, Scala and Zoo Keeper.
- Writing UDFs and integrating with Hive and Pig.
- Experience with Sequence files, AVRO and ORC file formats and compression.
- Experience in Hadoop Distributions: Cloudera and Hortonworks,
- Performed importing and exporting data into HDFS and Hive using Sqoop.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Extensive knowledge in using SQL Queries for backend database analysis.
- Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice - versa.
- Led many Data Analysis & Integration efforts involving HADOOP along with ETL.
- Hands on experience on Enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
- Extensive experience with SQL, PL/SQL and database concepts.
- Transferred bulk data from RDBMS systems like Teradata into HDFS using Sqoop.
- Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
- Well-versed in Agile, other SDLC methodologies and can coordinate with owners and SMEs.
- Worked on different operating systems like UNIX, Linux, and Windows
- Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, Enterprise Edition (Java EE), Enterprise Java Bean (EJB), JavaServer Pages (JSP), Java Servlets (including JNDI), Struts, and Java database Connectivity (JDBC) technologies.
- Fluid understanding of multiple programming languages, including C#, C, C++, JavaScript, HTML, and XML.
- Experience in web application design using open source MVC, Spring and Struts Frameworks.
TECHNICAL SKILLS:
Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Scala, HBase, Oozie, Flume, Zookeeper
DB Languages: SQL, PL/SQL, Oracle
Databases: Oracle 11g/10g, MySQL, Teradata, HBase, Cassandra, MongoDB
Programming Languages: Java, JavaScript, Java Beans, JSP, C, HTML, XML, Python, Spark SQL and Scala
Frameworks: JSF, J2EE, Apache Struts
Scripting Languages: JSP & Servlets, JavaScript, Python and HTML
Tools: Eclipse, Net Beans.
Application Servers: Apache Tomcat, WebSphere, Sun Java Enterprise System, JES
Methodologies: Agile and Waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Florham Park, NJ
Big Data & Spark Developer
Responsibilities:
- Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Developed Spark API to import data into HDFS from Teradata and created Hive tables.
- Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL using Python and Scala.
- Involved in performance tuning of Hive from design, storage and query perspectives.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Integrated Hive and Tableau Desktop reports and published to Tableau Server.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Orchestrated number of Sqoop, Hive scripts using Oozie workflow, and scheduled using Oozie coordinator.
- Used Jira for bug tracking, Bitbucket to check-in, and checkout code changes.
Environment: HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL Linux Shell Scripting, Cloudera.
Confidential, Florham Park, NJ
Big Data Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
- Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Involved in loading data from LINUX file system to HDFS.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Hands on experience in setting up HBase Column based storage repository for archiving and retro data.
- Responsible for creating Hive tables based on business requirements.
- Used Enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in data modeling and sharding and replication strategies in Cassandra.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
- Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Apache Hadoop 2x, Cloudera, HDFS, MapReduce, Hortonworks, Hive, Pig, HBase, Spark, Scala, Sqoop, Kafka, FLUME, Cassandra, Oracle 11g/10g, Linux, XMLMYSQL.
Confidential
Big Data Developer
Responsibilities:
- Understanding business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms.
- Optimizing Hadoop MapReduce code, Hive and Pig scripts for better scalability, reliability and performance.
- Developed the OOZIE workflows for the Application execution.
- Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
- Writing Pig scripts for data processing.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Implemented Hive tables and HQL Queries for the reports.
- Imported data from Cassandra into HDFS using Mongo export utility.
- Involved in developing shell scripts and automated data management from end to end integration work
- Experience in performing data validation using HIVE dynamic partitioning and bucketing.
- Written and used complex data type in storing and retrieved data using HQL in Hive.
- Developed Hive queries to analyze reducer output data.
- Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
- Highly involved in designing the next generation data architecture for the Unstructured data.
- Developed PIG Latin scripts to extract data from source system.
- Created and maintained technical documentation for executing Hive queries and Pig scripts.
- Involved in Extracting, loading Data from Hive to Load an RDBMS using Sqoop
Environment: HDFS, Map Reduce, MySQL, Cassandra, Hive, HBase, Oozie, PIG, ETL, Hortonworks (HDP 2.0), Shell Scripting, Linux, Sqoop, Flume and Oracle 11g.
Confidential
Hadoop Developer
Responsibilities:
- Analyzing the system requirement including Hadoop Cluster and HBase
- Moving log files to HDFS
- Analyzing the structure of the log files
- Writing a map reduce program to parse and convert it into structured key value format
- Inserting the structured data into HBase Table in the form of key value pair.
- Analyzing the results.
Environment: Hadoop, Map Reducer, HDFS, Hive, Pig, Sqoop, Spark, Kafka, Oozie, Zookeeper
Confidential
Sr. Java Developer
Responsibilities:
- Gathered business requirements from Kia.com and prepared functional specifications.
- Prepared Technical specs based on the functional document.
- Follow Agile scrum methodology for the project development.
- Integrated services with various clients using SOAP interfaces.
- Developed shell script for dealer data transfer to various vendors viz. KBB, Edmunds.
- Provided support for new vendor to integrate their application for sending leads into system.
- Developed free marker template to send email to dealers.
- Built and deployed Java applications into multiple Unix based environments.
- Provided recommendations on, best practices, exception handling,and identifying and fixing potential memory, performance, and transactional issues.
- Used Drools Rule engine to implement business validation for lead processing.
- Used Java Collection e.g. blocking queue, Hash Map for orchestrating the lead data.
- Wrote package, stored procedure, synonym for reporting module, database import export in Postgres.
- Coordinated the effort to move the infrastructure from dedicated environment to Rackspace cloud.
- Worked with SEO team to develop algorithm to calculate lead close rate.
- Applied design patterns and OO design conceptsto improve the existing Java/JEE based code base.
- Developed lead Score algorithm in java for multithread environment.
- Configured apache mod jk for load balancing on application server.
- Developed Restful endpoint for redesign of the application.
- Provided post production support for the application.
Environment: Java6, JSP, Spring IOC, Spring, Apache Webserver, Postgres9, Shell Script, Maven, Ant, Shell Scripting, JSP, JDBC, Hibernate, XML, JBoss, UNIX, PL/SQL & Agile.
Confidential
Java Developer
Responsibilities:
- Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
- Prepared the High and Low-level design document and Generating Digital Signature
- For analysis and design of application created Use Cases, Class and Sequence Diagrams.
- For the registration and validation of the enrolling customer developed logic and code.
- Developed web-based user interfaces using struts frame work.
- Handled Client-Side Validations used JavaScript
- Wrote SQL queries, stored procedures and enhanced performance by running explain plans.
- Involved in integration of various Struts actions in the framework.
- Used Validation Framework for Server-side Validations
- Created test cases for the Unit and Integration testing.
- Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.
Environment: Java Servlets, JSP, Java Script, Web Services, XML, HTML, UML, Apache Tomcat, JDBC, Oracle, SQL.