We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

Parsippany, NJ

PROFESSIONAL SUMMARY:

  • Around 7 years of IT experience in software design and development with strong background in Bigdata Analytics , Hadoop, Spark and NOSQL Databases .
  • Well - versed in all Hadoop ecosystem components like Pig, Hive, Sqoop, Flume, Kafka and HBase .
  • Worked with all kinds of data - structured, semi-structured and unstructured.
  • Experienced in working with Spark RDDs and Data Frames to enable in-memory data processing by executing appropriate actions and transformations.
  • Skilled at cleaning, processing, analysis, visualization and predictive modeling of data.
  • Experienced in building highly scalable Big-data solutions using Hadoop multiple distributions i.e., Cloudera, Hortonworks and NoSQL platforms (HBase, Mongo DB & Cassandra).
  • Experience in Software development life cycle (SDLC) for various applications including Analysis, Design, Development, Implementation, Maintenance and Support.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, flume, Kafka, Spark, Impala, Cassandra in both Cloudera and Hortonworks environments.
  • Experience in Development, analysis and design of ETL methodologies in all the phases of Data Warehousing life cycle.
  • More than one year of hands on experience using Spark framework with Scala.
  • Good exposure to performance tuning hive queries and map-reduce jobs in spark framework.
  • Experience in developing Map Reduce jobs in Java for data cleaning, transformations, pre-processing and analysis.
  • Good understanding of Machine Learning, Data Mining and Algorithms.
  • Analyzing Streaming data and identifying important trends in data for further analysis using Spark Streaming .
  • Experienced in ETL methodology for performing Data Migration, Data Profiling, Extraction, Transformation and Loading using Talend and designed data conversions from large variety of source systems including Oracle DB2, SQL server, Hive , and non-relational sources like flat files, XML and Mainframe files.
  • End to end experience in designing and data visualizations using Tableau .
  • Designed, developed and implemented several List, Chart and Crosstab, master-detail reports and Tableau Business Intelligence reports .
  • Developed core modules in large cross-platform applications using JAVA, J2EE, Hibernate, Spring, JSP, Servlets, EJB, JDBC, JavaScript, XML and HTML.
  • Hands on experience spinning up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
  • Experience as a Java Developer in Web/intranet, client/server technologies using Java, J2EE, JDBC and SQL.
  • Familiar with Java virtual machine (JVM) and multi-threaded processing.
  • Strong knowledge in data modelling, effort estimation, ETL Design, development, system testing, implementation and production support. Experience in resolving on-going maintenance issues and bug fixes.
  • Ability to adapt to evolving technology, strong sense of responsibility and .

TECHNICAL SKILLS:

Hadoop/Bigdata Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Impala, Apache Spark, Spark Streaming, Spark-SQL, Hue, Ambari, Windows Azure

Programming Languages: Java, C, SCALA, SQL, PL/SQL, PIG-Latin, HQL

Databases: Oracle, MySQL, PostgreSQL, MongoDB, HBase, Cassandra

IDE Tools: Eclipse, IntelliJ

Framework: Hibernate, Spring, Struts, Junit

Web Technologies: HTML5, CSS3, JavaScript, JSP, Servlets, JNDI, JDBC, Java Beans

Web Services: SOAP, REST, WSDL, JAXB, and JAXP

PROFESSIONAL EXPERIENCE:

Hadoop/Spark Developer

Confidential, Parsippany, NJ

Responsibilities:

  • Involved in various stages of project data flow such as control validation, data quality and change data capture.
  • Performed data mining tasks depending on business scenarios.
  • Experience with Cloudera distribution of Hadoop (CDH 5.10).
  • Experience with configuration of Hadoop Ecosystem components: Map Reduce, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Flume, Yarn.
  • Experienced in the entire Software development life cycle (SDLC) in the project including Analysis, Design, Development, Implementation, Maintenance and Support.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Wrote SQL stored procedure in Hue to access the data from Hive.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis and Worked with Apache Spark.
  • Loaded the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Created Hive tables integrated them as per the design using parquet file format.
  • Handled Delta processing or incremental updates using Hive .
  • Executed Dynamic Partitioning in Hive to segregate customer database based on age
  • Designed and developed Pig Latin scripts and pig command line transformations for data.
  • Used Kafka for logging the status of various jobs.
  • Involved in writing various joins in MySQL depending on client requirement.
  • Developed Hive scripts for analyst requirements for analysis.
  • Stored data in hive and enabled end users to access through Impala.
  • Exported data from RDBMS to HDFS and vice versa using Sqoop.
  • Created partitioned and bucketed tables in Hive based on the hierarchy of the dataset.
  • Created several UDFs in Pig and Hive to give additional support for the project.
  • Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
  • Good understanding on Spark SQL , Spark Transformation Engine and Spark Streaming.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the hive and Map Side joins.
  • Involved in cluster maintenance and monitoring.
  • Have experience in Scala programming language and used it extensively with Apache Spark for data processing.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.

Environment: Map Reduce, Cloudera Manager 5.10, HDFS, Hive, Spark 1.6, Kafka, Scala, MySQL, Java (JDK 1.6), Eclipse.

Hadoop Developer

Confidential, Louisville, KY

Responsibilities:

  • Responsible for running Hadoop streaming jobs to process terabytes of xml format data.
  • Created Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Optimized Hive joins for large tables and developed map reduce code for full outer join of two large tables.
  • Designed and developed Pig Latin scripts and pig command line transformations for data joins and custom processing of map reduce outputs.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
  • Created HBase tables for random read/writes by map reduce programs.
  • Loaded the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts with understanding of joins, groups and aggregations.
  • Developed Sqoop Scripts to extract data from Oracle source databases onto HDFS.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala .
  • Involved in NoSQL databases like HBase , Cassandra in implementing and integration.
  • Developed Sqoop Scripts to extract data from Oracle source databases onto HDFS .
  • Moved data from Hadoop to Cassandra using Bulk output format class.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop . Cluster co-ordination through Zookeeper .
  • Implemented Cloudera Manager on existing cluster.
  • Extensively worked with Cloudera Distribution of Hadoop, CDH 5.x.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka .
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
  • Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend.
  • Developed the Talend jobs and make sure to load the data into HIVE tables & HDFS files.

Environment: : Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Impala, Cassandra, Kafka, SQL, Python, Spark, Linux, Java.

Hadoop Developer

Confidential, Boston, MA

Responsibilities:

  • Worked on writing transformations/mapping Map-Reduce pipelines using Java.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Worked with various HDFS file formats like Avro, Sequence File, JSON and various compression formats like Snappy, bzip2.
  • Designed and implemented Incremental Imports into Hive tables.
  • Wrote Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in loading data into HBase using HBase Shell, HBase Client API.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Loaded data in HDFS to Spark RDD’s and performed various transformations, actions to process the data.
  • Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
  • Migrated ETL jobs to Pig scripts implementing Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Developed scripts and automated data management from end to end and sync up between all the clusters.
  • Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
  • Developed small distributed applications in our projects using Zookeeper and scheduled the workflows using Oozie.
  • Provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2, Open Stack.

Environment: Hadoop, Big Data, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Java, Eclipse, Cassandra, PL/SQL, Windows NT, UNIX Shell Scripting.

Java Developer

Confidential,

Responsibilities:

  • Implemented the Struts framework with MVC architecture.
  • Created new JSP's for the front end using HTML, Java Script, jQuery, and Ajax.
  • Developed the presentation layer using JSP, HTML, CSS and client-side validations using JavaScript.
  • Collaborated with the ETL/ Informatica team to determine the necessary data models and UI designs to support Cognos reports.
  • Performed several data quality checks and found potential issues, designed Ab Initio graphs to resolve them.
  • Applied J2EE design patterns like Business Delegate, DAO and Singleton.
  • Deployed and tested the application using Tomcat web server.
  • Involved in coding, code reviews, JUnit testing, Prepared and executed Unit Test Cases.
  • JBOSS for application deployment and MySQL for database
  • JUnit was used for unit testing for the integration testing tool.
  • Used Oracle coherence for real-time cache updates, live event processing, in-memory grid computations.
  • Writing SQL queries to fetch the business data using Oracle as database.
  • Developed UI for Customer Service Modules and Reports using JSF, JSP's and My Faces Components
  • Log4j used for logging the application log of the running system to trace the errors and certain automated routine functions.
  • Developed Rich user interface using HTML, JSP, AJAX, JSTL, Java Script, jQuery and CSS.
  • Creating custom tags for JSP for maximum re-usability of user interface components.
  • Testing and deploying the application on Tomcat.

Environment: Java, JSP, Hibernate, Junit, JavaScript, Servlets, Struts, Hibernate, EJB, JSF, JSP, Ant, Tomcat, CVS, Eclipse, SQL Developer, Oracle.

Java Developer

Confidential

Responsibilities:

  • Design and developed Web services based on SOAP, WSDL, JAXWS using Spring.
  • Involved in designing the xml schema for using web services.
  • Implemented service layer using Spring IOC, annotations and Controllers using Spring MVC
  • Designed and developed Data layer for Client communicating both Oracle and Sybase at any time.
  • Designed class diagrams and sequence diagrams using Microsoft Visio 2007.
  • Migrated complex queries and stored procedures from Sybase to Oracle.
  • Developed and debugged applications using Eclipse IDE.
  • Used Oracle coherence for real-time cache updates, live event processing, in-memory grid computations
  • Developed tools to generate automated send views and ret views to serialize the data to the mainframe.
  • Set up Custom business validations using Struts validate framework.
  • Involved in JUNIT testing, integration testing, system testing etc.
  • Developed and deployed Message Driven Beans to apply same adjustment for multiple air bills asynchronously.
  • Used multithreading on the client to process huge requests.
  • Implemented all the functionality using Spring IO/Spring Boot and Hibernate ORM.
  • Created different state machines to accomplish the dependent tasks individually one after another.
  • Used ASN encoding to send the data across the network and used MIIG API to talk to mainframe server.
  • Migrated functionalities developed in C (procedure Oriented Language) code to JAVA without missing business rules.
  • Created JSP page to modify log levels dynamically without restarting the server.
  • Involved in creating automated builds using ANT for the client and Maven to build/deploy onto WebLogic server.

Environment: Java, J2EE, SOAP, XML, XSD, WSDL, JAXB, SOAP UI, MDB, ANT/Maven, JUnit, SVN, Eclipse, Oracle, PL/SQL, WebLogic.

We'd love your feedback!