We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

4.00/5 (Submit Your Rating)

Broadway, NY

SUMMARY

  • Talented and accomplished Software Engineer with 7+ years of IT experience in developing applications using BigData, AWS, Java,SQL and Spark.
  • 3+ years of experience with Big Data tools like MapReduce, YARN, HDFS, Hbase, Impala,Hive, Pig, Oozie,AWS,, ApacheSpark for ingestion, storage, querying, processing and analysis of data.
  • Performance tuning in Hive&Impala using multiple methods limited to dynamic partitioning, bucketing, indexing, files compressions.
  • Hands on experience withdata ingestion tools Kafka, Flume and workflow management tools Oozie and Zena.
  • Hands on experience handling different file formats like JSON, AVRO, ORC, Parquet and compression techniques like snappy, zlib and lzo.
  • Hands on experience in Hadoop Ecosystem components such as Hadoop, Spark, HDFS, YARN, TEZ, Hive, Sqoop, Flume, MapReduce, SCALA, Pig, OOZIE, Kafka, NIFI, Storm, HBASE.
  • Experience on analyzing data in NOSQL databases like Hbase and Cassandraand its Integration withHadoopcluster.
  • Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
  • Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD.
  • Developed Java applications using various IDE's like Spring Tool Suite and Eclipse.
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Operated on Java/J2EE systems with different databases, which include Oracle, MySQL and DB2.
  • Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
  • Capable of processing large sets of structured, semi - structured and unstructureddata and supporting systems application architecture.
  • Extensive development experience in sparkapplications for datatransformations and loading into HDFS using RDD, DataFrames and Datasets.
  • Extensive knowledge on performance tuning of Spark applications and converting Hive/SQL queries into Sparktransformations.
  • Hands-on experience with AWS (AmazonWebServices), using ElasticMapReduce (EMR), creating and storing data in S3buckets and creating ElasticLoadBalancers(ELB) for Hadoop front end WebUI’s.
  • Extensive knowledge on creating Hadoop cluster on multiple EC2 instances in AWS and configuring them through ambari and using IAM (Identity and AccessManagement) for creating groups, users and assigning permissions.
  • Extensive programming experience in JavaCore concepts like OOPS, Multithreading, Collections and IO.
  • Experience using Jira for ticketing issues and Jenkins for continuous integration.
  • Extensive experience with UNIX commands, shellscripting and setting up CRON jobs.
  • Experience in software configuration management using Git.
  • Good experience in using Relational databases Oracle&MySQL.
  • Able to assess businessrules, collaborate with stakeholders and perform source-to-target datamapping, design.
  • Successfully working in fast-paced environment, both independently and in collaborative team environments.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig 0.17, Hive 2.3, Sqoop 1.4, Apache Impala 3.0, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper 3.4

Hadoop Distributions: Cloudera, Hortonworks, MapR

Cloud: AWS, Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.

Databases: Microsoft SQL Server, MySQL, Oracle, NoSQL and Hbase.

Scripting Languages: JavaScript, HTML & Bash.

Tools: Eclipse, IntelliJ IDEA, Maven and SBT.

Platforms: Windows, Linux, and Centos.

Programming Languages: Java, C/C++ and Scala.

Currently Exploring: Apache Kylo, Nifi, Flink and Alluxio.

PROFESSIONAL EXPERIENCE

Confidential - Broadway, NY

Hadoop/Spark Developer

Responsibilities:

  • Actively involved in designing Hadoop ecosystem pipeline.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Worked with data science team to build statistical model with Spark MLLIB and Pyspark.
  • Involved in performing importing data from various sources to the Cassandracluster using Sqoop.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Developed Oozie workflow for scheduling & orchestrating the ETL process.
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Worked extensively on Apache Nifi to build Nifi flows for the existing Oozie jobs to get the incremental load, full load and semi structured data and to get data from rest API into Hadoop and automate all the Nifi flows runs incrementally.
  • Created Nifi flows to triggerspark jobs and used put email processors to get notifications if their are any failures.
  • Developed shell scripts to periodically perform incremental import of data from third party API to AmazonAWS
  • Worked extensively with importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
  • Worked on creating data models for Cassandra from Existing Oracle data model.
  • Designed Column families in Cassandra and Ingested data from RDBMS, performed data transformations, and tan export the transformed data to Cassandra as per the business requirement.
  • Used Sqoop to import functionality for loading Historical data present in RDBMS to HDFS
  • Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Apache Hadoop environment by Hortonworks (HDP 2.2)
  • Configured Hive bolts and written data to hive in Hortonworks as a part of POC.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Administrator for Pig, Hive and Hbase installing updates, patches and upgrades.
  • Responsible for importing real time data to pull the data from sources to Kafka clusters.
  • Worked with spark techniques like refreshing the table and handling parallelly and modifying the spark defaults for performance tuning.
  • Experience in working with NO-SQL databases such as HBASE.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Involved in migrating MapReduce jobs into Spark jobs and used SparkSQL and DataframesAPI to load structured data into Spark clusters.
  • Involved in using SparkAPI over Hadoop YARN as execution engine for data analytics using Hive and submitted the data to BI team for generating reports, after the processing and analyzing of data in Spark SQL.
  • Used version control tools like Github to share the code snippet among the team members.
  • Involved in daily Scrummeetings to discuss the development/progress and was active in making scrum meetings more productive.

Environment: Hadoop 3.0, Scala 2.12, Spark, SQL, Hbase, Hive 2.3, Pyspark, Cassandra 3.11, Oozie, Apache Nifi, AWS, Oracle 12c, RDBMS, HDFS, Oozie 4.3, Hortonworks

Confidential - Bellevue, WA

Hadoop Developer (AWS with Spark)

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Worked on loading data into Spark RDD's, perform advanced procedures like text analytics using in-memory data computation capabilities of Spark to generate the Output response.
  • Developed the statistics graph using JSP, Custom tag libraries, Applets and Swing in a multi-threaded architecture
  • Created HBase tables to store various data formats coming from different applications.
  • Executed many performance tests using the Cassandra-stress tool to measure and improve the read and write performance of the cluster.
  • Handled large datasets using Partitions, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Used Kafka Streams to Configure Spark Streaming to get information and tan store it in HDFS.
  • Migrated an existing on-premises application to AWS. Used AWSservices like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Performed the migration of Hive and MapReduce Jobs from on-premise MapR to AWS cloud using EMR.
  • Partitioned data streams using Kafka, designed and Used Kafka producer API's to produce messages.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Performed tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Ingested data from RDBMS to Hive to perform data transformations, and tan export the transformed data to Cassandra for data access and analysis.
  • Experienced in Core Java, Collection Framework, JSP, Dependency Injection, Spring MVC, RESTful Web services.
  • Implemented Spark Scripts using Scala, Spark SQL to access hivetables into Spark for faster processing of data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQLdatabase for huge volume of data.
  • Extracted the data from Teradata into HDFS/Dashboards using Spark Streaming.
  • Implemented Informatica Procedures and Standards while developing and testing the Informatica objects.

Environment: Hadoop 3.0, Spark 2.1, Hbase, Cassandra 1.1, Kafka 0.9s, JSP, HDFS, AWS, EC2, Hive 1.9, MapReduce, MapR, Java, MVC, Scala, NoSQL

Confidential - Austin, TX

Java/Hadoop Developer

Responsibilities:

  • Implemented J2EEDesignPatterns like DAO, Singleton, and Factory.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Used Spring/MVC framework to enable the interactions between JSP/View layer and implemented different design patterns with J2EE and XML technology.
  • Implemented application using MVC architecture integrating Hibernate and spring frameworks.
  • Utilized various JavaScript and JQuery libraries Bootstrap, Ajax for form validation and other interactive features.
  • Extensively worked on Hadoop eco-systems including Hive, Spark Streaming with MapRdistribution.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Worked on storing data in HDFS either directly or through Hbase.
  • Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
  • Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Implemented the J2EE design patterns Data Access Object (DAO), Session Façade and Business Delegate.
  • Developed Nififlows dealing with various kinds of data formats such as XML, JSON and Avro.
  • Implemented MapReduce jobs in HIVE by querying the available data.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Collaborated with business users/product owners/developers to contribute to the analysis of functional requirements.
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Sparkstreaming for high efficiency throughput and reliability
  • Worked in tuning Hive&Pig to improve performance and solved performance issues in both scripts.

Environment: Hadoop 3.0, Hive 2.1, J2EE, Hbase, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, JQuery, JavaScript, Ajax

Confidential - Chicago, IL

Java Developer

Responsibilities:

  • As a Java Developer involved in back-end and front-end developing team.
  • Involved in the Software Development Life Cycle (SDLC) including Analysis, Design, Implementation
  • Responsible for use case diagrams, class diagrams and sequence diagrams using Rational Rose in the Design phase.
  • Developed ANT scripts that checkout code from SVN repository, build EAR files.
  • Used XML Web Services using SOAP to transfer information to the supply chain and domain expertise Monitoring Systems.
  • Use Eclipse and Tomcat web server for developing & deploying the applications.
  • Developed REST Web Services clients to consume those Web Services as well other enterprise wide Web Services.
  • Used JavaScript and AJAXtechnologies for front end user input validations and Spring validation framework for backend validation for the User Interface.
  • Used both annotation based configuration and XML based.
  • Developed application service components and configured beans using (applicationContext.xml) Spring IOC.
  • Implemented persistence mechanism using Hibernate (ORM Mapping).
  • Developed the DAO layer for the application using Spring Hibernate Template support.
  • Used WebLogic workshop, Eclipse IDE to develop the application.
  • Performed the code build and deployment using Maven.
  • Implementation of Spring Restful web services which produces JSON.
  • Responsible for maintaining the code quality, coding and implementation standards by code reviews.
  • Developed the front end of the application using HTML, CSS, JSP and JavaScript.
  • Created RESTFULL APIs using Spring MVC.
  • Used SVN version controller to maintain the code versions.
  • Worked on web applications using open source MVCframeworks.
  • Developed Web interface using JSP, Standard Tag Libraries (JSTL), and SpringFramework.
  • Implemented logger for debugging and testing purposes using Log4j.

Environment: JSON, HTML 4, CSS, XML, Hibernate 3.6, Eclipse, Maven, JUnit, JDBC, ANT, SOAP, Log4j

Confidential - Rochester, NY

Java Developer

Responsibilities:

  • Individually worked on all the stages of a SoftwareDevelopmentLifeCycle (SDLC).
  • Responsible for design and implementation of various modules of the application using Struts-Spring-Hibernate architecture.
  • Created user-friendly GUI interface and Web pages using HTML, CSS and JSP.
  • Developed web components using MVC pattern under Struts framework.
  • Wrote JSPs, Servlets and deployed them on Weblogic Application server.
  • Used JSP's, HTML on front end, Servlets as Front Controllers and JavaScript for client side validations.
  • Wrote the Hibernate-mapping XML files to define java classes-database tables mapping.
  • Developed the UI using JSP, HTML, CSS and AJAX and learned how to implement JQuery, JSP and client &server validations using JavaScript.
  • Implemented MVC architecture by using spring to send and receive the data from front-end to business layer.
  • Designed, developed and maintained the data layer using JDBC and performed configuration of JavaApplication Framework.
  • Extensively used Hibernate in data access layer to access and update information in the database.
  • Migrated the Servlets to the Spring Controllers and developed Spring Interceptors, worked on JSPs, JSTL, and JSP Custom Tags.
  • Used Jenkins for continuous integration purpose in using SVN, JUnit and Mockito as version control and Unit testing by Creating design documents and test cases for development work.
  • Worked on Eclipse IDE for front end development environment for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
  • Responsible for writing Struts action classes, Hibernate POJO classes and integrating Struts and Hibernate with spring for processing business needs.
  • Developed the application using Servlets and JSP for the presentation layer along with JavaScript for the client side validations.
  • Wrote Hibernate classes, DAO's to retrieve & store data, configured Hibernate files.
  • Used Web Logic for application deployment and Log4J used for Logging/debugging.
  • Used CVSversion controlling tool and project build tool using ANT.
  • Used various Core Java concepts such as multi-threading, Exception Handling, Collection APIs to implement various features and enhancements.
  • Wrote and debugged the MavenScripts for building the entire web application.
  • Designed and developed Ajax calls to populate screens parts on demand.

Environment: Struts, HTML, CSS, JSP, MVC, Hibernate, JSP, AJAX, JQuery, Java, Jenkins, ANT, Maven

We'd love your feedback!