We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

San Diego, CA

SUMMARY:

  • Over 4+ years of experience in IT industry in Big Data Technologies and JAVA/J2EE, this includes 3 plus years of experience in Big Data, Hadoop stack and 1 year of experience as Java developer.
  • Well versed in designing and implementing MapReduce jobs using JAVA on Eclipse to solve real world scaling problems.
  • Hands on experience working on structured, unstructured data with various file formats such as xml files, JSON files, sequence files using MapReduce programs.
  • Extensive experience with wiring SQL queries using HiveQL to perform analytics on structured data.
  • Expertise in Data load management, importing & exporting data using SQOOP and FLUME.
  • Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
  • Implemented business logic using Pig scripts. Wrote custom Pig UDF to analyse data
  • Performed PIG operations, joining operations and transformations on data to aggregate and analyse data.
  • Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using Hadoop distributions: Cloudera CDH, Horton Works HDP.
  • Good experience using Apache Spark, Storm and Kafka.
  • Good Knowledge on Spark framework on both batch and real time data processing.
  • Good Knowledge and experience in Spark using Python and Scala.
  • Hands on experience in creating Hive UDF's for the requirements and to handle Json and xml files.
  • Delivering projects (full SDLC) using big data technologies like Hadoop, Oozie and NoSQL.
  • Good understanding of NiFi workflow on picking up files from different locations and moving to HDFS or sending to Kafka brokers.
  • Have knowledge on injecting data from multiple data sources to HDFS and Hive using NiFi and importing data using Nifi tool from Linux servers
  • Extensive working knowledge in setting up and running Clusters, monitoring, Data analytics, Sentiment analysis, Predictive analysis, Data presentation with big data world.
  • Excellent understanding of NoSQL databases like HBase.
  • Excellent interpersonal and communication skills, creative, research - minded, technically competent and result-oriented with problem solving and leadership skills
  • Experience in JAVA programming with skills in analysis, design, testing and deploying with various technologies like Java, J2EE, JavaScript and Data Structures, JDBC, HTML, XML, JUnit, JQuery.

TECHNICAL SKILLS:

Big Data Ecosystems: MapReduce, Hive, Sqoop, Spark, Kafka, Pig, Flume, HBase, Oozie

Streaming Technologies: Spark Streaming, Storm

Scripting Languages: Python, Bash, Java Scripting, HTML5, CSS3

Programming Languages: Java, Scala, SQL, PL/SQL

Java/J2EE Technologies: Servlets, JSP,JSF, JUnit, Hibernate, Log4J, EJB, JDBC, JMS, JNDI

Databases: Oracle, RDBMS, NoSQL

IDEs / Tools: Eclipse, JUnit, Maven, Ant, MS Visual Studio, Net Beans

Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE:

Confidential, San Diego, CA

Hadoop/Spark Developer

Responsibilities:

  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Imported the data from different sources like HDFS/HBase into Spark RDD, developed a data pipeline using Kafka to store data into HDFS. Performed real time analysis on the incoming data.
  • Worked extensively with Sqoop for importing and exporting the data from data Lake HDFS to Relational Database systems like Oracle and MySQL.
  • Developed python scripts to collect data from source systems and store it on HDFS.
  • Involved in converting Hive or SQL queries into Spark transformations using Python and Scala.
  • Built Kafka Rest API to collect events from front end.
  • Built real time pipeline for streaming data using Kafka and Spark Streaming.
  • Worked on integrating Apache Kafka with Spark Streaming process to consume data from external sources and run custom functions
  • Exploring with the Spark and improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Performance optimization when dealing with large datasets using partitions, broadcasts in Spark, effective and efficient joins, transformations during ingestion process.
  • Used Spark for interactive queries, processing of streaming data and integration with Hbase database for huge volume of data.
  • Stored the data in tabular formats using Hive tables and Hive Serdes.
  • Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
  • Redesigned the HBase tables to improve the performance according to the query requirements.
  • Developed MapReduce jobs in Java to convert data files into Parquet file format.
  • Developed Hive queries for data sampling and analysis to the analysts.
  • Executed Hive queries that helped in analysis of trends by comparing the new data with existing data warehouse reference tables and historical data.
  • Developed Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
  • Worked on Sequence files, ORC files, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Developed MapReduce programs in Java to search production logs and web analytics logs for application issues.
  • Used OOZIE engine for creating workflow and coordinator jobs that schedule and execute various Hadoop jobs such as MapReduce Jobs, Hive, Spark and automating Sqoop jobs.
  • Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.

Environment: CDH5, Hue, Eclipse, Centos Linux, HDFS, MapReduce, Kafka, Python, Scala, Java, Hive, Sqoop, Spark, Spark-SQL, Spark-Streaming, Hbase, Oracle10g, Oozie, Red Hat Linux.

Confidential,Minneapolis, MN

Hadoop/Spark Developer

Responsibilities:
  • Developed spark scripts by using Scala shell as per requirements.
  • Worked with spark core, Spark Streaming and spark SQL modules of Spark
  • Developed multiple POCs using Spark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
  • Automated Sqoop incremental imports by using Sqoop jobs and automated the jobs using Oozie
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using HQL.
  • Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs.
  • Developed python and shell scripts to schedule the processes running on a regular basis.
  • Developed several advanced Map Reduce programs in Java as part of functional requirements for Big Data
  • Developing Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
  • Experienced in managing and reviewing Hadoop log files.
  • Tested and reported defects in an Agile Methodology perspective.
  • Installed Hadoop ecosystems (Hive, Pig, Sqoop, HBase, Oozie) on top of Hadoop cluster
  • Involved in importing data from SQL to HDFS and Hive for analytical purpose.
  • Implemented the workflows using Oozie framework to automate tasks.

Environment: Hadoop, Hue, HDFS, Spark, MapReduce, Hive, Oozie, Java, Python, NoSQL, Cloudera, Linux, MySQL, SQL.

Confidential,Houston, TX

Hadoop Developer

Responsibilities:
  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Importing the unstructured data into the data lake HDFS using Flume.
  • Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
  • Written Map Reduce java programs to analyse the log data for large-scale data sets.
  • Involved in using HBase Java API on Java application.
  • Automated Sqoop jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
  • Implemented Map Reduce jobs using Java API and PIG Latin as well HIVEQL
  • Participated in the setup and deployment of Hadoop cluster.
  • Hands on design and development of an application using Hive (UDF).
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
  • Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
  • Developed UDFs in Java when necessary to use in PIG and HIVE queries
  • Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.

Environment: Hadoop, HDP (Horton works), Hive, Ambari, Zookeeper, Map Reduce, Sqoop, Pig, UNIX, Java, Eclipse, Oracle, SQL Server, MySQL.

Confidential

Java Developer

Responsibilities:

  • Designed the user interfaces using JSP.
  • Developed Custom tags, JSTL to support custom User Interfaces.
  • Developed the application using Struts (MVC) Framework.
  • Implemented Business processes such as user authentication, Account Transfer using Session EJBs.
  • Used Eclipse to write the code for JSP, Servlets, Struts and EJBs.
  • Deployed the applications on Web Logic Application Server.
  • Used Java Messaging Services (JMS) and Backend messaging for reliable and asynchronous exchange of important information such as payment status report.
  • Developed the entire Application(s) through Eclipse.
  • Worked with Web Logic Application Server to deploy the Application(s).
  • Developed the Ant scripts for preparing WAR files used to deploy J2EE components.
  • Used JDBC for database connectivity to Oracle.
  • Worked with Oracle Database to create tables, procedures, functions and select statements.
  • Used JUnit Testing, debugging, and bug fixing.
  • Used Log4J to capture the log that includes runtime exceptions and developed WAR framework to alert the client and production support in case of application failures.
  • Performed Data driven testing using Selenium and TestNG functions which reads data from property and XML files. Involved in CICD process using GIT, Jenkins job creation, Maven build and publish.
  • Used Maven to build and run the Selenium automation framework. Involved in building and deploying scripts using Maven to generate WAR, EAR and JAR files.

Environment: Eclipse, Web Sphere Application Server, JSP, Servlet, HTML, JUnit, JavaScript, CSS, EJB, Hibernate, Struts, XML, JAXP, CVS, JAX-RPC, AXIS, SOAP, TOAD, AJAX, Jenkins, Maven, Log4J, UNIX, Linux, Java, J2EE, JSP, Struts, JNDI, Oracle 10g, HTML, XML, Web Logic 8.1, Ant, CVS, Log4J, JUnit, JMS, JDBC, JavaScript, Eclipse IDE, UNIX Shell Scripting, Rational Unified Process (RUP).

Hire Now