Hadoop/spark Developer Resume
Cambridge, MA
PROFESSIONAL EXPERIENCE:
- Over 6+ years of experience in IT industry in Big Data Technologies and JAVA/J2EE, this includes 3 plus years of experience in Big Data, Hadoop stack and 1 year of experience as Java developer.
- Well versed in designing and implementing MapReduce jobs using JAVA on Eclipse to solve real world scaling problems.
- Hands on experience working on structured, unstructured data with various file formats such as xml files, JSON files, sequence files using MapReduce programs.
- Extensive experience with wiring SQL queries using HiveQL to perform analytics on structured data.
- Expertise in Data load management, importing & exporting data using SQOOP and FLUME.
- Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
- Implemented business logic using Pig scripts. Wrote custom Pig UDF to analyse data
- Performed PIG operations, joining operations and transformations on data to aggregate and analyse data.
- Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using Hadoop distributions: Cloudera CDH, Horton Works HDP.
- Good experience using Apache Spark, Storm and Kafka.
- Good Knowledge on Spark framework on both batch and real time data processing.
- Good Knowledge and experience in Spark using Python and Scala.
- Hands on experience in creating Hive UDF's for the requirements and to handle Json and xml files.
- Delivering projects (full SDLC) using big data technologies like Hadoop, Oozie and NoSQL.
- Good understanding of NiFi workflow on picking up files from different locations and moving to HDFS or sending to Kafka brokers.
- Have knowledge on injecting data from multiple data sources to HDFS and Hive using NiFi and importing data using Nifi tool from Linux servers
- Extensive working knowledge in setting up and running Clusters, monitoring, Data analytics, Sentiment analysis, Predictive analysis, Data presentation with big data world.
- Excellent understanding of NoSQL databases like HBase.
- Excellent interpersonal and communication skills, creative, research - minded, technically competent and result-oriented with problem solving and leadership skills
- Experience in JAVA programming with skills in analysis, design, testing and deploying with various technologies like Java, J2EE, JavaScript and Data Structures, JDBC, HTML, XML, JUnit, jQuery.
TECHNICAL SKILLS:
Big Data Ecosystems: MapReduce, Hive, Sqoop, Spark, Kafka, Pig, Flume, HBase, Oozie
Streaming Technologies: Spark Streaming, Storm
Scripting Languages: Python, Bash, Java Scripting, HTML5, CSS3
Programming Languages: Java, Scala, SQL, PL/SQL
Java/J2EE Technologies: Servlets, JSP, JSF, JUnit, Hibernate, Log4J, EJB, JDBC, JMS, JNDI
Databases: Oracle, RDBMS, NoSQL
IDEs / Tools: Eclipse, JUnit, Maven, Ant, MS Visual Studio, Net Beans
Methodologies: Agile, Waterfall
PROFESSIONAL SUMMARY:
Confidential, NY
Hadoop/Spark Developer
Responsibilities:
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Imported the data from different sources like HDFS/HBase into Spark RDD, developed a data pipeline using Kafka to store data into HDFS. Performed real time analysis on the incoming data.
- Worked extensively with Sqoop for importing and exporting the data from data Lake HDFS to Relational Database systems like Oracle and MySQL.
- Developed python scripts to collect data from source systems and store it on HDFS.
- Involved in converting Hive or SQL queries into Spark transformations using Python and Scala.
- Built Kafka Rest API to collect events from front end.
- Built real time pipeline for streaming data using Kafka and Spark Streaming.
- Worked on integrating Apache Kafka with Spark Streaming process to consume data from external sources and run custom functions
- Exploring with the Spark and improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Performance optimization when dealing with large datasets using partitions, broadcasts in Spark, effective and efficient joins, transformations during ingestion process.
- Used Spark for interactive queries, processing of streaming data and integration with HBase database for huge volume of data.
- Stored the data in tabular formats using Hive tables and Hive Serves.
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
- Redesigned the HBase tables to improve the performance according to the query requirements.
- Developed MapReduce jobs in Java to convert data files into Parquet file format.
- Developed Hive queries for data sampling and analysis to the analysts.
- Executed Hive queries that helped in analysis of trends by comparing the new data with existing data warehouse reference tables and historical data.
- Developed Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
- Worked on Sequence files, ORC files, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Developed MapReduce programs in Java to search production logs and web analytics logs for application issues.
- Used OOZIE engine for creating workflow and coordinator jobs that schedule and execute various
- Hadoop jobs such as MapReduce Jobs, Hive, Spark and automating Sqoop jobs.
- Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.
Environment: CDH5, Hue, Eclipse, Centos Linux, HDFS, MapReduce, Kafka, Python, Scala, Java, Hive, Sqoop, Spark, Spark-SQL, Spark-Streaming, HBase, Oracle10g, Oozie, Red Hat Linux.
Confidential, Cambridge, MA
Hadoop/Spark Developer
Responsibilities:
- Developed spark scripts by using Scala shell as per requirements.
- Worked with spark core, Spark Streaming and spark SQL modules of Spark
- Developed multiple POCs using Spark and deployed on the Yarn cluster , compared the performance of Spark , with Hive and SQL/Teradata.
- Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs , Python and Scala.
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
- Automated Sqoop incremental imports by using Sqoop jobs and automated the jobs using Oozie
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using HQL.
- Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs.
- Developed python and shell scripts to schedule the processes running on a regular basis.
- Developed several advanced Map Reduce programs in Java as part of functional requirements for Big Data
- Developing Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
- Experienced in managing and reviewing Hadoop log files.
- Tested and reported defects in an Agile Methodology perspective.
- Installed Hadoop ecosystems (Hive, Pig, Sqoop, HBase, Oozie) on top of Hadoop cluster
- Involved in importing data from SQL to HDFS and Hive for analytical purpose.
- Implemented the workflows using Oozie framework to automate tasks.
Environment: Hadoop, Hue, HDFS, Spark, MapReduce, Hive, Oozie, Java, Python, NoSQL, Cloudera, Linux, MySQL, SQL.
Confidential, FL
Hadoop Developer
Responsibilities:
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Importing the unstructured data into the data lake HDFS using Flume.
- Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
- Written Map Reduce java programs to analyse the log data for large-scale data sets.
- Involved in using HBase Java API on Java application.
- Automated Sqoop jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
- Implemented Map Reduce jobs using Java API and PIG Latin as well HIVEQL
- Participated in the setup and deployment of Hadoop cluster.
- Hands on design and development of an application using Hive (UDF).
- Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
- Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
- Developed UDFs in Java when necessary to use in PIG and HIVE queries
- Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
Environment: Hadoop, HDP (Horton works), Hive, Ambari, Zookeeper, Map Reduce, Sqoop, Pig, UNIX, Java, Eclipse, Oracle, SQL Server, MySQL.
Confidential
Java Developer
Responsibilities:
- Designed the user interfaces using JSP.
- Developed Custom tags, JSTL to support custom User Interfaces.
- Developed the application using Struts (MVC) Framework.
- Implemented Business processes such as user authentication, Account Transfer using Session EJBs.
- Used Eclipse to write the code for JSP, Servlets, Struts and EJBs.
- Deployed the applications on Web Logic Application Server.
- Used Java Messaging Services (JMS) and Backend messaging for reliable and asynchronous exchange of important information such as payment status report.
- Developed the entire Application(s) through Eclipse.
- Worked with Web Logic Application Server to deploy the Application(s).
- Developed the Ant scripts for preparing WAR files used to deploy J2EE components.
- Used JDBC for database connectivity to Oracle.
- Worked with Oracle Database to create tables, procedures, functions and select statements.
- Used JUnit Testing, debugging, and bug fixing.
- Used Log4J to capture the log that includes runtime exceptions and developed WAR framework to alert the client and production support in case of application failures.
- Performed Data driven testing using Selenium and TestNG functions which reads data from property and XML files. Involved in CICD process using GIT, Jenkins job creation, Maven build and publish.
- Used Maven to build and run the Selenium automation framework. Involved in building and deploying scripts using Maven to generate WAR, EAR and JAR files.
Environment: Eclipse, Web Sphere Application Server, JSP, Servlet, HTML, JUnit, JavaScript, CSS, EJB, Hibernate, Struts, XML, JAXP, CVS, JAX-RPC, AXIS, SOAP, TOAD, AJAX, Jenkins, Maven, Log4J, UNIX, Linux, Java, J2EE, JSP, Struts,