Hadoop/Spark Developer Resume Cambridge, MA - Hire IT People

PROFESSIONAL EXPERIENCE:

Over 6+ years of experience in IT industry in Big Data Technologies and JAVA/J2EE, this includes 3 plus years of experience in Big Data, Hadoop stack and 1 year of experience as Java developer.
Well versed in designing and implementing MapReduce jobs using JAVA on Eclipse to solve real world scaling problems.
Hands on experience working on structured, unstructured data with various file formats such as xml files, JSON files, sequence files using MapReduce programs.
Extensive experience with wiring SQL queries using HiveQL to perform analytics on structured data.
Expertise in Data load management, importing & exporting data using SQOOP and FLUME.
Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
Implemented business logic using Pig scripts. Wrote custom Pig UDF to analyse data
Performed PIG operations, joining operations and transformations on data to aggregate and analyse data.
Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using Hadoop distributions: Cloudera CDH, Horton Works HDP.
Good experience using Apache Spark, Storm and Kafka.
Good Knowledge on Spark framework on both batch and real time data processing.
Good Knowledge and experience in Spark using Python and Scala.
Hands on experience in creating Hive UDF's for the requirements and to handle Json and xml files.
Delivering projects (full SDLC) using big data technologies like Hadoop, Oozie and NoSQL.
Good understanding of NiFi workflow on picking up files from different locations and moving to HDFS or sending to Kafka brokers.
Have knowledge on injecting data from multiple data sources to HDFS and Hive using NiFi and importing data using Nifi tool from Linux servers
Extensive working knowledge in setting up and running Clusters, monitoring, Data analytics, Sentiment analysis, Predictive analysis, Data presentation with big data world.
Excellent understanding of NoSQL databases like HBase.
Excellent interpersonal and communication skills, creative, research - minded, technically competent and result-oriented with problem solving and leadership skills
Experience in JAVA programming with skills in analysis, design, testing and deploying with various technologies like Java, J2EE, JavaScript and Data Structures, JDBC, HTML, XML, JUnit, jQuery.

TECHNICAL SKILLS:

Big Data Ecosystems: MapReduce, Hive, Sqoop, Spark, Kafka, Pig, Flume, HBase, Oozie

Streaming Technologies: Spark Streaming, Storm

Scripting Languages: Python, Bash, Java Scripting, HTML5, CSS3

Programming Languages: Java, Scala, SQL, PL/SQL

Java/J2EE Technologies: Servlets, JSP, JSF, JUnit, Hibernate, Log4J, EJB, JDBC, JMS, JNDI

Databases: Oracle, RDBMS, NoSQL

IDEs / Tools: Eclipse, JUnit, Maven, Ant, MS Visual Studio, Net Beans

Methodologies: Agile, Waterfall

PROFESSIONAL SUMMARY:

Confidential, NY

Hadoop/Spark Developer

Responsibilities:

Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Imported the data from different sources like HDFS/HBase into Spark RDD, developed a data pipeline using Kafka to store data into HDFS. Performed real time analysis on the incoming data.
Worked extensively with Sqoop for importing and exporting the data from data Lake HDFS to Relational Database systems like Oracle and MySQL.
Developed python scripts to collect data from source systems and store it on HDFS.
Involved in converting Hive or SQL queries into Spark transformations using Python and Scala.
Built Kafka Rest API to collect events from front end.
Built real time pipeline for streaming data using Kafka and Spark Streaming.
Worked on integrating Apache Kafka with Spark Streaming process to consume data from external sources and run custom functions
Exploring with the Spark and improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Performance optimization when dealing with large datasets using partitions, broadcasts in Spark, effective and efficient joins, transformations during ingestion process.
Used Spark for interactive queries, processing of streaming data and integration with HBase database for huge volume of data.
Stored the data in tabular formats using Hive tables and Hive Serves.
Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
Redesigned the HBase tables to improve the performance according to the query requirements.
Developed MapReduce jobs in Java to convert data files into Parquet file format.
Developed Hive queries for data sampling and analysis to the analysts.
Executed Hive queries that helped in analysis of trends by comparing the new data with existing data warehouse reference tables and historical data.
Developed Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
Worked on Sequence files, ORC files, bucketing, partitioning for Hive performance enhancement and storage improvement.
Developed MapReduce programs in Java to search production logs and web analytics logs for application issues.
Used OOZIE engine for creating workflow and coordinator jobs that schedule and execute various
Hadoop jobs such as MapReduce Jobs, Hive, Spark and automating Sqoop jobs.
Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.

Environment: CDH5, Hue, Eclipse, Centos Linux, HDFS, MapReduce, Kafka, Python, Scala, Java, Hive, Sqoop, Spark, Spark-SQL, Spark-Streaming, HBase, Oracle10g, Oozie, Red Hat Linux.

Confidential, Cambridge, MA

Hadoop/Spark Developer

Responsibilities:

Developed spark scripts by using Scala shell as per requirements.
Worked with spark core, Spark Streaming and spark SQL modules of Spark
Developed multiple POCs using Spark and deployed on the Yarn cluster , compared the performance of Spark , with Hive and SQL/Teradata.
Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs , Python and Scala.
Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
Automated Sqoop incremental imports by using Sqoop jobs and automated the jobs using Oozie
Responsible for writing Hive Queries for analyzing data in Hive warehouse using HQL.
Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs.
Developed python and shell scripts to schedule the processes running on a regular basis.
Developed several advanced Map Reduce programs in Java as part of functional requirements for Big Data
Developing Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
Experienced in managing and reviewing Hadoop log files.
Tested and reported defects in an Agile Methodology perspective.
Installed Hadoop ecosystems (Hive, Pig, Sqoop, HBase, Oozie) on top of Hadoop cluster
Involved in importing data from SQL to HDFS and Hive for analytical purpose.
Implemented the workflows using Oozie framework to automate tasks.

Environment: Hadoop, Hue, HDFS, Spark, MapReduce, Hive, Oozie, Java, Python, NoSQL, Cloudera, Linux, MySQL, SQL.

Confidential, FL

Hadoop Developer

Responsibilities:

Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
Importing the unstructured data into the data lake HDFS using Flume.
Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
Written Map Reduce java programs to analyse the log data for large-scale data sets.
Involved in using HBase Java API on Java application.
Automated Sqoop jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
Implemented Map Reduce jobs using Java API and PIG Latin as well HIVEQL
Participated in the setup and deployment of Hadoop cluster.
Hands on design and development of an application using Hive (UDF).
Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
Developed UDFs in Java when necessary to use in PIG and HIVE queries
Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.

Environment: Hadoop, HDP (Horton works), Hive, Ambari, Zookeeper, Map Reduce, Sqoop, Pig, UNIX, Java, Eclipse, Oracle, SQL Server, MySQL.

Confidential

Java Developer

Responsibilities:

Designed the user interfaces using JSP.
Developed Custom tags, JSTL to support custom User Interfaces.
Developed the application using Struts (MVC) Framework.
Implemented Business processes such as user authentication, Account Transfer using Session EJBs.
Used Eclipse to write the code for JSP, Servlets, Struts and EJBs.
Deployed the applications on Web Logic Application Server.
Used Java Messaging Services (JMS) and Backend messaging for reliable and asynchronous exchange of important information such as payment status report.
Developed the entire Application(s) through Eclipse.
Worked with Web Logic Application Server to deploy the Application(s).
Developed the Ant scripts for preparing WAR files used to deploy J2EE components.
Used JDBC for database connectivity to Oracle.
Worked with Oracle Database to create tables, procedures, functions and select statements.
Used JUnit Testing, debugging, and bug fixing.
Used Log4J to capture the log that includes runtime exceptions and developed WAR framework to alert the client and production support in case of application failures.
Performed Data driven testing using Selenium and TestNG functions which reads data from property and XML files. Involved in CICD process using GIT, Jenkins job creation, Maven build and publish.
Used Maven to build and run the Selenium automation framework. Involved in building and deploying scripts using Maven to generate WAR, EAR and JAR files.

Environment: Eclipse, Web Sphere Application Server, JSP, Servlet, HTML, JUnit, JavaScript, CSS, EJB, Hibernate, Struts, XML, JAXP, CVS, JAX-RPC, AXIS, SOAP, TOAD, AJAX, Jenkins, Maven, Log4J, UNIX, Linux, Java, J2EE, JSP, Struts,

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Cambridge, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship