We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

0/5 (Submit Your Rating)

Kansas City, MissourI

SUMMARY

  • 4+ years of IT professional experience with full project lifecycle development in J2EE technologies, Requirements analysis, Design, Development, Testing, Big Data, Deployment and production support of software applications.
  • Experience in analyzing data using Hadoop Ecosystem including HDFS, Hive, HiveQL, Spark, Spark Streaming, SparkSQL, MLLib, Kafka, HBase, and Zookeeper.
  • Involved in converting Hive/SQL queries into Spark Transformations using RDD's and Scala.
  • Migrated the traditional MapReduce jobs to Spark jobs to improve the Speed of Data.
  • Experienced in WAMP (Windows, Apache, MYSQL) and LAMP (Linux, Apache, MySQL) Architecture.
  • Experience in working with Horton Works Hadoop stack and Amazon Web Services (AWS) suite.
  • Very good understanding of Hadoop architecture and the daemons ofHadoop - Name Node, Data Node, Resource Manager, Node Manager, Task Tracker, Job Tracker.
  • Good knowledge and experience in developing SOAP and REST APIs and frameworks like Django and Flask.
  • Building Data Warehousing and Datamart solution in Teradata and Big data platforms.
  • Experienced in developing Web Services with java programming language.
  • Experience in developing web applications and implementing Model View Control (MVC) architecture using server-side applications Django, Flask and Pyramid.
  • Hands on experience in installing, configuring and using ecosystem components likeHadoopMap Reduce, HDFS, HBase, Oozie, Hive, HCatalog, Pig, Flume.
  • Performed Data Integration between different Databases and to HDFS, Hive and Hbase using Talend Integration and Talend Big Data tools.
  • Experience in using database stage like oracle connector, Teradata connector, ODBC connector.
  • Experience in Designing, Compiling, Testing, and Scheduling and Running Data Stage jobs.
  • Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data.
  • Expertise in back-end procedure development, for RDBMS, Database Applications using SQL and PL/SQL.
  • Good knowledge with Big Data on Azure - Data lake store, Data Factory.
  • Good Knowledge on Informatica and worked when connected to Oracle using Informatica and used various transformations to perform the ETL tasks.
  • Hands on experience on writing Queries, Stored procedures, Functions and Triggers by using SQL.
  • Experienced in utilizing Java tools in business, Web, and client-server environments including Java Platform, J2EE, EJB, JSP, Java Servlets, Struts, and Java database Connectivity (JDBC) technologies.
  • Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.
  • Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.

TECHNICAL SKILLS

Languages: C, Java, Scala

Hadoop Distribution: Hortonworks, Cloudera

Hadoop Eco Systems: HDFS, MapReduce, Yarn, Pig, Hive, HiveQL, HBase, Sqoop, Flume, Oozie, Zookeeper, Cassandra, Kafka, Scala, Spark, Spark Streaming, Spark SQL and Storm.

Technologies: JSP, J2EE, JDBC, Hibernate, Spring, Ajax, RESTful web services

Development Tools(IDEs): Eclipse, NetBeans, Intellij

Web/Application Servers: Tomcat, WebLogic, IBM WebSphere, JBOSS

Database: Oracle 11g, SQL server 2008, MySQL, MS SQL Server, HBase

Platforms: Windows, Unix, Linux

Testing Tools: Junit, JIRA

Version Control Tools: Git, GitHub

Methodologies: Agile (SCRUM), Waterfall

Build Tools: Maven, Gradle

PROFESSIONAL EXPERIENCE

Confidential, Kansas City, Missouri

Hadoop/Spark Developer

Responsibilities:

  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zoo Keeper, Sqoop, Flume, Spark and Kafka.
  • Developed Spark code using Scala and Spark -SQL for faster testing and processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Exploring with theSpark improving the performance and optimization of the existing algorithms in Hadoop usingSpark Context,Spark -SQL, Data Frame, Pair RDD’s,Spark YARN.
  • Experienced with batch processing of data sources using ApacheSpark.
  • Developed analytical components using Scala,Spark, YARN andSpark Stream.
  • Experienced with NoSQL databases like HBase, MongoDB and Cassandra.
  • Installed Hadoop, Map Reduce, HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Involved in converting Map Reduce programs intoSpark transformations usingSpark RDD’s on Scala.
  • DevelopedSpark scripts by using Scala Shell commands as per the requirement.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs with control flows.
  • Expertise in different Data Modelling and Data Warehouse design and development.

Environment: Hadoop, HDFS, Spark, MapReduce, Pig, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, Java, SQL Scripting and Linux Shell Scripting.

Confidential, Naperville IL

Spark/Java Developer

Responsibilities:

  • Worked on developing streaming application using Spark Streaming (2.x). The end to end data flow includes NiFi, Kafka, Spark Streaming and HBase.
  • Developed Spark code using Scala andSpark -SQL for faster testing and processing of data.
  • Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HBase.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response
  • I have been involved in streaming the data i.e. Json format from different Kafka topics and loading the data into HBase in real time.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • I have developed a real-time data validation checks on the streaming data before loading to HBase tables.
  • Worked on Row Key design and table design.
  • Daily reports are generated on the HBase tables using Spark HBase API. Reports include Audit batch reports and Data Validations reports.
  • Involved on created Time series data for the daily data, which would help for further time series analysis and conduct machine learning techniques on the time series data.

Environment: Spark, Spark Streaming, Java, Scala, HBase, Hive, Kafka, Intellij, NiFi, Zeppelin

Confidential, Ann Arbor, MI

Software Engineer

Responsibilities:

  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zoo Keeper, Sqoop, Flume, Spark and Kafka.
  • Developed Spark code using Scala andSpark -SQL for faster testing and processing of data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop usingSpark Context,Spark -SQL, Data Frame, Pair RDD’s,Spark YARN.
  • Experienced with batch processing of data sources using Apache Spark.
  • Developed analytical components using Scala, Spark, YARN andSpark Stream.
  • Experienced with NoSQL databases like HBase, MongoDB and Cassandra.
  • Installed Hadoop, Map Reduce, HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Involved in Data Extraction from Oracle, Flat files and XML files using Talend by using Java as Backend Language.
  • Wrote UNIX shell scripts in combination with the Informatica sessions to process the source files and load into staging database.
  • Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Involved in converting Map Reduce programs intoSpark transformations usingSpark RDD’s on Scala.
  • DevelopedSpark scripts by using Scala Shell commands as per the requirement.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs with control flows.
  • Expertise in different Data Modelling and Data Warehouse design and development.

Environment: Hadoop, HDFS, Spark, MapReduce, Pig, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, Java, SQL Scripting, Oracle and Linux Shell Scripting.

Confidential

Software Engineer

Responsibilities:

  • Implemented server-side programs by using Servlets and JSP.
  • Designed, developed and validated User Interface using HTML, Java Script, XML and CSS.
  • Implemented MVC using Struts Framework.
  • Involved in implementing the DAO pattern for database access and used the JDBC API extensively.
  • Used XML Web services for transferring data between different applications and retrieving credit information from the credit bureau.
  • Used XML with DTD and its references with the files. Used JAXB API to bind XML schema to java classes.
  • Used JMS-MQ Bridge to send messages securely, reliably and asynchronously to WebSphere MQ, which connects to the legacy systems.
  • Tested the application functionality with JUnit Struts Test Cases.
  • GUI was developed using JSF and Java Swing.
  • Developed logging module-using Log4J to create log files to debug as well as trace application.
  • Used CVS for version control.
  • Extensively used ANT as a build tool. Deployed the applications on IBM Web Sphere Application Server.
  • Handled the database access by implementing Controller Servlet.
  • Implemented PL/SQL stored procedures and triggers.
  • Used JDBC prepared statements to call from Servlets for database access.
  • Used Log4J for any errors in the application. Written test cases using Junit.

Environment: Java 1.4, J2EE, JSP, Servlets, HTML, DHTML, XML, JavaScript, Eclipse, WebLogic, Struts, Web Sphere MQ 5.3, Java SDK 1.4, MVC, Core Java, Servlet 2.2, JSP 2.0, JDBC, PL/SQL, XML Web Services, XML DTD, Apache Tomcat, ASP, Spring1.0.2, SOAP, WSDL, JavaScript, Windows 2000, Oracle 9i, JUnit, CVS, ANT 1.5 and Log4J.

We'd love your feedback!