We provide IT Staff Augmentation Services!

Spark Scala /big Data Developer Resume

4.00/5 (Submit Your Rating)

Columbus, OH

SUMMARY:

  • An information technology professional having overall 8+ years of IT Experience which including 4 years of experience in Big Data development.
  • In depth understanding/knowledge of Hadoop Architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.
  • Experienced in Waterfall & Agile development methodology.
  • Expertise in writing Hadoop Jobs for analyzing data using Python, MapReduce, Hive and Pig
  • Experienced Scala in using and spark streaming and Akka for ongoing transactions for customers.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa
  • Experience in setting upTest, QA, and Prod environment.
  • Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
  • Experience in developing MapReduce (YARN) jobs for cleaning, accessing and validating the data.
  • Experienced in Different Distributions like Cloudera, HortonWorks and MapR.
  • Experienced in Productionjobs debugging when failed.
  • Experienced with streaming work flow operations and Hadoop jobs using Oozie workflow and scheduled throughAutosyson a regular basis.
  • Experience with developing large-scale distributed applications.
  • Experience in developing solutions to analyze large data sets efficiently
  • Experience in Data Warehousing and ETL processes.
  • Expertise in deployment ofHadoop, Yarn, Spark and Storm integration with Cassandra, ignite and RabbitMQ, Kafka.
  • Strong database, SQL, ETL and data analysis skills.
  • Good understanding of Data Mining and Machine Learning techniques
  • Experienced in NoSQL databases such as HBase, Cassandra and MongoDB
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Knowledge in Maintaining the Log and Audit information in SQL tables, Experienced in providing Logging, Error Handling by using Event Handler forSSISPackages
  • Experienced in designing, built, and deploying a multitude applications utilizing almost all of the AWS stack (Including EC2, S3,), focusing on high-availability, fault tolerance, and auto-scaling.
  • Experience with Amazon Web Services, AWS command line interface, and AWS data pipeline.
  • Experienced in BI tools like Tableau.
  • Excellent experience using Text mate on Ubuntu for writing Java, Scala and shell scripts.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
  • Knowledge on importing and exporting data using Flume and kafka.
  • Expertise in testing complex Business rules created by mapping and various transformations using Informatica and other ETL tools.
  • Experienced in developing applications using all Java/J2EE technologies like Servlets, JSP, EJB, JDBC, JNDI, JMS, SOAP, REST, GRAILS etc.
  • Experienced in developing applications using HIBERNATE (Object/Relational mapping framework).
  • Experience in writing database objects like Stored Procedures, Triggers, SQL, PL/SQL packages and Cursors for Oracle, SQL Server, DB2 and Sybase.
  • Proficient in writing build scripts using Ant & Maven.
  • Experienced in using CVS, SVN and Sharepoint as version manager.
  • Proficient in unit testing the application using Junit, MRUnit and logging the application using Log4J.
  • Ability to learn and adapt quickly and to correctly apply new tools and technology. Self-Motivated, Innovative, Analytical, Inter-Personal and a team player. Determined and ability to deliver with minimal guidance from seniors.

TECHNICAL SKILLS:

Hadoop/Big Data, HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, scala, spark, storm, Kafka, Rabbit MQ, Active MQ, ZooKeeper.

HBase, Cassandra, CouchDB, MongoDB.

Cloudera, HortonWorks, MapR.

Teradata, MS SQL Server, Oracle, Informix, Sybase, Informatica, Datastage.

JAVA, J2EE, Spring, Hibernate EJB, Webservices (JAX-RPC, JAXP, JAXM), JMS, JNDI, Servlets, JSP, Jakarta Struts, Python.

BEA Web Logic, IBM Websphere, JBoss, Tomcat.

UML, OOAD.

HTML, AJAX, CSS, XHTML, XML, XSL, XSLT, WSDL, JSON, SOAP, REST, GRAILS

CVS, SVN, SharePoint, Clear Case, Clear Quest, Win CVS, Junit, MRUnit, Ant, Maven, Log4j, FrontPage

Eclipse, NetBeans.

Linux, UNIX, Windows

PROFESSIONAL EXPERIENCE:

Confidential, Columbus, OH

Spark Scala /Big data developer

Responsibilities:

  • Responsible for architectingHadoopclusters Translation of functional and technical requirements into detailed architecture and design.
  • Worked on analyzingHadoopcluster and different big data analytical and processing tools including Pig, Hive, Spark, and Spark Streaming.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Used Spark 1.6.2 version for building spark application using scala.
  • Migrating various Hive UDF's and queries into Spark SQL for faster requests.
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Hands on experience in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
  • Experienced Scheduling jobs using Control-M.
  • Developed and implemented hive custom UDFs involving date functions.
  • Used sqoop to import data from Oracle to Hadoop.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Experienced in developing scripts for doing transformations using Scala.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts and move the data files within and outside of HDFS.
  • Installed and configured Hive,Pig, Sqoop and Oozie on theHadoopcluster.
  • Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
  • Used Tableau for generating reports on weekly basis to the customer.
  • AnalyzingHadoopcluster and different Big Data analytic tools includingPig, Hive, HBase and Sqoop.
  • Implemented Kerberos Security Authentication protocol for existing cluster

Technology: Spark, Spark Streaming, Akka, Kafka, Flume, Hive, Hbase, Scala, Java, Pig, Map Reduce, Zookeeper, Oozie

Confidential, Bellevue, WA

Spark Scala / Big Data Developer

Responsibilities:

  • Experienced in migrating the huge volume of data from EDW to IDW Environment.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Experienced in Migrating data of file sources and Mount sources from RDMS system to Hadoop using by using Sqoop.
  • Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Experienced in creating data pipeline integrating kafka with spark streaming application used scala for writing applications.
  • Used sparkSQL for reading data from external sources and processes the the data using Scala computation framework.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source toHadoopusing shell script, sqoop, package and mysql.
  • UsedPigas ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing withPig.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Worked in transforming data from HBase to Hive as bulk operations.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations
  • Used spark for real-time batch processing.
  • Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.

Technology: Hadoop, Mapreduce, Hive, Pig, Hbase, Cassandra, Flume, Spark, Storm, Rabbit MQ, Active MQ, Sqoop, Accurev, Zookeeper, Oozie, Autosys, shell scripting.

Confidential, Winston Salem, NC

Hadoop Developer

Responsibilities:

  • Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.
  • Used Cloudera Distribution for Data Transformations.
  • Loaded home mortgage data from the existing DWH tables (SQL Server) to HDFS using Sqoop.
  • Wrote Hive Queries to have a consolidated view of the mortgage and retail data.
  • Orchestrated hundreds of sqoop scripts, pig scripts, hive queries using oozie workflows and sub-workflows.
  • Configured Oozie workflow to run multiple Hive andPigjobs which run independently with time and data availability.
  • Optimized MapReduce code,pigscripts and performance tuning and analysis.
  • Loaded the load ready files from mainframes to Hadoop and files were converted to ASCII format.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
  • Creating multiple MapReduce jobs inPigand Hive for data cleaning and pre-processing.
  • Developed PERL Scripts for code deployments.
  • Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP.
  • Involved in Analyzing, designing, building &, testing of OLAP cubes with SSAS and in adding calculations using MDX.
  • Good Understanding in Kafka Architecture and designing consumer and producer Applications.
  • Automated Sqoop, Hive andPigscripts using work flow scheduler Oozie and maintained by Autosys Scheduler.
  • Experienced in building Computation framework in Python for Spark POC
  • Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning, and troubleshootingHadoopclusters in different environments such as Development Cluster, Test Cluster andProduction.

Technology: Hadoop, MapReduce, Hive, Pig, Hbase, Cassandra, MongoDB, Sqoop, Flume, Avro, Scala, Akka, Spark, kafka, Rabbit MQ, storm, Datameer, Teradata, SQL Server, IBM Mainframes, Perl Scripts, Java 7.0, Log4J, Junit, MRUnit, SVN, JIRA, shell scripting.

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Worked with technology and business groups for Hadoop migration strategy.
  • Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
  • Used Cloudera distribution for Data transformation and Data preparation.
  • Validated and Recommended on Hadoop Infrastructure and data center planning considering data growth.
  • Transferred data to and from cluster, using Sqoop and various storage media such as Informix tables and flat files.
  • Developed MapReduce programs and Hive queries to analyze sales pattern and customer satisfaction index over the data present in various relational database tables.
  • Worked extensively with Flume for importing data from various webservers to HDFS.
  • Worked extensively in performance optimization by adopting/deriving at appropriate design patterns of the MapReduce jobs by analyzing the I/O latency, map time, combiner time, reduce time etc.
  • Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation inMongoDB
  • Developed Pig scripts in the areas where extensive coding needs to be reduced.
  • Developed UDF’s for Pig as needed.
  • Followed Agile methodology for the entire project.

Technology: Hadoop, MapReduce, Hive, Pig, MongoDB, Sqoop, Flume, Kafka, Impala, Python, Java 7.0, XML, WSDL, SOAP, Webservices, Oracle/Informix, Log4J, Junit, SVN.

Confidential, San Diego, CA

Sr. JAVA Developer

Responsibilities:

  • Involved in design process using UML & RUP (Rational Unified Process).
  • Developed different Components and Adapters of the integration framework using Stateless Session EJB.
  • Developed different interfaces using EJB Session Beans (Stateless) and Message Driven Beans for both synchronous and asynchronous communication.
  • Extensively interacted with SAP functional and technical teams in resolving technical and functional issues.
  • Effectively performed code refactoring to modularize the code and improve error handling and fault tolerance.
  • Provided second level and third level of production support in resolving issues relating to the interfaces.
  • Used Maven to build the project, run unit tests and deployed artifacts toNexusrepository
  • Developed the interfaces using Eclipse. Deployed the application in SAP Web Application Server.
  • Actively involved in configuration management tool CVS in managing the code.
  • Worked on Unit and Integration testing of the interfaces.
  • Involved in designing test plans, test cases and overall Unit and Integration testing of system.

Technology: EJB, JSP, Struts, Webservices, JMS, JNDI, JDBC, SAP Webapplication Server, Eclipse, Hibernate, SAP XI, SQL, Sybase, XML, XSD, WSDL, SOAP, RESTful, CVS, Win 2003 Server.

Confidential, Dallas, TX

Sr. JAVA Developer

Responsibilities:

  • Performed Code Reviews and responsible for Design, Code and Test signoff.
  • Assisting the team in development, clarifying on design issues and fixing the issues.
  • Involved in designing test plans, test cases and overall Unit and Integration testing of system.
  • Development of the logic for the Business tier using Session Beans (Stateful and Stateless).
  • Developed Web Services using JAX-RPC, JAXP, WSDL, JSON, SOAP, RESTful, XML to provide facility to obtain quote, receive updates to the quote, customer information, status updates and confirmations.
  • Extensively used SQL queries, PL/SQL stored procedures & triggers in data retrieval and updating of information in the Oracle database using JDBC.
  • Expert in writing, configuring and maintaining the Hibernate configuration files and writing and updating Hibernate mapping files for each Java object to be persisted.
  • Created CRUD applications using Groovy/Grails
  • Expert in writing Hibernate Query Language (HQL) and Tuning the hibernate queries for better performance.
  • Writing test cases using JUNIT, doing test first development.
  • Used Rational Clear Case & PVCS for source control. Also used Clear Quest for defect management.
  • Writing build files using ANT. Used Maven in conjunction with ANT to manage build files.
  • Running the nightly builds to deploy the application on different servers.

Technology: EJB, Webservices, Hibernate, Struts, JSP, JMS, JNDI, JDBC, Weblogic, SQL, PL/SQL, Oracle, Sybase, XML, XSLT, WSDL, SOAP, RESTful, GRAILS, UML, Rational Rose, Weblogic Workshop, OptimizeIt, Ant, JUnit, ClearCase, PVCS, ClearQuest, Win XP, Linux.

We'd love your feedback!