We provide IT Staff Augmentation Services!

Spark Scala /big Data Developer Resume

3.00/5 (Submit Your Rating)

Columbus, OH

SUMMARY:

  • An information technology professional having overall 8+ years of IT Experience which including 4 years of experience in Big Data development.
  • In depth understanding/knowledge of Hadoop Architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.
  • Experienced in Waterfall & Agile development methodology.
  • Expertise in writing Hadoop Jobs for analyzing data using Python, MapReduce, Hive and Pig
  • Experienced Scala in using and spark streaming and Akka for ongoing transactions for customers.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa
  • Experience in setting up Test, QA, and Prod environment.
  • Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
  • Experience in developing MapReduce (YARN) jobs for cleaning, accessing and validating the data. 
  • Experienced in Different Distributions like Cloudera, HortonWorks and MapR.
  • Experienced in Production jobs debugging when failed.
  • Experienced with streaming work flow operations and Hadoop jobs using Oozie workflow and scheduled through Autosys on a regular basis.
  • Experience with developing large-scale distributed applications.
  • Experience in developing solutions to analyze large data sets efficiently
  • Experience in Data Warehousing and ETL processes.
  • Expertise in deployment of Hadoop, Yarn, Spark and Storm integration with Cassandra, ignite and RabbitMQ, Kafka
  • Strong database, SQL, ETL and data analysis skills.
  • Good understanding of Data Mining and Machine Learning techniques
  • Experienced in NoSQL databases such as HBase, Cassandra and MongoDB
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Knowledge in Maintaining the Log and Audit information in SQL tables, Experienced in providing Logging, Error Handling by using Event Handler for SSIS Packages 
  • Experienced in designing, built, and deploying a multitude applications utilizing almost all of the AWS stack (Including EC2, S3,), focusing on high-availability, fault tolerance, and auto-scaling.
  • Experience with Amazon Web Services, AWS command line interface, and AWS data pipeline.
  • Experienced in BI tools like Tableau.
  • Excellent experience using Text mate on Ubuntu for writing Java, Scala and shell scripts.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala 
  • Knowledge on importing and exporting data using Flume and kafka
  • Expertise in testing complex Business rules created by mapping and various transformations using Informatica and other ETL tools . 
  • Experienced in developing applications using all Java/J2EE technologies like Servlets, JSP, EJB, JDBC, JNDI, JMS, SOAP, REST, GRAILS etc.
  • Experienced in developing applications using HIBERNATE (Object/Relational mapping framework).
  • Experience in writing database objects like Stored Procedures, Triggers, SQL, PL/SQL packages and Cursors for Oracle, SQL Server, DB2 and Sybase.
  • Proficient in writing build scripts using Ant & Maven.
  • Experienced in using CVS, SVN and Sharepoint as version manager.
  • Proficient in unit testing the application using Junit, MRUnit and logging the application using Log4J.
  • Ability to learn and adapt quickly and to correctly apply new tools and technology. Self-Motivated, Innovative, Analytical, Inter-Personal and a team player. Determined and ability to deliver with minimal guidance from seniors.

TECHNICAL SKILLS:

  • Hadoop/Big Data
  • HDFS
  • Map Reduce
  • Hive
  • Pig
  • Sqoop
  • Flume
  • Oozie
  • scala
  • spark
  • storm
  • Kafka
  • Rabbit MQ
  • Active MQ
  • ZooKeeper. HBase
  • Cassandra
  • CouchDB
  • MongoDB. Cloudera
  • HortonWorks
  • MapR. Teradata
  • MS SQL Server
  • Oracle
  • Informix
  • Sybase
  • Informatica
  • Datastage. JAVA
  • J2EE
  • Spring
  • Hibernate EJB
  • Webservices (JAX - RPC
  • JAXP
  • JAXM)
  • JMS
  • JNDI
  • Servlets
  • JSP
  • Jakarta Struts
  • Python. BEA Web Logic
  • IBM Websphere
  • JBoss
  • Tomcat. UML
  • OOAD. HTML
  • AJAX
  • CSS
  • XHTML
  • XML
  • XSL
  • XSLT
  • WSDL
  • JSON
  • SOAP
  • REST
  • GRAILS CVS
  • SVN
  • SharePoint
  • Clear Case
  • Clear Quest
  • Win CVS
  • Junit
  • MRUnit
  • Ant
  • Maven
  • Log4j
  • FrontPage Eclipse
  • NetBeans. Linux
  • UNIX
  • Windows

PROFESSIONAL EXPERIENCE:

Confidential,  Columbus, OH

Spark Scala /Big data developer

Responsibilities:

  • Responsible for architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design. 

  • Worked on analyzing Hadoop cluster and different big data analytical and processing tools including Pig, Hive, Spark, and Spark Streaming
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. 
  • Migrating various Hive UDF's and queries into Spark SQL for faster requests. 
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala
  • Hands on experience in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions. 
  • Experienced Scheduling jobs using Control-M.
  • Developed and implemented hive custom UDFs involving date functions.
  • Used sqoop to import data from Oracle to Hadoop.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop
  • Experienced in developing scripts for doing transformations using Scala.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts and move the data files within and outside of HDFS
  • Installed and configured HivePig, Sqoop and Oozie on the Hadoop cluster. 
  • Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability. 
  • Used Tableau for generating reports on weekly basis to the customer.
  • Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop
  • Implemented Kerberos Security Authentication protocol for existing cluster  

Technology: Spark, Spark Streaming, Akka, Kafka, Flume, Hive, Hbase, Scala, Java, Pig, Map Reduce, Zookeeper, Oozie

Confidential, Bellevue, WA

Spark Scala / Big Data Developer

Responsibilities:

  • Experienced in migrating the huge volume of data from EDW to IDW Environment.

  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration
  • Experienced in Migrating data of file sources and Mount sources from RDMS system to Hadoop using by using Sqoop .
  • Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations. 
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Experienced in creating data pipeline integrating kafka with spark streaming application used scala for writing applications.
  • Used sparkSQL for reading data from external sources and processes the the data using Scala computation framework.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and mysql
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Worked in transforming data from HBase to Hive as bulk operations.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations
  • Used spark for real-time batch processing.
  • Active member for developing POC on streaming data using Apache Kafka and Spark Streaming

Technology: Hadoop, Mapreduce, Hive, Pig, Hbase, Cassandra, Flume, Spark, Storm, Rabbit MQ, Active MQ, Sqoop, Accurev, Zookeeper, Oozie, Autosys, shell scripting.

Confidential, Winston Salem, NC

Hadoop Developer

Responsibilities:

  • Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.

  • Used Cloudera Distribution for Data Transformations.
  • Loaded home mortgage data from the existing DWH tables (SQL Server) to HDFS using Sqoop.
  • Wrote Hive Queries to have a consolidated view of the mortgage and retail data.
  • Orchestrated hundreds of sqoop scripts, pig scripts, hive queries using oozie workflows and sub-workflows.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability. 
  • Optimized MapReduce code, pig scripts and performance tuning and analysis. 
  • Loaded the load ready files from mainframes to Hadoop and files were converted to ASCII format.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data. 
  • Creating multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing. 
  • Developed PERL Scripts for code deployments.
  • Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP
  • Involved in Analyzing, designing, building &, testing of OLAP cubes with SSAS and in adding calculations using MDX
  • Good Understanding in Kafka Architecture and designing consumer and producer Applications.
  • Automated Sqoop, Hive and Pig scripts using work flow scheduler Oozie and maintained by Autosys Scheduler. 
  • Experienced in building Computation framework in Python for Spark POC
  • Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning, and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production

Technology: Hadoop, MapReduce, Hive, Pig, Hbase, Cassandra, MongoDB, Sqoop, Flume, Avro, Scala, Akka, Spark, kafka, Rabbit MQ, storm, Datameer, Teradata, SQL Server, IBM Mainframes, Perl Scripts, Java 7.0, Log4J, Junit, MRUnit, SVN, JIRA, shell scripting.

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Worked with technology and business groups for Hadoop migration strategy.

  • Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
  • Used Cloudera distribution for Data transformation and Data preparation.
  • Validated and Recommended on Hadoop Infrastructure and data center planning considering data growth.
  • Transferred data to and from cluster, using Sqoop and various storage media such as Informix tables and flat files.
  • Developed MapReduce programs and Hive queries to analyze sales pattern and customer satisfaction index over the data present in various relational database tables.
  • Worked extensively with Flume for importing data from various webservers to HDFS.
  • Worked extensively in performance optimization by adopting/deriving at appropriate design patterns of the MapReduce jobs by analyzing the I/O latency, map time, combiner time, reduce time etc.
  • Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB
  • Developed Pig scripts in the areas where extensive coding needs to be reduced.
  • Developed UDF’s for Pig as needed.
  • Followed Agile methodology for the entire project. 

Technology: Hadoop, MapReduce, Hive, Pig, MongoDB, Sqoop, Flume, Kafka, Impala, Python, Java 7.0, XML, WSDL, SOAP, Webservices, Oracle/Informix, Log4J, Junit, SVN.

Confidential, San Diego, CA

Sr. JAVA Developer

Responsibilities:

  • Involved in design process using UML & RUP (Rational Unified Process).

  • Developed different Components and Adapters of the integration framework using Stateless Session EJB.
  • Developed different interfaces using EJB Session Beans (Stateless) and Message Driven Beans for both synchronous and asynchronous communication.
  • Extensively interacted with SAP functional and technical teams in resolving technical and functional issues.
  • Effectively performed code refactoring to modularize the code and improve error handling and fault tolerance.
  • Provided second level and third level of production support in resolving issues relating to the interfaces.
  • Used Maven to build the project, run unit tests and deployed artifacts to Nexus repository
  • Developed the interfaces using Eclipse. Deployed the application in SAP Web Application Server.
  • Actively involved in configuration management tool CVS in managing the code.
  • Worked on Unit and Integration testing of the interfaces.
  • Involved in designing test plans, test cases and overall Unit and Integration testing of system.

Technology: EJB, JSP, Struts, Webservices, JMS, JNDI, JDBC, SAP Webapplication Server, Eclipse, Hibernate, SAP XI, SQL, Sybase, XML, XSD, WSDL, SOAP, RESTful, CVS, Win 2003 Server.

Confidential, Dallas, TX

Sr. JAVA Developer

Responsibilities:

  • Performed Code Reviews and responsible for Design, Code and Test signoff.

  • Assisting the team in development, clarifying on design issues and fixing the issues.
  • Involved in designing test plans, test cases and overall Unit and Integration testing of system.
  • Development of the logic for the Business tier using Session Beans (Stateful and Stateless).
  • Developed Web Services using JAX-RPC, JAXP, WSDL, JSON, SOAP, RESTful, XML to provide facility to obtain quote, receive updates to the quote, customer information, status updates and confirmations.
  • Extensively used SQL queries, PL/SQL stored procedures & triggers in data retrieval and updating of information in the Oracle database using JDBC.
  • Expert in writing, configuring and maintaining the Hibernate configuration files and writing and updating Hibernate mapping files for each Java object to be persisted.
  • Created CRUD applications using Groovy/Grails
  • Expert in writing Hibernate Query Language (HQL) and Tuning the hibernate queries for better performance.
  • Writing test cases using JUNIT, doing test first development.
  • Used Rational Clear Case & PVCS for source control. Also used Clear Quest for defect management.
  • Writing build files using ANT. Used Maven in conjunction with ANT to manage build files.
  • Running the nightly builds to deploy the application on different servers.

Technology: EJB, Webservices, Hibernate, Struts, JSP, JMS, JNDI, JDBC, Weblogic, SQL, PL/SQL, Oracle, Sybase, XML, XSLT, WSDL, SOAP, RESTful, GRAILS, UML, Rational Rose, Weblogic Workshop, OptimizeIt, Ant, JUnit, ClearCase, PVCS, ClearQuest, Win XP, Linux.

Confidential

JAVA Developer

Responsibilities:

  • Involved in designing and development using UML with Rational Rose

  • Played a significant role in performance tuning and optimizing the memory consumption of the application.
  • Developed various enhancements and features using Java 5.0
  • Developed advanced server side classes using Networks, IO and Multi-Threading.
  • Lead the issue management team and achieved significant stability to the product by bringing down the bug count to single digits.
  • Designed and developed various complex and advanced user interface using Swing.
  • Used SAX/DOM XML Parser for parsing the XML file

Technology: Java 5.0, JFC Swing, Multi-Threading, IO, Networks, XML, JBuilder, UML, CVS, WinCVS, Ant & JUnit, Win XP, Unix.

We'd love your feedback!