Spark Scala /big Data Developer Resume
Columbus, OH
SUMMARY:
- An information technology professional having overall 8+ years of IT Experience which including 4 years of experience in Big Data development.
- In depth understanding/knowledge of Hadoop Architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.
- Experienced in Waterfall & Agile development methodology.
- Expertise in writing Hadoop Jobs for analyzing data using Python, MapReduce, Hive and Pig
- Experienced Scala in using and spark streaming and Akka for ongoing transactions for customers.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa
- Experience in setting up Test, QA, and Prod environment.
- Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
- Experience in developing MapReduce (YARN) jobs for cleaning, accessing and validating the data.
- Experienced in Different Distributions like Cloudera, HortonWorks and MapR.
- Experienced in Production jobs debugging when failed.
- Experienced with streaming work flow operations and Hadoop jobs using Oozie workflow and scheduled through Autosys on a regular basis.
- Experience with developing large-scale distributed applications.
- Experience in developing solutions to analyze large data sets efficiently
- Experience in Data Warehousing and ETL processes.
- Expertise in deployment of Hadoop, Yarn, Spark and Storm integration with Cassandra, ignite and RabbitMQ, Kafka.
- Strong database, SQL, ETL and data analysis skills.
- Good understanding of Data Mining and Machine Learning techniques
- Experienced in NoSQL databases such as HBase, Cassandra and MongoDB
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Knowledge in Maintaining the Log and Audit information in SQL tables, Experienced in providing Logging, Error Handling by using Event Handler for SSIS Packages
- Experienced in designing, built, and deploying a multitude applications utilizing almost all of the AWS stack (Including EC2, S3,), focusing on high-availability, fault tolerance, and auto-scaling.
- Experience with Amazon Web Services, AWS command line interface, and AWS data pipeline.
- Experienced in BI tools like Tableau.
- Excellent experience using Text mate on Ubuntu for writing Java, Scala and shell scripts.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
- Knowledge on importing and exporting data using Flume and kafka .
- Expertise in testing complex Business rules created by mapping and various transformations using Informatica and other ETL tools .
- Experienced in developing applications using all Java/J2EE technologies like Servlets, JSP, EJB, JDBC, JNDI, JMS, SOAP, REST, GRAILS etc.
- Experienced in developing applications using HIBERNATE (Object/Relational mapping framework).
- Experience in writing database objects like Stored Procedures, Triggers, SQL, PL/SQL packages and Cursors for Oracle, SQL Server, DB2 and Sybase.
- Proficient in writing build scripts using Ant & Maven.
- Experienced in using CVS, SVN and Sharepoint as version manager.
- Proficient in unit testing the application using Junit, MRUnit and logging the application using Log4J.
- Ability to learn and adapt quickly and to correctly apply new tools and technology. Self-Motivated, Innovative, Analytical, Inter-Personal and a team player. Determined and ability to deliver with minimal guidance from seniors.
TECHNICAL SKILLS:
- Hadoop/Big Data
- HDFS
- Map Reduce
- Hive
- Pig
- Sqoop
- Flume
- Oozie
- scala
- spark
- storm
- Kafka
- Rabbit MQ
- Active MQ
- ZooKeeper. HBase
- Cassandra
- CouchDB
- MongoDB. Cloudera
- HortonWorks
- MapR. Teradata
- MS SQL Server
- Oracle
- Informix
- Sybase
- Informatica
- Datastage. JAVA
- J2EE
- Spring
- Hibernate EJB
- Webservices (JAX - RPC
- JAXP
- JAXM)
- JMS
- JNDI
- Servlets
- JSP
- Jakarta Struts
- Python. BEA Web Logic
- IBM Websphere
- JBoss
- Tomcat. UML
- OOAD. HTML
- AJAX
- CSS
- XHTML
- XML
- XSL
- XSLT
- WSDL
- JSON
- SOAP
- REST
- GRAILS CVS
- SVN
- SharePoint
- Clear Case
- Clear Quest
- Win CVS
- Junit
- MRUnit
- Ant
- Maven
- Log4j
- FrontPage Eclipse
- NetBeans. Linux
- UNIX
- Windows
PROFESSIONAL EXPERIENCE:
Confidential, Columbus, OH
Spark Scala /Big data developer
Responsibilities:
-
Responsible for architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design.
- Worked on analyzing Hadoop cluster and different big data analytical and processing tools including Pig, Hive, Spark, and Spark Streaming.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Migrating various Hive UDF's and queries into Spark SQL for faster requests.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Hands on experience in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Experienced Scheduling jobs using Control-M.
- Developed and implemented hive custom UDFs involving date functions.
- Used sqoop to import data from Oracle to Hadoop.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Experienced in developing scripts for doing transformations using Scala.
- Involved in developing Shell scripts to orchestrate execution of all other scripts and move the data files within and outside of HDFS.
- Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Used Tableau for generating reports on weekly basis to the customer.
- Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
- Implemented Kerberos Security Authentication protocol for existing cluster
Technology: Spark, Spark Streaming, Akka, Kafka, Flume, Hive, Hbase, Scala, Java, Pig, Map Reduce, Zookeeper, Oozie
Confidential, Bellevue, WA
Spark Scala / Big Data Developer
Responsibilities:
-
Experienced in migrating the huge volume of data from EDW to IDW Environment.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Experienced in Migrating data of file sources and Mount sources from RDMS system to Hadoop using by using Sqoop .
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Experienced in creating data pipeline integrating kafka with spark streaming application used scala for writing applications.
- Used sparkSQL for reading data from external sources and processes the the data using Scala computation framework.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and mysql.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
- Worked in transforming data from HBase to Hive as bulk operations.
- Implemented POC to migrate map reduce jobs into Spark RDD transformations
- Used spark for real-time batch processing.
- Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
Technology: Hadoop, Mapreduce, Hive, Pig, Hbase, Cassandra, Flume, Spark, Storm, Rabbit MQ, Active MQ, Sqoop, Accurev, Zookeeper, Oozie, Autosys, shell scripting.
Confidential, Winston Salem, NC
Hadoop Developer
Responsibilities:
-
Created Hive Tables, loaded retail transactional data from Teradata using Sqoop.
- Used Cloudera Distribution for Data Transformations.
- Loaded home mortgage data from the existing DWH tables (SQL Server) to HDFS using Sqoop.
- Wrote Hive Queries to have a consolidated view of the mortgage and retail data.
- Orchestrated hundreds of sqoop scripts, pig scripts, hive queries using oozie workflows and sub-workflows.
- Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
- Optimized MapReduce code, pig scripts and performance tuning and analysis.
- Loaded the load ready files from mainframes to Hadoop and files were converted to ASCII format.
- Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Creating multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Developed PERL Scripts for code deployments.
- Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP.
- Involved in Analyzing, designing, building &, testing of OLAP cubes with SSAS and in adding calculations using MDX.
- Good Understanding in Kafka Architecture and designing consumer and producer Applications.
- Automated Sqoop, Hive and Pig scripts using work flow scheduler Oozie and maintained by Autosys Scheduler.
- Experienced in building Computation framework in Python for Spark POC
- Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning, and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production.
Technology: Hadoop, MapReduce, Hive, Pig, Hbase, Cassandra, MongoDB, Sqoop, Flume, Avro, Scala, Akka, Spark, kafka, Rabbit MQ, storm, Datameer, Teradata, SQL Server, IBM Mainframes, Perl Scripts, Java 7.0, Log4J, Junit, MRUnit, SVN, JIRA, shell scripting.
Confidential, Dallas, TX
Hadoop Developer
Responsibilities:
-
Worked with technology and business groups for Hadoop migration strategy.
- Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
- Used Cloudera distribution for Data transformation and Data preparation.
- Validated and Recommended on Hadoop Infrastructure and data center planning considering data growth.
- Transferred data to and from cluster, using Sqoop and various storage media such as Informix tables and flat files.
- Developed MapReduce programs and Hive queries to analyze sales pattern and customer satisfaction index over the data present in various relational database tables.
- Worked extensively with Flume for importing data from various webservers to HDFS.
- Worked extensively in performance optimization by adopting/deriving at appropriate design patterns of the MapReduce jobs by analyzing the I/O latency, map time, combiner time, reduce time etc.
- Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB
- Developed Pig scripts in the areas where extensive coding needs to be reduced.
- Developed UDF’s for Pig as needed.
- Followed Agile methodology for the entire project.
Technology: Hadoop, MapReduce, Hive, Pig, MongoDB, Sqoop, Flume, Kafka, Impala, Python, Java 7.0, XML, WSDL, SOAP, Webservices, Oracle/Informix, Log4J, Junit, SVN.
Confidential, San Diego, CA
Sr. JAVA Developer
Responsibilities:
-
Involved in design process using UML & RUP (Rational Unified Process).
- Developed different Components and Adapters of the integration framework using Stateless Session EJB.
- Developed different interfaces using EJB Session Beans (Stateless) and Message Driven Beans for both synchronous and asynchronous communication.
- Extensively interacted with SAP functional and technical teams in resolving technical and functional issues.
- Effectively performed code refactoring to modularize the code and improve error handling and fault tolerance.
- Provided second level and third level of production support in resolving issues relating to the interfaces.
- Used Maven to build the project, run unit tests and deployed artifacts to Nexus repository
- Developed the interfaces using Eclipse. Deployed the application in SAP Web Application Server.
- Actively involved in configuration management tool CVS in managing the code.
- Worked on Unit and Integration testing of the interfaces.
- Involved in designing test plans, test cases and overall Unit and Integration testing of system.
Technology: EJB, JSP, Struts, Webservices, JMS, JNDI, JDBC, SAP Webapplication Server, Eclipse, Hibernate, SAP XI, SQL, Sybase, XML, XSD, WSDL, SOAP, RESTful, CVS, Win 2003 Server.
Confidential, Dallas, TX
Sr. JAVA Developer
Responsibilities:
-
Performed Code Reviews and responsible for Design, Code and Test signoff.
- Assisting the team in development, clarifying on design issues and fixing the issues.
- Involved in designing test plans, test cases and overall Unit and Integration testing of system.
- Development of the logic for the Business tier using Session Beans (Stateful and Stateless).
- Developed Web Services using JAX-RPC, JAXP, WSDL, JSON, SOAP, RESTful, XML to provide facility to obtain quote, receive updates to the quote, customer information, status updates and confirmations.
- Extensively used SQL queries, PL/SQL stored procedures & triggers in data retrieval and updating of information in the Oracle database using JDBC.
- Expert in writing, configuring and maintaining the Hibernate configuration files and writing and updating Hibernate mapping files for each Java object to be persisted.
- Created CRUD applications using Groovy/Grails
- Expert in writing Hibernate Query Language (HQL) and Tuning the hibernate queries for better performance.
- Writing test cases using JUNIT, doing test first development.
- Used Rational Clear Case & PVCS for source control. Also used Clear Quest for defect management.
- Writing build files using ANT. Used Maven in conjunction with ANT to manage build files.
- Running the nightly builds to deploy the application on different servers.
Technology: EJB, Webservices, Hibernate, Struts, JSP, JMS, JNDI, JDBC, Weblogic, SQL, PL/SQL, Oracle, Sybase, XML, XSLT, WSDL, SOAP, RESTful, GRAILS, UML, Rational Rose, Weblogic Workshop, OptimizeIt, Ant, JUnit, ClearCase, PVCS, ClearQuest, Win XP, Linux.
Confidential
JAVA Developer
Responsibilities:
-
Involved in designing and development using UML with Rational Rose
- Played a significant role in performance tuning and optimizing the memory consumption of the application.
- Developed various enhancements and features using Java 5.0
- Developed advanced server side classes using Networks, IO and Multi-Threading.
- Lead the issue management team and achieved significant stability to the product by bringing down the bug count to single digits.
- Designed and developed various complex and advanced user interface using Swing.
- Used SAX/DOM XML Parser for parsing the XML file
Technology: Java 5.0, JFC Swing, Multi-Threading, IO, Networks, XML, JBuilder, UML, CVS, WinCVS, Ant & JUnit, Win XP, Unix.