We provide IT Staff Augmentation Services!

Hadoop Developer/ Spark Developer/scala Resume

0/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • Around 8 years of experience in Design, Analysis and Development of software application using Big Data/ Hadoop, Spark and Java/JEE Technologies.
  • Knowledge in Spark Core, Spark - SQL, Spark Streaming and machine learning using Scala and Python Programming languages.
  • Worked on Open Source Apache Hadoop, Amazon EMR clusterCloudera Enterprise (CDH) and Hortonworks Data Platform (HDP) amazon EMR
  • Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, HBase, Zookeeper, Task Tracker, Name Node, Data Node, Sqoop, Oozie, Cassandra, Flume and Avro.
  • Developed various Map Reduce applications to perform ETL workloads on terabytes of data
  • Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Good understanding of RDD operations in Apache Spark like Transformations &Actions, Persistence/ Caching, Accumulators, Broadcast Variables, Optimising Broadcasts.
  • Hands on experience in performing aggregations on data using Hive Query Language (HQL).
  • Developed MapReduce programs in java.
  • Good experience in extending the core functionality of Hive and Pig by developing user-defined functions to provide custom capabilities to these languages.
  • Expertise in Hadoop Ecosystem components HDFS, Map Reduce, Hive, Pig, Sqoop, Hbase and Flume for Data Analytics.
  • Have a hands-on experience on fetching the live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
  • Capable of processing large sets of structured, semi-structured and unstructured data sets.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Expertise in writing Map-Reduce Jobs in Java for processing large sets of structured, semi-structured and unstructured data sets and store them in HDFS.
  • Experience in developing Custom UDFs for datasets in Pig and Hive.
  • Analyse latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Designed and Developed Shell Scripts and Sqoop Scripts to migrate data in and out of HDFS
  • Designed and Developed Oozie workflows to execute MapReduce jobs, Hive scripts, shell scripts and sending email notifications
  • Worked on pipeline and partitioning parallelism techniques and ensured load balancing of data
  • Deployed different partitioning methods like Hash by field, Round Robin, Entire, Modulus, and Range for bulk data loading
  • Hands on experience in working with input file formats like parquet, json, Avro.
  • Worked on Extraction, Transformation, and Loading (ETL) of data from multiple sources like Flat files, XML files and Databases.
  • Hands-on experience in J2EE technologies such as Servlets, JSP, EJB, JDBC and developing Web Services providers and consumers using SOAP, REST.
  • Used Agile Development Methodology and Scrum for the development process
  • Good Knowledge in HTML, CSS, JavaScript and web-based applications.
  • Worked extensively in design and development of business process using SQOOP, PIG, HIVE, HBASE

TECHNICAL SKILLS

Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, HBase, Flume, Pig, Sqoop, Map Reduce, Cloudera.

Big data distribution: Cloudera, Amazon EMR, related tools and systems

Programming languages: Core Java, Scala, SQL, Shell Scripting

Operating Systems: Windows, Linux (Ubuntu)

Databases: Oracle, SQL Server, data stores such as DynamoDB, Cassandra

Designing Tools: Eclipse

Java Technologies: JSP, Servlets, Junit, Spring, Hibernate

Web Technologies: XML, HTML, JavaScript, JVM, JQuery, JSON

Linux Experience: System Administration Tools, Puppet, Apache

Web Services: Web Service (RESTfuland SOAP)

Frame Works: Jakarta Struts 1.x, Spring 2.x

Development methodologies: Agile, Waterfall

Logging Tools: Log4j

Application / Web Servers: Apache Tomcat, WebSphere, Weblogic

Messaging Services: Kafka

Version Tools: Git, SVN and CVS

Analytics: Tableau

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Hadoop developer/ Spark developer/Scala

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Good understanding and related experience with Hadoop stack - internals, Hive, Pig and Map/Reduce
  • The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Involved in managing and reviewing Hadoop log files
  • Load and transform large sets of structured, semi structured and unstructured data
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Importing and exporting data into HDFS and HIVE using Sqoop.
  • Involved in gathering the requirements, designing, development and testing.
  • Developed PIG scripts for source data validation and transformation.
  • Designing and developing tables in HBase and storing aggregating data from Hive.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Developed Spark core and Spark-SQL scripts using Scala for faster data processing.
  • Involved in code review and bug fixing for improving the performance.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data
  • Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Real time streaming the data using Spark with Kafka data stores such as DynamoDB, Cassandra.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala and related tools and systems
  • Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and administration.
  • Developing and maintaining efficient ETL Talend jobs for Data Ingest.
  • Worked on Talend RTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
  • Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
  • Involved in migration Hadoop jobs into higher environments like SIT, UAT and Prod.

Environment: Cloudera, HDFS, Hive, Scala, Map Reduce, Storm, Java, HBase, Pig, Sqoop, Shell Scripts, Oozie, Coordinator, MySQL, Tableau, Elastic search, Talend and SFTP. Spark RDD, Kafka, Python, Horton works, Intellij, Azkaban, Ambari/Hue, Jenkins, Apache NiFi.

Confidential, Bentonville, Arkansas

Hadoop/Spark Developer

Responsibilities:

  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
  • Integrated the hive warehouse with HBase
  • Migrating the needed data from MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
  • Load the data into HBase tables for UI web application.
  • Written customized Hive UDFs in Java where the functionality is too complex.
  • Maintain System integrity of all sub-components related to Hadoop.
  • Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • HiveQL scripts to create, load, and query tables in a Hive.
  • Supported Map Reduce Programs those are running on the cluster
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Developed Apache spark jobs using Scala in test environment for faster data processing and used spark SQL for querying.
  • Developed scala code using specific monad pattern for different calculations based on the requirement.
  • Developed and executed shell scripts to automate the jobs
  • Written complex hive queries and automated using Azkaban for analyzing hourly calculations.
  • Analyzed large amounts of data sets using Pig scripts and Hive scripts.
  • Worked with Hue manager for developing hive queries and checking data in both development and production environments.
  • Developed Pig Latin scripts for extracting data.
  • Used Pig for data loading, filtering and storing the data.
  • Worked on Data Integration from different source systems.
  • Used Robo Mongo for storing the data.

Environment: Robo Mongo, Pig Latin, Hive, Azkaban, Shell, Scala, Spark SQL, Scala, Apache Spark, Map reduce, Hadoop, UDFs, HBase, HDFS, My SQL, Sqoop, MR.

Confidential, Houston TX

Hadoop Developer

Responsibilities:

  • Experience in supporting and managing Hadoop Clusters using Hortonworks distributions by deploying it on AWS cloud.
  • Collected aggregated large amount of web log data from different sources such as web servers, mobile and network devices using Apache Kafka.
  • Ingestion framework was developed in python Big Data technologies with data stores such as DynamoDB, Cassandra.
  • Creating the RDD’s, Data frames for faster execution and performing data transformations and actions using Spark.
  • Developed optimal strategies for distributing the web log data over the cluster.
  • Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
  • Configuring Spark Streaming to receive real time data from the Kafka for high speed data processing and Store the stream data to HDFS.
  • Used Scala to read text data, CSV data, image data from HDFS, S3 and Hive
  • Worked on Spark SQL for faster execution of Hive queries using Spark SQL Context.
  • Implemented complex big data with a focus on collecting, parsing, managing, analyzing, and visualizing large sets of data to turn information into business insights using multiple platforms in the Hadoop ecosystem.
  • The developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Involved in source system analysis, data analysis, and data modeling to ETL (Extract, Transform and Load).
  • Written Spark programs to model data for extraction, transformation, and aggregation from multiple file-formats including XML, JSON, CSV& other compressed file formats.
  • Imported data from the structured data source into HDFS using Sqoop incremental imports.
  • Created Hive tables, partitions and implemented incremental imports to perform ad-hoc queries on structured data.
  • Build Hive tables using list partitioning and hash partitioning and created Hive Generic UDF's to process business logic with HiveQL.
  • Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
  • Supported MapReduce Programs that are running on the cluster and Wrote MapReduce jobs using JavaAPI.
  • Designed unit test Data models and applications for data analytics solutions on streaming data

Environment: Hortonworks, HDFS, Hive, Sqoop, Oozie, Storm, Scala 2.11.8, Spark 2.0, Spark SQL, Spark streaming, Python, Kafka, GitHub, Kerberos, AWS, Amazon S3, Amazon EC2, Amazon EBS, Tableau.

Confidential

Java Developer

Responsibilities:

  • Full life cycle experience including requirements analysis, high level design, detailed design, UMLs, data model design, coding, testing and creation of functional and technical design documentation.
  • Used Spring Framework for MVC architecture with Hibernate to implement DAO code and also used Web Services to interact other modules and integration testing.
  • Developed and implemented GUI functionality using JSP, JSTL, Tiles and AJAX.
  • Designed database and involved in developing SQL Scripts.
  • Used SQL navigator as a and involved in testing the application.
  • Implementing the Design Patterns like MVC-2, Front Controller, Composite view and all Struts framework design patterns to improve the performance.
  • Designing the Use Case Diagrams, Class Model, Sequence diagrams, for SDLC process of the application.
  • Implemented GUI pages by using JavaScript, HTML, JSP, CSS, and AJAX.
  • Designed and developed UI components using JSP, JMS, JSTL.
  • Deployed project on Web Sphere application server in Linux environment.
  • Implemented the online application by using Web Services (SOAP), JSP, Servlets, JDBC, and Core Java.
  • Implemented Singleton, DAO Design Patterns, factory design pattern based on the application requirements.
  • Used DOM and SAX parsers to parse the raw XML documents.
  • Tested the web services with SOAP UI tool.
  • Developed back end interfaces using PL/SQL packages, stored procedures, Functions, Procedure, Anonymous PL/SQL programs, Cursor management, Exception Handling in PL/SQL programs.

Environment: PL/SQL, SOAP UI, DOM, SAX, Prasers, XML, Core Java, JDBC, JSP, Servlets, Linux, JSTL, JMS, HTML, CSS, AJAX, SDLC, MVC-2, Struts, SQL, Tiles, MVC, DAO, UMLs.

Confidential

Java Developer

Responsibilities:

  • Have good knowledge on database structuring and management using MYSQL.
  • Have strong knowledge of E-Commerce modules like Business to Customer, Business to
  • Business and Customer to Customer transaction.
  • Have good knowledge of data base query and transaction and database join.
  • Having good working knowledge of SAP Advance Business Application programming.
  • Certification & Training Details
  • Implemented the User Login logic using Spring MVC framework encouraging application architectures based on the Model View Controller design paradigm
  • Used various Java, J2EE APIs including JDBC, XML, Servlets, and JSP.
  • Generated Hibernate Mapping files and created the data model using mapping files
  • Developed UI using JavaScript, JSP, HTML and CSS for interactive cross browser functionality and complex user interface
  • Developed action classes and form beans and configured the struts-config.xml
  • Provided client-side validations using Struts Validator framework and JavaScript
  • Created business logic using servlets and session beans and deployed them on Apache Tomcat server
  • Created complex SQL Queries, PL/SQL Stored procedures and functions for back end
  • Prepared the functional, design and test case specifications
  • Performed unit testing, system testing and integration testing
  • Developed and implemented GUI functionality using JSP, JSTL, Tiles and AJAX.
  • Designed database and involved in developing SQL Scripts and related tools and systems.
  • Used SQL navigator as a tool to interact with DB Oracle 10g.
  • Developed portal screens using JSP, Servlets, and Struts framework
  • Involved in writing Test plans and conducted Unit Tests using JUnit.

Environment: JSP, Servlets, Struts, DB Oracle, SQL, GUI, JSP, AJAX, PL/SQL, Java script, Beans, HTML, CSS, Mapping files, Java, J2EEE, APIs, JDBC, XML, MVC, SAP.

We'd love your feedback!