We provide IT Staff Augmentation Services!

Hadoop, Spark Developer Resume

0/5 (Submit Your Rating)

Chicago, IllinoiS

SUMMARY

  • 8 years of overall IT experience in various sectors, which includes hands on experience on Big Data Analytics, and Development.
  • Having good experience in Big data related technologies like Hadoop frameworks, Map Reduce, Hive, HBase, PIG, Sqoop, Spark, Kafka, Flume, ZooKeeper, Oozie, and Storm.
  • Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, JSON and Avro.
  • Having working experience on Cloudera and Hortonworks Data Platform using VMware Player, Cent OS 6 Linux environment.
  • Strong experience on Hadoop distributions like Cloudera, and HortonWorks.
  • Good knowledge of No - SQL databases Cassandra, MongoDB and HBase.
  • Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL, DML SQL queries.
  • Having hands on experience in Apache Camel withKafka(Producer) and Spark Streaming withKafka(Consumer).
  • Worked on HBase to load and retrieve data for real time processing.
  • Very good experience of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Good working experience using Sqoop to import data into HDFS or Hive from RDBMS and exporting data back to HDFS or HIVE from RDBMS.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume andKafka.
  • Worked with Big Data distributions like Hortonworks (Hortonworks 1.7 and 2.1) with Ambari. Experience in administrating and setting up authentication and security using knox, Ranger, Kerberos
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
  • Experience in converting SQL queries into Spark Transformations using Spark RDDs, DataFrames and Scala, and performed map-side joins on RDD's.
  • Experienced in designing, built, and deploying a multitude applications utilizing almost all of the AWS stack (Including EC2, Route53, S3, RDS, DynamoDB, SQS, SNS,SWF, IAM,and EMR), focusing on high-availability, fault tolerance, and auto-scaling.
  • Hands on experience with Amazon web services, created EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce) with Hive scripts to process big data.
  • Created a multi terabyte database in multiload.
  • Extensive knowledge on Shell Scripting, Unix/Linux commands
  • Familiar with data architecture including data ingestion pipeline design,Hadoopinformation architecture, data modeling and data mining,machinelearningand advanced data processing.
  • Experience optimizing ETL workflows.
  • Have Solid knowledge on Scalz and Cats,
  • Have implemented terabyte parallel transporter(tpt) in data ware house
  • Worked with Big Data distributions like Cloudera (CDH 3 and 4) with Cloudera Manager.
  • Worked in ETL tools like Talend, Pentaho to simplify Map Reduce jobs from the front end. Also used to develop reports based on data in Hive.
  • Worked with BI tools like Tableau for report creation and further analysis from the front end.
  • Connecting Hive using Tableau and generating Bar chart etc based on business requirement.
  • Extensive knowledge in using SQL queries for backend database analysis.
  • Involved in unit testing of Map Reduce programs using Apache MRunit.
  • Worked on Amazon Web Services and EC2.
  • Experience in data migration from Hbase toCassandra.
  • Involved to coding all the backend mappers and reducers usinggroovy
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, Spring, Hibernate, JDBC.
  • Experience in creating Reusable Transformations (Joiner, Sorter, Aggregator,Expression, Lookup, Router, Filter, Update Strategy, Sequence Generator, Normalizer and Rank) and Mappings using InformaticaDesigner and processing tasks using Workflow Manager to move data from multiple sources into targets.
  • Doing query performance to increase speed.
  • Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g/11g, Informix, and SQL Server.
  • Knowledge on Jenkins, Dockers,Rabbit MQ.
  • Hands on experience in Play and Akka framework as well as strong experience on spray framework.
  • Experience working with Build tools like Maven, Ant, SBT.
  • Experienced in both Waterfall and Agile Development (SCRUM) methodologies
  • Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
  • Experience in developing service components using JDBC.
  • Experience in using Scala 2.11,2.12 and Spark 2.1,2.2.

TECHNICAL SKILLS

Hadoop Technologies: HDFS, YARN, Mesos, MapReduce, Hive, Pig, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie.

NO SQL Databases: HBase, Cassandra

Languages: Java, Scala, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting, Python.

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts.

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Cloud Computing Tools: Amazon AWS.

Databases: MySQL, Oracle, DB2

Operating Systems: UNIX, Windows, LINUX.

Build Tools: Jenkins, Maven, ANT, SBT (for Scala).

Business Intelligence Tools: Tableau, Splunk

Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans, Intellij Idea.

Development Methodologies: Agile/Scrum, Waterfall.

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, North Carolina

Spark, Scala developer

Responsibilities:

  • Responsible for development of Spark SQL Scripts based on Functional Specifications
  • Responsible for Spark Streaming configuration based on type of Input Source
  • Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
  • Developed the services to run the Map-Reduce jobs as per the requirement basis.
  • Importing and exporting data into HDFS and HIVE using Sqoop.
  • Responsible to manage data coming from different sources.
  • Developing business logic using scala.
  • Responsible for loading data from UNIX file systems to HDFS.
  • Experienced with AWS services to smoothly manage application in the cloud and creating or modifying instances.
  • Used cats scanamo to access Dynamo DB.
  • Used hammock, http4s in cats for http interaction with client.
  • Checking the health and utilization of AWS resources using AWS CloudWatch.
  • Provisioned AWS S3 buckets for backup of the application and sync this contents with remaining S3 backups by creating entry for AWS S3 SYNC in cron tab.
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • ImplementedKafkaCustomer with Spark-streaming and Spark SQL using Scala.
  • Involved the performance and optimization of the existing algorithms in Hadoop usingSpark Context Spark-SQL, Data Frame, Pair RDD's,SparkYARN.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive (Hadoop) tables
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Developed REST APIs using Java,Playframework andAkka.
  • Used pig scripts along with Shell scripting
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume andKafka.
  • Performance tuning ofKafka, Storm Clusters. Benchmarking Real time streams.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Developed scripts and automated data management from end to end and sync up b/w all the clusters.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Implemented a Continuous Delivery pipeline with Docker,Microservices, Jenkins and GitHub, Nexus, Maven and AWS AMI's.
  • Troubleshooting, debugging & alteringTalendparticular issues, while maintaining the health and performance of the ETL environment.
  • Complete end to end design and development of ApacheNiFiflow which acts as the agent between client and marketing team
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Involved in gathering the requirements, designing, development and testing.
  • Developing traits and case classes etc in scala.
  • Configured Ranger and Knox services for authorization purposes
  • Used scalatra framework on a brief period to handle http requests.
  • Data storage n Hbase and connecting to elastic search and solr search.
  • Developing reports using Tableau ETL .

Environment: Hive, Flume, Java, Maven, Impala, Spark 2.2,Spark 2.1, Oozie, Oracle, Yarn, GitHub, Junit, Tableau, Unix, Cloudera, Flume, Sqoop, HDFS, Tomcat, Java, Scala, Hbase.

Confidential, Chicago, Illinois

Hadoop, Spark developer

Responsibilities:

  • Consumed the data from Kafka queue using spark.
  • Developed and designed system to collect data from multiple portal usingKafkaand then process it using spark.
  • Developed REST APIs using Scala andPlayframework to retrieve processed data from Cassandra database.
  • Configured different topologies for spark cluster and deployed them on regular basis.
  • Developed REST APIs using Java,Playframework andAkka.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
  • Configured, Documented and Demonstrated inter node communication between Hadoop nodes and Cassandra nodes of client using SSL encryption.
  • Redesigned market risk model originally implemented in R to use map reduce in Cloudera'sHadoopcluster
  • Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift, Dynamo DB.
  • POC on Data Search using Elastic Search.
  • WrittenGroovyScripts for deployment and REST Services clients.
  • Involved to coding all the backend mappers and reducers usinggroovy.
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
  • This pipeline is also involved in Amazon Web Services EMR, S3 and RDS.
  • Launched instances with respect to specific applications.
  • Used cats to populate data from various data sources using extruder.
  • Loading data from different source (database & files) into Hive usingTalendtool.
  • Developed and managed cloud VMs with AWS EC2 command line clients and management console.
  • Involved in performing the Linear Regression using Scala API and Spark.
  • UsedTalendETL tool to develop multiple jobs and in setting workflows.
  • Used Scalaz-stream along with spark streaming for streaming the data to management team.
  • Created Map Reduce programs for some refined queries on big data.
  • Writing logic using Python
  • Hands on experience in Multithreaded programming usingakkaactors.
  • Built a REST API with Akka http
  • Involved in setting up of HBase to use HDFS.
  • Created many complex business mappings withTalendBI tool with multiple sources
  • Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
  • Used Scalaz-http for Server and Client development.
  • Developed multiple MapReduce Jobs in java for data cleaning and pre-processing.
  • Successfully Generated consumer group lags fromkafkausing their API
  • Developing scripts to perform business transformations on the data using Hive and PIG
  • Used Hive partitioning and bucketing for performance optimization of the hive tables and created around 20000 partitions.
  • Provided security to the cluster by implementing Knox, Kerberos forHadoopandCassandraclusters.
  • Worked on designing, implementing and managing Secure Authentication mechanism to Hadoop Cluster with Kerberos.
  • Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD's
  • Used Spark with Scala.

Environment: Spark, Spark SQL, Spark Streaming, Red Shift, CDH5, Scala, Javascript, Cassandra.Knox,Kerberos

Confidential, Dallas, Texas

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Involved in exploring Hadoop, Map Reduce programming and its ecosystems.
  • Involved in importing the real time data toHadoopusingKafkaand implemented the Oozie job for daily
  • Created a multi terabyte database in multiload.
  • Have implemented terabyte parallel transporter(tpt) in data ware house
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Writing UDF (User Defined Functions) in Pig, Hive when needed.
  • Developing the Pig scripts for processing data.
  • Managing work flow and scheduling for complex map reduce jobs using Apache Oozie.
  • Worked on Creating, debugging, and executingTalenDmappings, sessions, tasks.
  • Involved in creating Hive tables, loading data &writing hive queries.
  • Developed the ETL mappings using different data sources and also loading the data from these sources into relational tables withTalend, big data.
  • Push data as delimited files into HDFS usingTalendBig data studio
  • Written Hive queries for data analysis to meet the business requirements.
  • Created HBase tables to store various data formats of incoming data from different patient portfolios.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Automated the History and Purge Process.
  • Build ApacheKafkaMultinode Cluster and usedKafkaManager to monitor multiple Clusters.
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Called Unix scripts and Unix files insideTalendjobs to execute in different environments
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Validating the data using MD5 algorithms.
  • Designed and developed the REST basedMicroservicesusing the Spring Boot
  • Fine tuneCassandra.yaml various parameters according to the version and capacity.
  • Experience in Daily production support to monitor and trouble shootsHadoop/Hive jobs
  • Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
  • Implemented Data Integrity and Data Quality checks inHadoopusing Shell scripts.
  • Used AVRO, Parquet file formats for serialization of data.

Environment: - Solix EDMS,J ava 1.6, Hadoop (Map/Reduce, Hive, PIG, Oozie, Sqoop, Flume), Eclipse Helios, Linux/Unix, Talend, Camel. Knox, Ranger, Kerberos

Confidential

Sr. Java Developer

Responsibilities:

  • Developed Web module using Spring MVC, JSP.
  • Developing model logic by using Hibernate ORM framework.
  • Handle server side validations.
  • Involved in Bug fixing.
  • Involved in Unit Testing by using Junit.
  • Writing Technical Design Document.
  • Gathered specifications from the requirements.
  • Developed the application using Spring MVC architecture.
  • Developed JSP custom tags which support custom User Interfaces.
  • Developed front-end pages using JSP, HTML and CSS.
  • Developed Agile processes usingGroovy, JUnit to use continuous integration
  • Developed core Java classes for utility classes, business logic, and test cases
  • Developed SQL queries using MySQL and established connectivity
  • Used Stored Procedures for performing different database operations
  • Autowire Java objects using Spring Dependency Injection
  • Used Log4j for application logging and debugging
  • Spring framework was used in developing the application which uses Model View Controller(MVC) architecture.
  • Used Hibernate for interacting with Database
  • Developed control classes for processing the request
  • Used Exception Handling for handling exceptions
  • Prepare jUnit test cases for test driven development
  • Designed sequence diagrams and use case diagrams for proper implementation
  • Used Rational Rose for design and implementation

Environment: - JSP, HTML, CSS, JavaScript, MySQL, Spring, Hibernate, MySql Exception Handling, UML, Rational Rose.

Confidential

Java Developer

Responsibilities:

  • Involved in Requirements analysis, design, and development and testing
  • Developed Custom tags, JSTL to support custom User Interfaces.
  • Designed the user interfaces using JSP.
  • Designed and Implemented MVC architecture using Struts Framework, Coding involves writing Action Classes/Custom Tag Libraries, JSP.
  • Involved in writing and business layer using EJB, BO, DAO and VO.
  • Implemented Business processes such as user authentication, Account Transfer using Session EJBs.
  • Worked with Oracle Database to create tables, procedures, functions and select statements.
  • Used Log4J to capture the log that includes runtime exceptions and developed WAR framework to alert the client and production support in case of application failures.
  • Deployed the application in client's location on Tomcat Server.
  • Development carried out under Eclipse Integrated Development Environment (IDE).
  • Used JBoss for deploying various components of application
  • Used Ant for building Scripts.
  • Used JUNIT for testing and check API performance.

Environment: - Java1.6, J2EE, Struts, HTML, CSS, JavaScript, Jdbc SQL 2005, ANT, Log4j, JUnit, XML, JSP, JSTL, AJAX, JBoss, ClearCase.

We'd love your feedback!