We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

2.00/5 (Submit Your Rating)

Greenwood Village, CO

SUMMARY:

  • Over 9years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
  • More than 5+years of work experience in ingestion, storage, querying, processing and analysis of BigData with hands on experience in Hadoop Ecosystem development including Mapreduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, ZooKeeper, Sqoop, Flume, Oozie and AWS.
  • Extensive experience working in Teradata, Oracle, Netezza, SQLServer and MySQL database.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Strong experience working with different Hadoop distributions likeCloudera, Hortonworks, MapR and Apache distributions.
  • Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
  • Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RedShift which provides fast and efficient processing of Big Data.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
  • Good understanding of R Programming, Data Mining and Machine Learning techniques.
  • Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
  • Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
  • Experience in installing and maintaining Cassandra by configuring the cassandra.yaml file as per the requirement.
  • Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading the security roles and implementing newer features.
  • Responsible for performing reads and writes in Cassandra from and web application by using java JDBC connectivity.
  • Experience in extending HIVE and PIG core functionality by using custom UDF’s and UDAF’s.
  • Debugging MapReduce jobs using Counters and MRUNIT testing.
  • Expertise in writing the Real - time processing application Using spout and bolt in Storm.
  • Experience in configuring various topologies in storm to ingest and process data on the fly from multiple sources and aggregate into central repository Hadoop.
  • Good understanding of Spark Algorithms such as Classification, Clustering, and Regression.
  • Good understanding on Spark Streaming with Kafka for real-time processing.
  • Extensive experience working with Spark tools like RDD transformations, spark MLlib and spark QL.
  • Experienced in moving data from different sources using Kafka producers, consumers and preprocess data using Storm topologies.
  • Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
  • Good understanding of MPP databases such as HP Vertica, Greenplum and Impala.
  • Hands on experience in implementing Sequence files, Combiners, Counters, Dynamic Partitions and Bucketing for best practice and performance improvement.
  • Good knowledge on streaming data from different data sources like Log files, JMS, applications sources into HDFS using Flume sources.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Worked on docker based containerized applications.
  • Knowledge of data warehousing and ETL tools like Informatica, Talendand Pentaho.
  • Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia.
  • Experience with Testing MapReduce programs using MRUnit, Junit.
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
  • Expertise in developing responsive Front End components with JSP, HTML, XHTML, JavaScript, DOM, Servlets, JSF, NodeJS, Ajax, JQuery and AngularJS.
  • Extensive experience in working with SOA based architectures using Rest based web services using JAX-RS and SOAP based web services using JAX-WS.
  • Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews.
  • Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MS-Office, PLSQL Developer, SQL*Plus.
  • Experience in different application servers like JBoss/Tomcat, WebLogic, IBM WebSphere.
  • Experience in working with Onsite-Offshore model.

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark,Solr, Storm, Drill,Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy.

Hadoop Distributions: Cloudera, MapR and Hortonworks

Languages: Java, Scala, Python,Jruby, SQL, HTML, DHTML, JavaScript, XML and C/C++

No SQL Databases: Cassandra, MongoDB and HBase

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON

Development / Build Tools: Eclipse, Ant, Maven, Gradle, IntelliJ, JUNITand log4J.

Frameworks: Struts, spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2

Operating systems: UNIX, LINUX, Mac OS and Windows Variants

Data analytical tools: R, SAS and MATLAB

ETL Tools: Tableau, Talend, Informatica, Pentaho

PROFESSIONAL EXPERIENCE:

Confidential, Greenwood village, CO

Spark/Hadoop Developer

Responsibilities:

  • Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
  • Worked on importing and exporting data, into & out of HDFS and Hive using Sqoop
  • Worked on creating Hive tables and wrote Hive queries for data analysis to meet business requirements
  • Developed Hive UDF’s to handle data quality and create filtered datasets for further processing
  • Planning Cassandra cluster which includes Data sizing estimation and identifying hardware requirements based on the estimated data size and transaction volume
  • Worked with BI teams in generating the reports on Tableau
  • Designing data models in Cassandra and working with Cassandra Query Language.
  • Bulk loading of the data into Cassandra cluster using Java API's
  • Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka and JMS
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data
  • Experience in transferring data from different data sources into HDFS systems using message broker, Kafka producers, consumers and Kafka brokers
  • Used the Spark - Cassandra Connector to load data to and from Cassandra
  • Worked closely with the application team to resolve issues related to spark and cql
  • Created necessary keyspaces and modeled column families based on the queries
  • Used Jira for task/bug tracking
  • Used GIT for version control

Environment: Hadoop, Cassandra, Spark streaming, Kafka, SparkSql, Scala, Shell Scripting, Java, Oracle.

Confidential, Franklin, TN

Spark/Hadoop Developer

Responsibilities:

  • Extensively used Spark stack to develop preprocessing job which includes RDD, Datasets and Dataframes Api's to transform the data for upstream consumption.
  • Worked on extracting and enriching HBase data between multiple tables using joins in spark.
  • Worked on writing APIs to load the processed data to HBase tables.
  • Replaced the existing MapReduce programs into Spark application using Scala.
  • Built on-premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service.
  • Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.
  • Good knowledge on Kafka streams API for data transformation.
  • Implemented logging framework - ELK stack (Elastic Search, LogStash& Kibana) on AWS.
  • Setup Spark EMR to process huge data which is stored in Amazon S3.
  • Developed oozie workflow for scheduling & orchestrating the ETL process.
  • Used Talend tool to create workflows for processing data from multiple source systems.
  • Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
  • Developed Hive Queries to analyze the data in HDFS to identify issues and behavioral patterns.
  • Involved in writing optimized Pig Scripts along with developing and testing Pig Latin Scripts.
  • Able to use Python Pandas, Numpy modules for Data analysis, Data scraping and parsing
  • Deployed applications using Jenkins framework integrating Git- version control with it.
  • Participated in production support on a regular basis to support the Analytics platform.
  • Used Rally for task/bug tracking.
  • Used GIT for version control.

Environment: MapR, Hadoop, Hbase, HDFS, AWS, PIG, Hive, Drill, SparkSql, MapReduce, Spark streaming, Kafka, Flume, Sqoop, Oozie, Jupyter Notebook, Docker, Kafka, Spark, Scala, Hbase, Talend, Shell Scripting, Java.

Confidential, Piscataway, NJ

Spark/Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experienced in loading and transforming of large sets of structured, semi structured, and unstructured data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Experienced in developing Spark scripts for data analysis in both python and scala.
  • Built on-premise data pipelines using kafka and spark for real time data analysis.
  • Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
  • Encoded and decoded json objects using PySpark to create and modify the dataframes in Apache Spark
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Implemented Hive complex UDF's to execute business logic with Hive Queries.
  • Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
  • Evaluated performance of Spark SQL vs IMPALA vs DRILL on offline data as a part of poc.
  • Worked on solr configuration and customizations based on requirements.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Responsible for developing data pipeline by implementing Kafka producers and consumers.
  • Performed data analysis with HBase using Apache Pheonix.
  • Exported the analyzed data to Impala to generate reports for the BI team.
  • Developed multiple Spark jobs in PySpark for data cleaning and preprocessing.
  • Managing and reviewing Hadoop Log files to resolve any configuration issues.
  • Developed a program to extract the name entities from OCR files.
  • Used Gradle for building and testing project
  • Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
  • Used Mingle and later moved to JIRA for task/bug tracking.
  • Used GIT for version control

Environment: MapR, Cloudera, Hadoop, HDFS,AWS, PIG, Hive, Impala, Drill, SparkSql, OCR, MapReduce, Flume, Sqoop, Oozie,Storm, Zepplin, Mesos, Docker, PySpark, Solr, Kafka, Mapr DB, Spark, Scala, Hbase, ZooKeeper, Tableau, Shell Scripting, Gerrit, Java, Redis.

Confidential, Topeka, KS

Hadoop Developer

Responsibilities:

  • Analyzing the requirement to setup a cluster.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
  • Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Implemented Storm topologies to pre-process data before moving into HDFS system.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Migrated complex MapReduce programs into Spark RDD transformations, actions.
  • Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
  • Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
  • Developed the MapReduce programs to parse the raw data and store the pre Aggregated data in the partitioned tables.
  • Loaded and transformed large sets of structured, semi structured, and unstructured data with MapReduce, Hive and pig.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Implemented Python scripts for writing MapReduce programs using Hadoop Streaming Jar.
  • Involved in using HCATALOG to access Hive table metadata for MapReduce or Pig code.
  • Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in flume to ingest data from multiple sources.
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
  • Worked on implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Worked on No-SQL databases like Cassandra, MongoDB for POC purpose in storing images and URIs.
  • Implemented monitoring on all the NiFi flows to get notifications if there is no data flowing through the flow more than the specific time.
  • Converted unstructured data to structured data by writing Spark code.
  • Indexed documents using Apache Solr.
  • Set up Solr Clouds for distributing indexing and search.
  • Created NiFi flows to trigger spark jobs and used put email processors to get notifications if there are any failures.
  • Worked closely on parallel computing with Spark team to explore RDD in Datastax Cassandra.
  • Integrating bulk data into Cassandra file system using MapR educe programs.
  • Worked on MongoDB for distributed storage and processing.
  • Designed and implemented Cassandra and associated RESTful web service.
  • Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
  • Used CassandraCQL with Java API's to retrieve data from Cassandra tables.
  • Worked on analyzing and examining customer behavioral data using Cassandra.
  • Created partitioned tables in Hive, mentored analyst and SQA team for writing Hive Queries.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in cluster setup, monitoring, test benchmarks for results.
  • Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
  • Involved in agile methodologies, daily scrum meetings, Sprint planning's.

Environment: Hadoop,Cloudera, HDFS, pig, Hive, Flume, Sqoop, NiFi, AWS Redshift, Python, Spark, Scala, MongoDB, Cassandra, Snowflake, Solr, Kubernetes, ZooKeeper, MySQl, Talend, Shell Scripting, Linux Red Hat, Java.

Confidential, Mason, OH

Hadoop Developer

Responsibilities:

  • Converting the existing relational database model to Hadoop ecosystem.
  • Generate datasets and load to HADOOP Ecosystem.
  • Worked with Linux systems and RDBMS database on a regular basis to ingest data using Sqoop.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Involved in review of functional and non-functional requirements.
  • Implemented Frameworks using Java and python to automate the ingestion flow.
  • Responsible to manage data coming from different sources.
  • Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by using Flume.
  • Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
  • Involved in loading data from UNIX file system and FTP to HDFS.
  • Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Creating Hive tables and working on them using HiveQL.
  • Developed data pipeline using Kafka and Storm to store data into HDFS.
  • Created reporting views in Impala using Sentry policy files.
  • Developed Hive queries to analyze the output data.
  • Had to do the Cluster co-ordination services through ZooKeeper.
  • Collected the logs data from web servers and stored in to HDFS using Flume.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Implemented several Akka Actors which are responsible for loading of data into hive.
  • Design and implement Spark jobs to support distributed data processing.
  • Supported the existing MapReduce Programs those are running on the cluster.
  • Developed and implemented two Service Endpoints (end to end) in Java using Play framework, Akka server Hazelcast.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Wrote Java code to format XML documents; upload them to Solr server for indexing.
  • Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
  • Developed Power enter mappings to extract data from various databases, Flat files and load into DataMart using the Informatica.
  • Followed agile methodology for the entire project.
  • Installed and configured Apache Hadoop, Hive and Pig environment.

Environment: Hadoop, Hortonworks, HDFS, pig, Hive,Flume, Sqoop, Ambari, Ranger, Python, Akka, Play framework, Informatica, Elastic search, Linux- Ubuntu, Solr.

Confidential -Portland, OR

Hadoop Developer

Responsibilities:

  • Transferred purchase transaction details from legacy systems to HDFS.
  • Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Developed PIG UDF'S for manipulating the data as per the business requirements and worked on developing custom PIG Loaders.
  • Collected and aggregated large amounts of weblog data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer).
  • Experience in monitoring and managing Cassandra cluster.
  • Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster
  • Wrote the MapReduce jobs to parse the weblogs which are stored in HDFS
  • Developed the services to run the MapReduce jobs as per the requirement basis.
  • Importing and exporting data into HDFS from Oracle database and vice versa using SQOOP.
  • Analyzed the data using the Pig to extract number of unique patients per day and most purchased medicine
  • Implemented the workflows using Apache Oozie framework to automate tasks
  • Wrote UDF's for Hive and Pig that helped spot market trends
  • Good knowledge in running Hadoop streaming jobs to process terabytes of xml format data
  • Analyzed the Functional Specifications

Environment: Hadoop, Hortonworks, Cloudera, HDFS, pig, Hive,Tez, Accumulo, Flume, Sqoop, Oozie, Cassandra.

Confidential, Houston, TX

Java developer

Responsibilities:

  • Involved in Analysis, Design, Development, Integration and Testing of application modules and followed AGILE/SCRUM methodology. Participated in Estimation size of Backlog Items, Daily Scrum and Translation of backlog items into engineering design and logical units of work(tasks).
  • Used Spring framework for implementing IOC/JDBC/ORM, AOP and Spring Security to implement business layer.
  • Developed and Consumed Web services securely using JAX-WS API and tested using SOAP UI.
  • Extensively used Action, Dispatch Action, Action Forms, Struts Tag libraries, Struts Configuration from Struts.
  • Extensively used the Hibernate Query Language for data retrieval from the database and process the data in the business methods.
  • Developed pages using JSP, JSTL, Spring tags, JQuery, Java Script & Used JQuery to make AJAX calls.
  • Used Jenkins continuous integration tool to do the deployments.
  • Worked on JDBC for database connections.
  • Worked on multithreaded middleware using socket programming to introduce whole set of new business rules implementing OOPS design and principles
  • Involved in implementing Java multithreading concepts.
  • Developed several REST web services supporting both XML and JSON to perform task such as demand response management.
  • Used Servlet, Java and Spring for server side business logic.
  • Implemented the log functionality by using Log4j and internal logging API's.
  • Used Junit for server side testing.
  • Used Maven build tools and SVN for version control.
  • Developed frontend of application using BootStrap, Angular.Js and Node.JS frameworks.
  • Implemented SOA architecture using Enterprise Service Bus (ESB).
  • Designed front-end, data driven GUI using JSF, HTML4, JavaScript and CSS
  • Used IBM MQ Series as the JMS provider.
  • Responsible for writing SQL Queries and Procedures using DB2.
  • Connection with Oracle, MySQL Database is implemented using Hibernate ORM. Configured hibernate, entities using annotations from scratch.

Environment: Core Java1.5, EJB, Hibernate 3.6, AWS, JSF, Struts, Spring 2.5, JPA, REST, JBoss, Selenium, Socket programming, DB2, Oracle 10g, XML, JUnit 4.0, XSLT, IDE, Angular Js, Node JS, HTML4, CSS, JavaScript, Apache Tomcat 5x, Log4j .

Confidential

Java Developer

Responsibilities:

  • Designed Java Servlets and Objects using J2EE standards.
  • Involved in developing multithreading for improving CPU time.
  • Used Multithreading to simultaneously process tables as and when a user data is completed in one table.
  • Used JDBC calls in the Enterprise Java Beans to access Oracle Database.
  • Involved in developing the presentation layer using Spring MVC/Angular JS/JQuery.
  • Involved in design and development of rich internet applications using Flex, Action Script and Java.
  • Design and development of Web pages using HTML 4.0, CSS including Ajax controls and XML.
  • Created jar files and deployed in WebLogic Application Server.
  • Involved in writing the Properties, methods in the Class Modules and consumed web services.
  • Played a vital role in defining, implementing and enforcing quality practices in the team organization to ensure internal controls, quality and compliance policies and standards.
  • Used JavaScript1.5 for custom client-side validation.
  • Involved in designing and developing the GUI for the user interface with various controls.
  • Worked with View State to maintain data between the pages of the application.

Environment: Core Java, JavaBeans, HTML 4.0, CSS 2.0, PL/SQL, MySQL 5.1, Angular JS, JavaScript 1.5, Flex, AJAX and Windows

We'd love your feedback!