Spark /Hadoop Developer Resume Water loo, WI - Hire IT People

SUMMARY:

Having8+ years of professional IT experience, including 4+ years of Hadoop/Big Data experience, capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
4+ years of experience in Hadoop components like MapReduce, Flume, Kafka, Pig, Hive, Spark, HBase, Oozie, Sqoop and Zookeeper.
Good understanding of Hadoop Architecture including YARN and various components such as HDFS , Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts .
Hands-on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka .
Hands on experience in using various Hadoop distributions (Cloudera, Hortonworks, MapR).
Good knowledge on creating Data Pipelines in SPARK using SCALA .
Experience in developing Spark Programs for Batch and Real-Time Processing. Developed Spark Streaming applications for Real Time Processing.
Good knowledge on Spark components like Spark SQL , MLlib, Spark Streaming and GraphX.
Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core .
Experience in using Spark-SQL with various data sources like JSON, Parquet and Hive.
Strong knowledge on implementation of data processing on Spark-Core using SPARK SQL and Spark streaming.
Hands on experience in working on Spark-SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
Worked on Spark SQl, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
Experience in implementing Real-Time event processing and analytics using messaging systems like Spark Streaming .
Expertise in integrating the data from multiple data sources using Kafka .
Knowledge about unifying data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
Kafka Deployment and Integration with Oracle databases.
Experience data processing like collecting, aggregating, moving from various sources using Apache Kafka .
Worked extensively with Hadoop Distributions like Cloudera, Hortonworks. Good knowledge on MAPR distribution & Amazon’s EMR .
Experienced in moving data from Hive tables into Cassandra for real time analytics on hive tables and Cassandra Query Language ( CQL ) to perform analytics on time series data.
Good Knowledge in custom UDF's in Hive & Pig for data filtering.
Expertise in writing Hive and PIG queries for data analysis to meet the business requirements.
Hands-on experience in using Impala for data analysis.
Hands-on experience in using the data ingestion tools - Sqoop and flume.
Experience in importing and exporting data using Sqoop from RDBMS to HDFS and vice-versa.
Experience in Apache NIFI which is a Hadoop technology and also Integrating Apache NIFI and Apache Kafka.
Hands-on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
Worked on NoSQL databases like HBase , Cassandra and MongoDB .
Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper .
Experience in configuring various topologies in Storm to ingest and process data on the fly from multiple sources and aggregate into central repository Hadoop
Experience with web UI development using jQuery, CSS, HTML, HTML5, XHTML, JavaScript.
Experience in working with Spring and Hibernates framework from Java.
Extensive experience with Databases such as Oracle , Mysql, MS-Sql and PL Sql Script .
Experience in using IDEs like Eclipse, NetBeans and Intellij .
Experience with web UI development using jQuery, CSS, HTML, HTML5, XHTML, JavaScript.

TECHNICAL SKILLS:

Big Data Space: Hadoop, MapReduce, Pig, Hive, Hbase, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Elastic Search, Solr, MongoDB, Cassandra, Avro, Storm, Parquet, Snappy

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR, Apache EMR

Databases & warehouses: Teradata, SQL Server, My SQL, Oracle

Java Space: Core Java, J2EE, JDBC, JNDI, JSP, EJB, Struts, Spring Boot, REST, SOAP, JMS

Languages: Python, Java, JRuby, SQL, HTML, DHTML, Scala, JavaScript, XML, C/C++

Operating systems: UNIX, LINUX, Mac OS, Windows, Variants

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL, DB2.

Version controls: GIT, SVN, CVS

ETL Tools: Informatica, Talend

PROFESSIONAL EXPERIENCE:

Confidential, Water loo, WI

Spark /Hadoop Developer

Responsibilities:

Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Kafka, Sparkwith Cloudera distribution.
Worked on Cloudera distribution and deployed on AWS EC2 Instances.
Hands on experience on Cloudera Hue to import data on the GUI.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Performed Data Ingestion from multiple internal clients using Apache Kafka.
Worked on integrating Apache Kafka with Spark Streaming process to consume data from external REST APIs and run custom functions.
Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
Developed Spark scripts by using Scala Shell commands as per the requirement.
Configured, deployed and maintained multi-node Dev and Tested Kafka Clusters.
Implemented real time system with Kafka and Zookeeper .
Configured spark streaming data to receive real time data from Kafka and store it in HDFS.
Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
Involved in running Hadoop streaming jobs to process terabytes of text data. Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
Configured, supported and maintained all network, firewall, storage, load balancers, operating systems, and software in AWS EC2.
Implemented the use of Amazon EMR for Big Data processing among a Hadoop Cluster of virtual servers on Amazon related EC2 and S3.
Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
Good knowledge in using Data Manipulations, Tombstones, Compactions in Cassandra. Well experienced in avoiding faulty Writes and Reads in Cassandra.
Performed data analysis with Cassandra using Hive External tables.
Designed the Column families in Cassandra.
Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
Implemented YARN Capacity Scheduler on various environments and tuned configurations according to the application wise job loads.
Configured Continuous Integration system to execute suites of automated test on desired frequencies using Jenkins, Maven & GIT.
Involved in loading data from LINUX filesystem to HDFS .
Followed Agile Methodologies while working on the project.

Environment: Hadoop, HDFS, Hive, Spark, Cloudera, AWS EC2, S3, ERM, Sqoop, Kafka, Yarn, Shell Scripting, Impala, Scala, Pig, Cassandra, Oozie, Java, JUnit, Agile methods, Linux, MySQL, Elastic Search, Kibana, Teradata.

Confidential, Flowood, MS

Data Engineer

Responsibilities:

Worked in Multi Clustered Hadoop Echo-System environment.
Created MapReduce programs using Java API that filter un-necessary records and find out unique records based on different criteria.
Performed optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering the best results.
Converting the existing relational database model to Hadoop ecosystem.
Installed and configured Apache Hadoop, Hive and Pig environment.
Worked with Linux systems and RDBMS database on a regular basis so that data can be ingested using Sqoop.
Reviewed and managed all log files using HBase.
Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
Creating Hive tables and working on them using HiveQL.
Used Apache Kafka for the Data Ingestion from multiple internal clients.
Developed data pipeline using Flume and Spark to store data into HDFS.
Big data processing using Spark, AWS, and Redshift.
Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Spark.
Involved in performing the Linear Regression using Spark MLlib in Scala.
Continuous monitoring and managing the Hadoop cluster through HDP ( Hortonworks Data Platform) .
Implemented Frameworks using Java and Python to automate the ingestion flow.
Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by using Flume.
Implemented data quality checks and transformations using Flume Interceptor.
Implemented collections & Aggregate Frameworks in MongoDB.
Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
Efficiently handled periodic exporting of SQL data into Elastic search.
Involved in loading data from UNIX file system and FTP to HDFS.
Design and Implementation of Batch jobs using MR2, PIG, Hive , Tez .
Used Apache Tez for highly optimized data processing.
Developed Hive queries to analyze the output data.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
Developed Pig Custom UDF's for custom input formats for performing various levels of optimization.
Involved in maintaining the Hadoop clusters using Nagios server.
Used Pig to import semi-structured data coming from Avro files to make serialization faster.
Configuring high availability multi-core solr servers using replication, request handlers, analyzers and tokenizers.
Configured Solr server to index different content types like HTML, PDF, XML, XLS, DOC, DOCX and other types.
Loading Data into HBase using Bulk Load and Non-bulk load.
Used Spark for fast processing of data in Hive and HDFS .
Performed batch processing of data sources using Apache Spark, Elastic search .
Used Zookeeper to provide coordination services to the cluster.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with reference tables and historical metrics.
Wrote the Shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Worked on Reporting tools like Tableau to connect with Hive for generating daily reports.
Utilized Agile Scrum methodology.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Scala, Kafka, Flume, Sqoop, Hortonworks, AWS, Redshift, Oozie, Zookeeper, Elastic Search, Avro, Python, Shell Scripting, SQL Talend, Spark, HBase, MongoDB, Linux, Kafka, Solr, Ambari.

Confidential, Summit, NJ

Java/Hadoop Developer

Responsibilities:

Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reducejobs in java for data cleaning and preprocessing.
Migrated existing SQL queries to HiveQL queries to move to big data analytical platform.
Integrated Cassandra file system to Hadoop using Map Reduce to perform analytics on Cassandra data.
Installed and configured Cassandra DSE multi-node, multi-data center cluster.
Designed and implemented a 24 node Cassandra cluster for single point inventory application.
Analyzed the performance of Cassandra cluster using node tool TP stats and CFstats for thread analysis and latency analysis.
Implemented Real time analytics on Cassandra data using thrift API.
Responsible to manage data coming from different sources.
Supported Map Reduce Programs those are running on the cluster.
Involved in loading data from UNIX file system to HDFS.
Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
Load and transform large sets data into HDFS using Hadoop fs commands.
Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
Implemented UDFS, UDAFS in java and python for hive to process the data that can’t be performed using Hive inbuilt functions.
Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate report.
Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
Supported in setting up updating configurations for implementing scripts with Pig and Sqoop.
Designed the logical and physical data modeling wrote DML scripts for Oracle 9i database.
Used Hibernate ORM framework with Spring framework for data persistence.
Wrote test cases in JUnitfor unit testing of classes.
Involved in templates and screens in HTML and JavaScript.

Environment:: Java, HDFS, Cassandra, Map Reduce, Sqoop, JUnit, HTML, JavaScript, Hibernate, Spring, Pig, Hive.

Confidential, Mountain View, CA

Hadoop Developer

Responsibilities:

Worked on Spark SQl, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
Created Sqoop jobs to import the data from DB2 to HDFS.
Exported data using Sqoop into HDFS and Hive for report analysis.
Used Oozie Workflow engine to run multiple Hive and Sqoop jobs.
Wrote Hive queries for data analysis to meet the business requirements.
Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
Experienced in working with Sqoop Scripts.
Experienced in creating hive external tables in HDFS.
Created visualized reports using tableau tool for the visualization.
Performed source data transformations using Hive .
Created Partitions in Hive tables and worked on them using HQL .

Environment: Spring tool suite(STS), Spark, Scala, Sqoop, Bashscript, Bamboo, AWS, Github, Hive, Map-Reduce, DB2, Shell scripting, Oozie, Python.

Confidential

Java Developer

Responsibilities:

Participated in all the phases of the Software development life cycle (SDLC) which includes Development, Testing, Implementation and Maintenance.
Involved in collecting client requirements and preparing the design documents.
Implemented Spring MVC architecture and Spring Bean Factory using IOC, AOP concepts.
Developed the JAVA classes to execute the business logic and to collect the input data from the users using JAVA, Oracle.
Involved in creation of scripts to create, update and delete data from the tables.
Followed Agile Methodology in analyze, define, and document the application which will support functional and business requirements.
Wrote JSP using HTML tags for designing UI for different pages.
Extensively used OOD concepts in overall design and development of the system.
Developed user interface using Spring JSP to simplify the complexities of the application.
Responsible for Development, unit testing and implementation of the application.
Used Agile methodology to design, develop and deploy the changes.
Extensively used tools like AccVerify, check style and Clockworks to check the code.

Environment: Java, JSP, JSP JDBC, HTML, XSL, Spring, CSS, JavaScript, Oracle 8i, XML, WebLogic

Confidential

Java developer

Responsibilities:

Good understanding in Install, configure and deploy software by gathering all the requirements needed.
Performed a Quality Assurance test.
Help Design application development using Spring MVC framework, front-end interactive page design using HTML, JSP, JSTL, CSS, JavaScript, JQuery and AJAX.
Implemented JavaScript , Shell script , JSP for Front and Server-side validations.
Involved in writing SQL queries for fetching data from Oracle database.
Developed multi-tiered web - application using J2EE standards.
Used JIRA to track bugs.
Used Apache Axis to develop web services and SOAP protocol for web services communication.
Implemented persistence layer using Spring JDBC to store and update data in database.
Used Apache Tomcat application server for deploying and configuring application.
Used JUnit to test persistence and service tiers.Involved in unit test case preparation.
Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
Deployed and built the application using MAVEN.
Following AGILE and SCRUM Methodology.
Involved in sprint planning, code review, and daily standup meetings to discuss the progress of the application.

Environment : HTML, Ajax, Servlets, JSP, SQL, JavaScript, CSS, XML, SOAP, Tomcat Server, Hibernate, JDBC, MAVEN, MVC, Agile, Git, JIRA, SVN .

We provide IT Staff Augmentation Services!

Spark /hadoop Developer Resume

Water Loo, WI

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship