Spark /hadoop Developer Resume
Water Loo, WI
SUMMARY:
- Having8+ years of professional IT experience, including 4+ years of Hadoop/Big Data experience, capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- 4+ years of experience in Hadoop components like MapReduce, Flume, Kafka, Pig, Hive, Spark, HBase, Oozie, Sqoop and Zookeeper.
- Good understanding of Hadoop Architecture including YARN and various components such as HDFS , Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts .
- Hands-on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka .
- Hands on experience in using various Hadoop distributions (Cloudera, Hortonworks, MapR).
- Good knowledge on creating Data Pipelines in SPARK using SCALA .
- Experience in developing Spark Programs for Batch and Real-Time Processing. Developed Spark Streaming applications for Real Time Processing.
- Good knowledge on Spark components like Spark SQL , MLlib, Spark Streaming and GraphX.
- Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core .
- Experience in using Spark-SQL with various data sources like JSON, Parquet and Hive.
- Strong knowledge on implementation of data processing on Spark-Core using SPARK SQL and Spark streaming.
- Hands on experience in working on Spark-SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
- Worked on Spark SQl, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
- Experience in implementing Real-Time event processing and analytics using messaging systems like Spark Streaming .
- Expertise in integrating the data from multiple data sources using Kafka .
- Knowledge about unifying data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
- Kafka Deployment and Integration with Oracle databases.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Kafka .
- Worked extensively with Hadoop Distributions like Cloudera, Hortonworks. Good knowledge on MAPR distribution & Amazon’s EMR .
- Experienced in moving data from Hive tables into Cassandra for real time analytics on hive tables and Cassandra Query Language ( CQL ) to perform analytics on time series data.
- Good Knowledge in custom UDF's in Hive & Pig for data filtering.
- Expertise in writing Hive and PIG queries for data analysis to meet the business requirements.
- Hands-on experience in using Impala for data analysis.
- Hands-on experience in using the data ingestion tools - Sqoop and flume.
- Experience in importing and exporting data using Sqoop from RDBMS to HDFS and vice-versa.
- Experience in Apache NIFI which is a Hadoop technology and also Integrating Apache NIFI and Apache Kafka.
- Hands-on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Worked on NoSQL databases like HBase , Cassandra and MongoDB .
- Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper .
- Experience in configuring various topologies in Storm to ingest and process data on the fly from multiple sources and aggregate into central repository Hadoop
- Experience with web UI development using jQuery, CSS, HTML, HTML5, XHTML, JavaScript.
- Experience in working with Spring and Hibernates framework from Java.
- Extensive experience with Databases such as Oracle , Mysql, MS-Sql and PL Sql Script .
- Experience in using IDEs like Eclipse, NetBeans and Intellij .
- Experience with web UI development using jQuery, CSS, HTML, HTML5, XHTML, JavaScript.
TECHNICAL SKILLS:
Big Data Space: Hadoop, MapReduce, Pig, Hive, Hbase, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Elastic Search, Solr, MongoDB, Cassandra, Avro, Storm, Parquet, Snappy
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR, Apache EMR
Databases & warehouses: Teradata, SQL Server, My SQL, Oracle
Java Space: Core Java, J2EE, JDBC, JNDI, JSP, EJB, Struts, Spring Boot, REST, SOAP, JMS
Languages: Python, Java, JRuby, SQL, HTML, DHTML, Scala, JavaScript, XML, C/C++
Operating systems: UNIX, LINUX, Mac OS, Windows, Variants
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL, DB2.
Version controls: GIT, SVN, CVS
ETL Tools: Informatica, Talend
PROFESSIONAL EXPERIENCE:
Confidential, Water loo, WI
Spark /Hadoop Developer
Responsibilities:
- Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Kafka, Sparkwith Cloudera distribution.
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Hands on experience on Cloudera Hue to import data on the GUI.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Performed Data Ingestion from multiple internal clients using Apache Kafka.
- Worked on integrating Apache Kafka with Spark Streaming process to consume data from external REST APIs and run custom functions.
- Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Configured, deployed and maintained multi-node Dev and Tested Kafka Clusters.
- Implemented real time system with Kafka and Zookeeper .
- Configured spark streaming data to receive real time data from Kafka and store it in HDFS.
- Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Involved in running Hadoop streaming jobs to process terabytes of text data. Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
- Configured, supported and maintained all network, firewall, storage, load balancers, operating systems, and software in AWS EC2.
- Implemented the use of Amazon EMR for Big Data processing among a Hadoop Cluster of virtual servers on Amazon related EC2 and S3.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Good knowledge in using Data Manipulations, Tombstones, Compactions in Cassandra. Well experienced in avoiding faulty Writes and Reads in Cassandra.
- Performed data analysis with Cassandra using Hive External tables.
- Designed the Column families in Cassandra.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
- Implemented YARN Capacity Scheduler on various environments and tuned configurations according to the application wise job loads.
- Configured Continuous Integration system to execute suites of automated test on desired frequencies using Jenkins, Maven & GIT.
- Involved in loading data from LINUX filesystem to HDFS .
- Followed Agile Methodologies while working on the project.
Environment: Hadoop, HDFS, Hive, Spark, Cloudera, AWS EC2, S3, ERM, Sqoop, Kafka, Yarn, Shell Scripting, Impala, Scala, Pig, Cassandra, Oozie, Java, JUnit, Agile methods, Linux, MySQL, Elastic Search, Kibana, Teradata.
Confidential, Flowood, MS
Data Engineer
Responsibilities:
- Worked in Multi Clustered Hadoop Echo-System environment.
- Created MapReduce programs using Java API that filter un-necessary records and find out unique records based on different criteria.
- Performed optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering the best results.
- Converting the existing relational database model to Hadoop ecosystem.
- Installed and configured Apache Hadoop, Hive and Pig environment.
- Worked with Linux systems and RDBMS database on a regular basis so that data can be ingested using Sqoop.
- Reviewed and managed all log files using HBase.
- Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
- Creating Hive tables and working on them using HiveQL.
- Used Apache Kafka for the Data Ingestion from multiple internal clients.
- Developed data pipeline using Flume and Spark to store data into HDFS.
- Big data processing using Spark, AWS, and Redshift.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Spark.
- Involved in performing the Linear Regression using Spark MLlib in Scala.
- Continuous monitoring and managing the Hadoop cluster through HDP ( Hortonworks Data Platform) .
- Implemented Frameworks using Java and Python to automate the ingestion flow.
- Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by using Flume.
- Implemented data quality checks and transformations using Flume Interceptor.
- Implemented collections & Aggregate Frameworks in MongoDB.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
- Efficiently handled periodic exporting of SQL data into Elastic search.
- Involved in loading data from UNIX file system and FTP to HDFS.
- Design and Implementation of Batch jobs using MR2, PIG, Hive , Tez .
- Used Apache Tez for highly optimized data processing.
- Developed Hive queries to analyze the output data.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Developed Pig Custom UDF's for custom input formats for performing various levels of optimization.
- Involved in maintaining the Hadoop clusters using Nagios server.
- Used Pig to import semi-structured data coming from Avro files to make serialization faster.
- Configuring high availability multi-core solr servers using replication, request handlers, analyzers and tokenizers.
- Configured Solr server to index different content types like HTML, PDF, XML, XLS, DOC, DOCX and other types.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Used Spark for fast processing of data in Hive and HDFS .
- Performed batch processing of data sources using Apache Spark, Elastic search .
- Used Zookeeper to provide coordination services to the cluster.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with reference tables and historical metrics.
- Wrote the Shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Worked on Reporting tools like Tableau to connect with Hive for generating daily reports.
- Utilized Agile Scrum methodology.
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Scala, Kafka, Flume, Sqoop, Hortonworks, AWS, Redshift, Oozie, Zookeeper, Elastic Search, Avro, Python, Shell Scripting, SQL Talend, Spark, HBase, MongoDB, Linux, Kafka, Solr, Ambari.
Confidential, Summit, NJ
Java/Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reducejobs in java for data cleaning and preprocessing.
- Migrated existing SQL queries to HiveQL queries to move to big data analytical platform.
- Integrated Cassandra file system to Hadoop using Map Reduce to perform analytics on Cassandra data.
- Installed and configured Cassandra DSE multi-node, multi-data center cluster.
- Designed and implemented a 24 node Cassandra cluster for single point inventory application.
- Analyzed the performance of Cassandra cluster using node tool TP stats and CFstats for thread analysis and latency analysis.
- Implemented Real time analytics on Cassandra data using thrift API.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
- Load and transform large sets data into HDFS using Hadoop fs commands.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Implemented UDFS, UDAFS in java and python for hive to process the data that can’t be performed using Hive inbuilt functions.
- Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate report.
- Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
- Supported in setting up updating configurations for implementing scripts with Pig and Sqoop.
- Designed the logical and physical data modeling wrote DML scripts for Oracle 9i database.
- Used Hibernate ORM framework with Spring framework for data persistence.
- Wrote test cases in JUnitfor unit testing of classes.
- Involved in templates and screens in HTML and JavaScript.
Environment:: Java, HDFS, Cassandra, Map Reduce, Sqoop, JUnit, HTML, JavaScript, Hibernate, Spring, Pig, Hive.
Confidential, Mountain View, CA
Hadoop Developer
Responsibilities:
- Worked on Spark SQl, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
- Created Sqoop jobs to import the data from DB2 to HDFS.
- Exported data using Sqoop into HDFS and Hive for report analysis.
- Used Oozie Workflow engine to run multiple Hive and Sqoop jobs.
- Wrote Hive queries for data analysis to meet the business requirements.
- Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
- Experienced in working with Sqoop Scripts.
- Experienced in creating hive external tables in HDFS.
- Created visualized reports using tableau tool for the visualization.
- Performed source data transformations using Hive .
- Created Partitions in Hive tables and worked on them using HQL .
Environment: Spring tool suite(STS), Spark, Scala, Sqoop, Bashscript, Bamboo, AWS, Github, Hive, Map-Reduce, DB2, Shell scripting, Oozie, Python.
Confidential
Java Developer
Responsibilities:
- Participated in all the phases of the Software development life cycle (SDLC) which includes Development, Testing, Implementation and Maintenance.
- Involved in collecting client requirements and preparing the design documents.
- Implemented Spring MVC architecture and Spring Bean Factory using IOC, AOP concepts.
- Developed the JAVA classes to execute the business logic and to collect the input data from the users using JAVA, Oracle.
- Involved in creation of scripts to create, update and delete data from the tables.
- Followed Agile Methodology in analyze, define, and document the application which will support functional and business requirements.
- Wrote JSP using HTML tags for designing UI for different pages.
- Extensively used OOD concepts in overall design and development of the system.
- Developed user interface using Spring JSP to simplify the complexities of the application.
- Responsible for Development, unit testing and implementation of the application.
- Used Agile methodology to design, develop and deploy the changes.
- Extensively used tools like AccVerify, check style and Clockworks to check the code.
Environment: Java, JSP, JSP JDBC, HTML, XSL, Spring, CSS, JavaScript, Oracle 8i, XML, WebLogic
Confidential
Java developer
Responsibilities:
- Good understanding in Install, configure and deploy software by gathering all the requirements needed.
- Performed a Quality Assurance test.
- Help Design application development using Spring MVC framework, front-end interactive page design using HTML, JSP, JSTL, CSS, JavaScript, JQuery and AJAX.
- Implemented JavaScript , Shell script , JSP for Front and Server-side validations.
- Involved in writing SQL queries for fetching data from Oracle database.
- Developed multi-tiered web - application using J2EE standards.
- Used JIRA to track bugs.
- Used Apache Axis to develop web services and SOAP protocol for web services communication.
- Implemented persistence layer using Spring JDBC to store and update data in database.
- Used Apache Tomcat application server for deploying and configuring application.
- Used JUnit to test persistence and service tiers.Involved in unit test case preparation.
- Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
- Deployed and built the application using MAVEN.
- Following AGILE and SCRUM Methodology.
- Involved in sprint planning, code review, and daily standup meetings to discuss the progress of the application.
Environment : HTML, Ajax, Servlets, JSP, SQL, JavaScript, CSS, XML, SOAP, Tomcat Server, Hibernate, JDBC, MAVEN, MVC, Agile, Git, JIRA, SVN .
