We provide IT Staff Augmentation Services!

Sr Hadoop/spark Developer Resume

4.00/5 (Submit Your Rating)

SUMMARY:

  • 8+ years of extensive Professional IT experience, including 4+ years ofHadoop/Big Data experience, capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
  • 4+ years of experience in HadoopEcosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Oozie, and Zookeeper.
  • In depth understanding ofHadoopArchitecture including YARN and various components such asHDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2 concepts.
  • Good knowledge on creating Data Pipelines in SPARKusingSCALA.
  • Good knowledge on Spark components like Spark SQL, MLlib, Spark Streaming and GraphX.
  • Hands-on experience on fetching the live stream data from DB2 to HBase table usingSpark Streaming and Apache Kafka.
  • Experience in developing Spark Programs for Batch and Real-Time Processing. Developed Spark Streaming applications for Real Time Processing.
  • Strong knowledge on implementation of data processing on Spark-Core using SPARK SQL, MLlib and Spark streaming.
  • Expertise in writingSparkRDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations usingSpark-Core.
  • Experience in usingSpark-SQL with various data sources like JSON, Parquet and Hive.
  • Hands on experience in working onSpark-SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Expertise in integrating the data from multiple data sources using Kafka.
  • Knowledge about unifying data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
  • Worked extensively withHadoopDistributions like Cloudera, Hortonworks.Good knowledge on MAPR distribution & Amazon’s EMR.
  • Written and implemented custom UDF's in Pig for data filtering.
  • Expertise in writing Hive and PIG queries for data analysis to meet the business requirements.
  • Hands-on experience in using Impala for data analysis.
  • Hands-on experience in using the data ingestion tools - Sqoop and flume.
  • Experience in importing and exporting data using Sqoop from RDBMS to HDFS and vice-versa.
  • Hands-on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Worked on NoSQL databases like HBase, Cassandra and MongoDB.
  • Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Experience in configuring various topologies in Storm to ingest and process data on the fly from multiple sources and aggregate into central repository Hadoop.
  • Hands on experience on build tools like Maven, Log4j, Junit and Ant.
  • Experience in working with Spring and Hibernates framework from Java.
  • Extensive experience with Databases such as Oracle, Mysql, MS-Sql and PL Sql Script.
  • Experience in using IDEs like Eclipse, NetBeans and Intellij.
  • Experience with web UI development using jQuery, CSS, HTML, HTML5, XHTML, JavaScript.
  • Working experience with Linux lineup like Redhat and CentOS.
  • Experience on ETL concepts using Informatica Power Center, OLAP and OLTP.
  • Good Knowledge on AWS components like EC2 Instance, S3 and EMR.
  • Comprehensive knowledge of Software Development Life Cycle (SDLC).
  • Exposure to Waterfall, Agile and Scrum models.
  • Strengths include handling variety of software systems, capacity to learn and adapt to new technologies, amicable team player and curriculum focused with strong personal, technical and communication skills.

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop MapReduce Pig Hive Hbase YARN Kafka Flume Sqoop Impala Oozie Zookeeper Spark Ambari Elastic Search Solr Mahout MongoDB Cassandra Avro Storm Parquet Snappy

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5) Hortonworks MapR Apache EMR

Databases & warehouses: Teradata Sql Server My SQL Oracle

Java Space: Core Java J2EE JDBC JNDI JSP EJB Struts Spring Boot REST SOAP JMS

Languages: Python Java Jruby SQL HTML DHTML Scala JavaScript XML C/C++

PROFESSIONAL EXPERIENCE:

Confidential

Sr Hadoop/Spark developer

Responsibilities:

  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Kafka, Spark, Impala with Cloudera distribution.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Hands on experience on Cloudera Hue to import data on the GUI.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka.
  • Implemented real time system with Kafka, Storm and Zookeeper.
  • Worked on integrating Apache Kafka with Spark Streaming process to consume data from external REST APIs and run custom functions.
  • Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming.
  • Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Configured, deployed and maintained multi-node Dev and Tested Kafka Clusters.
  • Configured spark streaming data to receive real time data from Kafka and store it in HDFS.
  • Data from HDFS, Hive and HBase for search and quick BI visualization in Kibana.
  • Built centralized logging to enable better debugging using Elastic Search, Logstash and Kibana.
  • Efficiently handled periodic exporting of SQL data into Elastic search.
  • Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Involved in runningHadoopstreaming jobs to process terabytes of text data. Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
  • Configured, supported and maintained all network, firewall, storage, load balancers, operating systems, and software inAWSEC2.
  • Worked on storing the database in S3 by connecting Cassandra db to the Amazon EMR File System.
  • Implemented the use of Amazon EMR for Big Data processing among a Hadoop Cluster of virtual servers on Amazon related EC2 and S3.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Using Hive and Pig developed the ad-hoc queries required for the business users to generate data metrics.
  • Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
  • Worked with Nifi for managing the flow of data from source to HDFS.
  • Used IMPALA to analyze data ingested into Hive tables.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.
  • Good Knowledge on MLLib framework for auto suggestions.
  • Good knowledge in using Data Manipulations, Tombstones, Compactions in Cassandra. Well experienced in avoiding faulty Writes and Reads in Cassandra.
  • Performed data analysis with Cassandra using Hive External tables.
  • Involved in creating data-models for customer data usingCassandraQuery Language related to Cassandra clusters.
  • Automation of SLA/OLA Reports Using Splunk.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark.
  • Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
  • Used Spark API overHadoopYARN as execution engine for data analytics using Hive.
  • ImplementedYARNCapacity Scheduler on various environments and tuned configurations according to the application wise job loads.
  • Configured Continuous Integration system to execute suites of automated test on desired frequencies using Jenkins, Maven & GIT.
  • Implement ETL standards utilizing proven data processing patterns with open source standard tools like Talend and Pentaho for more efficient processing.
  • Involved in loading data from LINUX filesystem to HDFS.
  • Followed Agile Methodologies while working on the project.

Environment: Hadoop, HDFS, Hive, Spark, Cloudera, AWS EC2, S3, ERM, Sqoop, Kafka, Spark MLLIB, PySpark, Yarn, Shell Scripting, Impala, Splunk, Scala, Pig, Cassandra, Oozie, Java, JUnit, Agile methods, Jenkins, Maven, Linux, MySQL, Elastic Search, Kibana.

Confidential

Hadoop developer

Responsibilities:

  • Worked in Multi ClusteredHadoopEcho-System environment.
  • Created MapReduce programs using Java API that filter un-necessary records and find out unique records based on different criteria.
  • Performed optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering the best results.
  • Converting the existing relational database model toHadoopecosystem.
  • Installed and configured Apache Hadoop, Hive and Pig environment.
  • Worked with Linux systems and RDBMS database on a regular basis so that data can be ingested using Sqoop.
  • Reviewed and managed all log files using HBase.
  • Designed and implementedHIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Creating Hive tables and working on them using HiveQL.
  • Used Apache Kafka for the Data Ingestion from multiple internal clients.
  • Developed data pipeline using Flume and Spark to store data into HDFS.
  • Big data processing usingSpark, AWS, and Redshift.
  • Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project inSpark.
  • Involved in performing the Linear Regression using Spark MLlib in Scala.
  • Continuous monitoring and managing theHadoop cluster through HDP (Hortonworks Data Platform).
  • Implemented Frameworks using Java and Python to automate the ingestion flow.
  • Loaded the CDRs from relational DB using Sqoop and other sources toHadoop cluster by using Flume.
  • Implemented data quality checks and transformations using Flume Interceptor.
  • Implemented collections & Aggregate Frameworks in MongoDB.
  • Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
  • Efficiently handled periodic exporting of SQL data into Elastic search.
  • Involved in loading data from UNIX file system and FTP to HDFS.
  • Design and Implementation of Batch jobs using MR2, PIG, Hive, Tez.
  • Used Apache Tez for highly optimized data processing.
  • DevelopedHivequeries to analyze the output data.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
  • Developed Pig Custom UDF's for custom input formats for performing various levels of optimization.
  • Involved in maintaining the Hadoop clusters using Nagios server.
  • Used Pig to import semi-structured data coming from Avro files to make serialization faster.
  • Configuring high availability multi-coresolrservers using replication, request handlers, analyzers and tokenizers.
  • Configuredsolrserver to index different content types like HTML, PDF, XML, XLS, DOC, DOCX and other types.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Used Spark for fast processing of data in Hive and HDFS.
  • Performed batch processing of data sources using Apache Spark, Elastic search.
  • UsedZookeeperto provide coordination services to the cluster.
  • CreatedHivequeries that helped market analysts spot emerging trends by comparing freshdata with tables and historical metrics.
  • Wrote the Shell scripts to monitor the health check ofHadoop daemon services and respond accordingly to any warning or failure conditions.
  • Worked on Reporting tools like Tableau to connect with Hive for generating daily reports.
  • Utilized Agile Scrum methodology.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Scala, Kafka, Flume, Sqoop, Hortonworks, AWS, Redshift, Oozie, Zookeeper, Elastic Search, Avro, Python, Shell Scripting, SQL Talend, Spark, HBase, MongoDB, Linux, Kafka, Solr, Ambari.

Confidential, Alexandria, VA

Hadoop Developer

Responsibilities:

  • Worked on importing data from various sources and performed transformations using Map Reduce and Hive to load data into HDFS.
  • Used Apache Ambari to communicate with Hadoop Eco System components.
  • Developed multiple MapReduce jobs in PIG and HIVE for data cleaning and pre-processing.
  • Hands on experience on HIVE queries and functions for evaluation, filtering, loading and storing of data.
  • DevelopedHiveQLqueries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
  • Created HBase tables to store variable data formats coming from different portfolios Performed real time analytics on HBase using Java API and Rest API.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
  • Moved all log/text files generated by various products into HDFS location.
  • Experienced in managing and reviewing theHadooplog files.
  • Analyzed HBase data in Hive by creating external partitioned and bucketed tables.
  • Written Map Reduce code that will take input as log files and parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Experienced with join different data sets using Pig join operations to perform queries using pig scripts.
  • Used Oozie andZookeeperfor workflow scheduling and monitoring.
  • Actively involved in loading data from UNIX file system to HDFS.
  • Used Sqoop for importing and exporting data into HDFS and HIVE.
  • Used Flume to import the Web Logics.
  • Developed Shell scripts to automate routine DBA tasks.
  • Development Review (code review) to ensure that the code functionality is as per business requirements and the standards are followed.
  • Implemented test scripts for supporting test driven development and continuous integration.
  • Followed Agile Methodology.

Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Java, Maven, Hortonworks, Eclipse, Agile, Unix and Shell Scripting.

Confidential

Java Developer

Responsibilities:

  • Designed use cases for the Application as per the business requirements.
  • Involved in various phases of Software Development Life Cycle (SDLC).
  • Developed the User Interfaces using Struts, JSP, JSTL, HTML and Ajax, JavaScript.
  • Involved in creation of a queue manager in WebSphere MQ along with the necessary WebSphere MQ objects required for use with WebSphere Data Interchange.
  • Experience with SOAP Web services and WSDL.
  • Use ANT scripts to automate application build and deployment processes.
  • Involved in design, development and Modification ofPL/SQLstored procedures, functions, packages and triggers to implement business rules into the application.
  • Used RESTful web services with MVC for parsing and processing XML data.
  • DevelopedETLprocesses to load data from Flat files, SQL Server and Access into the target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.
  • Deployed web applications on Tomcat and JBoss server.
  • Involved in creating User Authentication page usingJavaServlets.
  • Migrated data source passwords to encrypted passwords using Vault tool in all theJBossapplication servers.
  • Used Spring Framework for Dependency injection and integrated using Hibernate.
  • UsedJMS for asynchronous communication between different modules.
  • Actively involved in code reviews and in bug fixing.
  • Followed Agile software methodology for project development.

Environment: Java, J2EE, Servlets, HTML, XHTML, CSS, JavaScript, Struts 1.1, Spring, JSP, JMS, JBoss 4.0, Rest, SQL Server 2000, Ant, CVS, PL/SQL, MVC, Hibernate, Eclipse, Linux.

Confidential

Java developer

Responsibilities:

  • Good understanding in Install, configure and deploy software by gathering all the requirements needed.
  • Performed a Quality Assurance test.
  • Help Design application development using Spring MVC framework, front-end interactive page design using HTML, JSP, JSTL, CSS, JavaScript, JQuery and AJAX.
  • Implemented JavaScript, Shell script, JSP for Front and Server-side validations.
  • Involved in writing SQL queries for fetching data from Oracle database.
  • Developed multi-tiered web - application using J2EE standards.
  • Used JIRA to track bugs.
  • Used Apache Axis to develop web services and SOAP protocol for web services communication.
  • Implemented persistence layer using Spring JDBC to store and update data in database.
  • Used Apache Tomcat application server for deploying and configuring application.
  • Used JUnit to test persistence and service tiers. Involved in unit test case preparation.
  • Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
  • Deployed and built the application using MAVEN.
  • Following AGILE and SCRUM Methodology.
  • Involved in sprint planning, code review, and daily standup meetings to discuss the progress of the application.

Environment: HTML, Ajax, Servlets, JSP, SQL, JavaScript, CSS, XML, SOAP, Tomcat Server, Hibernate, JDBC, MAVEN, MVC, Agile, Git, JIRA, SVN.

We'd love your feedback!