We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

4.00/5 (Submit Your Rating)

Cranston, RI

SUMMARY

  • 8+ years of overall experience in data analysis, data modeling and implementation of enterprise class systems spanning Big Data, Data Integration, Object Oriented programming and Advanced Analytics.
  • Excellent understanding of Hadoop Architecture and Deamons such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker.
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, HDFS, MapReduce Programming, Hive, Pig, Sqoop, HBase, Impala, Solr, Elastic Search, Oozie, Zoo Keeper, Kafka, Spark, Cassandra with Cloudera and Hortonworks distribution.
  • Created custom Solr Query segments to optimize ideal search matching.
  • Used Solr Search & MongoDB for querying and storing data.
  • Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using RDDs, and Scala.
  • Analyzed teh Cassandra/SQL scripts and designed teh solution to implement using Scala.
  • Expertise in Big Data Technologies and Hadoop Ecosystem tools like Flume, Sqoop, Hbase, ZooKeeper, Oozie, MapReduce, Hive, PIG and YARN.
  • Extracted and updated teh data into MONGOD using MONGO import and export command line utility interface.
  • Developed Collections in Mongo DB and performed aggregations on teh collections.
  • Hands on experience in installation, configuration, management and deployment of Big Data solutions and teh underlying infrastructure of Hadoop Cluster using Cloudera and Horton works distributions.
  • In - depth Knowledge of Data Structures, Design and Analysis of Algorithms.
  • Good understanding of Data Mining and Machine Learning techniques.
  • Hands on experience in various Hadoop distributions IBM Big Insights, Cloudera, Horton works and MapR.
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data
  • Frames, Spark Streaming, Spark MLlib.
  • Expertise in writing Spark RDD transformations, actions, Data Frames, case classes for teh required input data and performed teh data transformations using Spark-Core.
  • Expertise in developing Real-Time Streaming Solutions using Spark Streaming.
  • Proficient in big data ingestion and streaming tools like Flume, Sqoop, Spark, Kafka and Storm.
  • Hands on experience in developing Map Reduce programs using Apache Hadoop for analyzing teh Big Data.
  • Expertise in implementing ad-hoc Map Reduce programs using Pig Scripts.
  • Experience in importing and exporting data from RDBMS to HDFS, Hive tables and HBase by using Sqoop.
  • Experience in importing streaming data into HDFS using flume sources, and flume sinks and transforming teh data using flume interceptors.
  • Exposure on usage of Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
  • Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
  • Knowledge in handling Kafka cluster and created several topologies to support real-time processing requirements.
  • Hands on experience migrating complex map reduce programs into Apache Spark RDD transformations.
  • Design and Programming experience in developing Internet Applications using Java, J2EE, JSP, MVC, Servlets, Struts, Hibernate, JDBC, JSF, EJB, XML, AJAX and web based development tools.
  • Experience with Enterprise Java Beans (EJB) components Technical expertise & demonstrated high standards of skills in J2EE frameworks like Struts (MVC Framework).
  • Proficient in using various IDEs like RAD, Eclipse, and JDeveloper.
  • Extensive experience in programming, deploying, configuring middle-tier popular J2EE Application Servers like Bea WebLogic 8.1, IBM WebSphere 5.0, open source Apache Tomcat, and JBoss Servers.
  • Good experience in client web technologies like HTML, CSS, JavaScript and AJAX, Servlets, JSP, JSON, XML, JSF,AWS.
  • Experience in Software Development Life Cycle (SDLC), OOA, OOD and OOP through implementation and testing.
  • Experience in working with Hadoop in Stand-alone, pseudo and distributed modes.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Zookeeper, Hive, Pig, Sqoop, Oozie, Flume,Yarn,Spark Database Languages SQL,PL/SQL,Oracle

Programming Languages: Java, Scala Frameworks Spring, Hibernate, JMS

Scripting Languages: JSP, Servlets, JavaScript, XML, HTML, Python Web Services RESTful web services

Databases: RDBMS, HBase, Cassandra IDE Eclipse, IntelliJ

Platforms: Windows, Linux, Unix

Application Servers: Apache Tomcat, Web Sphere, Web logic, JBoss Methodologies Agile, Waterfall

ETL Tools: Talend

PROFESSIONAL EXPERIENCE

Hadoop/Spark Developer

Confidential - Cranston, RI

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Developed job processing scripts using Oozie workflow.
  • Created Cassandra tables to store various data formats of data coming from different sources.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on teh Hadoop cluster.
  • Configured Spark streaming to get ongoing information from teh Kafka and stored teh stream information to HDFS.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on teh data got from Kafka and Persists into Cassandra.
  • Designed, developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.
  • Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Used various Spark Transformations and Actions for cleansing teh input data
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining teh Hadoop cluster on AWS EMR.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Practical experience in defining queries on JSON data using Query DSL provided by ElasticSearch.
  • Experience in improving teh search focus and quality in ElasticSearch by using aggregations.
  • Worked with Elastic Mapreduce and setup Hadoop environment in AWS EC2 Instances
  • Experience in working with Hadoop clusters using Hortonworks distributions.
  • Experienced in NOSQL databases like HBase, MongoDB and experienced with Hortonworks distribution of Hadoop.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Used Spark Streaming APIs to perform transformations and actions on teh fly for building common learner data model which gets teh data from Kafka in Near real time and persist it to Cassandra.
  • Consumed JSON messages using Kafka and processed teh JSON file using Spark Streaming to capture UI updates
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Helped teh team to increase teh Cluster size from Nodes.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as we as RDBMS and NoSQL data stores for data access and analysis. Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Wrote Hive Queries and UDF's.
  • Developed Hive queries to process teh data and generate teh data cubes for visualizing.
  • Created Pig Latin scripts to sort, group, join and filter teh enterprise wise data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Gained experience in managing and reviewing Hadoop log files.

Environment: Hadoop, MapReduce, Sqoop, HDFS, HBase, Hive, Pig, Oozie, Spark, Kafka, Cassandra, AWS, ElasticSearch, Java, Oracle 10g, MySQL, Ubuntu, HDP.

Hadoop Developer

Confidential - Pittsburgh, PA

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive, Hbase and Sqoop.
  • Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Experience in implementing Spark RDD's in Scala.
  • Implemented different machine learning techniques using Weka machine learning library.
  • Developed Spark program to analyze reports using Machine Learning models
  • Good exposure in development with HTML, Bootstrap, Scala
  • Worked in AWS environment for development and deployment of custom Hadoop applications.
  • Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.
  • Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build teh data model and persists teh data in HDFS
  • Experienced in implementing static and dynamic partitioning in hive.
  • Experience in customizing map reduce framework at different levels like input formats, data types and partitioners.
  • Experience in using Flume to efficiently collect, aggregate and move large amounts of log data.
  • Developed custom writable MapReduce JAVA programs to load web server logs into HBase using Flume.
  • Was responsible for importing teh data (mostly log files) from various sources into HDFS using Flume.
  • Implemented API tool to handle streaming data using Flume.
  • Created Oozie workflows to automate data ingestion using Sqoop and process incremental log data ingested by Flume using Pig.
  • Involved in migrating hive queries and UDF's in hive to Spark SQL.
  • Extensively Used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value.
  • Created Oozie workflow engine to run multiple Hive jobs.
  • Created customized BI tool for manager team that perform Query analytics using HiveQL.
  • Involved in data migration from Oracle database to Mongo DB
  • Involved in migrating tables from RDBMS into Hive tables using Sqoop and later generate particular visualizations using Tableau.
  • Created Hive Generic UDF's to process business logic that varies based on policy.
  • Experienced with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy etc in Hive tables.
  • Creating Hive tables, loading with data and writing Hive queries which will run internally in Map Reduce way
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats
  • Continuous monitoring and managing teh Hadoop cluster using Cloudera Manager
  • Designed teh ETL process and created teh high level design document including teh logical data flows, source data extraction process, teh database staging, job scheduling and Error Handling
  • Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2
  • Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.

Environment: Hadoop, Cloudera (CDH 4), HDFS, Hive, HBase, Flume, Sqoop, Pig, Kafka Java, Eclipse, Teradata, Tableau, Talend, MongoDB, Ubuntu, UNIX, and Maven.

Hadoop Developer

Confidential - Houston, TX

Responsibilities:

  • Worked with terabytes of structured and unstructured data (240 TB with replication factor coming in from multiple web sources).
  • Designed and developed entire pipeline from data ingestion to reporting tables.
  • Scrutinized Hadoop Log Files, executed performance scripts.
  • Used Cloudera Manager to monitor and manage Hadoop Cluster.
  • Creating HBase tables for random read/writes by teh map reduce programs.
  • Developed Oozie workflows to schedule and manage Sqoop, Hive, Pig jobs, orchestrating Extract-Transform- Load process.
  • Good understanding of Amazon web services like Elastic MapReduce (EMR), EC2.
  • Working noledge of MapR and Teradata unison to optimize high availability (HA).
  • Installed, configured and participated in Hadoop MapReduce, Pig, Hive, Oozie, Sqoop environment.
  • Involved in loading data from LINUX file system to HDFS Importing and exporting data into HDFS and Hive using Sqoop Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table- Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig Used
  • Experienced in running Hadoop streaming jobs to process terabytes of JSON format data.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet teh business requirements. Extracted data from Oracle database, Teradata database into HDFS using Sqoop.
  • Worked on streaming log data into HDFS from web servers using Flume.
  • Performed data cleaning, integration, transformation, reduction by developing MapReduce jobs in java for data mining
  • Importing and exporting data into MapR-FS from various web servers.
  • Creating Hive tables, loading data into it and customizing hive queries, internally operating in MapReduce way.
  • Performed Map-Side joins and Reduce-Side joins for large tables.
  • Defining a schema, creating new relations, performing Pig-Join, sorting and filtering using Pig-Group on large data sets.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Oozie, Cloudera CDH4.5, Cloudera CDH5.1.3, SQL, LINUX, Java, J2EE, Web services, PostgreSQL, DB2.

Java Developer

Confidential - IN

Responsibilities:

  • Responsible for requirement gathering, requirement analysis, defining scope, and design.
  • Developed teh Use Cases, Class Diagrams and Sequence Diagrams using Rational Rose.
  • Developed user Interface using JSP and HTML.
  • Written Server Side programs using Servlets.
  • Used Java Script for client side Validation.
  • Used HTML, AWT with Java Applets to create web pages.
  • Responsible for database design and developed stored procedures and triggers to improve teh performance.
  • Used Eclipse IDE for all coding in Java, Servlets and JSPs.
  • Co-ordinate with teh QA lead for development of test plan, test cases, test code and actual testing, responsible for defects allocation and ensuring that teh defects are resolved.
  • Used Flex Styles and CSS to manage teh Look and Feel of teh application.
  • Deployed teh application on Web Sphere Application server.

Environment: Java2.0, Eclipse, Apache Tomcat Web Server, JSP, JavaScript, AWT, Servlets, JDBC, HTML, Front Page 2000, Oracle, Win NT, CVS.

Java Developer

Confidential - IN

Responsibilities:

  • Setting up environment such as deploying application, maintaining Web server.
  • Analyzing teh Defects and providing fix.
  • Developed Servlets and Java Server Pages (JSP), to route teh submittals to teh EJB components. Java Scripting handled teh Front-end validations
  • Wrote Korn Shell build scripts for configuring and deployment of GUI application on UNIX machines.
  • Created Session Beans and controller Servlets for handling HTTP requests from JSP pages.
  • Eclipse IDE used for teh developing teh application.
  • Used Session Facade, Singleton design patterns.
  • Development of XML files using XPATH, XSLT, XPointer, DTD's, Schema's and Parsing using both SAX and DOM parsers.
  • Designed and developed XSL style sheets using XSLT to transform XML and display teh Customer Information on teh screen for teh user and also for processing
  • Extensively used Clear Case, teh version control tool.

Environment: Java, J2EE, Oracle, RMI, ClearCase, JDBC, UNIX, Junit, Eclipse, WebSphere5.0, Struts, XML, XSLT, XPATH, XHTML, CSS, HTTP

We'd love your feedback!