We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

Jersey City New, JerseY

SUMMARY

  • Close to 6 years of experience in IT which includes Analysis, Design, Development, Implementation & maintenance of projects in Big Data usingApache Hadoop/Spark echo systems, design and development of web applications usingBig Data technologies.
  • Experience in analysis, design, development and integration using Big Data Hadoop ecosystem components with cloudera in working with various file formats likeAvro, Parquet
  • Working with various compression techniques likeSnappy, LZO and GZip.
  • Experience in developing customized partitioners and combiners for effective data distributions.
  • Expertise in tuningImpala queriesto overcome multiple concurrence jobs and out of memory errors for various analytics use cases
  • Rigorously applied transformations inSparkandRprograms.
  • Expertise in using built inHive SerDeand developing custom SerDes.
  • Developed multiple Internal and external Hive Tables using Dynamic Partitioning & bucketing.
  • Design and development of full text search feature with multi - tenancy elastic search after collecting the real time data through Spark streaming
  • Experience in analyzing large scale data to identify new analytics, insights, trends, and relationships with a strong focus on data clustering.
  • Wrote multiple customized MapReduce Programs for various Input file formats.
  • Experience in developingNoSQLapplications usingMongodb,HBaseandCassandra.
  • Tuned multiple spark applications for better optimization.
  • Developed data pipeline for real time use cases usingKafka, FlumeandSpark Streaming.
  • UsingSqoopfromHDFS, Hiveto Relational Database Systems (RDBMS) and vice-versa.
  • Developed multiple hive views for accessingHBase Tablesdata.
  • Used complexSpark SQLprograms for better joining and display the results on Kibana dashboard.
  • Expertise in using various formats likeText,Parquetwhile creating Hive Tables.
  • End-to-end hands on inETLprocess and setup automation to load terabytes data intoHDFS.
  • Good Experience in Developing Applications usingcore java, Collections, Threads, JDBC, Servlets, JSP, Struts, Hibernate, XMLcomponents using variousIDEssuch as Eclipse6.0, MyEclipse.
  • Experience inSQLprogramming in writing queries using joins, stored procedures, triggers, functions and performing query optimization techniques withOracle, SQL Server, MySQL.
  • Excellent team worker with good interpersonal skills and leadership qualities.
  • Excellent organizational and communication skills.
  • Excellent in understanding ofAgileandscrummethodologies

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential, Jersey city, new jersey

Responsibilities:

  • End-to-end involvement in data ingestion, cleansing, and transformation in Hadoop.
  • Created Hive tables, load and transform large sets of structured and semi structured data
  • Logical implementation and interaction withHBase.
  • Developed multiple scala/spark jobs for data transformation and aggregation
  • Write scripts to automate application deployments and configurations. Monitoring YARN applications. Troubleshoot and resolve cluster related system problems.
  • Optimizing of existing word2vec algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's in development.
  • Produce unit tests for Spark transformations and helper methods
  • Implemented various output formats like Sequence file and parquet format in Map reduce programs. Also, implemented multiple output formats in the same program to match the use cases.
  • Design and developed data pipeline using kafka, flume and spark streaming
  • Performed benchmarking of the No-SQL databases, Cassandra and HBase.
  • Hands on experience with Lambda architectures.
  • Created data model for structuring and storing the data efficiently. Implemented partitioning and bucketing of tables in Cassandra.
  • Implemented test scripts to support test driven development and continuous integration.
  • Converted text files into Avro then to parquet format for the file to be used with other Hadoop eco system tools.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts inSpark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • HBase tablesto load large sets of structured, semi-structured and unstructured data.
  • UsedImpalato read, write and query the data inHDFSfromCassandraand configuredKafkato read and write messages from external programs.
  • Handling the importing of data from various data sources (media, MySQL) and performing transformations usingHive, MapReduce.
  • RanPig scriptsonLocal Mode, Pseudo Mode, and Distributed Modein various stages of testing.
  • Performed Importing and exporting data fromSQL server to HDFS and Hive using Sqoop
  • Optimizing existing algorithms in Hadoop usingSpark Context,Spark-SQL, Data Frames and PairRDD's.
  • ImplementedSparkusingScalaandSpark SQLfor faster testing and processing of data.
  • Create a complete processing engine, based onClouderadistribution, enhanced performance.
  • Developed data pipeline usingFlume, Spark and Hiveto ingest, transform and analyzing data.
  • Writing scaladoc-style documentation with all code
  • Designed and Modified Database tables and usedHBASEQueries to insert and fetch data from tables.
  • Developing and supporting multiple spark Programs running on the cluster.
  • Preparation of Technical architecture and Low -level design documents.
  • Tested raw data and executed performance scripts

Environment: Linux, eclipse, jdk1.8.0, Hadoop2.9.0, flume 1.7.0, HDFS, MapReduce, Pig0.16.0, Spark 2.0, Hive 2.0, Apache-Maven3.0.3

Confidential, New York

Hadoop Developer

Responsibilities:

  • Involved in creating Hive tables, loading with data and writing hive queries to process the data.
  • Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Involved with the team of fetching live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
  • Developing Spark Streaming program on Scala for importing data from the Kafka topics into the Hbase tables.
  • Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text file, Avro data files, Sequence files, Xml and JSon files, ORC and Parquet).
  • Involved in the Design Phase for getting live event data’s from the database to the front end application using Spark Ecosystem.
  • Importing data from hive table and run SQL queries over imported data and existing RDD’s Using Spark SQL.
  • Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
  • Collected the log data from web servers and integrated into HDFS using Flume.
  • Responsible to manage data coming from different sources.
  • Extracted files from Couch DB and placed into HDFS using Sqoop and pre-process the data for analysis.
  • Developed the subqueries in Hive.
  • Partitioning and bucketing the imported data using HiveQL.
  • Partitioning dynamically using dynamic-partition insert feature.
  • Moving this partitioned data onto the different tables as per as business requirements.

Environment: eclipse, jdk1.8.0, Hadoop2.8, HDFS, MapReduce, Hive2.0, HBase, Apache-Maven3.0.3

Confidential, New York

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 130 nodes.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Good experience with continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

We'd love your feedback!