We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • I have about 7+ years of professional IT experience which includes experience in Big d Confidential ecosystem experience in complete project life cycle (design, development, testing and implementation) of which over 3+ years of work experience in ingestion, storage, querying, processing and analysis of Big D Confidential with hands on experience inHadoop Ecosystem (YARN, HDFS) and its components Hive, Pig, HBase, Sqoop, Hue, Kafka, Flume, Oozie, Zookeeper, Spark, SparkSQL andSparkStreaming.
  • Worked Hands on in Hadoop clusters like Hortonworks, AWS Elastic Map Reduce and Cloudera.
  • I have hands on experience in improving the performance and optimization of the existing algorithms in Hadoop usingSparkcontext,Spark - SQL, D Confidential Frame, pair RDD's &SparkYARN.
  • I have working experience on building spark applications using build tools like SBT, Maven and Gradle.
  • I have good experience in dealing with different file formats like text, Sequence, RCFILE, ORC, Parquet, Avro and JSON and different compression formats like GZip, LZO, BZip2 and snappy.
  • I have good knowledge on relational d Confidential bases like MySQL, Oracle and NoSQL d Confidential bases like HBase, MongoDB. working knowledge on UNIX /Linux systems including Experience on shell scripting working experience in handling semi/un-structured d Confidential from different d Confidential sources.
  • Working experience in developing Map Reduce programs using Combiners, Map side join, Reducer side join, Distributed Cache, Compression techniques, Multiple Input & output.
  • I have working experience in performing ad-hoc analysis on structured d Confidential using HiveQL, joins and Hive UDF's good exposure to Counters, Shuffle & Sort parameters, Dynamic Partitions, Bucketing for performance improvement.
  • I have worked in using IDE like Eclipse and Intellij IDEA
  • I have working knowledge in Java and SQL in application development and deployment.

TECHNICAL SKILLS

Big D Confidential Associated: HDFS, MapReduce, Pig, Hive, Sqoop, Flume, HBase, Oozie, Apache Spark, Spark SQL, Spark Streaming.

Process/D Confidential Modeling: MS Visio, UML Diagrams and ER Studio

Cluster Manager Tools: HDP Ambari, Cloudera Manager, Hue

ETL/ELT/D Confidential bases: HBase, MongoDB, Spark SQL, MS Access, Oracle, DB-II, My SQL, SQL Developer, SQL Server and Toad

Languages: C, C++, Java, PL/SQL, Python, Scala

Web-Technologies: HTML, DHTML, XML, CSS

Microsoft Technologies: ASP.NET, C#.Net, VB.Net, ADO.NET, SharePoint, Word, Excel and PowerPoint.

Operating Systems: Linux, Ubuntu, RHEL, Windows XP/7/8/10.

IDE: Eclipse and Intellij IDEA

PROFESSIONAL EXPERIENCE

Confidential

Hadoop/Spark Developer

Responsibilities:

  • Worked with lambda architecture in handling and processing batch and real-time d Confidential .
  • Using Sqoop, ingested the D Confidential from d Confidential warehouse to HDFS.
  • Using Kafka, collected real-time streaming and log d Confidential from web applications and click stream d Confidential, analyzing a part of d Confidential using spark streaming and rest stored into HDFS for future use.
  • Worked in writing Hive Queries for analyzing d Confidential in Hive warehouse using Hive Query Language (HiveQL) and Worked with Hive Tables, Hive queries, Partitioning, Bucketing.
  • PerformedD Confidential Profiling, identifyd Confidential quality and validating rules regarding d Confidential integrity andd Confidential quality as it relates to the impact on business requirements.
  • Build spark applications using SBT builds.
  • Used Spark SQL to process the huge amount of structured d Confidential .
  • Connected Tableau server to publish dashboard to a central location for portal integration.
  • Creation of metrics, attributes, filters, reports, and dashboards created advanced chart types, visualizations and complex calculations to manipulate the d Confidential .

Environment: Cloudera Manager, Sqoop, Java (jdk1.8 Version), Hive, Spark, Spark-SQL, Scala, Tableau.

Confidential

Hadoop/Spark Developer

Responsibilities:

  • Worked in Ingesting flat files from local Unix file systems to HDFS and using Sqoop ingested structured d Confidential from legacy RDBMS systems to
  • Developed the code for Importing and exporting d Confidential into HDFS and Hive using Sqoop
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, D Confidential Frame, Pair RDD's, Spark YARN.
  • Used D Confidential Frame API in Scala for converting the distributed collection of d Confidential organized into named columns, developing predictive analytic using Apache Spark Scala APIs.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop and Worked with Spark and Scala.
  • Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive Used Hive to analyze the partitioned and bucketed d Confidential and compute various metrics for reporting on the dashboard.
  • Utilized Oozie workflow to run Hive Jobs Extracted files through Sqoop and placed in HDFS and processed.

Environment: Hadoop, Spark, HDFS, Scala, Hive, Java, Spring, Map Reduce, Sqoop, Spring MVC, Big D Confidential, Spark SQL, JDBC, Oozie, Pig, Flume

Confidential

Hadoop Developer

Responsibilities:

  • Worked on analyzing d Confidential using different big d Confidential analytic tools including Pig, Hive and MapReduce.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise d Confidential .
  • Implemented Partitioning, Dynamic Partitions, and Buckets in Hive on Avro files to meet the business requirements.
  • Implemented D Confidential Integrity and D Confidential Quality checks using Linux scripts.
  • Used flume to tail the application log files into HDFS.
  • Involved in scheduling of Hive and pig jobs using Oozie workflow.
  • Involved in performance tuning and memory optimization of map-reduce and Hive applications.
  • Worked on end to end automation of application.
  • Responsible for continuous Build/Integration with Jenkins and deployment using XL Deploy.
  • Actively involved in code review and bug fixes and enhancements.

Environment: Hadoop, HDFS, MySQL, Apache Hive, Pig, MapReduce, MySQL, Core Java, Shell Scripting, Eclipse, Git, Jenkins.

We'd love your feedback!