We provide IT Staff Augmentation Services!

Hadoop/spark Engineer Resume

3.00/5 (Submit Your Rating)

Charlote, NC

SUMMARY:

  • As an active member on Q&A communities in the free time and reference links as follows
  • Stack over flow.
  • Hortonworks Community.

TECHNICAL SKILLS:

Programming Languages: Python, Scala, SQL, Core Java.

Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala, Hue, Sqoop, Spark, Oozie, NiFi, Kafka, Zookeeper, Cloudera, HortonWorks.

Databases: Oracle, MySQL, SQL Server, DB2, Teradata, Cloudant (CouchDB), MongoDB.

Scripting and Query Languages: UNIX, Shell scripting, SQL, PL/SQL and HiveQL.

Operating Systems: Windows, UNIX, Linux distributions (Centos, Ubuntu).

PROFESSIONAL EXPERIENCE:

Confidential, Charlote, NC

Hadoop/Spark Engineer

Responsibilities:

  • Designed and created NiFi flows to pull data from RDBMS, NoSql, S3, HTTP Ports, Files, Click Stream sources and migrated all the existing oozie jobs into NiFi.
  • Worked on RestApi to pull the data and process them in near real time and store into Nosql databases.
  • Worked on handling updates in hive using Transactional tables with merge strategy and other ways.
  • Worked on Hbase for real time lookup’s, HBase - Hive tables to access Hbase tables in hive and load HBase tables into spark for batch analysis.
  • Tuned hive tables by analyzing the jobs that are running and business use cases to serve faster and better.
  • Designed and implemented Pyspark scripts to access HBase, Hive tables, and load Files using JDBC.
  • Worked on Hive Interactive and Spark LLAP connector to access Hive transactional tables into Spark.
  • Implemented Spark scripts using Python to build complex Hive jobs for better performance and to meet our SLA’s.
  • Created monitoring flows to identify Failed, Long running jobs, disk usage, memory usage, files in a directory and send alerts once exceeds threshold.
  • Worked on creating oozie workflows using Sqoop, Hive, Shell actions and scheduled using oozie coordinator to run incrementally.
  • Used incremental imports, delta imports on tables from Sql Server having no primary keys and importing them into Hive for the transformations, aggregations.
  • Exporting the data using Sqoop and NiFi to RDBMS systems.
  • Created Partitioning, Bucketing, and Map side Join, Parallel execution for optimizing the hive queries/tables and worked extensively on Avro, ORC formats.
  • Designed and implemented workload distribution in NiFi using Remote Processor Groups for more parallelism.
  • Used NiFi to schedule, automate and monitor Hive, Spark, and Shell scripts.
  • Implemented Spark scripts using Python to migrate the hive jobs to perform better and faster.
  • For near Real time analysis worked on Kafka and Spark Streaming.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, LLAP, Oozie, S3, Java, Unix, Spark-Streaming, Kafka, Python, Hbase, Cloudant, MongoDB, Hortonworks, NiFi.

Confidential

SQL and hadoop Developer

Responsibilities:

  • Developed SQL Scripts/Stored Procedures to perform different joins, sub queries, nested querying, Insert/Update and Delete data in MS SQL database tables.
  • Experience in writing PL/SQL and in developing and implementing Stored Procedures, Packages and Triggers.
  • Responsible for the designing the advance SQL queries, procedure, cursor, triggers.
  • Build data connection to the database using MS SQL Server.
  • Worked on project to extract data from xml file to SQL table and generate data file reporting using SQL Server 2008.
  • Worked on installing/configuring Hadoop tools using Cloudera manager.
  • Created ingestion scripts using sqoop and oozie to schedule/monitor the jobs.
  • Created Hive UDF has to convert different time zones data into GMT in UTC format.

Environment: SQL SERVER, PL/SQL, My SQL, Visual studio 2000/2005, Cloudera, Sqoop, OOzie, Hive.

We'd love your feedback!