Hadoop/spark Engineer Resume
Charlote, NC
SUMMARY:
- As an active member on Q&A communities in the free time and reference links as follows
- Stack over flow.
- Hortonworks Community.
TECHNICAL SKILLS:
Programming Languages: Python, Scala, SQL, Core Java.
Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala, Hue, Sqoop, Spark, Oozie, NiFi, Kafka, Zookeeper, Cloudera, HortonWorks.
Databases: Oracle, MySQL, SQL Server, DB2, Teradata, Cloudant (CouchDB), MongoDB.
Scripting and Query Languages: UNIX, Shell scripting, SQL, PL/SQL and HiveQL.
Operating Systems: Windows, UNIX, Linux distributions (Centos, Ubuntu).
PROFESSIONAL EXPERIENCE:
Confidential, Charlote, NC
Hadoop/Spark Engineer
Responsibilities:
- Designed and created NiFi flows to pull data from RDBMS, NoSql, S3, HTTP Ports, Files, Click Stream sources and migrated all the existing oozie jobs into NiFi.
- Worked on RestApi to pull the data and process them in near real time and store into Nosql databases.
- Worked on handling updates in hive using Transactional tables with merge strategy and other ways.
- Worked on Hbase for real time lookup’s, HBase - Hive tables to access Hbase tables in hive and load HBase tables into spark for batch analysis.
- Tuned hive tables by analyzing the jobs that are running and business use cases to serve faster and better.
- Designed and implemented Pyspark scripts to access HBase, Hive tables, and load Files using JDBC.
- Worked on Hive Interactive and Spark LLAP connector to access Hive transactional tables into Spark.
- Implemented Spark scripts using Python to build complex Hive jobs for better performance and to meet our SLA’s.
- Created monitoring flows to identify Failed, Long running jobs, disk usage, memory usage, files in a directory and send alerts once exceeds threshold.
- Worked on creating oozie workflows using Sqoop, Hive, Shell actions and scheduled using oozie coordinator to run incrementally.
- Used incremental imports, delta imports on tables from Sql Server having no primary keys and importing them into Hive for the transformations, aggregations.
- Exporting the data using Sqoop and NiFi to RDBMS systems.
- Created Partitioning, Bucketing, and Map side Join, Parallel execution for optimizing the hive queries/tables and worked extensively on Avro, ORC formats.
- Designed and implemented workload distribution in NiFi using Remote Processor Groups for more parallelism.
- Used NiFi to schedule, automate and monitor Hive, Spark, and Shell scripts.
- Implemented Spark scripts using Python to migrate the hive jobs to perform better and faster.
- For near Real time analysis worked on Kafka and Spark Streaming.
- Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, LLAP, Oozie, S3, Java, Unix, Spark-Streaming, Kafka, Python, Hbase, Cloudant, MongoDB, Hortonworks, NiFi.
Confidential
SQL and hadoop Developer
Responsibilities:
- Developed SQL Scripts/Stored Procedures to perform different joins, sub queries, nested querying, Insert/Update and Delete data in MS SQL database tables.
- Experience in writing PL/SQL and in developing and implementing Stored Procedures, Packages and Triggers.
- Responsible for the designing the advance SQL queries, procedure, cursor, triggers.
- Build data connection to the database using MS SQL Server.
- Worked on project to extract data from xml file to SQL table and generate data file reporting using SQL Server 2008.
- Worked on installing/configuring Hadoop tools using Cloudera manager.
- Created ingestion scripts using sqoop and oozie to schedule/monitor the jobs.
- Created Hive UDF has to convert different time zones data into GMT in UTC format.
Environment: SQL SERVER, PL/SQL, My SQL, Visual studio 2000/2005, Cloudera, Sqoop, OOzie, Hive.