We provide IT Staff Augmentation Services!

Hadoop Developer & Support Resume

0/5 (Submit Your Rating)

SUMMARY:

  • Highly motivated and quality driven technology with over 7.5+ years of experience in data warehousing, and the use of relevant concepts, ETL and other tools, and big data platforms, in dynamic, fast - paced environments.
  • Over 3+ years’ experience in working in large scale Hadoop implementation.
  • Expertize in Hadoop architecture and various components such as Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Experience in Apache Spark integration (Spark SQL, Spark Streaming).
  • Having hands on experience in Apache Camel with Kafka(Producer) and Spark Streaming with Kafka(Consumer).
  • Worked in numerous ingestion projects to ingest the data from various sources to HDFS using Flume/Sqoop.
  • Developed frameworks to ingest the data from Cassandra DB to HDFS.
  • Experience in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Experience in managing Hadoop clusters and services using Cloudera Manager.
  • Excellent Experience in Designing, Developing, Documenting, Testing of ETL jobs and mappings in Server and Parallel jobs using various ETL tools to populate tables in Data Warehouse and Data marts.
  • Good Knowledge in Teradata, Netezza and Data warehousing modeling including Star Schema and Snowflake schema .
  • Experience in working with Slowly Changing Dimensions and setting up Changing Data Capture (CDC) mechanism.
  • Experience in using Splunk for logging.
  • Worked on both SDLC methodology Waterfall and Agile (Scrum approach) and have clear understanding of all phases of Software Development Life Cycle
  • Worked closely with Client manager’s/Business Analysts of the bank to drive technical solutions, design and provide development estimates for schedule and effort
  • Dynamic, innovative, self-starter, enthusiastic ability to work in-groups as well as independently with initiative to learn new technologies/tool quickly and emphasis on delivering quality services
  • Good experience in working with teams in big implementations. 7 years of working experience in onshore/Offshore model.

TECHNICAL SKILLS:

Software Tools and Applications: Hadoop, HDFS, Hive, Sqoop, Oozie Autosys, Aginity workbench, Splunk, JIRA

Specializations: Hadoop, Spark, Python, Netezza, Unix Shell Scripting, Java, Teradata, Data warehousing concepts

Technical Platforms & Databases: Windows, Hadoop, Netezza, Unix, Teradata

PROFESSIONAL EXPERIENCE:

Confidential - Newark, DE

Hadoop Developer & Support

Responsibilities:

  • Created file to Hadoop frameworks to ingest the data from 20 different sources into Hadoop.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Designed and implemented the Spark Dataframes to read the data from HDFS.
  • Created tables in Hive and wrote Hive queries using Spark HiveContext.
  • Worked on Oozie workflow engine to run multiple hive jobs and on schedulers.
  • Involved in debugging and troubleshooting issue in development and test environment.

Environment: Scala 2.11, Java 8, Cloudera Hadoop Distribution(CDH5.6), Hive, Apache spark 1.6.0, HDFS

Confidential

Hadoop Developer & Support

Responsibilities:

  • Load and transform large sets of structured, semi structured and unstructured data into HDFS.
  • Worked extensively in File to Hadoop utility and implemented schema extraction for Parquet and Avro file Formats in Hive
  • Involved in creating Hive tables, and loading and analyzing data using hive queries and Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Developed Hive queries to process the data and generate the data cubes for visualization.
  • Experienced in performance tuning of hive queries for correct level of Parallelism and memory tuning.
  • Done various compressions and file formats like parquet, snappy, Avro & text.
  • Involved in Unit Testing, UAT and Performance Testing.

Environment: Cloudera Hadoop Distribution(CDH5.6), Hive

Confidential

Scala/Spark/Java/Kafka developer & Support

Responsibilities:

  • Analyzed the volume of the existing batch process and designed the Kafka Topic and partition.
  • Worked on Producer API and created a custom partitioner to publish the data to the Kafka Topic.
  • Worked on POC for streaming data using Kafka and spark streaming.
  • Implemented Kafka Customer with Spark-streaming and Spark SQL using Scala.
  • Validated the Dstream and created generated new Dstream and saved the data in HDFS.
  • Used Broadcast variables to store the metadata of the event.
  • Involved in Unit Testing, UAT and Performance Testing.

Environment: Cloudera Hadoop Distribution(CDH5.6), Apache Kafka 0.9, Hive, HDFS, Java 8, Scala 2.11, Spark Core 1.6.0, Spark Streaming 1.6.0, Apache Camel 2.16.xOne Hadoop

Confidential

Hadoop Developer & Support

Responsibilities:

  • Load and transform large sets of structured, semi structured and unstructured data into HDFS.
  • Worked extensively with Sqoop for importing metadata from Teradata and implemented schema extraction for Parquet and Avro file Formats in Hive
  • Involved in creating Hive tables, and loading and analyzing data using hive queries and also Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data and publishing dashboards based on client requirements.

Confidential

Hadoop ETL Developer & Support

Responsibilities:

  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and HIVE.
  • Used Sqoop tools to import and export the data from RDMS to HDFS/HIVE tables and vice versa.
  • Review all the development queries and performed optimization and query performance tuning using various techniques for Netezza
  • Coordinate production release and provide implementation support.
  • Tasked with resolving production issues and supporting upgrades for existing applications.

Confidential

ETL Developer & Support

Responsibilities:

  • Understanding the load process of all tables in existing DB2 process and export the data DB2 export utilities and load the data into Teradata staging & perm tables using specific load operator based on the volume.
  • Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism.
  • Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, Remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML.
  • Expert in working with Data Stage Manager, Designer, Administrator, and Director.
  • Proven track record in troubleshooting of Data Stage jobs and addressing production issues like performance tuning and enhancement.
  • Expertise in UNIX shell scripts using bash-shell for the automation of processes and scheduling the Data Stage jobs using wrappers
  • Coordinate production release and provide implementation support. Support production readiness activities, and eventually oversee continuous monitoring and support for production implemented code

Confidential

Developer & Support

Responsibilities:

  • Converting the business requirement into technical design, code development, unit testing
  • Worked extensively on the Netezza framework on Linux platform and contributed to building the customized ELT framework using Shell scripting
  • Used NZSQL and NZLOAD scripts for day to day loading and migration activities.
  • Migrated the existing Teradata Scripts to Netezza from BTEQ to NZSQL by keeping the business logic same and validating the results across the systems
  • Coordinate production release and provide implementation support. Support production readiness activities, and eventually oversee continuous monitoring and support for production implemented code
  • Tasked with resolving production issues and supporting upgrades for existing applications.

We'd love your feedback!