Hadoop Developer & Support Resume

SUMMARY:

Highly motivated and quality driven technology with over 7.5+ years of experience in data warehousing, and the use of relevant concepts, ETL and other tools, and big data platforms, in dynamic, fast - paced environments.
Over 3+ years’ experience in working in large scale Hadoop implementation.
Expertize in Hadoop architecture and various components such as Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
Experience in Apache Spark integration (Spark SQL, Spark Streaming).
Having hands on experience in Apache Camel with Kafka(Producer) and Spark Streaming with Kafka(Consumer).
Worked in numerous ingestion projects to ingest the data from various sources to HDFS using Flume/Sqoop.
Developed frameworks to ingest the data from Cassandra DB to HDFS.
Experience in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
Experience in managing Hadoop clusters and services using Cloudera Manager.
Excellent Experience in Designing, Developing, Documenting, Testing of ETL jobs and mappings in Server and Parallel jobs using various ETL tools to populate tables in Data Warehouse and Data marts.
Good Knowledge in Teradata, Netezza and Data warehousing modeling including Star Schema and Snowflake schema .
Experience in working with Slowly Changing Dimensions and setting up Changing Data Capture (CDC) mechanism.
Experience in using Splunk for logging.
Worked on both SDLC methodology Waterfall and Agile (Scrum approach) and have clear understanding of all phases of Software Development Life Cycle
Worked closely with Client manager’s/Business Analysts of the bank to drive technical solutions, design and provide development estimates for schedule and effort
Dynamic, innovative, self-starter, enthusiastic ability to work in-groups as well as independently with initiative to learn new technologies/tool quickly and emphasis on delivering quality services
Good experience in working with teams in big implementations. 7 years of working experience in onshore/Offshore model.

TECHNICAL SKILLS:

Software Tools and Applications: Hadoop, HDFS, Hive, Sqoop, Oozie Autosys, Aginity workbench, Splunk, JIRA

Specializations: Hadoop, Spark, Python, Netezza, Unix Shell Scripting, Java, Teradata, Data warehousing concepts

Technical Platforms & Databases: Windows, Hadoop, Netezza, Unix, Teradata

PROFESSIONAL EXPERIENCE:

Confidential - Newark, DE

Hadoop Developer & Support

Responsibilities:

Created file to Hadoop frameworks to ingest the data from 20 different sources into Hadoop.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Designed and implemented the Spark Dataframes to read the data from HDFS.
Created tables in Hive and wrote Hive queries using Spark HiveContext.
Worked on Oozie workflow engine to run multiple hive jobs and on schedulers.
Involved in debugging and troubleshooting issue in development and test environment.

Environment: Scala 2.11, Java 8, Cloudera Hadoop Distribution(CDH5.6), Hive, Apache spark 1.6.0, HDFS

Confidential

Hadoop Developer & Support

Responsibilities:

Load and transform large sets of structured, semi structured and unstructured data into HDFS.
Worked extensively in File to Hadoop utility and implemented schema extraction for Parquet and Avro file Formats in Hive
Involved in creating Hive tables, and loading and analyzing data using hive queries and Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Developed Hive queries to process the data and generate the data cubes for visualization.
Experienced in performance tuning of hive queries for correct level of Parallelism and memory tuning.
Done various compressions and file formats like parquet, snappy, Avro & text.
Involved in Unit Testing, UAT and Performance Testing.

Environment: Cloudera Hadoop Distribution(CDH5.6), Hive

Confidential

Scala/Spark/Java/Kafka developer & Support

Responsibilities:

Analyzed the volume of the existing batch process and designed the Kafka Topic and partition.
Worked on Producer API and created a custom partitioner to publish the data to the Kafka Topic.
Worked on POC for streaming data using Kafka and spark streaming.
Implemented Kafka Customer with Spark-streaming and Spark SQL using Scala.
Validated the Dstream and created generated new Dstream and saved the data in HDFS.
Used Broadcast variables to store the metadata of the event.
Involved in Unit Testing, UAT and Performance Testing.

Environment: Cloudera Hadoop Distribution(CDH5.6), Apache Kafka 0.9, Hive, HDFS, Java 8, Scala 2.11, Spark Core 1.6.0, Spark Streaming 1.6.0, Apache Camel 2.16.xOne Hadoop

Confidential

Hadoop Developer & Support

Responsibilities:

Load and transform large sets of structured, semi structured and unstructured data into HDFS.
Worked extensively with Sqoop for importing metadata from Teradata and implemented schema extraction for Parquet and Avro file Formats in Hive
Involved in creating Hive tables, and loading and analyzing data using hive queries and also Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Used Reporting tools like Tableau to connect with Hive for generating daily reports of data and publishing dashboards based on client requirements.

Confidential

Hadoop ETL Developer & Support

Responsibilities:

Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW tables and historical metrics.
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and HIVE.
Used Sqoop tools to import and export the data from RDMS to HDFS/HIVE tables and vice versa.
Review all the development queries and performed optimization and query performance tuning using various techniques for Netezza
Coordinate production release and provide implementation support.
Tasked with resolving production issues and supporting upgrades for existing applications.

Confidential

ETL Developer & Support

Responsibilities:

Understanding the load process of all tables in existing DB2 process and export the data DB2 export utilities and load the data into Teradata staging & perm tables using specific load operator based on the volume.
Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism.
Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, Remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML.
Expert in working with Data Stage Manager, Designer, Administrator, and Director.
Proven track record in troubleshooting of Data Stage jobs and addressing production issues like performance tuning and enhancement.
Expertise in UNIX shell scripts using bash-shell for the automation of processes and scheduling the Data Stage jobs using wrappers
Coordinate production release and provide implementation support. Support production readiness activities, and eventually oversee continuous monitoring and support for production implemented code

Confidential

Developer & Support

Responsibilities:

Converting the business requirement into technical design, code development, unit testing
Worked extensively on the Netezza framework on Linux platform and contributed to building the customized ELT framework using Shell scripting
Used NZSQL and NZLOAD scripts for day to day loading and migration activities.
Migrated the existing Teradata Scripts to Netezza from BTEQ to NZSQL by keeping the business logic same and validating the results across the systems
Coordinate production release and provide implementation support. Support production readiness activities, and eventually oversee continuous monitoring and support for production implemented code
Tasked with resolving production issues and supporting upgrades for existing applications.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship