Hadoop Developer & Support Resume
SUMMARY:
- Highly motivated and quality driven technology with over 7.5+ years of experience in data warehousing, and the use of relevant concepts, ETL and other tools, and big data platforms, in dynamic, fast - paced environments.
- Over 3+ years’ experience in working in large scale Hadoop implementation.
- Expertize in Hadoop architecture and various components such as Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Experience in Apache Spark integration (Spark SQL, Spark Streaming).
- Having hands on experience in Apache Camel with Kafka(Producer) and Spark Streaming with Kafka(Consumer).
- Worked in numerous ingestion projects to ingest the data from various sources to HDFS using Flume/Sqoop.
- Developed frameworks to ingest the data from Cassandra DB to HDFS.
- Experience in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
- Experience in managing Hadoop clusters and services using Cloudera Manager.
- Excellent Experience in Designing, Developing, Documenting, Testing of ETL jobs and mappings in Server and Parallel jobs using various ETL tools to populate tables in Data Warehouse and Data marts.
- Good Knowledge in Teradata, Netezza and Data warehousing modeling including Star Schema and Snowflake schema .
- Experience in working with Slowly Changing Dimensions and setting up Changing Data Capture (CDC) mechanism.
- Experience in using Splunk for logging.
- Worked on both SDLC methodology Waterfall and Agile (Scrum approach) and have clear understanding of all phases of Software Development Life Cycle
- Worked closely with Client manager’s/Business Analysts of the bank to drive technical solutions, design and provide development estimates for schedule and effort
- Dynamic, innovative, self-starter, enthusiastic ability to work in-groups as well as independently with initiative to learn new technologies/tool quickly and emphasis on delivering quality services
- Good experience in working with teams in big implementations. 7 years of working experience in onshore/Offshore model.
TECHNICAL SKILLS:
Software Tools and Applications: Hadoop, HDFS, Hive, Sqoop, Oozie Autosys, Aginity workbench, Splunk, JIRA
Specializations: Hadoop, Spark, Python, Netezza, Unix Shell Scripting, Java, Teradata, Data warehousing concepts
Technical Platforms & Databases: Windows, Hadoop, Netezza, Unix, Teradata
PROFESSIONAL EXPERIENCE:
Confidential - Newark, DE
Hadoop Developer & Support
Responsibilities:
- Created file to Hadoop frameworks to ingest the data from 20 different sources into Hadoop.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Designed and implemented the Spark Dataframes to read the data from HDFS.
- Created tables in Hive and wrote Hive queries using Spark HiveContext.
- Worked on Oozie workflow engine to run multiple hive jobs and on schedulers.
- Involved in debugging and troubleshooting issue in development and test environment.
Environment: Scala 2.11, Java 8, Cloudera Hadoop Distribution(CDH5.6), Hive, Apache spark 1.6.0, HDFS
Confidential
Hadoop Developer & Support
Responsibilities:
- Load and transform large sets of structured, semi structured and unstructured data into HDFS.
- Worked extensively in File to Hadoop utility and implemented schema extraction for Parquet and Avro file Formats in Hive
- Involved in creating Hive tables, and loading and analyzing data using hive queries and Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Developed Hive queries to process the data and generate the data cubes for visualization.
- Experienced in performance tuning of hive queries for correct level of Parallelism and memory tuning.
- Done various compressions and file formats like parquet, snappy, Avro & text.
- Involved in Unit Testing, UAT and Performance Testing.
Environment: Cloudera Hadoop Distribution(CDH5.6), Hive
Confidential
Scala/Spark/Java/Kafka developer & Support
Responsibilities:
- Analyzed the volume of the existing batch process and designed the Kafka Topic and partition.
- Worked on Producer API and created a custom partitioner to publish the data to the Kafka Topic.
- Worked on POC for streaming data using Kafka and spark streaming.
- Implemented Kafka Customer with Spark-streaming and Spark SQL using Scala.
- Validated the Dstream and created generated new Dstream and saved the data in HDFS.
- Used Broadcast variables to store the metadata of the event.
- Involved in Unit Testing, UAT and Performance Testing.
Environment: Cloudera Hadoop Distribution(CDH5.6), Apache Kafka 0.9, Hive, HDFS, Java 8, Scala 2.11, Spark Core 1.6.0, Spark Streaming 1.6.0, Apache Camel 2.16.xOne Hadoop
Confidential
Hadoop Developer & Support
Responsibilities:
- Load and transform large sets of structured, semi structured and unstructured data into HDFS.
- Worked extensively with Sqoop for importing metadata from Teradata and implemented schema extraction for Parquet and Avro file Formats in Hive
- Involved in creating Hive tables, and loading and analyzing data using hive queries and also Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data and publishing dashboards based on client requirements.
Confidential
Hadoop ETL Developer & Support
Responsibilities:
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and HIVE.
- Used Sqoop tools to import and export the data from RDMS to HDFS/HIVE tables and vice versa.
- Review all the development queries and performed optimization and query performance tuning using various techniques for Netezza
- Coordinate production release and provide implementation support.
- Tasked with resolving production issues and supporting upgrades for existing applications.
Confidential
ETL Developer & Support
Responsibilities:
- Understanding the load process of all tables in existing DB2 process and export the data DB2 export utilities and load the data into Teradata staging & perm tables using specific load operator based on the volume.
- Proficient in developing strategies for Extraction, Transformation and Loading (ETL) mechanism.
- Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, Remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML.
- Expert in working with Data Stage Manager, Designer, Administrator, and Director.
- Proven track record in troubleshooting of Data Stage jobs and addressing production issues like performance tuning and enhancement.
- Expertise in UNIX shell scripts using bash-shell for the automation of processes and scheduling the Data Stage jobs using wrappers
- Coordinate production release and provide implementation support. Support production readiness activities, and eventually oversee continuous monitoring and support for production implemented code
Confidential
Developer & Support
Responsibilities:
- Converting the business requirement into technical design, code development, unit testing
- Worked extensively on the Netezza framework on Linux platform and contributed to building the customized ELT framework using Shell scripting
- Used NZSQL and NZLOAD scripts for day to day loading and migration activities.
- Migrated the existing Teradata Scripts to Netezza from BTEQ to NZSQL by keeping the business logic same and validating the results across the systems
- Coordinate production release and provide implementation support. Support production readiness activities, and eventually oversee continuous monitoring and support for production implemented code
- Tasked with resolving production issues and supporting upgrades for existing applications.
