We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

Renton, WA

PROFESSIONAL SUMMARY:

  • Over 7 years of Total IT professional experience in Big Data and Data warehousing (ETL/ELT) technologies includes requirements gathering, data analysis, design, development, system integration testing, deployments and documentation.
  • Hands on experience in solutions for Big data using Hadoop, HDFS, Map Reduce, Spark, PIG, Hive, Kafka, Sqoop, Zoo keeper, Flume, Oozie.
  • Excellent knowledge and hands on experience of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm and monitoring systems.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components and management.
  • Experience in importing and exporting data using Sqoop from HDFS/Hive to Relational Database Systems and vice - versa.
  • Experienced and well versed in writing and using UDFs in both Hive and PIG using Java.
  • Excellent understanding with different storage concepts like block storage, object storage, column storage, compression storage.
  • Extensive experience in Extraction, Transformation <<>> Loading (ETL and ELT) data from various sources into Data Warehouses and Data marts with industry best practices.
  • Experience with Informatica ETL for data movement, applying data transformations and data loads.
  • Good working experience with different Relational DB systems.
  • Very good understanding with implementations in building data warehousing and data marts with OLTP vs OLAP, star vs snow flake schema, normalization vs de-normalization methods.
  • Hands on experience in building wrapper shell scripts and analysis shell commands in practice.
  • Supported various reporting teams and experience with data visualization tool Tableau.
  • Very good at SQL, data analysis, unit testing, debugging data quality issues.
  • Excellent communication, creative, technically competent, problem solving and leadership skills.
  • Focus on customer satisfaction and drive results by being team player and individual contributor with good collaboration skills as well.

TECHNICAL ENVIRONMENT/SKILLS:

Languages: C, C++, Python, Java

BigData Tools: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala, Spark, Sqoop, Oozie, Zookeeper, Kafka, Cassandra, Cloudera CDH4, CDH5, HiveQL, PigLatin

ETL/ELT Tools: Informatica PowerCenter, Informatica PowerExchange, TeraData TPT/BTEQ/Fast Load/Multiload.

RDBMS: Oracle, MySQL, SQL Server, DB2, Teradata

No SQL: HBase, Cassandra

Scripting & Query Languages: Shell Scripting, SQL

Environment: Unix, Linux, Windows, MAC, Solaris

Metadata Tools: Informatica Metadata Manager

Scheduling Tools: Control-M, Informatica Scheduler, Autosys 

PROFESSIONAL EXPERIENCE: 

Confidential, Renton, WA

Hadoop/Spark Developer

Responsibilities:

  • Involved in building scalable distributed data lake system for Confidential real time and batch analytical needs.
  • Experience in job management using Fair Scheduling and Developed job processing scripts using Oozie workflow.
  • Configured, deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Scoop.
  • Experienced in Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Loaded the data into Spark RDD and do in memory data computation to generate the output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capacities of Spark using Scala.
  • Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.
  • Worked on migrating legacy Map Reduce programs into Spark transformations using Spark and Scala.
  • Worked on a POC to compare processing time for Impala with Apache Hive for batch applications to implement the former in project.
  • Worked extensively with Sqoop for importing metadata from Oracle.

Environment: Hadoop, Spark-core, Spark-Streaming, Spark-SQL, Scala, Python, Java, Kafka, Hive, Sqoop, Elastic Search, Impala, Cassandra, Tableau, Talend, Cloudera, Oracle 10g, Linux.

Confidential, Anna Arbor, MI

Hadoop Developer

Responsibilities:
  • Observed the Set up and monitoring of a scalable distributed system based on HDFS for better idea and worked closely with the team to understand the business requirement and add new support features.
  • Gathered business requirement to determine the feasibility and to convert them to technical tasks in the design document.
  • Installed and configured Hadoop MapReduce jobs, HDFS and developed multiple MapReduce jobs in java and used different UDF's for data cleaning and processing.
  • Involved in loading data from Linux file system to HDFS.
  • Used Pig Latin and Pig scripts to process data.
  • Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using Sqoop.
  • Extracted the data from various SQL servers into HDFS using Sqoop. Developed custom MapReduce code, generated Jar files for user defined functions and integrated it with Hive to extend the accessibility of statistical procedures within the entire analysis team.
  • Implemented Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table for more efficient data.
  • Used Hive queries for aggregating the data and mining information sorted by volume and grouped by vendor and product.
  • Performed statistical data analysis routines using Java API's to analyze data using.

Environment: Hadoop (HDFS/MapReduce), Pig, Hive, Sqoop, SQL, Linux, Statistical analysis.

Confidential

Hadoop Developer / ETL Informatica Developer

Responsibilities:
  • Involved in review of functional and non-functional requirements.
  • Implemented data integration solutions with traditional ETL/ELT tools and Big Data techonologies.
  • Developed ETL mappings to Extract Data from OLTP Systems/Files applying Technical and Business Transformations loading data into Oracle Datamarts and Enterprise Data warehouse Systems.
  • Extensive Data analysis, profiling and testing activities to deliver Quality Data Product(s) for Business.
  • Developed reusable transformations and mapplets for multiple mappings to automate and simplify development of multiple mappings.
  • Involved in creating Hive tables, loading data and running hive queries in those data.
  • Imported data using Sqoop to load data from Oracle to HDFS on regular basis.
  • Written Hive queries for data analysis to meet the business requirements.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Implemented helper classes that access HBase directly from java using Java API to perform CRUD operations.
  • Developed scripts and batch jobs to schedule various Hadoop Programs.
  • Handled different time series data using HBase to perform store data and perform analytics based on time to improve queries retrieval time.
  • Launching and Setup of HADOOP/HBASE Cluster which includes configuring different components of Cluster.

Environment: ETL Informatica 9.6.1, Oracle, Teradata, Hadoop, HDFS, Hive, HBase, Linux, Toad.

Confidential

ETL Informatica Developer

Responsibilities:
  • Understanding the Business requirements based on Functional specification to design the ETL methodology in technical specifications.
  • Developed data conversion/quality/cleansing rules and executed data cleansing activities like data.
  • Consolidation, Standardization, matching Trillium for the unstructured flat file data.
  • Responsible for developing, support and maintenance for the ETL process using Power Center.
  • Experience in integration of heterogeneous data sources like Oracle, DB2, SQL Server and Flat Files into Staging Area.
  • Wrote SQL-Overrides and used filter conditions in sources qualifier thereby improving the performance of mapping.
  • Designed and developed mappings using Source Qualifier, Expression, Lookup, Router, Aggregator, Filter, Sequence Generator, Stored Procedure, Update Strategy, joiner and Rank transformations.
  • Managed the Metadata associated with the ETL processes used to populate the Data Warehouse.
  • Copied/Exported/Imported the mappings/sessions/worklets/workflows from development to Test Repository and promoted to Production.
  • Worked with Static, Dynamic and Persistent Cache in lookup transformation for better throughput of Sessions.

Environment: Informatica PowerCenter 8.1/8.6.1, PowerExchange 8.1/8.6.1, Oracle 10g, UNIX, Win 7, Flat files, AUTOSYS scheduling tool, Netezza and Unix Shell Script, PlSql Developer, Sql*Loader, Putty, ORA-Excel

Confidential

Jr. Developer

Responsibilities:
  • Gathered business requirement from Business Analyst.
  • Designed and implemented appropriate ETL mappings to extract and transform data from various sources to meet requirements.
  • Worked on loading of data from several flat files to XML Targets.
  • Analyzed business process workflows and assisted in the development of ETL procedures for moving data from source to target systems.
  • Created UNIX shell scripts for Informatica ETL tools to automate sessions.
  • Monitored sessions using the workflow monitor, which were scheduled, running, completed or failed. Debugged mappings for failed sessions.

Environment: Informatica 8.x, Cognos, Oracle 10g, DB2, UNIX, Toad

We'd love your feedback!