We provide IT Staff Augmentation Services!

Senior Consultant (apache Hadoop & Spark) Resume

0/5 (Submit Your Rating)

Brea, CA

SUMMARY

  • Over 3 years of experience on Apache Hadoop ecosystem using various tools like Hive, Sqoop, SparkQL, Spark Streaming, Cloudera CDH, Cloudera Impala.
  • Over 16 years of work experience in the IT industry using vast set of tools and technologies.
  • Working knowledge of AWS concepts like EC2, EMR & S3.
  • Working knowledge of Apache Hadoop distribution from Hortonworks.
  • Experience in working with large data sets.
  • Experience in high speed data processing and storage using HDFS and Spark.
  • Expertise in importing and exporting data using SQOOP from HDFS to various Database Systems (RDBMS, Data Ware Houses, Data lake) and vice - versa.
  • Capable of creating Real time data streaming Solutions and Batch style large scale distributed Computing applications using Apache Spark,Spark Streaming, Kafka and Flume.
  • Experience in installation, configuration, management and deployment of Hadoop Cluster, HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, Oozie and Zookeeper.
  • Experience in NoSql databases like Cassandra and MongoDB for data extraction and storing huge volumes of data.
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe's like REGEX, JSON and Avro.
  • Extensive experience in Data Warehousing and Data Migration tools.
  • Good understanding of Dimensional Modeling.
  • Extensive experience in working on python and UNIX automation scripts.

TECHNICAL SKILLS

OPEN SOURCE: Apache Hadoop CDH 4.X/5.X, Mapreduce 2.0, Spark, Hive, Pig, Kafka,Flume, Sqoop, OozieHADOOP CLUSTER Cloudera CDH 4.X/5.X

DATABASES: Apache Impala 2.x, Cassandra, ORACLE 12.x/11.x,SQL Server, DB2 UDB 9.X

LANGUAGES/UTILITIES: Python 2.7.X/ 3.X, SQL/PLSQL; C; PRO*C; UNIX shell script (Korn, Bash).

ETL: Informatica Powermart/Powercenter version 5.X/6.X/7.X/8.X/9.X, 10.X, SSIS.

METADATA: Informatica Metadata Manager, Business Glossary.

REPORTING TOOL: Business Objects 5.1, 6.X; Microstrategy 7/8/9; Informatica Data Analyzer,Tableau

PROFESSIONAL EXPERIENCE

Confidential, Brea, CA

Senior Consultant (Apache Hadoop & Spark)

Platform: Cloudera CDH 5.X (10 node cluster), Spark, Hive, Sqoop, Impala, Informatica

Responsibilities:

  • Work on project initiatives to migrate a majority of operational data from ERP to HDFS for near real time reporting.
  • Deployment of Apache Hadoop Cluster (Cloudera Distribution) and setup Hive, Oozie, Sqoop, Flume, Spark, Impala.
  • Ingest data from RDBMS (DB2 UDB) and ERP (JDE), into HDFS using Sqoop..
  • Utilize in-memory processing capability of Apache Spark to process data using Spark SQL, Spark Streaming using PySpark scripts.
  • Create PySpark scripts to load data from source files to RDDs, create data frames from RDD and perform transformations and aggregations and collect the output of the process.
  • Work with source data in different formats like XML, JSON, Apache Parquet, Splunk Journal file.
  • Use Tableau as data Visualization tool.
  • Use Oozie workflow engine for hadoop job scheduling.
  • Analyze Hive queries and perfom optimization as needed.
  • Ensure that the Data Warehouse is available for reporting and the SLAs are met.

Confidential, Newport Beach, CA

Tech Lead\ Technical Manager

Platform: HDFS, Python, Sqoop, Spark, Hive, Cloudera cluster CDH 5.X(10 nodes)Informatica Powercenter Advanced Edition, MVS, DB2 UDB 7.1, OracleSQL Server 2013, HP UNIX, Redhat Enterprise Linux 5.X/6.X

Responsibilities:

  • Ingest data from various sources including mainframe, xml, JSON, SPLUNK journal into HDFS.
  • Used Hadoop cluster (10 nodes), to move ETL workload from Oracle & Informatica to Hadoop using Hive scripts and later on using PySpark scripts.
  • Performed POC on Cloudera cluster (6 nodes) to demonstrate the workload offloading capabilites of Hadoop and thereby saving money while increasing the data processing capabilty of hardware infrastructure.
  • Used PySpark script to leverage the in-memory processing capability of Apache Spark for faster data transformation.
  • Lead the ETL team on various data gap fulfillment projects.
  • Serve as SME (subject matter expert) on ETL load strategies.
  • Maintain and Optimize ETL infrastructure.
  • Establish documentation and coding standards.
  • Manage the project to transition a datamart on unsupported hardware to a supported one.
  • Identify needs of team and sponsor.
  • Perform various activities related to Informatica Server Administration (Installing/Upgrading and configuring Informatica Server and Client, backup, restore, migration & Repository Promotion.
  • Perform code review with ETL developers

Confidential, Chicago, IL

Tech Lead

Platform: Informatica Powercenter & Powermart 5.1, Sun Solaris 7.0, Oracle 9, Siebel 6

Responsibilities:

  • Use Informatica to load the data warehouse
  • Perform various activities related to Informatica Server Administration (Installing and configuring Informatica Server and Client, backup, restore, migration & Repository Promotion.
  • Fine-tune the mapping for faster loads
  • Extensively used Unix shell scripts clean the data files
  • Migrate the objects between various repositories.

Confidential, Dallas, TX

Tech Lead

Platform: Informatica Powercenter 5.1.1, Oracle 9i, DB2 UDB 7.1, Sun Solaris, CognosTrillium

Responsibilities:

  • Use ETL tool Informatica to populate the data warehouse
  • Convert PL/SQL procedures to Informatica Mappings.
  • Creat testing and deployment plan.
  • Work with data cleansing tool Trillium.
  • Extensively use mapplets and reusable transformations in the Informatica mapping.
  • Complex mappings involving Target Load Order and Event based loading.
  • Extensively use Joiner, Lookup, Router, Filter, Normalizer transformations.
  • Extensively use PL/SQL scripts and SQL/LOADER scripts
  • Maintain the Informatica repository
  • Use heterogeneous data sources, like flat files, VSAM files, DB2 and Oracle
  • Use Cognos as BI tool.

Confidential

Informatica Tech Lead

Platform: Informatica Powermart 5, DB2 UDB 7.1, Peoplesoft 8, Sun Solaris, Informix 7

Responsibilities:

  • Installation and configuration of Informatica PowerMart 5.0.
  • Creating the repository in DB2 to store the metadata.
  • Setting up the security for creating user groups and assigning privileges.
  • Set up resource and priority assignment for the Developer team.
  • Work closely with the end users and developers to develop the transformation logic to be used in Informatica.
  • Develop complex mappings & mapplets in Informatica to load the data from various sources (Siebel and Informix) using different transformations like Source Qualifier, Expression, Look up (connected and unconnected), Aggregate, Update Strategy, Joiner, Filter and Router.
  • Design and document validation rules, error handling and test strategy of the mappings.
  • Setting up batches and sessions to schedule the loads at the required frequency.
  • Performance tuning of the sessions using Server Manager Statistics and front end query tools.
  • Taking regular backup of the repository and Metadata Reporting.
  • Development and Production Support.
  • End User
  • System Documentation.

Confidential

Tech Lead / Analyst

Platform: Informatica Powermart 4.6, Business Objects 5.1.1, Web Intellignece 2.6ERWIN, IBM AIX, Oracle 8i, Sybase.

Responsibilities:

  • Building universes and universe modification using Business Objects 5.1.1 on Oracle and Sybase as database.
  • Used Informatica Designer to create complex mappings using different transformations to move data to multiple databases
  • Used Informatica Server manager to create sessions, batches to run with the logic embedded in the mappings.
  • Configured user security profile for Informatica and Business Objects.
  • Tuned the mappings to perform better using different logics to provide maximum efficiency and performance
  • Was a lead in the Installation and maintenance of repository on Oracle Database.
  • Resolved loops by creating aliases and contexts in Business Objects universe.
  • Upgraded Business Objects from 5.0 to 5i.

Confidential

Programmer Analyst

Platform: Oracle 8, Windows NT, UNIX (IBM AIX), PL/SQL, PRO*C

Responsibilities:

  • Provide Production Support for the Data warehouse project.
  • Data transformation mappings were build using PRO*C & PL/SQL code blocks. Wrote several ETL functions in PL/SQL for aggregation and summation purpose.
  • Scripts were run through UNIX shell programs in Batch scheduling.
  • Reports performance tuning by optimizing SQL.
  • Resolving problems when data is loading from text files in to the Oracle Database using SQL Loader and PL/SQL.

Confidential

Programmer Analyst

Platform: Oracle 7.2, UNIX, PL/SQL, PRO*C

Responsibilities:

  • Requirement study and Preparing Requirement Specification
  • Database Design according to specs
  • Design Of the Screens
  • Design and Development of Reports using Pro* C
  • Unit Testing
  • Implementation of the system.

We'd love your feedback!