We provide IT Staff Augmentation Services!

Technology Lead - Hadoop Resume

Dallas, TX

SUMMARY:

  • Over 15 years of experience in Data Warehousing, Analytics and ETL processes in various business domains like retail, manufacturing, insurance and banking domain.
  • Profound in Apache Hadoop ecosystems: Yarn, Spark, Pig, Hive, Flume, Sqoop, HBase, Zookeeper, Impala, strong understanding of HDFS and MapReduce architecture with Cloudera and Hortonworks.
  • Strong Data Warehousing ETL experience of using Informatica 9.x/8.x/7.x Power Center tools.
  • Experience in configuring and using AWS cloud components to push/pull/process data from different cloud storages.
  • Strong Knowledge on ER modeling and Dimensional Data Modeling Methodologies like Star Schema and Snowflake Schema.

TECHNOLOGIES:

Big Data Ecosystems: MapReduce, HBase, Pig, Hive, Sqoop, Spark, YARN, Storm, Flume, Kafka Ooziee, Zookeeper, EC2, EMR, S3, Kinesis, CloudWatch.

Programming Languages: Java, C, SQL, Scala, Pig Latin, HiveQL, Shell Scripting, Python.

Database and Tools: MySQL, SQLite, Oracle, Teradata, MS SQL, MongoDB, Cassandra, DBeaver, DataStax DevCenter, SQL developer, MySQL Workbench.

ETL Tools: Informatica Power Center 7.x,8.x 9.x, BigData Edition, SSIS, DTS.

Scheduling Tools: Control - M, AutoSys, IBM TWS, Apache Airflow.

Visualization/Reporting: Tableau, Pentaho.

Web Technologies: Spring, Hibernate, JSP, JavaScript, HTML, XML, JSON, Web-Services.

Dev and Build Tools: Maven, Ant, Eclipse, Scala IDE, Jira, BitBucket, GIT, Jenkins, Docker.

Methodologies and Tools: Waterfall, Agile (Scrum and Kanban), MS Project.

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Technology Lead - Hadoop

Responsibilities:

  • AA to store and join customer centric data like click stream, sales, email campaigns in generating UCIDs and personalization which are consumed by CXP through APIs.
  • Developed data models for personalization and product recommendations using Storm, Kafka, Hive, Pig. provide insights into percentage of penetration of sales by associates and stores.
  • Developed scripts for data migration of enterprise data from in-house infra to AWS cloud.

Environment : Hadoop2.6-chd5.13(40 node), AWS cloud, Hive1.2.1, Storm, Cassandra, Solr, CouchDB, EC2, S3, Airflow.

Confidential, Philadelphia. PA

ETL/Hadoop Developer

Responsibilities:

  • Hadoop and Informatica based ETL and analytical system to have insights about customer’s usage of Confidential products across different product line leading to future enhancements, improvement in business and services.
  • Developed data pipeline using Spark, Kafka, Hive, Pig and HBase to ingest customer system usage data and financial histories into Hadoop cluster for analysis.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for data aggregation and writing data back into S3 through Sqoop.
  • Extensively used Informatica to create data ingestion jobs into HDFS using complex data file objects such as AVRO and Parquet and to evaluate dynamic mapping capabilities.
  • Implement Data Quality Rules using Informatica Data Quality (IDQ) to check correctness of the source files and perform the data cleansing/enrichment.
  • Analyze log records data a day and its aggregated hourly, daily reporting using Tableau.

Environment : Hadoop2.7, Informatica9.x, Hive1.2.1, Spark1.6, Teradata, Oracle, EC2, S3.

Confidential, San Jose, CA

Hadoop Developer

Responsibilities:

  • Worked with highly unstructured and semi structured data of 100TB+ in size
  • Developed Pig and Hive scripts to be used by end user / analyst / product manager’s requirements for adhoc analysis.
  • Used Informatica to validate and test the business logic implemented in the mappings and fix the bugs. Developed reusable Mapplets and Transformations.
  • Managed External tables in Hive for optimized performance using Sqoop jobs.
  • Solved performance issues in Hive and Pig scripts with understanding of joins, group and aggregation and how it translates to MapReduce jobs.
  • Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark on YARN.
  • Worked with Hadoop-Kerberos security environment which is supported by the Cloudera team.

Environment : 32 Node Hadoop 2.6 cluster, Informatica9.x, HDFS, Flume 1.5, Sqoop 1.4.3, Hive 1.0.1, Spark 1.4, HBase, XML, JSON, Teradata, Oracle, MongoDB, Cassandra.

Confidential

Hadoop Developer

Responsibilities:

  • Migration of 100+ TBs of data from different databases (i.e. Oracle, SQL Server) to Hadoop.
  • Wiring code in different applications of Hadoop and Informatica Ecosystem
  • Extensively involved in performance tuning of the Informatica ETL mappings by using the caches and overriding the SQL queries and also by using Parameter files.
  • Worked on various file formats Avro, SerDe, Parquet, and Text by using snappy compression.
  • Used Pig Custom Loaders to load different forms of data files such as XML, JSON and CSV.
  • Designed dynamic partition mechanism for optimal query performance of system using HIVE to reduce report time generation under SLA requirements.

Environment: Hadoop 2.2, Informatica Power Center 9.x, HDFS, HBase, Flume 1.4, Sqoop 1.4.3, Hive 0.13.1, Avro 1.7.4, Parquet 1.4, XML, JSON, Oracle 11g, Amazon EC2, S3.

Confidential

ETL Developer

Responsibilities:

  • Developed mappings/sessions to import, transform and load data into respective target tables and flat files using Informatica Power Center for data loading.
  • Automation of the Informatica ETL jobs for different ETL design pattern.
  • Extensively used Transformations like Router, Aggregator, Source Qualifier, Joiner, Expression, Aggregator and Sequence generator by using Source Analyzer, Warehouse Designer, Mapping Designer & Mapplet, and Transformation Developer.

Environment: Informatica Power Center 9.x (Repository Manager, Designer, Workflow Manager, and Workflow Monitor), Oracle 11g, SeaQuest, HPDM, SQL Server, Teradata, Toad, Control-M.

Confidential

ETL Developer

Responsibilities:

  • Extensively used Slowly Changing Dimensions technique for updating dimensional schema.
  • Processed data using various transformations like Aggregator, Router, Expression, Source Qualifier, Filter, Lookup, Joiner, Sorter, XML Source qualifier and web-consumer for WSDL.
  • Used Informatica user defined functions to reduce the code dependency.

Environment : Informatica Power Center 8.x, Informatica Power Connect, Power Exchange, Power Analyzer, Toad, Erwin, Oracle 11g/10g, Teradata V2R5, PL/SQL, ODI, Trillium 11.

Confidential

ETL Developer

Responsibilities:

  • Used SSIS as an Extract Transform Loading (ETL) tool of SQL Server to populate data from various data sources, creating packages for different data loading operations for application.
  • Extensive use of Transact-SQL, stored procedures, trigger scripts for creating database objects.
  • Generated various reports using features such as group by, drilldowns, drill through, sub-reports, Parameterized Reports.
  • Deploying new strategies for checksum calculations, and exception population using mapplets and normalizer transformations.

Environment : SQL Server 2005, T-SQL, SSIS/DTS Designer and Reporting tools, Control-M.

Confidential

Java Developer

Responsibilities:

  • Developed the web applications using Spring MVC Framework including writing actions/ classes/ forms/ custom tag libraries and JSP pages.
  • Worked on Integration of Spring and Hibernate Frameworks using Spring ORM Module.
  • Implemented caching techniques, wrote POJO classes for storing data and DAO's to retrieve the data and did database configurations.

Hire Now