Technology Lead - Hadoop Resume Dallas, TX - Hire IT People

SUMMARY:

Over 15 years of experience in Data Warehousing, Analytics and ETL processes in various business domains like retail, manufacturing, insurance and banking domain.
Profound in Apache Hadoop ecosystems: Yarn, Spark, Pig, Hive, Flume, Sqoop, HBase, Zookeeper, Impala, strong understanding of HDFS and MapReduce architecture with Cloudera and Hortonworks.
Strong Data Warehousing ETL experience of using Informatica 9.x/8.x/7.x Power Center tools.
Experience in configuring and using AWS cloud components to push/pull/process data from different cloud storages.
Strong Knowledge on ER modeling and Dimensional Data Modeling Methodologies like Star Schema and Snowflake Schema.

TECHNOLOGIES:

Big Data Ecosystems: MapReduce, HBase, Pig, Hive, Sqoop, Spark, YARN, Storm, Flume, Kafka Ooziee, Zookeeper, EC2, EMR, S3, Kinesis, CloudWatch.

Programming Languages: Java, C, SQL, Scala, Pig Latin, HiveQL, Shell Scripting, Python.

Database and Tools: MySQL, SQLite, Oracle, Teradata, MS SQL, MongoDB, Cassandra, DBeaver, DataStax DevCenter, SQL developer, MySQL Workbench.

ETL Tools: Informatica Power Center 7.x,8.x 9.x, BigData Edition, SSIS, DTS.

Scheduling Tools: Control - M, AutoSys, IBM TWS, Apache Airflow.

Visualization/Reporting: Tableau, Pentaho.

Web Technologies: Spring, Hibernate, JSP, JavaScript, HTML, XML, JSON, Web-Services.

Dev and Build Tools: Maven, Ant, Eclipse, Scala IDE, Jira, BitBucket, GIT, Jenkins, Docker.

Methodologies and Tools: Waterfall, Agile (Scrum and Kanban), MS Project.

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Technology Lead - Hadoop

Responsibilities:

AA to store and join customer centric data like click stream, sales, email campaigns in generating UCIDs and personalization which are consumed by CXP through APIs.
Developed data models for personalization and product recommendations using Storm, Kafka, Hive, Pig. provide insights into percentage of penetration of sales by associates and stores.
Developed scripts for data migration of enterprise data from in-house infra to AWS cloud.

Environment : Hadoop2.6-chd5.13(40 node), AWS cloud, Hive1.2.1, Storm, Cassandra, Solr, CouchDB, EC2, S3, Airflow.

Confidential, Philadelphia. PA

ETL/Hadoop Developer

Responsibilities:

Hadoop and Informatica based ETL and analytical system to have insights about customer’s usage of Confidential products across different product line leading to future enhancements, improvement in business and services.
Developed data pipeline using Spark, Kafka, Hive, Pig and HBase to ingest customer system usage data and financial histories into Hadoop cluster for analysis.
Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for data aggregation and writing data back into S3 through Sqoop.
Extensively used Informatica to create data ingestion jobs into HDFS using complex data file objects such as AVRO and Parquet and to evaluate dynamic mapping capabilities.
Implement Data Quality Rules using Informatica Data Quality (IDQ) to check correctness of the source files and perform the data cleansing/enrichment.
Analyze log records data a day and its aggregated hourly, daily reporting using Tableau.

Environment : Hadoop2.7, Informatica9.x, Hive1.2.1, Spark1.6, Teradata, Oracle, EC2, S3.

Confidential, San Jose, CA

Hadoop Developer

Responsibilities:

Worked with highly unstructured and semi structured data of 100TB+ in size
Developed Pig and Hive scripts to be used by end user / analyst / product manager’s requirements for adhoc analysis.
Used Informatica to validate and test the business logic implemented in the mappings and fix the bugs. Developed reusable Mapplets and Transformations.
Managed External tables in Hive for optimized performance using Sqoop jobs.
Solved performance issues in Hive and Pig scripts with understanding of joins, group and aggregation and how it translates to MapReduce jobs.
Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark on YARN.
Worked with Hadoop-Kerberos security environment which is supported by the Cloudera team.

Environment : 32 Node Hadoop 2.6 cluster, Informatica9.x, HDFS, Flume 1.5, Sqoop 1.4.3, Hive 1.0.1, Spark 1.4, HBase, XML, JSON, Teradata, Oracle, MongoDB, Cassandra.

Confidential

Hadoop Developer

Responsibilities:

Migration of 100+ TBs of data from different databases (i.e. Oracle, SQL Server) to Hadoop.
Wiring code in different applications of Hadoop and Informatica Ecosystem
Extensively involved in performance tuning of the Informatica ETL mappings by using the caches and overriding the SQL queries and also by using Parameter files.
Worked on various file formats Avro, SerDe, Parquet, and Text by using snappy compression.
Used Pig Custom Loaders to load different forms of data files such as XML, JSON and CSV.
Designed dynamic partition mechanism for optimal query performance of system using HIVE to reduce report time generation under SLA requirements.

Environment: Hadoop 2.2, Informatica Power Center 9.x, HDFS, HBase, Flume 1.4, Sqoop 1.4.3, Hive 0.13.1, Avro 1.7.4, Parquet 1.4, XML, JSON, Oracle 11g, Amazon EC2, S3.

Confidential

ETL Developer

Responsibilities:

Developed mappings/sessions to import, transform and load data into respective target tables and flat files using Informatica Power Center for data loading.
Automation of the Informatica ETL jobs for different ETL design pattern.
Extensively used Transformations like Router, Aggregator, Source Qualifier, Joiner, Expression, Aggregator and Sequence generator by using Source Analyzer, Warehouse Designer, Mapping Designer & Mapplet, and Transformation Developer.

Environment: Informatica Power Center 9.x (Repository Manager, Designer, Workflow Manager, and Workflow Monitor), Oracle 11g, SeaQuest, HPDM, SQL Server, Teradata, Toad, Control-M.

Confidential

ETL Developer

Responsibilities:

Extensively used Slowly Changing Dimensions technique for updating dimensional schema.
Processed data using various transformations like Aggregator, Router, Expression, Source Qualifier, Filter, Lookup, Joiner, Sorter, XML Source qualifier and web-consumer for WSDL.
Used Informatica user defined functions to reduce the code dependency.

Environment : Informatica Power Center 8.x, Informatica Power Connect, Power Exchange, Power Analyzer, Toad, Erwin, Oracle 11g/10g, Teradata V2R5, PL/SQL, ODI, Trillium 11.

Confidential

ETL Developer

Responsibilities:

Used SSIS as an Extract Transform Loading (ETL) tool of SQL Server to populate data from various data sources, creating packages for different data loading operations for application.
Extensive use of Transact-SQL, stored procedures, trigger scripts for creating database objects.
Generated various reports using features such as group by, drilldowns, drill through, sub-reports, Parameterized Reports.
Deploying new strategies for checksum calculations, and exception population using mapplets and normalizer transformations.

Environment : SQL Server 2005, T-SQL, SSIS/DTS Designer and Reporting tools, Control-M.

Confidential

Java Developer

Responsibilities:

Developed the web applications using Spring MVC Framework including writing actions/ classes/ forms/ custom tag libraries and JSP pages.
Worked on Integration of Spring and Hibernate Frameworks using Spring ORM Module.
Implemented caching techniques, wrote POJO classes for storing data and DAO's to retrieve the data and did database configurations.

We provide IT Staff Augmentation Services!

Technology Lead - Hadoop Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship