We provide IT Staff Augmentation Services!

Etl/hadoop Developer Resume

4.00/5 (Submit Your Rating)

Philadelphia, Pa

SUMMARY:

  • Over 15 years of experience in Data Warehousing, Analytics and ETL processes in various business domains like retail, manufacturing, insurance and banking domain.
  • Profound in ApacheHadoopecosystems: Yarn, Spark, Pig, Hive, Flume, Sqoop, HBase, Zookeeper, Impala, strong understanding of HDFS and MapReduce architecture with Cloudera and Hortonworks.
  • Strong Data WarehousingETLexperience of using Informatica 9.x/8.x/7.x Power Center tools.
  • Experience in using cloud components and connectors to push/pull data from different cloud storages.
  • Strong Knowledge on ER modeling and Dimensional Data Modeling Methodologies like Star Schema and Snowflake Schema.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Pig, Hive, Sqoop, Spark, YARNStorm, Kafka, Zookeeper, Flume, Hue, Ooziee, MRUnit, Impala.

Programming Languages: Java, C, SQL, Scala, Pig Latin, HiveQL, Shell Scripting, Python.

Database and Tools: MySQL, SQLite, Oracle, Teradata, MS SQL, MongoDB, Cassandra,NoSQL, DB Visualizer, SQL developer, MySQL Workbench.

ETL Tools: Informatica Power Center 7.x,8.x 9.x, BigData Edition, SSIS, DTS,.

Scheduling Tools: Control - M, AutoSys, IBM TWS

Visualization/Reporting: Tableau, Kibana, Zeppelin, Pentaho, Talend.

Web Technologies: Spring, Hibernate, JSP, JavaScript, HTML, XML, JSON, Web-Services.

Dev and Build Tools: Maven, Ant, Eclipse, Scala IDE, Jira, BitBucket, SVN, GIT, Telnet, Jenkins.

Methodologies and Tools: Waterfall, Agile (Scrum and Kanban), MS Project.

PROFESSIONAL EXPERIENCE:

Confidential, Philadelphia. PA.

ETL/Hadoop Developer

Responsibilities:

  • Hadoop and Informatica based ETL and analytical system to have insights about customer’s usage of Lutron products across different product line leading to future enhancements, improvement in business and services.
  • Developed data pipeline usingSpark, Kafka, Hive, Pig and HBase to ingest customer system usage data and financial histories intoHadoopcluster for analysis.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for data aggregation and writing data back into S3 through Sqoop.
  • Extensively usedInformatica to create data ingestion jobs into HDFS using complex data file objects such as AVRO and Parquet and to evaluate dynamic mapping capabilities.
  • Implement Data Quality Rules using Informatica Data Quality (IDQ) to check correctness of the source files and perform the data cleansing/enrichment.
  • Analyze log records data a day and its aggregated hourly, daily reporting using Tableau.
  • Environment: Hadoop2.7, Informatica9.x, Hive1.2.1, Spark1.6, Teradata, Oracle, EC2, S3.

Confidential

Hadoop Developer

Responsibilities:

  • Worked with highly unstructured and semi structured data of 100TB+ in size
  • Developed Pig and Hive scripts to be used by end user / analyst / product manager’s requirements for adhoc analysis.
  • UsedInformatica to validate and test the business logic implemented in the mappings and fix the bugs. Developed reusable Mapplets and Transformations.
  • Managed External tables in Hive for optimized performanceusing Sqoop jobs.
  • Solved performance issues in Hive and Pig scripts with understanding of joins, group and aggregation and how it translates to MapReduce jobs.
  • Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark on YARN.
  • Worked with Hadoop-Kerberos security environment is supported by the Cloudera team.
  • Environment:32 Node Hadoop2.6 cluster, Informatica9.x, HDFS, Flume 1.5, Sqoop 1.4.3, Hive 1.0.1, Spark 1.4, HBase, XML, JSON, Teradata, Oracle, MongoDB, Cassandra.

Confidential

Hadoop Developer

Responsibilities:

  • Migration of 100+ TBs of data from different databases (i.e. Oracle, SQL Server) to Hadoop.
  • Wiring code in different applications of Hadoop and Informatica Ecosystem
  • Extensively involved in performance tuning of theInformaticaETL mappings by using the caches and overriding the SQL queries and also by using Parameter files.
  • Worked on various file formats Avro, SerDe, Parquet, and Text by using snappy compression.
  • Used Pig Custom Loaders to load different forms of data files such as XML, JSON and CSV.
  • Designed dynamic partition mechanism for optimal query performance of system using HIVE to reduce report time generation under SLA requirements.
  • Environment: Hadoop 2.2, Informatica Power Center 9.x, HDFS, HBase, Flume 1.4, Sqoop 1.4.3, Hive 0.13.1, Avro 1.7.4, Parquet 1.4, XML, JSON, Oracle 11g, Amazon EC2, S3.

Confidential

ETL Developer

Responsibilities:

  • Developed mappings/sessions to import, transform and load data into respective target tables and flat files using Informatica Power Center for data loading.
  • Automation of theInformaticaETL jobs for different ETL design pattern.
  • Extensively used Transformations like Router, Aggregator, Source Qualifier, Joiner, Expression, Aggregator and Sequence generator by using Source Analyzer, Warehouse Designer, Mapping Designer & Mapplet, and Transformation Developer.
  • Environment:InformaticaPower Center 9.x (Repository Manager, Designer, Workflow Manager, and Workflow Monitor), Oracle 11g, SeaQuest, HPDM, SQL Server, Teradata, Toad, Control-M.

Confidential

ETL Developer

Responsibilities:

  • Extensively used Slowly Changing Dimensions technique for updating dimensional schema.
  • Processed data using various transformations like Aggregator, Router, Expression, Source Qualifier, Filter, Lookup, Joiner, Sorter, XML Source qualifier and web-consumer for WSDL.
  • Used Informatica user defined functions to reduce the code dependency.
  • Environment:InformaticaPower Center 8.x,InformaticaPower Connect, Power Exchange, Power Analyzer,Toad, Erwin, Oracle 11g/10g, Teradata V2R5, PL/SQL, ODI, Trillium 11.

Confidential

ETL Developer

Responsibilities:

  • Used SSIS as an Extract Transform Loading (ETL) tool ofSQLServerto populate data from various data sources, creating packages for different data loading operations for application.
  • Extensive use of Transact-SQL, stored procedures, trigger scripts for creating databaseobjects.
  • Generated various reports using features such as group by, drilldowns, drill through, sub-reports, Parameterized Reports.
  • Deploying new strategies for checksum calculations, and exception population using mapplets and normalizer transformations.
  • Environment: SQL Server 2005, T-SQL,SSIS/DTSDesigner and Reporting tools, Control-M.

Confidential

Java Developer

Responsibilities:

  • Developed the web applications using Spring MVC Framework including writing actions/ classes/ forms/ custom tag libraries and JSP pages.
  • Worked on Integration ofSpringandHibernateFrameworks usingSpringORM Module.
  • Implemented caching techniques, wrote POJO classes for storing data and DAO's to retrieve the data and did database configurations.

Confidential

Java Developer

Responsibilities:

  • Implementation of routing and shortest path algorithms along with parsing logic for device discovery using Heart-Beat
  • Implementation of Java Native Interface(JNI) API’s for Indus Mote to access devices dynamically through C code.

We'd love your feedback!