Etl/hadoop Developer Resume
Philadelphia, Pa
SUMMARY:
- Over 15 years of experience in Data Warehousing, Analytics and ETL processes in various business domains like retail, manufacturing, insurance and banking domain.
- Profound in ApacheHadoopecosystems: Yarn, Spark, Pig, Hive, Flume, Sqoop, HBase, Zookeeper, Impala, strong understanding of HDFS and MapReduce architecture with Cloudera and Hortonworks.
- Strong Data WarehousingETLexperience of using Informatica 9.x/8.x/7.x Power Center tools.
- Experience in using cloud components and connectors to push/pull data from different cloud storages.
- Strong Knowledge on ER modeling and Dimensional Data Modeling Methodologies like Star Schema and Snowflake Schema.
TECHNICAL SKILLS:
Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Pig, Hive, Sqoop, Spark, YARNStorm, Kafka, Zookeeper, Flume, Hue, Ooziee, MRUnit, Impala.
Programming Languages: Java, C, SQL, Scala, Pig Latin, HiveQL, Shell Scripting, Python.
Database and Tools: MySQL, SQLite, Oracle, Teradata, MS SQL, MongoDB, Cassandra,NoSQL, DB Visualizer, SQL developer, MySQL Workbench.
ETL Tools: Informatica Power Center 7.x,8.x 9.x, BigData Edition, SSIS, DTS,.
Scheduling Tools: Control - M, AutoSys, IBM TWS
Visualization/Reporting: Tableau, Kibana, Zeppelin, Pentaho, Talend.
Web Technologies: Spring, Hibernate, JSP, JavaScript, HTML, XML, JSON, Web-Services.
Dev and Build Tools: Maven, Ant, Eclipse, Scala IDE, Jira, BitBucket, SVN, GIT, Telnet, Jenkins.
Methodologies and Tools: Waterfall, Agile (Scrum and Kanban), MS Project.
PROFESSIONAL EXPERIENCE:
Confidential, Philadelphia. PA.
ETL/Hadoop Developer
Responsibilities:
- Hadoop and Informatica based ETL and analytical system to have insights about customer’s usage of Lutron products across different product line leading to future enhancements, improvement in business and services.
- Developed data pipeline usingSpark, Kafka, Hive, Pig and HBase to ingest customer system usage data and financial histories intoHadoopcluster for analysis.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for data aggregation and writing data back into S3 through Sqoop.
- Extensively usedInformatica to create data ingestion jobs into HDFS using complex data file objects such as AVRO and Parquet and to evaluate dynamic mapping capabilities.
- Implement Data Quality Rules using Informatica Data Quality (IDQ) to check correctness of the source files and perform the data cleansing/enrichment.
- Analyze log records data a day and its aggregated hourly, daily reporting using Tableau.
- Environment: Hadoop2.7, Informatica9.x, Hive1.2.1, Spark1.6, Teradata, Oracle, EC2, S3.
Confidential
Hadoop Developer
Responsibilities:
- Worked with highly unstructured and semi structured data of 100TB+ in size
- Developed Pig and Hive scripts to be used by end user / analyst / product manager’s requirements for adhoc analysis.
- UsedInformatica to validate and test the business logic implemented in the mappings and fix the bugs. Developed reusable Mapplets and Transformations.
- Managed External tables in Hive for optimized performanceusing Sqoop jobs.
- Solved performance issues in Hive and Pig scripts with understanding of joins, group and aggregation and how it translates to MapReduce jobs.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark on YARN.
- Worked with Hadoop-Kerberos security environment is supported by the Cloudera team.
- Environment:32 Node Hadoop2.6 cluster, Informatica9.x, HDFS, Flume 1.5, Sqoop 1.4.3, Hive 1.0.1, Spark 1.4, HBase, XML, JSON, Teradata, Oracle, MongoDB, Cassandra.
Confidential
Hadoop Developer
Responsibilities:
- Migration of 100+ TBs of data from different databases (i.e. Oracle, SQL Server) to Hadoop.
- Wiring code in different applications of Hadoop and Informatica Ecosystem
- Extensively involved in performance tuning of theInformaticaETL mappings by using the caches and overriding the SQL queries and also by using Parameter files.
- Worked on various file formats Avro, SerDe, Parquet, and Text by using snappy compression.
- Used Pig Custom Loaders to load different forms of data files such as XML, JSON and CSV.
- Designed dynamic partition mechanism for optimal query performance of system using HIVE to reduce report time generation under SLA requirements.
- Environment: Hadoop 2.2, Informatica Power Center 9.x, HDFS, HBase, Flume 1.4, Sqoop 1.4.3, Hive 0.13.1, Avro 1.7.4, Parquet 1.4, XML, JSON, Oracle 11g, Amazon EC2, S3.
Confidential
ETL Developer
Responsibilities:
- Developed mappings/sessions to import, transform and load data into respective target tables and flat files using Informatica Power Center for data loading.
- Automation of theInformaticaETL jobs for different ETL design pattern.
- Extensively used Transformations like Router, Aggregator, Source Qualifier, Joiner, Expression, Aggregator and Sequence generator by using Source Analyzer, Warehouse Designer, Mapping Designer & Mapplet, and Transformation Developer.
- Environment:InformaticaPower Center 9.x (Repository Manager, Designer, Workflow Manager, and Workflow Monitor), Oracle 11g, SeaQuest, HPDM, SQL Server, Teradata, Toad, Control-M.
Confidential
ETL Developer
Responsibilities:
- Extensively used Slowly Changing Dimensions technique for updating dimensional schema.
- Processed data using various transformations like Aggregator, Router, Expression, Source Qualifier, Filter, Lookup, Joiner, Sorter, XML Source qualifier and web-consumer for WSDL.
- Used Informatica user defined functions to reduce the code dependency.
- Environment:InformaticaPower Center 8.x,InformaticaPower Connect, Power Exchange, Power Analyzer,Toad, Erwin, Oracle 11g/10g, Teradata V2R5, PL/SQL, ODI, Trillium 11.
Confidential
ETL Developer
Responsibilities:
- Used SSIS as an Extract Transform Loading (ETL) tool ofSQLServerto populate data from various data sources, creating packages for different data loading operations for application.
- Extensive use of Transact-SQL, stored procedures, trigger scripts for creating databaseobjects.
- Generated various reports using features such as group by, drilldowns, drill through, sub-reports, Parameterized Reports.
- Deploying new strategies for checksum calculations, and exception population using mapplets and normalizer transformations.
- Environment: SQL Server 2005, T-SQL,SSIS/DTSDesigner and Reporting tools, Control-M.
Confidential
Java Developer
Responsibilities:
- Developed the web applications using Spring MVC Framework including writing actions/ classes/ forms/ custom tag libraries and JSP pages.
- Worked on Integration ofSpringandHibernateFrameworks usingSpringORM Module.
- Implemented caching techniques, wrote POJO classes for storing data and DAO's to retrieve the data and did database configurations.
Confidential
Java Developer
Responsibilities:
- Implementation of routing and shortest path algorithms along with parsing logic for device discovery using Heart-Beat
- Implementation of Java Native Interface(JNI) API’s for Indus Mote to access devices dynamically through C code.