We provide IT Staff Augmentation Services!

Etl/hadoop Developer Resume

2.00/5 (Submit Your Rating)

Falls Church, VA

SUMMARY

  • Over 10 years of IT experience in data warehousing using ETL/BI tools like Informatica PowerCenter, Informatica Data Quality (IDQ) and Business Objects. Experience in business requirements gathering, defining and capturing metadata for business rules, system analysis, design, development, testing and user training associated with Business Intelligence solutions.
  • Hands on Experience in data warehouse development life cycle using SDLC and Agile Methodologies.
  • Experience in gathering requirements from Business Users, documenting using BUS architecture and implementation of appropriate ETL/BI solutions.
  • Worked on loading and transforming huge sets of structured, semi structured and unstructured data.
  • Experience in dimensional modeling, implementation of STAR, Snowflake schemas and complex ETL processes using Informatica as ETL tool.
  • Designed and developed complex ETL solutions. Adept in converting functional requirements to Technical, Mapping and Design process specification documents.
  • Developed complex ETL mappings for Data Warehouse, ODS and vendor extracts using transformations such as XML, dynamic lookup, Aggregator, SQL, Java and Transaction control transformations.
  • In - Depth noledge and experience in design, development and deployments of Big Data projects using Hadoop / Data Analytics / NoSQL / Distributed Machine Learning frameworks.
  • Developed interfaces using Unix Shell scripts to Schedule sessions using pmcmd command and automate the bulk loads.
  • Extensively worked with Informatica performance tuning involving source level, target level and map level bottlenecks.
  • Experience in Production support of Daily/Weekly ETL jobs
  • Excellent written and oral communication skills, and is experienced in working with senior level managers, business people and developers across multiple disciplines.

TECHNICAL SKILLS

ETL Tools: Informatica Power Center 10/9.5.1/9.1, Informatica Power Mart, Informatica MDM, Informatica Data Quality (IDQ) 9.6/9.1 and Data Profiling.

DB Tools: TOAD 12, SQL* Plus, SQL * Loader.

Scripting Languages: UNIX Shell Script, Perl, JavaScript, Python

Data Modeling Tools: Erwin, Oracle Designer.

Programming Languages: C, C++, Java, Scala, Python, HTML, SQL, PL/SQL, Pig Latin, HiveQL, UNIX, Java Script, Visual Basic 5.0/6.0, Visual Studio, .NET.

Database: Oracle 11g/10g/9i/8i, MS SQL-Server2008/2005, DB2 UDB, Teradata, MS-Access.

Operating Systems: UNIX, Linux, Sun Solaris, HP-UX, Windows NT/2000, Win-XP, MAC-X.

Big Data Technologies: Hadoop, YARN, Map Reduce, Hive, Pig, Sqoop, Flume, Spark, Impala, HBase, Apache Kafka, Zookeeper, Oozie.

AWS Services: EC2, S3, Cloud Watch, SNS, EBS

PROFESSIONAL EXPERIENCE

Confidential, Falls Church, VA

ETL/Hadoop developer

Responsibilities:

  • Involved in analyzing partner data and designing the processes for Extracting, Transforming and Loading (ETL) the partner data.
  • Extensively worked on creating the transformations by applying complex logic for the data received from partners.
  • Analyzed data which need to be loaded into Hadoop and contacted with respective source teams to get the table information and connection details.
  • Involved in loading billions of records from unix file system to HDFS using ETL tool.
  • Worked on data masking by creating a pseudo ID column for each member.
  • Build a new schema from scratch in HDFS by creating separate tables for all the partners.
  • Improved performance by proposing, developing consolidated tables for each partner which contain 3 years of data and implemented compression, partitions on the tables.
  • Created Hive tables to consolidate monthly data coming from each partner every month.
  • Developed complex UNIX scripts to automate ETL Loads and perform various validations on the flat file sources, cleaning up of log directories on the UNIX server.
  • Identified bottlenecks and proposed architectural changes and gave solutions for improving the performance.
  • Performed data profiling on various security columns to identify uniqueness and masked the data in the columns.
  • Involved in performance tuning the SAS queries which were used to extract data from HDFS for studies.
  • Analyzed data by performing Hive queries to study transactional behavior of policies and plans.
  • Good noledge of Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.

Environment: Pentaho, Hadoop, Hive, Shell Scripts (Linux), Python, DB2, SQL, AWS.

Confidential, Detroit, MI

ETL/Hadoop developer

Responsibilities:

  • Analyzed Business requirements, framing the business logic for the ETL process to generate technical data and report requirements.
  • Designed and developed Star Schema and created Fact and Dimension Tables for the Warehouse, Data Marts and Business Intelligence Applications.
  • Created and modified simple and complex Informatica mappings to implement Slowly Changing Dimensions Type2/Hybrid, denormalize hierarchy data to load stage, fact and dimension tables.
  • Used MD5 Hash functionality, Dynamic lookup transformation technique in Informatica, to increase the performance in implementing Change Data Capture (CDC) to target database.
  • Created and Configured Workflows, Worklets and Sessions to transport the data to target warehouse tables using Informatica Workflow Manager.
  • Involved in unit testing of mappings, mapplets also involved in integration testing and user acceptance testing.
  • In the second iteration of the data lake, converted the sqoop scripts and spark applications into Informatica applications.
  • Created ETL’s for different zones for loading the data in the data lake namely the landing zone, discovery zone and publish zone using Informatica BDE mappings and connectors for Hadoop.
  • Troubleshooting key performance issues with respect to data loads. Implemented novel methods of data loading to deal with insert-only mechanism of Hive tables.
  • Created perl scripts and modules to implement the data loads automation.
  • Implemented views within impala to deal with performance related issues in hive.

Environment: Informatica Power Center, Informatica IDQ, Unix/Linux, Hadoop, Hive, HDFS, Sqoop, Oozie, Cloudera, Oracle/SQL & DB2.

Confidential, Detroit MI

ETL Developer

Responsibilities:

  • Worked on the design and development of reusable ETL routines like reusable mappings, mapplets, transformations and worklets for project teams.
  • Used PMCMD command to start, stop and ping server from UNIX and created Shell scripts to automate the process.
  • Worked with various Informatica Transformations like Joiner, Expression, Lookup, Aggregate, Filter, Update Strategy, Stored procedure, Router and Normalizer etc.
  • Worked on production tickets to resolve the issues in a timely manner.
  • Involved in HDFS maintenance and loading of structured and unstructured data.

Environment: Informatica PowerCenter/Powermart 9.6/9.1, Oracle 11g/10g, TOAD 9.x, Shell Scripting, PL/SQL.

Confidential, Boston MA

Sr. ETL Developer

Responsibilities:

  • Interacted with the Business Analyst and DBA’s, for the requirement gathering, business analysis and designing of the data warehouse.
  • Involved in designing the ETL processes using Informatica to load data from Oracle 10g, DB2, flat files, Teradata into the target Oracle 10g database.
  • Designed and developed several simple and complex mappings for data loads and data cleansing. Extensively worked on Informatica designer and Workflow Manager.
  • Technical Lead for Atlas project in which reports are generated so that the business users can report on differences between conversions file and extract file on a column by column basis.
  • Worked on Impact Analysis and Informatica mappings modification necessary to upgrade and enhance of IBEX and Logician systems. Developed and co - authored the ETL design and coding standards at BMC.

Environment: Informatica PowerCenter/Powermart 9.5.1/9.1/8.6/8.1, Oracle 11g/10g, TOAD 9.x, Erwin, Business Objects XI 3.1, OBIEE, Shell Scripting, PL/SQL, and Sun Solaris UNIX, Windows-XP, Teradata

We'd love your feedback!