We provide IT Staff Augmentation Services!

Sr. Data Warehouse Developer/sme Resume

3.00/5 (Submit Your Rating)

Boston, MA

SUMMARY

  • Over 8 years of IT experience, which includes around 2 years of experience in Big Data, analysis - using Hadoop distributed file system, Mapreduce framework and Hadoop big data ecosystem.
  • Experience in business requirements gathering, definition and capture metadata for business rules, system analysis, design, development, testing and user training associated with Business Intelligence solutions.
  • Expertise in building data lake solutions for enterprises using open source Hadoop and Cloudera distribution technologies.
  • Hands on Experience in data warehouse development life cycle using SDLC and Agile Methodologies.
  • Experience in gathering requirements from Business Users, documentation using BUS architecture and implementation of appropriate BI solutions.
  • Worked on loading and transforming huge sets of structured, semi structured and unstructured data.
  • Worked on different file formats like XML files, Sequence files, JSON, CSV and Map files using Map Reduce Programs.
  • Experience in data processing like collecting, aggregating, moving data from various sources using Apache Flume and Kafka.
  • Strong hands on experience with HDFS and MapReduce framework, YARN, PIG, HIVE, Spark, Flume, HBASE, SQOOP, Oozie, ZooKeeper.
  • In-Depth knowledge and experience in design, development and deployments of Big Data projects using Hadoop / Data Analytics / NoSQL / Distributed Machine Learning frameworks.
  • Experienced in working with in-memory processing framework like Spark Transformations, SparkQL, MLLib and Spark Streaming.
  • Knowledge inHadoop Security. Securing aHadoopCluster with Kerberos.
  • Experience in dimensional modeling, implementation of STAR, Snowflake schemas and complex ETL processes using Informatica as ETL tool.
  • Developed interfaces using Unix Shell scripts to Schedule sessions using pmcmd command and automate the bulk loads.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, YARN, MapReduce, Hive, Pig, Sqoop, Flume, Spark, Impala, Apache Kafka, Zookeeper, Oozie.

Programming Languages: C, C++, Java, Scala, Python, HTML, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Visual Basic 5.0/6.0, Visual Studio, .NET.

NO SQL Databases: Hbase, Cassandra

Database: Oracle 11g/10g/9i/8i, MS SQL-Server2008/2005, DB2 UDB, Teradata, MS-Access.

Operating Systems: UNIX, Sun Solaris, HP-UX, Windows NT/2000, Win-XP, MAC-X.

DB Tools: TOAD 9.x/7.x, SQL* Plus, SQL * Loader, Developer 2000.

Scripting Languages: Unix Shell Script, Perl, JavaScript.

ETL Tools: Informatica PowerCenter 9.5.1/9.1/8.6 , Informatica PowerMart 6.2/5.1, Informatica Power Connect, Informatica Data Quality and Data Profiling.

Data Modeling Tools: Erwin, Oracle Designer.

PROFESSIONAL EXPERIENCE

Confidential, Detroit, MI

Hadoop developer

Responsibilities:

  • Gathered business requirements for building the first implementation of data lake environment by working with the analytics team on the various data elements.
  • Designed and developed hive schemas and tables based on the dashboard and analytics requirements for end users and data scientists.
  • Designed shell scripts using sqoop to export data from RDBMS into the cluster's landing area. Security was handled by manually setting up ACL's at the folder level in Hadoop.
  • Implemented complex transformations, de duplication, denormalization of data from various data sets by building spark applications on the cluster.
  • Expert in writing, configuring and deploying spark applications in the cluster. Used Spark Shell for interactive data analysis for preparing test cases.
  • Knowledge in processing and querying structured data sets in the cluster using Spark SQL based on data scientist and end user input.
  • Implemented Spark Streaming proof of concept to process live twitter feed into the data lake for customer sentiment analysis.
  • In the second iteration of the data lake, converted the sqoop scripts and spark applications into Informatica BDE applications.
  • Created ETL’s for different zones for loading the data in the data lake namely the landing zone, discovery zone and the publish zone using Informatica BDE mappings and connectors for Hadoop.
  • Troubleshooting key performance issues with respect to data loads. Implemented novel methods of data loading to deal with insert-only mechanism of Hive tables.
  • Working knowledge of yarn performance tuning based on the daily analysis of the job loads and their performance.
  • Created perl scripts and modules to implement the data loads automation.
  • This would involve creating modules to facilitate cleanup of Hadoop temporary directories, moving of content within Hadoop, creation of file watcher upon job completion etc.
  • Implemented views within impala to deal with performance related issues in hive.

Environment: Hadoop, Map Reduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, CDH3, Cassandra, HDFS, J2EE, Oracle/SQL & DB2, Unix/Linux, JavaScript, Eclipse IDE.

Confidential, Detroit, MI

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop 2.0 MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig to process the data.
  • Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.
  • Designed and implemented Hive queries and functions for data analysis to meet the Business requirements.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Responsible to manage data coming from different sources.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi - structured..
  • Worked extensively on creating Oozie workflows for scheduling different jobs of hive, map reduce and shell scripts.
  • Supported MapReduce Programs those are running on the cluster.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduceway.

Environment: Hadoop, HDFS, Hive, Sqoop, PIG, MapReduce, Oozie, Cloudera, Flume, HBase, Java (JDK), MySQL and Ubuntu.

Confidential, Boston, MA

Sr. ETL Developer

Responsibilities:

  • Interacted with the Business Analyst and DBA’s, for the requirement gathering, business analysis and designing of the data warehouse.
  • Involved in designing the ETL processes using Informatica to load data from Oracle 10g, DB2, flat files, Teradata into the target Oracle 10g database.
  • Designed and developed several simple and complex mappings for data loads and data cleansing. Extensively worked on Informatica designer and Server Manager.
  • Created reports from Informatica Repository to summarize the Workflow performance like time taken, start time and end time which was used for proper ordering of sessions in the Workflow to fit the Load Window..
  • Created the reports using Business Objects functionality like Combined Queries, Slice and Dice, Drill Down, Functions, Cross Tab, Master Detail and Formulae etc.

Environment: Informatica PowerCenter/Powermart 8.1/7.1.2/5.1.2 , Oracle 10g/9i/8.x, TOAD 9.x, People Soft, Erwin, Business Objects 5.1, Teradata, Hyperion, Shell Scripting, PL/SQL, PVCS, and Sun Solaris UNIX, Windows-XP.

Confidential, Boston, MA

Sr. Data Warehouse Developer/SME

Responsibilities:

  • Technical Lead for Atlas project in which reports are generated so that the business users can report on differences between conversion file and extract file on a column by column basis.
  • Worked on Impact Analysis and Informatica mappings modification necessary to upgrade and enhance of IBEX and Logician systems. Developed and co - authored the ETL design and coding standards at Confidential .
  • Worked with various Informatica Transformations like Joiner, Expression, Lookup, Aggregate, Filter, Update Strategy, Stored procedure, Router and Normalizer etc.
  • Worked with Connected and Unconnected Stored Procedure for pre & post load sessions
  • Designed and Developed pre-session, post-session routines and load execution routines using Informatica Server to run Informatica sessions.
  • Used PMCMD command to start, stop and ping server from UNIX and created Shell scripts to automate the process.
  • Experience working with partitioned loading using PowerCenter for a Teradata target database.
  • Strong hands on experience using Teradata utilities (Fast Load, MultiLoad, FastExport, Tpump).
  • Worked on the design and development of reusable ETL routines like reusable mappings, mapplets, transformations and worklets for project teams.
  • Worked on production tickets to resolve the issues in a timely manner.
  • Involved in unit testing of mappings, mapplets, also involved in integration testing and user acceptance testing.

Environment: Informatica PowerCenter/Powermart 9.5.1/9.1/8.6 /8.1, Oracle 11g/10g, TOAD 9.x, Erwin, Business Objects XI 3.1, OBIEE, Shell Scripting, PL/SQL, and Sun Solaris UNIX, Windows-XP, Teradata

Confidential, Westborough, MA

Sr. ETL/Informatica Developer

Responsibilities:

  • Analyzed Business requirements, framing the business logic for the ETL process to generate technical data and report requirements.
  • Designed the ETL processes using Informatica as ETL tool to load data from Oracle 11g, sql server into the target Oracle 11g database.
  • Created Standard technical document with emphasis implementing ETL best practises.
  • Led the ETL team in designing the ETL processes and implement coding/data standards by assisting the developers onsite and overseas for Install Base, Common Dimesnions and Service Request Subject areas.
  • Developed and scheduled Workflows using task developer, worklet designer, workflow designer in Workflow manager and monitored the results in Workflow monitor.

Environment: Informatica PowerCenter/Powermart 9.1/8.6/8.1, Oracle 11g/10g, TOAD 9.x, Erwin, Business Objects 5.1, OBIEE, Shell Scripting, PL/SQL, and Sun Solaris UNIX, Windows-XP.

We'd love your feedback!