Bigdata Developer Resume
SUMMARY:
- Over 10 + years of experience with emphasis on Big Data technologies, development and design of data ware house enterprise applications.
- Having 4+ years of Experience in IT industry in Designing, Developing and Maintaining Enterprise Applications using Big Data Technologies like Hadoop and Spark Ecosystems and ETL Technologies.
- Excellent understanding of Hadoop Architecture and Daemons such as HDFS, YARN, Name Node, Data Node, Resource Manager, Node Manager and Map Reduce Concepts.
- Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, HDFS, Map Reduce Programming, Hive, Pig, Sqoop, HBase, Impala, Oozie, Zoo Keeper, Spark, Cassandra with Cloudera distribution and AWS.
- Hands on experience in various big data application phases like data ingestion, data analytics and data visualization.
- In - depth understanding of Spark Architecture including Spark Core, Spark SQL, Spark Streaming.
- Implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
- Scheduled various ETL process and Hive scripts by developing Oozie workflows.
- Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
- Experience in handling various file formats like AVRO, Sequential, and Parquet etc.
- Proficient in Various NoSQL Databases like Cassandra, HBase and RDBMS Databases.
- Experience with various scripting languages like Unix shell scripts, Java Scripts and Python.
- Strong working experience in the Data Analysis, Design, Development, Implementation and Testing of Data Warehousing usingData Conversions,Data Extraction,Data TransformationandData Loading(ETL) using Informatica Power Center and Infrmatica power exchange.
- Very good exposure to Oracle, MS SQL Server, IBM DB2, Teradata and Netezza.
- Strong organizational and interpersonal skills, ability to interact with people at all levels.
- Good communication and presentation skills.
- Highly motivated self-starter with strong troubleshooting skills, quick learner, good technical skills and an excellent team player.
TECHNICAL SKILLS:
Operating systems: Windows, UNIX
Data warehousing Tools: Informatica Power center, Data Stage and SSIS, Talend, Podium,SAS Base.
Reporting Tools: Cognos, SSRS, Business Objects and Tableau
Data Modeling: Star-Schema, Snowflake, FACT and dimension tables, Erwin, Visio.
Databases: MS SQL Server, Oracle, Teradata, Netezza, HBase, Cassandra.
Computer Skills: MS Office - Excel, Word, Access, Outlook, PowerPoint, SharePoint.
Languages: SQL, PL/SQL, JAVA, R, R-script, Python, Pig Scrip, Hive/QL.
Framework& Distribution: CDH, HDP-YARN, HDFS, MR, SPARK, Sqoop, PIG,Hive, SAS-EG
PROFESSIONAL EXPERIENCE:
Confidential
Bigdata Developer
Responsibilities:
- Data Ingestion implemented using SQOOP, SPARK, loading data from various RDBMS, CSV, XML files.
- Data cleansing, transformations tasks are handled using SPARK using SCALA and HIVE.
- Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats by applying various ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to HDFS.
- Responsible for design development of Spark SQL Scripts based on Functional Specifications.
- Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
- Involved in converting Hive/SQL queries into Spark RDD using Scala.
- Responsible for Job management using Fair scheduler and Developed Job Processing scripts using Oozie Workflow.
- Responsible for Performance Tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and Memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL and Pair RDD's.
- Responsible in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during Ingestion process itself.
- Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Worked with No SQL databases like HBase. Creating HBase tables to load large sets of semi structured data coming from various sources.
- Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
- Responsible on loading and transforming of large sets of structured, semi structured and unstructured data.
Environment: Hadoop eco system, Scala, Hive, HBase, Java, Impala, Pig, Spark, Oozie, Oracle, Unix, Cloudera, Flume, Sqoop, HDFS, Python, Mainframes, SAS Enterprise guide, SQL server, Toad, Tableau.
Confidential
Big Data Developer
Responsibilities:
- Involved in loading data from Linux file system to HDFS.
- Responsible for building scalable distributed data solutions using Hadoop cluster on Cloudera.
- Importing data using sqoop to load data from DB2 to HDFS on regular basis from various source.
- Experience in using Sqoop to import the data on to Hive tables from different relational databases.
- Importing and exporting data into HDFS from database and vice-versa using Sqoop.
- Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Worked on Informatica Power Center tools- Designer, Repository Manager, Workflow Manager and Workflow Monitor.
- Involved in creating Hive tables, loading with data and writing hive queries to analyze the data.
- Validate the data by using Python.
- Worked with Oozie Workflow manager to schedule Hadoop jobs and high intensive jobs.
- Extensively used Hive/HQL or Hive queries to query data in Hive Tables and Loaded data into HIVE tables.
- Involved in source to target mapping and business rules associated with the ETL processes.
- Developed complex mappings in Informatica Power Center to load the data from various sources using different transformations.
- Worked closely in setting up of different environment and updating configurations.
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
- Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.
- Worked on migrating legacy Map Reduce programs into Spark transformations using Spark and Scala.
- Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Apache Hadoop 2x, Cloudera, HDFS, Map Reduce, Hortonworks, AWS, Hive, Pig, Spark, Scala, Sqoop, Kafka, FLUME, HBase, Cassandra, Python, Tensor flow, Linux, XML, Talend, Informatica Power center 9.6.1.
Confidential
Big Data Developer
Responsibilities:
- Responsible for creating Hive tables based on business requirements.
- Created Sqoop scripts to import and export data into HDFS.
- Created Map Reduce jobs to process the daily transaction data.
- Added pig Scripts for data cleansing and preprocessing the data.
- Created Map Reduce jobs and pig scripts to transform and aggregate data for analysis.
- Implemented workflow management of Sqoop, Hive and pig scripts using Oozie.
- Involved in creating Hive tables, loading with data and writing hive queries.
- Responsible for building scalable distributed data solutions using Hadoop.
- Analyzed large amount of data sets to determine optimal way to aggregate and report on it.
- Developed Map Reduce jobs in pig scripts for data cleansing and preprocessing.
- Developed mapping parameters and variables to support SQL override.
- Creating various Map reduce jobs for performing ETL transformations on the transactional and application specific data sources.
- Analyzed the data by performing Hive queries (HiveQL) and running pig scripts (Pig Latin) to study customer behavior.
- Involved in creating Hive tables, loading with data and writing hive queries.
- Used Informatica as an ETL tool to extract data from source systems to target systems.
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Analyzing the log data using HiveQL to extract number of unique visitors per day, page views, returning visitors.
- Analyzed large amounts of data sets to determine optimal way to aggregate & report on it.
- Used OOZIE operational services for batch processing & scheduling workflows dynamically.
Environment: Informatica Power Center 9.6.1, Cloudera, HDFS, Pig, Hive, Oozie, Sqoop, Oracle 11, Mainframes, Cassandra, Python, Toad, Oracle SQL developer, PL/SQL and SQL Server 2012,XML, Unix Shell Scripts.
Confidential
ETLdeveloper/BI
Responsibilities:
- Involved in designing, developing and documenting of the ETL (Extract, Transformation and Load) strategy to populate the Data Warehouse from various source systems
- Designed and developed complex mappings in Informatica to load the data from various sources such as SQLServer, Flat files, Oracle, XML using different transformations such as, Source qualifier, Look up (connected and unconnected), Expression, Aggregate, Update strategy, Sequence generator, Joiner, Filter, Rank, and Routertransformations.
- Designed and developed mappings, defined workflows and tasks, monitored sessions, exported and importedmappings and workflows, backups, and recovery.
- Developed CDC mappings usingInformatica Power Exchange9.1.0
- Extensively worked with database and Informatica for performance tuning.
- Created and configured workflows, Worklets, and sessions to transport the data to target warehouse SQL Servertables using Informatica Workflow Manager.
- Created Mapping Parameters and Variables.
- Worked on Database level tuning and SQL Query tuning for the Data warehouse.
- Developed PL/SQL stored procedures and UNIX commands shell scripts for pre and post session commands and batch jobs.
- Executed sessions, both sequential and concurrent for efficient execution of mappings and used other tasks like event wait, event raise, email, command and pre/post SQL
- Performed extensive testing on the mappings and wrote queries in SQL to check if the data was loading to the dimension tables and fact tables properly.
Environment: Informatica Power Center 9.5.1,InformaticaPower Exchange9.5, Oracle 11,UNIX, Flat Files, SSIS, Mainframes, PL/SQLDeveloperand TOAD, XML, SQL Server, TFS.
Confidential
Informatica developer
Responsibilities:
- Worked closely with the Architect in extracting the data from sources system and Designed and developed the processes to extract, transform and load data into Oracle Database.
- Extracted data from Heterogeneous source systems like Oracle, Teradata, SQL Server and flat files.
- Due to the new jurisdiction inclusion, large volume of the data and no data ware house worked on the preparation for the dimensional modeling of the warehouse using Erwin
- Used Informatica Power Center Source Analyzer, Target Designer, Mapping Designer, Workflow Manager,Mapplets, and Reusable Transformations.
- Developed Mappings to load data from flat files through various data cleansing and data validation process.
- Used almost all the Transformations like Lookup, Aggregator, Expression, Router, Filter, Update Strategy, Storedprocedure and Sequence Generator
Environment: Informatica Power Center 8.6,Informatica Power Exchange, Oracle 10, Toad, PL/SQL and SQL Server 2008, DB2, SSIS,SSRS, XML, TSQL, UNIX Shell Scripts and Teradata.