Hadoop Developer Resume
SUMMARY
- Having 2 years of Experience in IT industry in Designing, Developing and Maintaining Web based Applications using Big Data Technologies like Hadoop and Spark Ecosystems and ETL Technologies.
- Excellent understanding of Hadoop Architecture and Daemons such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker and Map Reduce Concepts.
- Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, HDFS, MapReduce Programming, Hive, Pig, Sqoop, HBase, Impala, Solr, Elastic Search, Oozie, Zoo Keeper, Kafka, Spark, Cassandra with Cloudera and Hortonworks distribution.
- In - depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming.
- Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core .
- Experience in using Spark-SQL with various data sources like JSON, Parquet and Hive.
- Extending HIVE and PIG core functionality by using custom User Defined Function's ( UDF ), User Defined Table-Generating Functions ( UDTF ) and User Defined Aggregating Functions ( UDAF ) for Hive and Pig .
- Worked on version control tools like GIT and C I tools like Jenkins .
- Experienced in Developing Spark programs using Scala and Java API’s ..
- Implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
- Scheduled various ETL process and Hive scripts by developing Oozie workflows.
- Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
- Experience in handling various file formats like AVRO, Sequential, Parquet etc.
- Proficient in Various NoSQL Databases like Cassandra,Hbase etc.
- Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWS and Amazon EC2, Amazon EMR.
- Strong Knowledge in Informatica ETL Tool, Data warehousing and Business intelligence.
- Experience in Developing ETL Data Marts for various business requirements.
- Experience in Developing Applications using Core Java Programming.
- Good level of experience in Core Java, JEE technologies as JDBC, Servlets, and JSP.
- Hands on Experience in writing SQL and PL/SQL queries.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall and performed Testing such as Unit, Regression, Agile, White-box, Black-box.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), HortonworksLanguages: Java, Scala, SQL, HTML, Scala, JavaScript, XML and C/C++
No SQL Databases: Cassandra, and HBase
Methodology: Agile, waterfall
Development / Build Tools: Eclipse, Maven, IntelliJ
DB Languages: MySQL, PL/SQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
ETL Tools: Informatica
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Developer
Responsibilities:
- Implemented Data ingestion Module as per the business requirements.
- Developed java programs to move the data from LandingZone to S3 input buckets.
- Involved in the Design for Data Ingestion and Parameter Generation.
- Worked on the Architecture of the entire project.
- Developed Shell scripts to implement business conditions for generating parameters for oozie workflow.
- Developed Map-reduce programs to validate the records in every source file.
- Developed Java program to process the source files and check the record counts and compare the counts with trigger file and load the processed files into S3 processed bucket.
- Created Hive Tables on s3 processed bucket and Developed Hive scripts to implement business analytics user requirements.
- Developed automatic generation of parameters for triggering oozie workflow.
- Developed a shell script for automatic triggering of o0zie wokflow.
- Developed Ftp script to pull source files from MainFrame systems.
- Developed Oozie Coordinators to schedule Data Ingestion and Patameter Generation workflows.
Confidential
Hadoop / Informatica Developer
Responsibilities:
- Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Cassandra, Oozie, Sqoop, Spark, with Cloudera distribution.
- Developed Pig scripts to help perform analytics on JSON and XML data.
- Created Hive tables (external, internal) with static and dynamic partitions and performed bucketing on the tables to provide efficiency.
- Used Hive QL to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Performed data transformations by writing MapReduce and Pig scripts as per business requirements.
- Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
- Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Casandra tables for quick searching, sorting and grouping.
- Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
- Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
- Work experience with cloud infrastructure like Amazon Web Services (AWS)..
- Developed Spark applications using Scala and Spark-SQL for faster processing and testing.
- Developed customized UDF's in java to extend Hive and Pig functionality.
- Imported data from RDBMS systems like MySQL into HDFS using Sqoop.
- Developed Sqoop jobs to perform incremental imports into Hive tables.
- Implemented map-reduce counters to gather metrics of good records and bad records.
- Involved in loading and transforming of large sets of structured and semi structured data.
- Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs (GZIP, SNAPPY, LZO).
- Experience in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3.
- Well versed with the Database and Data Warehouse concepts like OLTP, OLAP, Star and Snow Flake Schema.
- Developed Pig and Hive UDF's to implement business logic for processing the data as per requirements.
- Developed Oozie Workflows to schedule pig, Sqoop and hive jobs to create data pipelines.
- Developed mappings using Informatica to load data from sources such as Relational tables, Flat files, Oracle tables into the target Oracle Data warehouse.
- Used Informatica 9.1 to extract data from flat files/relational and load it into relational target (SQL SERVER 2008)/ FLAT FILES .
- Implemented Data Staging layer throughout the enterprise.
- Developed sessions using Power Center , Workflow Manager for loading the data into target database.
- Developed stored procedures and implemented them in the Maps using the Stored Procedure Transformation.
- Extensively worked on Informatica Power Center Transformations such as Source Qualifier, Lookup, Filter, Expression, Router, Normalizer, Joiner, Update Strategy, Rank, Aggregator, Stored Procedure, Sorter, and Sequence Generator.
- Extensively used SQL and PL/SQL to write Stored Procedures, Functions, Packages and Triggers .
- Highly adept at creation, execution, testing and debugging of Informatica mappings, mapplets, sessions, tasks, worklets and workflows in UNIX and Windows environment.
- Strong in Data warehousing concepts, dimensional Star Schema and Snowflakes Schema methodologies.
- Developed mappings/Transformation/mapplets by using mapping designer, transformation developer and mapplet designer in Informatica Power Center.
- Experience in Extracting data from legacy systems and transforming as per Business requirements and loading into Target Systems.
- Used Incremental Aggregation technique to load data into aggregation tables for improved performance.
- Experienced in Debugging Informatica Mappings by using Debugger and break points.
- Implemented slowly changing dimension in various mappings.
- Optimized mappings, sessions/tasks, source, and target databases as part of the performance tuning.
Environment: Hadoop, Hive, HDFS, Pig, Sqoop, Oozie, Spark, Cassandra, Cloudera Distribution, Java, MySQL, AWS, Informatica PowerCenter 9.1, Oracle 10g, SQL, PL/SQL, TOAD, SQL*Plus, MySQL, Salesforce Windows, Unix