We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Minnetonka, MN

PROFESSIONAL SUMMARY:

  • 3+ years of Experience as Hadoop Developer with good knowledge of Hadoop Distributed File System and Eco System components like MapReduce, Spark, Pig, Impala, Hive, HBase, Zookeeper, Kafka, Sqoop, and Oozie.
  • 4+ years of experience in ETL technologies development which includes analysis, design and development, Passionate towards working in Hadoop and Big data Technologies, Big data Processing, Analytics and Visualization
  • Hands - on experience on shell scripting
  • Strong Technical and Functional experience in Software Development Life Cycle (SDLC) which includes requirement gathering, designing, implementing and Expertise in Informatica 9.x/8.x.
  • Strong experience in the Analysis, design, development, testing and Implementation of Business Intelligence solutions using Data Warehouse/Data Mart Design, ETL, OLTP, OLAP, BI, Client/Server applications.
  • Excellent Analytical, Programming and Reporting skills.
  • Good knowledge on Python Collections, Python Scripting and Multi-Threading.
  • Developed Spark scripts by using python IDE's as per the business requirement.
  • Experience in analyzing data using Hive, Spark SQL, Spark Streaming
  • Good understanding of different file formats like JSON, Parquet, Avro, ORC, Sequence, XML etc.
  • Extensive experience in working with Oracle, MS SQL Server, DB2, MySQL RDBMS databases.
  • A strong ability to prepare and present data in a visually appealing and easy to understand manner using Tableau, Excel etc.
  • Experienced in working in SDLC, Agile and Waterfall Methodologies.

TECHNICAL SKILLS:

Tools: Informatica Power Center 9.6.1,8.5.1,8.0, TOAD, SQL Developer

Cloud: AWS Cloud

Big Data: Hadoop, Cloudera, Hive, Impala, Hue, Spark, Pig, Sqoop, Kafka, Hbase, OozieRDBMS

Oracle: 11g and 10g, SQL Server 2005 and 2008

Operating Systems: Windows 2008,2012 server, UNIX, LINUX

Languages: C, C++, Visual Basic, Java, Python, Pyspark, SQL, HiveQL.

Version Control: TFS

Defect Tracking: HP QC (ALM 11.0)

Management: Project Management using Microsoft Project.

Reporting Tools: MS Excel, PowerPivot, Power BI

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential, Minnetonka, MN

Responsibilities:

  • Worked on importing and exporting data from Oracle into HDFS using Sqoop for analysis, visualization and to generate reports.
  • Performing full and incremental imports and created Sqoop jobs.
  • Implemented multiple MapReduce jobs for data processing.
  • Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
  • Scheduled the jobs with Oozie workflow engine.
  • Used Zookeeper to provide coordination services to the cluster.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Created internal table, Externals tables in Hive, and merged the data sets using Hive joins. Involved in integration of Hive and Hbase.
  • Designed and Developed Hive managed/external tables using Struct, Maps and Arrays using various storage formats.
  • Implemented various performance techniques ( Partitioning , Bucketing ) in hive to get better performance.
  • Worked with different Hadoop file formats like Parquet and compression techniques like gzip & Snappy.
  • Built real time pipeline for streaming data using Kafka and Spark Streaming.
  • Implemented python script to perform transformations and loaded the data into Hive.
  • Worked on the Spark SQL for analyzing the data .
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Developed Pyspark code to read data from Hive, group the fields and generate XML files.
  • Implemented Spark using Pyspark and SparkSQL for faster testing and processing of data.
  • Involved in HDFS maintenance and loading of structured and unstructured data.

Environment: MapReduce, Hive, HDFS, Python, PIG, Sqoop, Spark, Kafka, Oozie, Cloudera, Oracle,Hbase, Linux.

Hadoop Developer

Confidential, New York, NY

Responsibilities:

  • Extracted data from Teradata to HDFS using Sqoop.
  • Created a Hive aggregator to update the Hive table after running the data profiling job.
  • Analyzed large data sets by running hive queries.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Involved in submitting and tracking MapReduce jobs using job tracker.
  • Analyzed the data by performing hive queries.
  • Implemented Partitioning, Dynamic Partitioning and Bucketing in Hive.
  • Further used the Pig scripts to do transform, join and load the optimized tables into the Hive data mart also created PIG UDF’s.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Built reusable Hive UDF libraries for business requirements, which enabled users to use these UDF's in Hive Querying.
  • Written Hive UDF to sort Structure fields and return complex data type.
  • Modeled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
  • Exported the patterns analyzed back to Teradata using Sqoop.
  • Implemented a script to transmit sys print information from Oracle to HBase using Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
  • Involved in loading data from local file system (Linux) to HDFS.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Spark, RDS, Kafka, LINUX, Cloudera, Big Data, Python, SQL, NoSQL, Cassandra, Tableau, HBase.

Informatica Developer

Confidential, Monroe, Louisiana

Responsibilities:

  • Prepared ETL process flow documents based on the present process flow and business functionalities.
  • Created mappings with heterogeneous sources like flat tiles, oracle databases and created targets in oracle using Informatica Mapping Designer.
  • Developed Mappings/Workflows/Scheduling ETL process.
  • Frequently using import and export utility to migrate session from developer’s folder to subject folder.
  • In volved in massive data profiling using IDQ prior to data staging.
  • Developed Re-usable piece of code like Mapplet to use across various modules.
  • Used Power Exchange interface to extract the legacy data.
  • Involved in design changes specific to releases.
  • Designed mapping templates to specify high-level approach.
  • Extensively worked with Informatica components like Source Analyzer, Warehouse Designer, Transformation developer, Mapplet Designer, Mapping Designer, Repository manager, Workflow Manager, Workflow Monitor, Repository server and Informatica server to load data from flat files, SQL Server.
  • Used Informatica IDQ to do data profiling of the source and check for the accuracy of data using dashboard.
  • Built the Logical Data Objects (LDO) and developed various mapping, mapplets/rules using the Informatica Data Quality (IDQ) based on requirements to profile, validate and cleanse the Data.
  • Designed the mappings between sources (files and databases) to operational staging targets.
  • Used Aggregator, sequence, look up, expression, filter, Joiner, Rank, Router, Sequence generator, Update Strategy transformations in populating the data process.
  • Designed and developed the Informatica workflows/sessions to extract, transform and load the data into oracle Server.
  • Worked with different Informatica tuning issues and fine-tuned the transformations to make them more efficient in terms of performance.

Environment: Informatica Power Center 9, Oracle10g, SQL*Loader,Toad

ETL Developer

Confidential, Cleveland, Ohio

Responsibilities:

  • Designed, Developed and Supported Extraction, Transformation and Load Process (ETL) for data migration with Informatica power center.
  • Involved in extracting the data from the Flat Files and Relational databases into staging area.
  • Developed mappings/sessions using Informatica Power Center for initial loading and incremental loading.
  • Developed Informatica Mappings by usage of Aggregator, SQL overrides usage in Lookups, source filter usage in Source qualifiers, and data flow management into multiple targets using Router.
  • Created mappings, which involved Slowly Changing Dimensions.
  • Created procedures for calculation of Loan Total Repayments, Bonus Rewards and used in the stored procedure Transformation.
  • Developed mappings/sessions using Informatica Power Center for initial loading and incremental loading.
  • Developed Informatica Mappings by usage of Aggregator, SQL overrides usage in Lookups, source filter usage in Source qualifiers, and data flow management into multiple targets using Router.
  • Performed Informatica administration activities: created user ids, managed group privileges, controlled user access to certain software features, project folders and database connections.
  • Worked with the middleware team on transitions of Informatica administration responsibilities for non-development environments, managed application code releases between support and middleware teams, and consolidated code changes between support and project teams from parallel development efforts.

Environment: Informatica Power Center, SQL Server, Sybase, Oracle 10g, DB2, TOAD 8.6, UNIX AIX 5.1, Windows XP, Erwin.

Java Developer

Confidential

Responsibilities:

  • Involved in the development of use case documentation, requirement analysis, and project documentation.
  • Developed and maintained Web applications as defined by the Project Lead.
  • Developed GUI using JSP, JavaScript, and CSS.
  • Used MS Visio for creating business process diagrams.
  • Developed Action Servlet, Action Form, Java Bean classes for implementing business logic for the struts Framework.
  • Developed Servlets and JSP based on MVC pattern using struts Action framework.
  • Developed all the tiers of the J2EE application. Developed data objects to communicate with the database using JDBC in the database tier, implemented business logic using EJBs in the middle tier, developed Java Beans and helper classes to communicate with the presentation tier which consists of JSPs and Servlets.
  • Used AJAX for Client side validations.
  • Applied annotations for dependency injection and transforming POJO/POJI to EJBs.
  • Developed persistence layer modules using EJB Java Persistence API (JPA) annotations and Entity manager.
  • Involved in creating EJBs that handle business logic and persistence of data.
  • Developed Action and Form Bean classes to retrieve data and process server side validations.
  • Involved in impact analysis of Change requests and Bug fixes.

Environment: Java 5, Struts, PL/SQL, Oracle, EJB, IntelliJ, Tortoise SVN, MS Visio, Firebug, Apache Tomcat, JSP, Java Script, CSS

We'd love your feedback!