We provide IT Staff Augmentation Services!

Talend/hadoop Developer Resume

2.00/5 (Submit Your Rating)

New York, NY

SUMMARY:

  • 8 years of experience with emphasis on Relational Database, Big Data ecosystem related technologies.
  • Big data Professional passionate to learn new concepts, tools, technologies, working collaboratively and exploring Data Architecture landscapes
  • Worked for Fortune 500 companies including CGI, Cognizant, Confidential, Cisco, Confidential .
  • Strong working experience with Big Data and Hadoop ecosystems and Java.
  • Expertize with the tools in Talend Big Data Integration, Hadoop ecosystem including MapReduce, HBase, Hive, Pig, Oozie, and Sqoop.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode and MapReduce concepts
  • Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming.
  • Experience with leveraging Hadoop ecosystem components including Pig and Hive for Data Analysis, Sqoop for Data Migration, Oozie for Data Scheduling, and HBase as a NoSQL data store.
  • Proficient in writing Map Reduce jobs.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Experience in Hadoop Shell Commands, writing Map Reduce Programs, verifying, managing and reviewing Hadoop log files.
  • Experience in working with flume to load the log data from multiple sources directly into HDFS.
  • Experience in providing support to data analyst in running Pig and Hive queries.
  • Experience in fine tuning Map Reduce jobs for better scalability and performance.
  • Experience working with Cloudera Distributions of Hadoop.
  • Expertise in writing SQLs, PL/SQL procedures in MySQL.
  • Having good knowledge on Apace Spark.
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa according to client's requirement.
  • Data Design and Development on Microsoft SQL Server.
  • Good Java Development skills using J2EE.
  • Self-motivated, ability to handle multiple tasks, learn and adapt quickly with new technologies.

PROFESSIONAL EXPERIENCE:

Talend/Hadoop Developer

Confidential, MO

Responsibilities:

  • Good understanding on the charter domain and project
  • Documentations and Presentations on Design, Testing, deployment etc.,
  • Involved in Planning, Data Modeling and design DW ETL process
  • Identify Potential Opportunities for implementations in DW by accessing RDBMS and BigData environments
  • Developed the ETL mappings using XML, .csv, .txt, Json sources and also loading the data from these sources into relational tables with Talend, big data
  • Design, develop and deploy end-to-end Data Integration solution.
  • File management using Linux, hdfs Commands and security enforcement
  • Implemented CDC feature, complex transformation and mappings etc
  • Managing the jobs in Tidal, Deployment plans, Monitor the Jobs, Troubleshooting etc.,
  • Implementing Data Integration process with Talend BigData Integration Suite 6.4
  • Ensure Data Quality, Data Cleanliness, Data Analysis, Best practices, Performance optimization etc.,
  • Design, Develop and deploy jobs to load the data into Hadoop and Hive
  • Written and executed Test scenarios Managed small Team

Environment: Talend Studio 6.4, Hadoop, Hive, HDFS, Hbase, Teradata, Kafka,, Linux Commands, Putty, Oracle 11g, etc

Hadoop Developer

Confidential

Responsibilities:

  • Configuring all the different kinds of data sources and loading the tables.
  • Loading the table configurations for all the tables in a source.
  • Configured different types of ingestion like Full Load, Timestamp-based Incremental Load, and Query based incremental load, Batch Id based incremental Load.
  • Loading the data of tables using table groups and scheduling
  • Developed automation script for the submission of failed jobs using python
  • Handling different services (Governor, Hangman, MongoDB, Scheduler, RestAPI) provided by info works
  • Handled admin responsibilities of creating users, Roles, Domains, Jobs
  • Involved in handling the HA service when a Active server is down.

Environment: CDH,Hadoop, MapReduce, HDFS, Hive, Sqoop

Hadoop Developer

Confidential, CA

Responsibilities:

  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Performed different validation services like matching service, create service, Change services
  • Experience in writing Spark programs in Scala for Data Extraction, Transformation and Aggregation
  • Collected the JSON data from HTTP Source and developed Spark APIs that helps to do insert and updates in Hive tables.
  • Used Apache Maven to build and configure the application

Environment: Hadoop, Spark, Scala

Hadoop Developer

Confidential - New York, NY

Responsibilities:

  • Created NDM Jobs in Mainframe in order to copy the daily SOR files from Mainframe to Edgenode (Unix).
  • Developed DMX-h copy tasks to split the SOR file into Header,Detail & Trailer as part of Data Quality processing.
  • Created Shell scripts to execute the DMX Jobs in Unix using Autosys.
  • Created JIL script to execute autosys jobs to trigger shell scripts.
  • Worked with Sqoop to fetch the data from RDBMS to Hadoop Cluster.
  • Built complex DMX tasks using Sort, Aggregate & Join combinations in order to achieve MapReduce functionality.
  • Analysed the existing code and designed the approach to create the required tasks in DMX and specified which part needs to be executed in mapper and which is to be run on reducer side.
  • Generated partitions while working with MapReducer Jobs in DMX.
  • Created Lookup tasks for the better performance when working with smaller size of files instead of Joins. Parametersied the components using shell for reusability.
  • Created Custom Tasks in DMX Jobs to execute a shell to run SQL.
  • Used Teradata to have the final Hadoop file data from each application into Teradata table using DMX TPT Utility
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Worked on analysing Hadoop cluster and different big data analytic tools including Pig Hbase database and Sqoop

Environment: Hadoop, Mainframe, Oracle, Linux, Hive, HDFS,DMX-h, Sqoop, Autosys, Spark,Scala

Hadoop Developer

Confidential

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Written multiple MapReduce programs in Java for Data Analysis Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files .
  • Developed pig scripts for analyzing large data sets in the HDFS.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume Designed and presented plan for POC on impala.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Performed extensive Data Mining applications using HIVE.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using autosys and Oozie coordinator jobs.
  • Experienced in analyzing and Optimizing RDD's by controlling partitions for the given data
  • Good understanding on DAG cycle for entire spark application flow on Spark application WebUI
  • Experienced in writing live Real-time Processing using Spark Streaming with Kafka
  • Used Hive QL to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Experienced in querying data using Spark SQL on top of Spark engine.
  • Experience in managing and monitoring Hadoop cluster using Cloudera Manager
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop

Environment: CDH, Java (JDK1.7), Hadoop, MapReduce, HDFS, Hive, Sqoop, Flume, HBase, Pig, Oozie, Scala, Spark, SparkSQL, Spark Streaming, Linux, Shell Scripting, MySQL Oracle 11g, PL/SQL, SQL*PLUS

Hadoop Developer

Confidential

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Setup and benchmarked Hadoop clusters for internal use.
  • Developed Simple to complex Map/reduce Jobs using Java programming language that are implemented using Hive and Pig.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior. Used UDF's to implement business logic in Hadoop.
  • Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
  • Experienced on loading and transforming of large sets of structured and semi structured data.
  • Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
  • Involved in implementation of JBoss Fuse ESB 6.1 Consumed REST based web services.

Environment: Hadoop, Hive, Impala, Java, J2ee, RestServices, MapReduse, Jboss Fuse ESB 6.1.

Software Developer

Confidential

Responsibilities:

  • Working with the client and business management on gathering requirements and understanding functional aspects of the application.
  • Understand the requirements and provide the technical expertise to project team covering database, middleware and web technologies.
  • Designed complex applications based on business requirements and prepares Business requirement documents (BRD)
  • Developed Technical specification documents (TSD) after design discussions with development team includes technical lead, team lead, project manager and development team
  • Managed offshore teams based on Project Plan and Geographic Locations.
  • Developed Proof of Concepts (POC)'s on providing support for technologies to use for projects.
  • Used multiple frameworks MVC, Struts, spring, Hibernate on project design.
  • Used multiple middleware technologies Spring ORM, Hibernate, JPA to implement security, transaction management/processing etc.
  • Using various design patterns DAO, DTO depending on the project design.
  • Worked on multiple databases MySql, DB2to manage project data.
  • Designed the databases for complex projects which include complex triggers, procedures, normalizations, constraints, Job schedulers etc.

Environment: Java, J2EE, MYSQL, JDBC, Struts, Hibernate.

We'd love your feedback!