Hadoop/talend Developer, Resume
Chicago, IL
SUMMARY:
- 7 years of IT experience as a Developer, Designer & quality reviewer with cross platform integration experience using Big Data, Cloudera, Hortonworks distribution, Talend, ETL, Tez, SPARK and Scala. Relational Database design, Core Java development, J2EE application development, SQL and PL/SQL, Hadoop, Java, J2EE.
- Strong experience with Hadoop components: Hive, Pig, HBase, Sqoop and Flume.
- Implementation of Big data batch processes using Hadoop Map Reduce2, YARN, Tez, PIG and Hive.
- Analyzed large and critical datasets using Hortonworks, HDFS, HBase, MapReduce, Hive UDF, Hive, Sqoop, Oozie and Spark.
- Good working experience in using Spark SQL to manipulate Data Frames in Scala.
- Hands on Experience with Power BI Desktop and the Power BI Service.
- Experience working on NoSQL databases including Hbase and data access using HIVE.
- Experience with a variety of data formats and protocols such as JSON, and AVRO.
- Involved in Power Shell scripts to automate the Azure cloud system creation including end - to-end infrastructure.
- Good working experience using Sqoop to import data into HDFS from RDBMS .
- Hands on experience on Business Intelligence tools like Power Bi and Tableau for interactive dashboards and visual analysis.
- Worked on SPARK engine creating batch jobs with incremental load through STORM, KAFKA.
- Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS)
- Experience in Multiple Relational Databases like Oracle 10g and NOSQL database Hbase.
- Leveraged Jersey and Google GSon libraries for REST services and JQuery for Rest calls. Utilized Java Script to update the content on the web page dynamically.
- Hands on experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Oozie, Flume and Hbase).
- Extensive experience in design, development and support Model View Controller using Struts and Spring framework.
- Ability to work effectively in cross-functional team environments and experience of providing training to business users.
- Set up Active Data Guard Configurations for oracle database 11.2.0.3 for thirty data Warehouses whose size varies from 2TB to 5TB on a 3 node RAC.
PROFESSIONAL EXPERIENCE:
HADOOP/TALEND DEVELOPER,
Confidential, CHICAGO, IL
Responsibilities:
- Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
- Supported team to Build data lake ecosystem using Hadoop technologies, such as Hive, HBASE, Map-reduce, Pig, HDFS, Scala and Spark.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
- Wrote hql queries to perform transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Worked on various Talend components such as tMap, tFilterRow, tFileExist, tFileCopy, tFileList, tLogCatcher, tRowGenerator etc.
- Developed Archive framework using talend to archive data from hive partitioned and non-partitioned tables, flat files from source to target.
- Experience in deploying code hqls, talend jobs via Jenkins and urban code.
- Created Hive tables to store structured data into HDFS and processed it using HiveQL
- Worked with different File Formats like textfile, Parquet, ORC for HIVE querying and processing based on business logic.
- Involved in collecting and aggregating substantial amounts of datasets using Spark and staging data in HDFS for further analysis.
- Experience in Creating documents such as mapping, play books, implementation migration while delivering the project to AMS team.
- Coordinated with the QA lead for development of test plan, test cases, test code and actual testing, was responsible for defects allocation and ensuring that the defects are resolved.
- Used Horton works 2.5 and 2.7 versions, wrote shell scripts to automate the SQL- Spark conversion in UNIX.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Experienced in working with various kinds of data sources such as Teradata and Oracle.
- Successfully loaded files to HDFS from Teradata, and load loaded from hdfs to hive and ORMB.
- Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
- Knowledgeable in SPARK and Scala Framework exploration for transition from Hadoop/MapReduce to SPARK.
Environment: Hortonworks, HDFS, Hive, Sqoop, spark, Scala, Microsoft Azure SQL Server, Talend, Shell scripts, RDD, Archive framework, Talend FTP.
HADOOP/SCALA DEVELOPER
Confidential, BENTONVILLE, AR
Responsibilities:
- Involved in the Daily Scrum (Agile) including design of System Architecture, development of System Use Cases based on the functional requirements.
- Identify data sources, create source-to-target mapping, storage estimation, provide support for Hadoop cluster setup, data partitioning.
- Involved in gathering requirements from client and estimating time line for developing complex queries using Hive for logistics application.
- Responsible for Data Ingestion to inject data into Data Lake using multiple sources systems using Talend Bigdata
- Exposure on Spark Architecture and how RDD's work internally by involving and processing the data from Local files, HDFS and RDBMS sources by creating RDD and optimizing for performance.
- Worked with different actions in Oozie to design workflow like Sqoop action, Spark action, hive action, shell action.
- Developed daily, weekly and monthly cronjobs and crontab entries for Production and Test environment.
- Developing Spark programs using Scala API's to compare the performance of Spark with Hive and Microsoft Azure SQL Server.
- Developed Scala scripts using both RDD and Data frames/SQL/Data sets Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive for Data Aggregation, queries and writing data.
- Exported the Hive Tables to Microsoft azure platform using sqoop export, Window service, SQL server 2015.
- Responsible for run the Talend jobs using TAC.
- Experience in writing Sqoop scripts to import and export data from RDBMS into HDFS and HDFS to Microsoft SQL server and handled incremental loading on the customer and transaction information data dynamically.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
- Experience creating reports using BI Reporting Tools like Tableau and Power Bi.
- Prepared dashboards using calculated fields, parameters, calculations, groups, sets and hierarchies in Tableau.
- Experienced in creating Triggers on TAC server to schedule Talend jobs to run on server
- Analyzed business process workflows and assisted in the development of ETL procedures for moving data from source to target systems.
- Published the dashboard reports to Power BI service for navigating the developed dashboards in web.
- Scheduled Oozie workflows actions to run daily using Oozie coordinator at a UDT standard time.
Environment: Hortonworks, HDFS, Hive, Sqoop, spark, Scala, Microsoft Azure SQL Server, Oozie, Crontab, shell scripts, RDD, Oozie workflows & coordinator, Tableau and Power BI.
HADOOP DEVELOPE
Confidential, PHOENIX, AZ
Responsibilities:
- Defined, designed and developed Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
- Leading and coordinating with offshore development team for development and unit testing
- Developed workflow using Oozie for running Map Reduce jobs and Hive Queries .
- Worked on loading log data directly into HDFS using Flume.
- Created managed service offerings for several open source tools, including Hadoop, Spark, Mesos targeting cloud development and analytics use cases.
- Involved in extracting the Big data from Teradata into HDFS using Sqoop
- This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- Involved in analyzing existing architecture on premise datacenters and designed to migrate applications from on prem to AWS (Amazon Web Services) Public Cloud.
- Created tables in Teradata to export the data from HDFS using Sqoop after all the transformations and wrote
- Design roles and groups for users and resources using AWS Identity Access Management (IAM).
- Worked on Git for version control, JIRA for project tracking and Jenkins for continuous Integration.
- Designed and implemented MapReduce-based large-scale parallel relation-learning system
- Migration of SQL Server database to SQL Azure Developed the necessary Stored Procedures and created Complex Views using Joins for robust and fast retrieval of data in SQL Server.
- Extracted files from MySQL through Sqoop and placed in HDFS and processed.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Provisioned Azure data lake store and azure data lake analytics, and leverage U-SQL to write federated queries across data stored in multiple azure services.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Created Hbase tables to store various data formats of PII data coming from different portfolios.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files into Hadoop
- Applications are built using AWS Cloud formation templates (json).
- Writing entities in Scala and Java along with named queries to interact with database
- Writing Scala test cases to test Scala written code.
- Developed automated processes that run daily to check disk usage and perform cleanup of file systems on UNIX environments using shell scripting and CRON.
Environment: Hadoop, HDFS, Pig, Sqoop, Scala, Teradata, HBase, Azure, Shell Scripting, AWS, Maven, Hudson/Jenkins, Ubuntu, Red Hat.
Confidential, FAIRFAX, VA
Responsibilities:
- Involved in development of different modules of the application.
- Used design patterns MVC, Single Ton, DAO and DTO.
- Involved in Developing Test Cases.
- Involved in development of DAO to access data from database.
SOFTWARE ENGINEER,
Confidential, TELANGANA, IN
Responsibilities:
- As part of the lifecycle development prepared class model, sequence model and flow diagrams by analysing Use cases using Rational Tools.
- Reviewing and analyzing data model for developing the Presentation layer and Value Objects.
- Involved in developing Database access components using Spring DAO integrated with Hibernate for accessing the data.
- Extensive use of Struts Framework for Controller components and view components.
- Involved in writing the exception and validation classes using Struts validation rules.
- Involved in writing the validation rules classes for general server-side validations for implementing validation rules as part observer J2EE design pattern.
- Developed REST API's for user profile and other application support services.
- Used Spring AOP and Dependency injection during various modules of project.
- Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
- Create Web services using REST for consumption of data testing with Postman
- Developed various java objects (POJO) as part of persistence classes for OR mapping.
- Developed web services using SOAP and WSDL with Axis.
- Implemented EJB (Message Driven Beans) in the Service Layer.
- Responsible for the architecture and design of the data storage tier for this third-party provider of data warehousing, data mining and analysis.
- Involved in working with JMS MQ Queues (Producers/Consumers) in Sending and Receiving Asynchronous messages via MDB’s.
- Developed SQL stored procedures and prepared statements for updating and accessing data from database.
- Used JBoss for deploying various components of application and MAVEN as build tool and developed build file for compiling the code of creating WAR files.
- Used CSV for version control.
- Performed Unit testing and rigorous integration testing of the whole application.
Environment: Java, J2EE, EJB, JMS, Strut, JBoss, Hibernate, JSP, JSTL, AJAX, CVS, Java Script, HTML, XML, MAVEN, SQL, Oracle, SOA, SAX and DOM Parser, Web services (SOAP, WSDL), Spring