Bigdata/hadoop Developer Resume
Fort Worth, TX
SUMMARY
- Over 8+ Years of professional IT experience in analysis, architectural design, prototyping, development, Integration, and testing of applications using Java/J2EE Technologies and Good Working experience in Big Data Technologies.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Experienced in major Hadoop ecosystem's projects such as Pig, Hive, HBase and monitoring them with Cloudera Manager.
- Extensive experience in developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Hands - on experience working on NoSQL databases including HBase, Cassandra and its integration with the Hadoop cluster.
- Experience in implementing Spark, Scala application using higher order functions for both batch and interactive analysis requirement.
- Good knowledge in using Apache NiFi to automate the data movement between different Hadoop systems.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Apache, Cloudera,andAWS.
- Experienced in designing, built, and deploying a multitude application utilizing almost all the AWS stack (Including EC2, R53, S3, RDS, DynamoDB, SQS, IAM, and EMR), focussing on high-availability, fault tolerance, and auto-scaling.
- Experienced in writing Business Requirements Document (BRD) and Functional Requirements Document (FRD).
- Experience in using TalendBigData components to create connections to various third-party tools used for transferring, storing or analyzing big data, such as HDFS Components, Hive, Sqoop, MongoDB and Big Query to quickly load, extract, transform and process large and diverse data sets.
- Adept at creating and transforming business requirements into functional requirements and designing business models using UML diagrams - Context, Use Case, Sequence, Activity Diagrams in Enterprise Architect, MS Visio and Rational Rose.
- Extensive experience in loading and analyzing large datasets with the Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala), NoSQL databases like MongoDB, HBase, Cassandra.
- Hands on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like ApacheHadoop and Cloudera.
- Developed Web-based applications using Python, Amazon Web Services, jQuery, CSS, and Model View control frameworks like Django, Flask,and JavaScript.
- Good experience with design, coding, debug operations, reporting and data analysis utilizing python and using python libraries to speed up development.
- Hands on experience with Bid Data environment on technologies including Hadoop.
- Experienced in creative and effective front-end development using JSP, JavaScript, HTML 5, DHTML, XHTML Ajax and CSS.
- Good Working experience in using different Spring modules like Spring Core Container Module, Spring Application Context Module, Spring MVC Framework module, Spring ORM Module in Web applications.
- Used jQuery to select HTML elements, to manipulate HTML elements and to implement AJAX in Web applications. Used available plug-ins for extension of jQuery functionality.
- Working knowledge of database such as Oracle10g/11g/12c, Microsoft SQL Server, DB2.
- Experience in writing numerous test cases using JUnit framework with Selenium.
- Experience in development of logging standards and mechanism based on Log4J.
- Strong problem-solving skills, good communication, interpersonal skills and a good team player.
- Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
TECHNICAL SKILLS
Bigdata/Hadoop Technologies: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper, and Oozie
Languages: C, C++, SAS, DTD, Schemas, JSON, Java, Scala.
NO SQL Databases: Cassandra, HBase, MongoDB.
Business Intelligence Tools: Tableau server, Tableau Reader, Tableau, Splunk, SAP Business Objects, OBIEE, SAP Business Intelligence, QlikView.
Business Applications: Microsoft Office Suite- MS Word, Excel, Outlook, PowerPoint
Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall.
Build Tools: Jenkins, Toad, SQL Loader, Maven, ANT, RTC, RSA, Control-M, Oziee, Hue, SOAP UI
Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.
Databases: Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Teradata, Netezza
Operating Systems: All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris
PROFESSIONAL EXPERIENCE
Confidential - FORT WORTH, TX
Bigdata/Hadoop Developer
Responsibilities:
- Responsible for installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.
- Configured Spark Streaming to receive real-time data from the ApacheKafka and store the stream data to HDFS using Scala.
- Complete end to end design and development of Apache NiFi flow which acts as the agent between middleware team and EBI team and executes all the actions mentioned above.
- Developed the Sqoop scripts to make the interaction between Hive and vertical Database.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route53, SES and SNS in the defined virtual private connection.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Streamed AWS log group into Lambda function to create service now incident.
- Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
- Used Hive to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into HBase.
- Scheduled several times based Oozie workflow by developing Python scripts.
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
- Responsible for installing Talend on multiple environments, creating projects, setting up user roles, setting up job servers, configure TAC options, adding Talend jobs, job failures, on-call support and scheduling etc.
- Enabled load balancer for impala to distribute data load on all Impala daemons across the cluster.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using a shell script, Sqoop, package,andMySQL.
- Collaborated with business users to analyze current business process and partnered with them to prepare detailed test scenarios for proof of concept execution.
- Prepared and presented Business Requirement Document (BRD), System Requirement Specification (SRS) and Functional Requirement Document (FRD).
- Analyse business requirements and segregated them into UseCases. Created Use case diagrams, activity diagrams, Sequence Diagrams.
- Organized JAD sessions to flush out requirements, performed Use Case and work flow analysis, outlined business rules, and developed domain object models.
- End-to-end architecture and implementation of client-server systems using Scala, Akka, Java, JavaScript and related, Linux
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Implementing Hadoop with the AWSEC2 system using a few instances in gathering and analyzing data log files.
- Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Created partitioned tables and loaded data using both static partition and dynamic partition method.
- Developed custom Apache Spark programs in Scala to analyze and transform unstructured data.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
- Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
- Involved in Cluster maintenance, Cluster Monitoring, andTroubleshooting, Manage and review data backups and log files.
- Involved in the development of Talend Jobs and preparation of design documents, technical specification documents.
- Analyzing the Hadoop cluster and different BigData analytic tools including Pig, Hive, HBase,and Sqoop.
- Improved the Performance by tuning of HIVE and map reduce.
- Research, evaluate and utilize modern technologies/tools/frameworks around Hadoop ecosystem.
Environment: HDFS, Map Reduce,Talend, Hive,Apache NiFi, Sqoop,Impala, Pig, Flume, Vertica, Oozie Scheduler, Java, Shell Scripts, Teradata, Oracle, HBase, MongoDB, Cassandra, Cloudera, AWS, JavaScript, JSP, Kafka, Spark, Scala and ETL, Python.
Confidential - DALLAS, TX
Bigdata/Hadoop Developer
Responsibilities:
- Contributing as a member of a high performing, the agile team focused on next-generation data &analytics
- Build Big Data Analytics and Visualization platform for handling high-volume batch-oriented and real-time data streams.
- Utilized Agile Scrum Methodology to help manage and organize a team with regular code review sessions.
- Configured Apache Nifi flow for loading data from non-relational data sources into raw access layer of HDFS.
- Built platforms and deployed cloud-based tools and solutions with AWSEMR
- Analyzed different big data analytics using Hive import data from RDBMS to HDFS.
- Loaded data from diff servers to AWS S3 bucket and setting appropriate bucket permissions.
- Reduced the overall EMR production cluster's cost (Amazon Web Services) by obtaining the best configuration for running data.
- Upgraded the Hadoop Cluster from CDH4 to CDH5 and setup High availability Cluster to Integrate the HIVE with existing applications
- Implemented complex big data with a focus on collecting, parsing, managing, analyzing, and visualizing large sets of data to turn information into business insights using multiple platforms in the Hadoopecosystem.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Provided regular user and application support for highly complex issues involving multiple components such as Hive, Impala, Spark, Kafka, MapReduce
- Involved in source system analysis, data analysis, and data modeling to ETL (Extract, Transform and Load).
- Facilitated and managed meeting sessions with committee of SMEs from various business areas including estimating resource and budget requirements.
- Designed and implemented basic SQL queries for QA testing and report, data validation.
- Managed plan and design of backup data centre infrastructure build-out and participate in business continuation/disaster recovery planning.
- Build Hive tables using list partitioning and hash partitioning and created Hive Generic UDF's to process business logic with HiveQL.
- Integrated HBase with MapReduce to move the bulk amount of data into HBase.
- Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
- Supported MapReduce Programs those are running on the cluster and Wrote MapReduce jobs using JavaAPI.
- Used Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly basis.
- Followed the organization defined Naming conventions for naming the Flat file structure, Talend Jobs and daily batches for executing the Talend Jobs.
- Built code for real-time data ingestion using Java, MapR-Streams (Kafka) and STORM.
- Designed unit test Data models and applications for data analytics solutions on streaming data
- Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle, MySQL) for predictive analytics
- Developed Scripts and Batch Job to schedule various Hadoop Program and worked with the raw data, cleanses it and finally polishes it to the format where it can be consumed by Data Scientists to create critical insights.
- Developed storytelling dashboards in Tableau Desktop and published them on to TableauServer and used GitHub version controlling tools to maintain project versions.
- Optimized the mappings using various optimization techniques and also debugged some existing mappings using the Debugger to test and fix the mappings.
Environment: Hadoop, Java, MapReduce, HDFS,Impala, AWS,Talend, Amazon S3, Hive, Linux, XML, Eclipse, Cloudera, CDH4/5 Distribution, Spark, Scala, HBase, MongoDB, Python, GitHub,SQL,QA, Sqoop, Oozie, DB2, SQL Server, Oracle 12c, MySQL,Apache NiFi.
Confidential -San Antonio, TX
Hadoop Developer
Responsibilities:
- Provided application demo to the client by designing and developing a search engine, report analysis trends, application administration prototype screens using AngularJS, and BootstrapJS.
- Took the ownership of complete application Design of Java part, Hadoop integration
- Apart from the normal requirement gathering, participated in a Business meeting with the client to gather security requirements.
- Implemented Apache NiFi to allow integration of Hadoop and PostgreSQL into day to day usage SDS of team's projects.
- Design and implement ETL jobs using BIML, SSIS and Sqoop to move from SQL Server to Impala.
- Assisted with the architect to analyze the existing system and future system Prepared design blue pints and application flow documentation
- Experienced in managing and reviewing Hadoop log files Load and transform large sets of structured, semi-structured and unstructured data
- Responsible to manage data coming from different sources and application Supported Map Reduce Programs those are running on the cluster
- Responsible in working with Message broker system such as Kafka Extracted data from mainframes and feed to KAFKA and ingested to HBase to perform Analytics
- Written event-driven, link tracking system to capture user events and feed to KAFKA to push it to HBASE.
- Created MapReduce jobs to extracts the contents from HBase and configured in OOZIE workflow to generate analytical reports.
- Developed the JAX- RS web services code using Apache CXF framework to fetch data from SOLR when the user performed the search for documents
- Participated in SOLR schema, and ingested data into SOLR for data indexing.
- Written MapReduce programs to organize the data, and ingest the data to suitable for analytics in client specified format
- Hands on experience in writing python scripts to optimize the performance Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Created Talend Development Standards. This document describes the general guidelines for Talend developers, the naming conventions to be used in the Transformations and also development and production environment structures.
- Involved in writing spark applications using Scala Hands on experience in creating RDDs, transformations, and Actions while implementing spark applications
- Good knowledge in creating data frames using Spark SQL. Involved in loading data into Cassandra NoSQL Database
- Implemented record level atomicity on writes using Cassandra Written PIG Scripts to query and process the Datasets to figure out the patterns of trends by applying client-specific criteria, and configured OOZIE workflows to run the jobs along with the MR jobs
- Stored the derived the results in HBasefrom analysis and make it available to data ingestion for SOLR for indexing data
- Involved in integration of java search UI, SOLR and HDFS Involved in code deployments using continuous integration tool using Jenkins
- Documented all the challenges, issues involved to deal with the security system and Implemented best practices
- Created Project structures and configurations according to the project architecture and made it available to the junior developer to continue their work
- Handled onsite coordinator role to deliver work to offshore Involved in core reviews and application lead supported activities
Environment: Java, J2EE, Python, Cassandra,Talend,Impala, Spring 3.2, MVC, HTML5, CSS, AngularJS, Restful services using CXF web services framework, spring data,Apache NiFi, SOLR 5.2.1, PIG, HIVE, apache AVRO, Map Reduce, Sqoop Zookeeper, SVN, Jenkins, windows AD, windows KDC, Hortonworks distribution of Hadoop 2.3, YARN, Ambari
Confidential - San Jose, CA
Hadoop developer
Responsibilities:
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Experienced to implement Hortonworks distribution system (HDP 2.1, HDP 2.2 and HDP 2.3).
- Developed Map Reduce programs for some refined queries on big data.
- Created AzureHDInsight and deployed Hadoop cluster in could platform
- Used HIVE queries to import data into MicrosoftAzure cloud and analyzed the data using HIVE scripts.
- Using Ambari in Azure HDInsight cluster recorded and managed the data logs of name node and data node
- Creating Hive tables and working on them for data analysis to cope up with the requirements.
- Developed a framework to handle loading and transform large sets of unstructured data from the UNIX system to HIVE tables.
- Worked with a business team in creating Hive queries for ad hoc access.
- In-depth understanding of Classic MapReduce and YARN architectures.
- Implemented Hive Generic UDF's to implement business logic.
- Involved in installing and configuring security authentication using Kerberos security.
- Creating and dropping of users, granting and revoking permissions to users/Policies as and when required using Ranger.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Installed and configured Pig for ETL jobs.
- Developed Pig UDF's to pre-process the data for analysis.
- Deployed Cloudera Hadoop Cluster on Azure for Big Data Analytics
- Analyzed the data by performing Hive queries, ran Pig scripts, SparkSQL and Spark Streaming.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes a batch of data to Spark for real-time processing.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
- Involved in continuous monitoring of operations using Storm.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented indexing for logs from Oozie to ElasticSearch.
- Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend.
Environment: Hortonworks, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Apache Kafka, AZURE, Apache Storm, Oozie, SQL, Flume, Spark, HBase, Cassandra, Informatica, Java, and GitHub.
Confidential
Java Developer
Responsibilities:
- Involved in Analysis, design and coding on Java and J2EE Environment.
- Implemented struts MVC framework.
- Designed, developed and implemented the business logic required for Security presentation controller.
- Set up the deployment environment on Web Logic Developed system preferences UI screens using JSP and HTML.
- Developed UI screens using Swing components like JLabel, JTable, JScrollPane, JButtons, JTextFields, etc.
- Used JDBC to connect to Oracle database and get the results that are required.
- Designed asynchronous messaging using Java Message Service (JMS).
- Consumed web services through SOAP protocol.
- Developed web Components using JSP, Servlets and Server-side components using EJB under J2EE Environment.
- Designing JSP using Java Beans.
- Implemented Struts framework 2.0 (Action and Controller classes) for dispatching request to appropriate class
- Design and implementation of front-end web pages using CSS, DHTML, JavaScript, JSP, HTML, XHTML, JSTL, Ajax and Struts Tag Library.
- Designed table structure and coded scripts to create tables, indexes, views, sequence, synonyms and database triggers.
- Designed and implemented a generic parser framework using a SAX parser to parseXML documents which storeSQL.
- Involved in writing Database procedures, Triggers, PL/SQL statements for data retrieving.
- Developed using Web 2.0 to interact with other users and changing the contents of websites.
- Implemented AOP and IOC concept using UI Spring 2.0 Framework.
- Developed using Transaction Management API of Spring 2.0 and coordinates transactions for Java objects
- Generated WSDL files using AXIS2 tool.
- Developed using CVS as a version controlling tool for managing the module developments.
- Configured and Tested Application on the IBM Web Sphere App. Server
- Used Hibernate ORM tools which automate the mapping between SQL databases and objects in Java.
- Developed using XML XPDL, BPEL and XML parsers like DOM, SAX.
- Developed using XSLT to convert XML documents into XHTML and PDF documents.
- Written JUnit test cases for Business Objects, and prepared code documentation for future reference and upgrades.
- Deployed applications using WebSphere Application Server and Used IDE RAD (Rational Application Developer).
Environment: Java, J2EE, JSP, Servlets, MVC, Hibernate, Spring 3.0, Web Services, Maven 3.2.x, Eclipse, SOAP, WSDL, Eclipse,jQuery, Java Script, Swings, Oracle, REST API, PL/SQL, Oracle 11g, UNIX.
Confidential
Java Developer.
Responsibilities:
- Designed & developed the application using Spring Framework
- Developed class diagrams, sequence and use case diagrams using UML Rational Rose.
- Designed the application with reusable J2EE design patterns
- Designed DAO objects for accessing RDBMS
- Developed web pages using JSP, HTML, DHTML and JSTL
- Designed and developed a web-based client using Servlets, JSP, Tag Libraries, JavaScript, HTML and XML using Struts Framework.
- Involved in developing JSP forms.
- Designed and developed web pages using HTML and JSP.
- Designed various applets using JBuilder.
- Designed and developed Servlets to communicate between presentation and business layer.
- Used EJB as a middleware in developing a three-tier distributed application.
- Developed Session Beans and Entity beans for business and data process.
- Used JMS in the project for sending and receiving the messages on the queue.
- Developed the Servlets for processing the data on the server.
- Developed views and controllers for client and manager modules using Spring MVC and Spring Core.
- Used Spring Security for securing the web tier Access.
- Business logic is implemented using Hibernate.
- Developed and modified database objects as per the requirements.
- Involved in Unit integration, bug fixing, acceptance testing with test cases, Code reviews.
- Interaction with customers and identified System Requirements and developed Software Requirement Specifications.
- Implemented Java design patterns wherever required.
- Involved in development, maintenance, implementation and support of the System.
- Involved in initial project setup and guidelines.
- Implemented Multi-threading concepts.
- Developed test cases for Unit testing using JUnit and performed integration and system testing
- Involved in coding for the presentation layer using Struts Framework, JSP, AJAX, XML, XSLT and JavaScript
- Closely worked and supported the creation of database schema objects (tables, stored procedures, and triggers) using Oracle SQL/PLSQL.
Environment: Java / J2EE, JSP, CSS, JavaScript, AJAX, Hibernate, Spring, XML, EJB, Web Services, SOAP, Eclipse, Rational Rose, HTML, XPATH, XSLT, DOM and JDBC.