Hadoop Developer Resume
Westlake, TX
SUMMARY:
- 6+ years of extensive IT experience with as a Hadoop Developer.
- Experience in installation, management and monitoring of Hadoop cluster using Cloudera Manager.
- Having experience in using Apache Avro to provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes.
- Good understanding of NoSQL databases like MongoDB, Cassandra, and HBase.
- Comprehensive experience in Big Data processing using Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce (MRV1 and YARN), Sqoop, Flume, Kafka, Oozie, Zookeeper, Spark, Impala.
- Experienced in the Hadoop ecosystem components like Hadoop MapReduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Tez, Flume, Kafka,Sparksql, Storm, Spark, Scala, MongoDB, Couchbase and Cassandra.
- Experience to offers a suite of cloud - computing services which makes up an on-demand computing platform with the help of Amazon web services (AWS).
- Experience in analyzing data using HIVEQL and Pig Latin and custom MapReduce programs in Java.
- Agile Coaching of 6 different development teams with 6-10 members each.Strategize and manage implementation of a new test automation system while maintaining quarterly release schedule.
- Having a Good exposure on Big Data technologies and Hadoop ecosystem, In-depth understanding of MapReduce and the Hadoop Infrastructure.
- Experience to evaluate the relationship between AWS Amazon Redshift and other big data systems.
- Hands on experience in gathering information from different nodes into Greenplum database and then Sqoop incremental load into HDFS.
- Recently started using Mahout for machine learning in identifying a more subtle classifier.
- Knowledge on Data Mining and Analysis including Regression models, Decision Trees, Association rule mining, customer segmentation, Hypothesis Testing and proficient in R Language (including R packages) and SAS.
- Experienced in configuring Flume to stream data into HDFS.
- Experienced in real-time Big Data solutions using Hbase, handling billions of records.
- Extensive experience in working with structured/semi-structured and Unstructured data by implementing complex MapReduce programs using design patterns.
- Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Experience in developing and designing Web Services (SOAP and Restful Web services).
- Extensive experience in writing SQL queries for Oracle, Hadoop and DB2 databases using SQLPLUS.
- Experience working on Waterfall and Agile methodology Analyzed and synthesized results from Joint Application Development (JAD)
- Hands on experience in working with Oracle, DB2, MySQL and knowledge on SQL Server.
- Hands on experience in using SQL and PL/SQL to write Stored Procedures, Functions and Triggers.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
- Well-versed with multiple operating systems such as Windows, Dos, UNIX, LINUX, Sun Solaris.
- Provide direction on trouble resolution for ISP Interfaces and customers interface. Provide technical support for WAN problems, T-1/ISDN/Frame Relay/HDLC. ent command on change management and coordinating deployments in distributed environments.
- Hands on experience with RAID using Volume Management Software like Logical Volume Manager,Veritas Volume Manager, Solaris Volume Manager.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, Mapreduce, HBase, Pig, Hive, Sqoop, Flume, Cassandra, Impala, Oozie, Zookeeper,Redshift, Hortonworks,greenplum, Amazon Web Services, EMR, MRUnit, Spark, Storm,AVRO,RDBMS
Java & J2EE Technologies: Core Java, JDBC, Servlets, JSP, JNDI, Struts, Spring, Hibernate and Web Services (SOAP and Restful)
IDE s: Eclipse, Net beans, MyEclipse, IntelliJ
Frameworks: MVC, Struts, Hibernate, Spring, HibMS Visual Studio
Programming languages: C,C++, Java, Python, Ant scripts, Linux shell scripts, R
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, MongoDB, Graph DB
Web Servers: Web Logic, Web Sphere, AOLserver, Apache Tomcat, Apache HTTP Server.
Web Technologies: HTML, XML, JavaScript, AJAX, Restful WS,PHP, XHTML, WordPress
Network Protocols: ARP, UDP, HTTP, BGP, DNS, RPC and ICMP
ETL Tools: Informatica, Qlikview, Microsoft SQL and Cognos
WORK EXPERIENCE:
Confidential, Westlake, TX
Hadoop Developer
Responsibilities:
- Migrate HDFS data to Edge-Note on AWS(Amazon web service) and setup Cloudera-Impala Environment in the cloud.
- Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution on data nodes .
- Developed custom aggregate functions using SparkSQL and performed interactive querying.
- Created Data Marts and loaded the data using Informatica Tool.
- Install, Configure and maintain Hadoop on Multi clustered environment on Virtual Systems and worked with MapReduce, Hbase, Hive, Pig, Sqoop, Spark, Scala and Pig, Latin, Flume, Zookeeper etc.
- MapReduce code to process the data from social feeds which used to come in various formats like Json, TSV, CSV etc. and to load it into Database.
- Deployed AWS(Amazon Web services) Big Data solutions using Redshift, Mango, Apache Hadoop, Spark, Casandra .
- Built and maintained standard operational procedures for all needed Greenplum implementations .
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Installed configured, monitored and did maintenance on the Cloudera distribution on Red Hat Enterprise Linux Version.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Worked on Big data platform Hortonworks. Used Kafka as a messaging system to get data from different sources.
- Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop.
- Setup POC Hadoop cluster on Amazon EC2(AWS).
- Importing and exporting data into HDFS and Hive using Sqoop.
- Developing MapReduce Program to transform/process data.
- Installed and configured Flume, Hive, Pig, Sqoop, HBase on the Hadoop cluster.
- Installed and configured Hive and also written Hive UDFs.
- Writing the shell scripts automate the data flow from local file system to hdfs and then to Nosql databases (Hbase, Monbodb) and vice versa.
- Loaded HIVE data into GreenPlum database using GPload utility for performing real time aggregation.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Create proof of concepts for data science (Spark SQL, Spark Streaming, Spark ML) utilizing Big Data ecosystem and AWS(Amazon web service)Cloud Computing (EMR)
- Configured authentication and authorization with Kerberos Centrify, and LDAP.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Worked on continuous scheduling of informatica workflow.
- Carried out Linux OS administration for Ubuntu version 14.04 and Red Hat 6.5.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented Spark to migrate map reduce jobs into Spark RDD transformations, streaming data using Apache Kafka and Spark streaming.
- Developed PIG Latin scripts to extract the data from the local file system output files to load into HDFS.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Confidential, Lakewood, NJ
Hadoop Developer
Responsibilities:
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Create a complete processing engine, based on Hortonworks’ distribution, enhanced to performance .
- Worked on importing and exporting data from RDBMS into HDFS, HIVE and HBase using Sqoop.
- Responsible for Creating Workflows and Sessions using Informatica workflow manager and Monitor the workflow run and statistic properties on Informatica Workflow Monitor.
- Developed Sqoop commands to pull the data from Teradata, Oracle and export into Greenplum.
- Send data extract to SAS for analytics purpose.
- Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie ontheHadoop cluster.
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Experience in writing Pig Latin scripts to sort, group, join and filter the data as part of data transformation as per the business requirements.
- Implemented scripts to transmit sysprin information from Oracle to HDFS using Sqoop.
- Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
- Designed and deployed scalable, highly available and fault tolerant systems on Amazon AWS in different region/zone.
- Load log data into HDFS using Kafka.
- Stored MapReduce program output in Amazon and developed a script to move the data to RedShift for generating a dashboard using QlikView.
- Optimized Map/Reduced Jobs to use HDFS efficiently by using various compression mechanisms.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Implemented pushing the data from Hadoop to Greenplum.
- Developed analytical components using Scala, Spark and Spark Stream
- Extracted files from MySQL/DB2 through Sqoop and placed in HDFS and processed.
- Analyzes the data by performing Hive queries and running Pig scripts to study data.
- Implemented business logic by writing Pig UDF's in Java and used various UDF's from Piggybanks and other sources.
- Implemented and delivered AWS(Amazon web service) infrastructure projects into operational delivery.
- Continuously monitored and managed the Hadoop cluster using Ganglia.
- Implemented real-time analytics with Apache Kafka and storm.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in writing queries in Sparksql using Scala.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed Sqoop scripts to handle change data capture for processing incremental records between new arrived and existing data in RDBMS tables.
- Supported in settling up QA environment and updating configuration for implementing scripts with Pig and Sqoop.
- Extracted data from DB2 and Oracle source systems and loaded into Flat files using Informatica.
- Deployed custom configured cluster on Amazon AWS.
- Wrote MapReduce job using Pig Latin, MySQL to HDFS on regular basis .
- Implemented testing scripts to support test driven development and continuous integration.
Java developer
Responsibilities:
- Assisted the analysis team in performing the feasibility analysis of the project.
- Designed Use Case diagrams, Class diagrams and Sequence diagrams and Object Diagrams in the detailed design phase of the project using Rational Rose 4.0.
- Involved in the Analysis, Design, Implementation and Testing of Software Development Life Cycle (SDLC) of the project.
- Developed presentation layer of the project using HTML, JSP 2.0, JSTL and JavaScript technologies. Experienced in developing web-based applications using Python, Django, PHP, C++, XML,CSS, HTML, DHTML,, JavaScript and Jquery.
- Developed complete Business tier using Stateless and Stateful Session beans with EJB 2.0 standards using Web sphere Studio Application Developer (WSAD 5.0).
- Used various J2EE design patterns, like DTO, DAO, and Business Delegate, Service Locator, Session Facade, Singleton and Factory patterns.
- Used Linux OS to convert the existing application to Windows.
- Consumed Web Service for transferring data between different applications.
- Integrated Spring DAO for data access using Hibernate.
- Written complex SQL queries, stored procedures, functions and triggers in PL/SQL.
- Configured and used Log4J for logging all the debugging and error information.
- Developed Ant build scripts for compiling and building the project.
- Used IBM Websphere portal and IBM Websphere application server for deploying the applications.
- Used CVS Repository for Version Control.
- Created test plans and JUnit test cases and test suite for testing the application.
- Good hands on UNIX commands, used to see the log files on the server.
- Assisted in Developing testing plans and procedures for unit test, system test, and acceptance test.
- Unit test case preparation and Unit testing as part of the development.
- Used Log4J components for logging. Perform daily monitoring of log files and resolve issues.
- Created hibernate mapping files to map POJO to DB tables.
Java Developer
Responsibilities:
- Used Ajax for intensive user operations and client-side validations.
- Developed the code using object oriented programming concepts.
- Developed application service components and configured beans using Spring IoC, creation of Hibernate mapping files and generation of database schema.
- Used Web Services for creating rate summary and used WSDL and SOAP messages for getting insurance plans from different module and used XML parsers for data retrieval.
- Used JUnit for testing the web application.
- Used JAXM for making distributed software applications communicate via SOAP and XML.
- Used DB2 as backend database. Experienced in MVW frameworks like Django, Angular.js, JavaScript,JQueryandNode.js. .Expert knowledge of and experience in Object oriented Design and Programming concepts.
- Used SQL statements and procedures to fetch the data from the DB2 database.
- Involved in writing Spring Configuration XML, file that contains declarations and business classes are wired-up to the frontend-managed beans using Spring IOC pattern.
- Involved in creating various Data Access Objects (DAO) for addition, modification and deletion of records using various specification files.
- Developed Ant Scripts for the build process and deployed in IBM WebSphere.
- Implemented Log4J for Logging Errors, debugging and tracking using loggers, appenders' components.
- Developed user interface using JSP, HTML, XHTML and Java Script to simplify the complexities of the application.
- Implemented Business processes such as user authentication, Transfer of Service using Session EJB's.
- Involved in the Bug fixing of various applications reported by the testing teams in the application during the integration and used Bugzilla for the bug tracking.
- Used Tortoise CVS as version control across common source code used by developers.
- Deployed the applications on IBM Web Sphere Application Server.
- Primarily responsible for design and development of Springs based applications.
- Involved in configuring JDBC connection pooling to access the oracle database.
- Used JUnit for Unit testing the application. Developed and maintained ANT Scripts