Hadoop Developer/java Developer Resume
Princeton, NJ
PROFESSIONAL SUMMARY:
- Around 7 years of professional experience in various Software Development positions in core and enterprise software development using Big Data, Java/J2EE and Open Source technologies.
- Having 3+ years of hands - on Experience on Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Zookeeper, Flume, NiFi and Kafka including their installation and configuration.
- Experience in AWS, Hortonworks and Cloudera Hadoop distributions.
- Worked with case team (client, implementation Consultants, System Administrators and project manager) in gathering business requirements for data migration needs. Also to identify, define, document and communicate data migration requirements
- Assisted with designing, planning and managing the data migration process.
- Migrated large data for both Front office and back office systems (SaaS or Enterprise Clients).
- In depth knowledge of Hadoop architecture and various components such as HDFS, JobTracker, NameNode, DataNode, MapReduce and Yarn concepts.
- Experience in writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Handling and further processing schema oriented and non-schema oriented data using Pig.
- Designed and developed SQOOP scripts for datasets transfer between Hadoop and RDBMS.
- Experience in extending Hive and Pig core functionality by writing custom UDFs.
- Hands on experience in extending the core functionalities of HIVE using UDF, UDAF and UDTF.
- Experience and good at Data modeling with Hive.
- Experience in using NIFI processor groups, processors and concepts on process flow management.
- Implemented Data Warehousing Methodologies for ETL using Informatica Designer, Repository Manager, Workflow Manager, Workflow Monitor, Repository Server Administration Console.
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in using Flume to collect weblogs.
- Actively involved in successfully migration project without affecting Ab Initio applications data and processing with through testing.
- Developed MapReduce jobs to automate transfer the data from HBase.
- Handling different file formats on Parquet, Proto Buffer, Avro, Sequence file, JSON, XML and Flat file.
- Experience working on Kafka cluster. Also have experience in working on Spark and Spark streaming.
- Good Knowledge in creating event processing data pipelines using Kafka and Spark Streaming.
- Configured and maintained different topologies in Storm cluster and deployed them on regular basis.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Experience in using Hcatalog for Hive and Pig.
- Involved in the ETL process using Ab Initiotool to setup a data extraction from several databases.
- Wrote a Python module to connect and view the status of an Apache Cassandra instance.
- Worked on Apache spark writing Python applications to convert txt, xls files and parse data into JSON format.
- Loaded data in elastic search from datalake using SPARK/Hive.
- Experienced in NoSql databases such as Hbase, MongoDb and Cassandra.
- Involved in deploying the applications in AWS. Proficiency in Unix/Linux shell commands.
- Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
- Involved in loading data from local file system (Linux) to HDFS
- Extracted large data from mostly flat files and excel sources, with a minimum of 250,000 rows, to be loaded to the server; handled files with the max 1024 column allowed
- Experienced in developing Shell scripts and Python scripts for system management.
- Involved in data modeling and sharing and replication strategies in MongoDB.
- Experience in creating custom Lucene/Solr Query components.
- Utilized Kafka for loading streaming data and performed initial processing, real time analysis using Storm.
- Experience in developing distributed Web applications and Enterprise applications using Java/ J2EE technologies (Core Java (JDK 6+).
- Got experience in working on Scala with Spark.
- Excellent programming skills with experience in Java, C, SQL and Python Programming.
TECHNICAL SKILLS:
Hadoop Core Services: HDFS, MapReduce, Spark, Yarn
Hadoop Distribution: Hortonworks, Cloudera
NoSQL Databases: Hbase, Cassandra, MongoDB
Hadoop Data Services: Hive, Pig, Impala, Sqoop, Flume, NiFi, Kafka, Storm, Solr
Hadoop Operational Services: Zookeeper, Oozie
Programming Languages: Core Java, Servlets, Hibernate, Spring, Struts, Scala, Python
Databases: Oracle, MySQL, SQL Server
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat
Operating Systems: UNIX, Windows, LINUX
Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ
PROFESSIONAL EXPERIENCE:
Confidential, Princeton, NJ
Hadoop Developer/Java Developer
Responsibilities:
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Involved in data migration activity using Sqoop JDBC drivers for MySql
- Developed MapReduce programs that filter bad and un-necessary records and find out unique records based on different criteria.
- Developed Secondary sorting implementation to get sorted values at reduce side to improve MapReduce performance.
- Implemented Custom writables, Input Format, Record Reader, Output Format, and Record Writer for MapReduce computationsto handle custom business requirements.
- Implemented MapReduce programs to classify data records into different classifications based on different type of records.
- Responsible to manage data coming from different sources.
- Created NIFI flow to ingest the data realtime from MySql to SalesForce using RestAPIS
- Created FanIn and FanOut multiplexing flows with Flume
- Experience with creating ETL jobs to load JSON data and server data into MongoDB and transformed MongoDB into the Data Warehouse.
- Experience in developing and designing POCs using Scalaand deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Created Ab Initio graphs that transfer data from various sources like Oracle, flat files and CSV files to the Teradata database and flat files.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Implemented Daily Oozie coordination jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Responsible for performing extensive data summarization using Hive.
- Importing the data into Spark from Kafka Consumer group using Spark Streaming APIs.
- Developed Pig UDF's to pre-process the data for analysis using Java or Python.
- Worked with SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS.
- Derived modeled the Facts, Dimensions, Aggregated facts in Ab Initio from data warehouse star schema for creating billing.
- Involved in writing curl scripts, background batch process and on demand process for indexing to solr using SolrJ API.
- Involved in migrating Hive queries intoSparktransformations usingData frames, Spark SQL, SQL Context, and Scala.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Involved in deploying the applications in AWS. Proficiency in Unix/Linux shell commands.
- Worked with JSON for data exchange between client and server.
- Extensively used Spring & Hibernate Frameworks and implemented MVC architecture.
- Worked on Spring RESTful for dependency injection.
- Developed and retrieved No-SQL data using Mongo DB using DAO's
- Implementation of Business logic layer for MangoDB services.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, CDH4, Map Reduce, HDFS, Pig, Hive, Impala, Oozie, Java, Kafka, Storm, Linux, Maven, Oracle 11g/10g, SVN, MongoDB, Informatica.
Confidential, Waltham, MA
Hadoop Developer
Responsibilities:
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase and Sqoop.
- Involved in loading data from LINUX file system to HDFS.
- Developed custom and Indexed search result using Apache SOLR.
- Involved in creation of test plan and testing module for all the java code in these projects and load test of solr server and search service.
- Developed MapReduce jobs in Python for data cleaning and data processing
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Integrated Spark with various NoSQL Databases like HBase, Cassandra and Message Brokering Kafka in Cloudera.
- Implemented test scripts to support test driven development and continuous integration.
- Performed performance tuning for Spark Streaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
- Developed multiple MapReduce jobs in java for data cleansing and preprocessing.
- Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise to get transformed data sets.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
- Worked on tuning the performance Pig scripts.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple Mapreduce jobs.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets.
- Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
- Used Cloud watch logs to move app logs to S3. Create alarms based on exceptions raised by applications.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as Required
Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Linux, Java, Oozie, Cassandra.
Confidential, San Jose, CA
Hadoop Developer/Java Developer
Responsibilities:
- Extensively involved in Installation and configuration of Cloudera distribution, NameNode, Secondary NameNode, Job Tracker, TaskTrackers and DataNodes.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Managed and reviewed Hadoop Log files.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Designed a data warehouse using Hive.
- Created partitioned tables in Hive.
- Mentored analyst and test team for writing Hive Queries.
- Extensively used Pig for data cleansing.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Developed the Pig UDF’S to pre-process the data for analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Good exposure and knowledge on co-ordination services through ZooKeeper.
- Expertise in using with Spring, JSF, EJB, Hibernate and Struts frameworks.
- Expertise in using Development Tools like Eclipse, My eclipse and Net beans.
- Excellent back-end SQL programming skills using MSSQL, Oracle and SQL Server with PL/SQL.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, ZooKeeper, Oozie, CoreJava, Spring MVC, Hibernate UNIX Shell Scripting.
Confidential
Java Developer
Responsibilities:
- Developed documentation for new and existing programs, designs specific enhancements to application.
- Implemented web layer using JSF.
- Implemented business layer using Spring MVC.
- Implemented Getting Reports based on start date using SQL.
- Implemented Session Management using Session Factory in Hibernate.
- Developed the DO’s and DAO’s using hibernate.
- Hands on Experience in consuming data from RESTful Web Services using JSON.
- Restful web service development using Hibernate.
- Implement SOAP web service to validate zip code using Apache Axis.
- Built SOAP and RESTful services
- Wrote complex queries, PL/SQL Stored Procedures, Functions and Packages to implement Business Rules.
- Wrote PL/SQL program to send EMAIL to a group from backend.
- Developer scripts to be triggered monthly to give current monthly analysis.
- Scheduled Jobs to be triggered on a specific day and time.
- Modified SQL statements to increase the overall performance as a part of basic performance tuning and exception handling.
- Used Cursors, Arrays, Tables, Bulk collect concepts.
- Extensively used log4j for logging the log files
- Performed UNIT testing in all the environments.
- UsedSubversionas the version control system
Environment: Java 1.4.2, Spring MVC, JMS, Java Mail API 1.3, Hibernate, HTML, CSS, JSF, JavaScript, Junit, RAD, Web service, UNIX
Confidential
Java/J2ee Developer
Responsibilities:
- Involved in all the phases of the life cycle of the project from requirements gathering to quality assurance testing.
- Developed Class diagrams, Sequence diagrams using Rational Rose.
- Responsible in developing Rich Web Interface modules with Struts tags,JSP, JSTL, CSS, JavaScript, Ajax, GWT.
- Developed presentation layer using Struts framework, and performed validations using Struts Validatorplugin.
- Created SQL script for the Oracle database
- Implemented the Business logic using Java Spring Transaction Spring AOP.
- Implemented persistence layer using Spring JDBC to store and update data in database.
- Produced web service using WSDL/SOAP standard.
- Implemented J2EE design patterns like Singleton Pattern with Factory Pattern.
- Extensively involved in the creation of the Session Beans and MDB, using EJB 3.0.
- Used Hibernate framework for Persistence layer.
- Extensively involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
- Deployed and built the application using Maven.
- Performed typo JUnit.
- Used JIRA to track bugs.
- Extensively used Log4j for logging throughout the application.
- Produced a Web service using REST with Jersey implementation for providing customer information.
- Used SVN for source code versioning and code repository.
Environment: Java (JDK1.5), J2EE, Eclipse, JSP, JavaScript, JSTL, Ajax, GWT, Log4j, CSS, XML, Spring, EJB, MDB, Hibernate, Web Logic, REST, Rational Rose, Junit, Maven, JIRA, SVN.