- Experienced Hadoop developer with strong foundation in distributed file systems like HDFS, HBase in big data environment. Excellent understanding of the complexities associated with big data with experience in developing modules and codes in MapReduce, Hive, Pig and Spark to address those complexities
- Having 7+ years of experience in Analysis, Architecture, Design, Development, Testing, Maintenance, and User training of software application which includes around 4 Years in Big Data, Hadoop and HDFS environment and over 3 Years of experience in JAVA.
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, ZOOKEEPER, SQOOP, HUE, JSON.
- Good knowledge of Hadoop Development and various components such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and Map Reduce concepts.
- Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache and Cloudera.
- Good knowledge in programming Spark using Scala.
- Good understanding in processing of real - time data using Spark.
- Efficient in writing MapReduce Programs and using Apache Hadoop Map Reduce API for analyzing the structured and unstructured data.
- Experience in managing and reviewing Hadoop log files.
- Hands on experience on handling different file formats like Sequential files, CSV, XML, JSON,AVRO and PARQUET.
- Experience in writing external Pig Latin scripts.
- Experience in writing UDF's in java for Hive and pig.
- Experience in working with Flume/Kafka to load the log data from different sources into HDFS.
- Experience in using Apache Sqoop to import and export data to from HDFS and external RDBMS databases.
- Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
- Experienced with open source NOSQL technologies such as HBase, Cassandra, MongoDB
- Experience in using Hcatalog for Hive, Pig and HBase.
- Experienced with AWS components and services
- Experienced with AKKA and Spray Frameworks.
- Experienced with the Sparkimproving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's,Spark YARN.
- Used specifications in Swagger for documenting REST API. Strong knowledge of SOAP web Services
- Very Good understanding and Working Knowledge of Object Oriented Programming(OOPS), Python and Scala.
- Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.
- Extensive experience with Waterfall and Agile Scrum Methodologies.
- Experienced with agile development and Atlassian tools like Jira, Confluence and Bit Bucket.
- Developed UML Diagrams for Object Oriented Design: Use Cases, Sequence Diagrams and Class Diagrams using Visual.
- Working knowledge of database such as Oracle 8i/9i/10g, Microsoft SQL Server.
Hadoop Technologies and Distributions: Apache Hadoop, Cloudera Hadoop Distribution CDH3, CDH4, CDH5 and Horton works Data Platform (HDP)
Hadoop Ecosystem: HDFS, Map-Reduce, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper
NoSql Databases: Hbase
Programming: C, C++, Java, PL/SQL, SCALA
Databases: ORACLE, MySQL, SQL Server
IDE: Rational Rose, Eclipse, NetBeans
Operating Systems: Linux (RedHat, CentOS), Windows XP/7/8
Web Servers: Apache Tomcat
Cluster Management Tools: Cloudera Manager, HortonWorks, Ambari
Build Tools: Maven, Ant
Version Tools: GitHub
Confidential, Atlanta, GA
Hadoop Scala Developer
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in scala.
- As a part Data acquisition in, used sqoop and flume to inject the data from server to hadoop using incremental import.
- In pre-processing phase used spark to remove all the missing data and data transformation to create new features.
- In data exploration stage used hive and impala to get some insights about the customer data.
- Used flume, sqoop, hadoop, spark and oozie for building data pipeline.
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and Processing.
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in defining job flows
- Experienced in managing and reviewing Hadoop log files.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Supported Map Reduce Programs those are running on the cluster
- Cluster coordination services through Zookeeper.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
Environment: Hadoop, Big Data, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, Flume, LINUX, Java, Eclipse, Hadoop Distribution of Cloudera., Windows, UNIX Shell Scripting, and Eclipse.
Confidential, Charlotte, NC
- Developing Hive scripts to select the Delta (CDC) and load into HBase tables using pig script
- Transforming data using pig scripts
- Developing MapReduce scripts to count large number of records in HBase tables
- Working on different hive optimization and performance tuning techniques.
- Working on Ingestion of logs into Hadoop using Flume and Kafka
- Processing logs using spark streaming and loaded into hive tables
- Using Hive SerDe to read and write data in different formats.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from edge node to HDFS.
- Hands on experience in the process of Cassandra technology
- Monitoring the production clusters and handling tickets such as related to disk issues, service/OS Level issues on clusters
- Have involved in the process of upgrading, Adding & removing nodes also, have collected production metrics by using SPLUNK tool
- Involved in Design, Architecture and Installation of Big Data and Hadoop ecosystem components.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Creating workflows using Oozie.
- Automated Hadoop jobs using Oozie scheduler.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Environment: Hadoop, MapReduce, HDFS, Hive, HBase, Oozie, Spark 1.6
Confidential, Jersey City, NJ
- Understand Business requirement and involved in preparing Design document preparation according to client requirement.
- Analyzed Tera Data procedure to prepare all individual queries information.
- Developed hive queries according to business requirement.
- Developed UDF’s in Hive where we don’t have some default functions in hive.
- Developed UDF for converting data from Hive table to JSON format as per client requirement.
- Implemented Dynamic partitioning and Bucketing in Hive as part of performance tuning.
- Implemented the workflow and coordinator files using Oozie framework to automate tasks.
- Involved in Unit, Integration, System Testing.
- Prepared all unit test case documents and flow diagrams for all scripts which are used in the project.
- Scheduling and managing jobs on aHadoopcluster using Oozie work flow.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Transforming unstructured data into structured data using PIG.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Designed and developed PIG Latin Scripts to process data in a batch to perform trend analysis.
- Good experience onHadooptools like MapReduce, Hive and HBase.
- Worked on both External and Managed HIVE tables for optimized performance.
- Developed HIVE scripts for analyst requirements for analysis.
- Maintenance of data importing scripts using Hive and Map reduce jobs.
- Data design and analysis in order to handle huge amount of data.
- Cross examining data loaded in Hive table with the source data in oracle.
- Working close together with QA and Operations teams to understand, design, and develop and end-to-end data flow requirements.
- Utilising Oozie to schedule workflows.
- Storing, processing and analyzing huge data-set for getting valuable insights from them.
Environment: HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Java, Eclipse, Cassandra, PL/SQL, UNIX Shell Scripting, and Eclipse.
Confidential, Denver, Colorado
- In the system analysis OOA/OOD are determined.
- Developed Session Beans for JSP clients.
- Configured and Deployed EAR & WAR files on WebSphere Application Server.
- Defined and designed the layers and modules of the project using OOAD methodologies and standard J2EE design patterns & guidelines
- Designed and developed all the user interfaces using JSP, Servlets and Spring framework
- Maintained the existing code based developed in Spring and Hibernate framework by incorporating new features and fixing bugs
- Involved in fixing bugs and unit testing with test cases using JUnit framework
- Developed build and deployment scripts using Apache ANT to customize WAR and EAR files
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic using Oracle database
- Involved in writing Hibernate Query Language (HQL) for persistence layer.
- Used Log4j for application logging and debugging
- Coordinated with offshore team for requirement transition & providing the necessary inputs required for successful execution of the project
Environment: Java SE 7, Java EE 6, JSP 2.1, Servlets 3.0, HTML, JDBC 4.0, IBM WebSphere 8.0, PL/SQL, XML, Spring 3.0, Hibernate 4.0, Oracle 12c, ANT, Java Script & JQuery, JUnit, Windows 7 and Eclipse 3.7.
- Involved in SDLC phase of requirement analysis, design and development of the web based intranet application tool using Java, Spring and Hibernate.
- Used Struts validator framework to automatically validate user input.
- Develop, implement and maintain a synchronous, AJAX based rich client for improved customer experience.
- Used J2EE design patterns like DAO, Value Object, Service Locator, MVC and Business Delegate.
- Developed/Customized Java Server Pages (JSP) for Customer User Interface (UI).
- Developed web tier using Struts tag libraries, CSS, HTML, XML, JSP, and Servlet.
- Developed the database tier using JDBC 2.0.
- Used CVS tools for version control.
- Involved in production support, monitoring server and error logs and foreseeing the potential issues.
Environment: Java 1.6, JSP 2.1, Struts 2.1, Spring 3.0, Hibernate 4.0, Servlets 2.5, JDBC 2.0, Oracle 9i, AJAX, CSS, JSP 2.2, HTML, Web Sphere 7.0, JUnit, Design patterns, Web Services.
- Designed and implemented the training and reports modules of the application using Servlets, JSP and Ajax.
- Developed custom JSP tags for the application.
- Writing queries for fetching and manipulating data using ORM software iBatis.
- Used Quartz schedulers to run the jobs sequentially Confidential given time.
- Implemented design patterns like Filter, Cache Manager and Singleton to improve the performance of the application.
- Implemented the reports module of the application using Jasper Reports to display dynamically generated reports for business intelligence.
- Deployed the application in client's location on Tomcat Server.