- Over 11+ years of experience in the field of IT including four years of experience in Hadoop ecosystem and four years of experience as a Java developer with good object oriented programming skills.
- Expertise in design, development and Testing of various web and enterprise applications using Type safe technologies like Scala, Akka, Play framework, Slick.
- Experienced in using Scala, Java tools like Intelli J, Eclipse.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MapReduce concepts responsible for writing MapReduce programs and setting up standards and processes for Hadoop - based application design and implementation.
- Expertise with different tools in Hadoop Environment including Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading using Informatica.
- Expertise in developing data driven applications using Python 2.7, Python 3.0 on Pycharm and Anaconda Spyder IDE's.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Expertise in MapReduce programs in HIVE and PIG to validate and cleanse the data in HDFS, obtained from heterogeneous data sources, to make it suitable for analysis.
- Analyzed or transformed stored data by writing MapReduce jobs based on business requirements.
- Experienced in coding Web Services with JAX-WS (SOAP) and JAX-RS (Restful).
- Experience in developing Pig scripts and Hive Query Language.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Hands on experience working with NoSQL database including MongoDB and HBase.
- Experience in optimizing MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
- Experience in developing Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Expertise in Web technologies using Core Java, J2EE, Servlets, EJB, JSP, JDBC, Java Beans, Apache, and Design Patterns.
- Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
- Ability to adapt to evolving technology, a strong sense of responsibility and accomplishment.
Operating Systems: Windows 8/7/XP, Unix, Ubuntu 13.X, Mac OSX
Hadoop Eco System: Hadoop 1.x/2.x(Yarn), HDFS, Map Reduce, Mongo, HBase, Hive, Impala, PIG, Zookeeper, Sqoop, Oozie, Flume, Storm, HDP, AWS, Eclipse, Cloudera-desktop and SVN.
API s: Servlets, EJB, Java Naming, and Directory Interface(JNDI), MapReduce, RESTful.
Development Tools: Eclipse, RAD/RSA (Rational Software Architect), IBM DB2 Command Editor, SQL Developer, Microsoft Suite (Word, Excel, PowerPoint, Access), Open Office Suite (Editor, Calc etc..), VM Ware.
Languages: Scala, Java, Java EE, JSP, Python
No SQL Databases: HBase, Cassandra, Monod
Servers: Web sphere (WAS) 6.x/7.0, Web Logic 10-12c, Apache.
Confidential, Minneapolis, MN
Senior Java/Hadoop/Python Developer
- Responsible for developing efficient MapReduce on AWS cloud programs for more than 20 years' worth of claim data to detect and separate fraudulent claims.
- Worked with the advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to efficiently run the algorithm on the huge datasets.
- Ran data formatting scripts in python and created terabyte csv files to be consumed by Hadoop MapReduce jobs.
- Performed Kafka analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
- Extensively used Akka actors architecture for scalable & hassle free multi-threading.
- Experience using Cloudera in an application for Vendors platform.
- Developed Python code using version control tools like GIT hub and SVN on vagrant machines.
- Created Hive tables to store data into HDFS, loading data and writing hive queries that will run internally in map-reduce way.
- Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
- Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
- Involved in Cluster coordination services through Zookeeper.
- Using Scala for coding the components in Play and Akka and Used Maven to build and generate code analysis reports. Involved in implementing Programmatic transaction management using AOP.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Played a key role in installation and configuration of the various Hadoop ecosystem tools such as Solr, Pig, HBase and Cassandra.
- Developed an information pipeline utilizing Kafka and Storm to store data into HDFS.
- Loading spilling data using Kafka, Flume and real time Using Spark and Storm.
- Implemented various hive optimization techniques like Dynamic Partitions, Buckets, Map Joins, Parallel executions in Hive.
- Working on handling all the requests to the systems using play framework MVC framework.
- Worked with Pre-Session and Post-Session Linux scripts for automation of ETL jobs and to perform operations like gunzip, remove and archive files.
- Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
- Created Joblets and Parent child jobs in Talend.
- Designed & Developed the ETL Jobs using Talend Integration Suite by using various transformations as per the business requirements and based on ETL Mapping Specifications.
- Extracted meaningful data from dealer csv files, text files, and mainframe files and generated Pythonpanda's reports for data analysis.
- Utilized Python to run scripts, generate tables, and reports.
- Coordinates with Agile team to effectively meet all Confidential commitments.
- Worked on JMS like Rabbit MQ, Active MQ and used JERSEY framework to implement the JAX- RS
- Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
- Parse Json files through Spark core to extract schema for the production data using SparkSQL and Scala.
- Actively updated the upper management with daily updates on the progress of project that include the classification levels that were achieved on the data.
ENVIRONMENT: Scala language with Akka framework, Java, J2EE, Hadoop, HDFS, Pig, Nifi, Hive, MapReduce, Sqoop, Kafka, CDH3, Cassandra, Python, Oozie, collection, Scala, AWS cloud, storm, Ab Initio, Apache, SQL, NoSQL, Bitbucket, HBase, Flume, spark, Solr, Zookeeper, ETL, Talend, Centos, Eclipse, Agile.
Confidential, Overland Park, KS
Java/ Hadoop Developer
- Involved in making Hive tables, stacking the information and composing Hive queries that will run inside in MapReduce.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Wrote Python modules to view and connect the Apache Cassandra instance
- Involved in writing MapReduce jobs.
- Developed RESTful web services interface to Java-based runtime engine and accounts.
- Customized RESTful Web Service using Spring RESTful API, sending JSON format data packets between front-end and middle-tier controller.
- Real- time streaming the information utilizing Spark with Kafka.
- Responsible for creating information pipeline utilizing flume, Sqoop, and pig to remove the information from weblogs and store in HDFS.
- Involved in emitting processed information from Hadoop to relational databases or external frameworks utilizing Sqoop, HDFS GET or CopyToLocal.
- Used Play logger to run through pre-load and post-load test cycles for application performance and errors.
- Developed data pipeline utilizing Flume, Sqoop, Pig and Java MapReduce to ingest client behavioral information and money related histories into HDFS for analysis.
- Experienced in managing and assessing Hadoop log records.
- Used Pig to do changes like event joins, filter boot traffic and some pre-aggregations before storing the information onto HDFS.
- Written Hive inquiries for data to meet the business requirements.
- Importing and sending out information into HDFS and Hive utilizing Sqoop and Kafka.
- Created various Parser programs to extract data from Autosys, Tibco Business Objects, XML, Informatica, Java, and database views using Scala.
- Performed deployment and support of cloud services including AWS.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Experience in Message based systems using JMS and MQ Series.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
ENVIRONMENT: Java, Hadoop, Scala, MapReduce, MongoDB, SQL, Apache, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Akka, Play, IBM MQ-Series, Core Java, HDP, HDFS, Eclipse, Kafka.
Confidential, Boston, MA
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Involved in installing and updating and managing Environment.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Involved in running Hadoop streaming jobs to process terabytes of XML format data.
- Participated in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
- Involved in Sqoop, HDFS Put or CopyFromLocal to ingest data.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Implemented test scripts to support test-driven development and continuous integration.
- Implemented SQL, PL/SQL Stored Procedures.
- Involved in developing Shell scripts to orchestrate the execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Actively updated the upper management with daily updates on the progress of a project that include the classification levels that were achieved on the data.
ENVIRONMENT: Core Java, J2ee, Hadoop, MapReduce, NoSQL, Hive, Pig, Sqoop,, Apache, HDP, HDFS, Eclipse.
Confidential, New York, NY
- Involved in Analysis, design, and coding on J2EE Environment.
- Implemented MVC architecture using Struts, JSP, and EJB's.
- Used Core Java concepts in an application such as multithreaded programming, synchronization of threads used thread to wait, notify, join methods etc.
- Presentation layer design and programming on HTML, XML, XSL, JSP, JSTL and Ajax.
- Creating cross-browser compatible and standards-compliant CSS-based page layouts.
- Worked on Hibernate object/relational mapping related to the database schema.
- Designed, developed and implemented the business logic required for Security presentation controller.
- Used JSP, Servlet coding under J2EE Environment.
- Good Experience in software configuration management using CVS, GIT and SVN.
- Designed XML files to implement most of the wiring need for Hibernate annotations and Struts configurations.
- Responsible for developing the forms, which contains the details of the employees, and generating the reports and bills.
- Developed Web Services for data transfer from client to server and vice versa using Apache Axis, SOAP, and WSDL.
- Involved in designing of class and data flow diagrams using UML Rational Rose.
- Created and modified Stored Procedures, Functions, Triggers and Complex SQL Commands using PL/SQL.
- Involved in the Design of ERD (Entity Relationship Diagrams) for the Relational database.
- Developed Shell scripts in UNIX and procedures using SQL and PL/SQL to process the data from the input file and load into the database.
- Used CVS for maintaining the Source Code Designed, developed and deployed on WebLogic Server.
- Performed Unit Testing on the applications that are developed.
- Developed user interface using JSP, Struts Tag Libraries to simplify the complexities of the application.
- Developed business logic using Stateless session beans for calculating asset depreciation on Straight line and written down value approaches.
- Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle.
- Created java classes to communicate with the database using JDBC.
- Responsible for design and implementation of various modules of the application using Struts-Spring-Hibernate architecture.
- Developed the Web Interface using Servlets, Java Server Pages, HTML, and CSS.
- Extensively used the JDBC Prepared Statement to embed the SQL queries into the java code.
- Developed DAO (Data Access Objects) using Spring Framework 3.
- Developed Web applications with Rich Internet applications using Java applets, Silverlight, Java.
- Deployed this application, which uses J2EE architecture model and Struts Framework first on WebLogic and helped in migrating to JBoss Application server.
- Designed and developed the application using various design patterns, such as session facade, business delegate and service locator.
- Involved in designing use-case diagrams, class diagrams, interaction using UML model with Rational Rose.