- Having 8+ years of experience in Analysis, Architecture, Design, Development, Testing, Maintenance, and User training of software application which includes around 4 Years in Big Data, Hadoop Framework and HDFS, Hive, Pig, MapReduce, Sqoop, Oozie, MongoDB, Cassandra, AWS, ETL, Cloudera environment and years of experience in JAVA/J2EE.
- Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Excellent working experience with Hadoop distributions such as Hortonworks, Cloudera, and IBM BigInsights.
- Strong hands on experience with Hadoop ecosystem components like Hadoop Map Reduce, YARN, HDFS, Hive, Pig, Hbase, Storm, Sqoop, Impala, Oozie, Kafka, Spark, and ZooKeeper.
- Expertise in loading and transforming large sets of structured, semi - structured and unstructured data.
- Experienced in analyzing data with Hive Query Language (HQL) and Pig Latin Script.
- Expertise in optimizing Map Reduce algorithms using Mappers, Reducers, and combiners to deliver the best results for the large datasets.
- Experienced with cloud: Hadoop-on-Azure, AWS/EMR, Cloudera Manager (also direct-Hadoop-EC2 (non EMR)).
- Very good experience in writing Map Reduce jobs using Java native code, Pig, and Hive for various business use cases.
- Excellent Hands on Experience in developing Hadoop Architecture within the project in Windows and Linux platforms.
- Experienced in Microsoft Azure cloud platform to build Big Data applications includes provisioning HDInsight clusters, Azure Storage - Blob accounts, ADL (Azure Data Lake) accounts, SQL Server instance to use as Hive meta store server, ADF (Azure Data Factory) to build data pipeline - which includes Jobs like Cosmos copy, SQL Copy and running Hive queries . etc
- Strong Experience in writing Pig scripts and Hive Queries and Spark SQL queries to analyze large datasets and troubleshooting errors.
- Well versed in Relational Database Design/Development with Database Mapping, PL/SQL Queries, Stored Procedures and Packages using Oracle, DB2, Teradata and MySQL Databases.
- Excellent working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
- Have extensive knowledge and working experience on Software Development Life Cycle (SDLC), Service-Oriented architecture (SOA), Rational Unified Process (RUP), Object Oriented Analysis and Design (OOAD), UML and J2EE Architecture.
- Experience in working on SOAP and RESTfulWebServices.
- Extensive knowledge of OOPS, OOAD, UML concepts (Use Cases, Class Diagrams, Sequence Diagrams, Deployment Diagrams etc), SEI-CMMI and SixSigma.
- Proficiency in using frameworks and tools like Struts, Ant, JUnit, WebSphere Studio Application Developer (WSAD5.1), JBuilder, Eclipse, IBM Rapid Application Developer (RAD)
- Expertise in designing and coding Stored Procedures, Triggers, Cursers and Functions using PL/SQL.
- Expertise in developing XML documents with XSD validations, SAX, DOM, JAXP parsers to parse the data held in XML documents.
- Good in writing ANT scripts for development and deployment purposes.
- Experienced in GUI/IDE Tool using Eclipse, Jbuilder and WSAD5.0.
- Expertise in using java performance tuning tools like JMeter and Jprofiler and LOG4J for logging.
- Extensive Experience in using MVC (Model View Controller) architecture for developing applications using JSP, JavaBeans, Servlets.
- Highly Self-motivated and goal oriented team player with strong analytical, debugging and problem solving skills, Strong in object oriented analysis and design. Diversified knowledge and ability to learn new technologies quickly.
- Knowledge in implementing enterprise Web Services, SOA, UDDI, SOAP, JAX-RPC, XSD, WSDL and AXIS.
- Expertise in working with various databases like Oracle and SQLServer using Hibernate, SQL, PL/SQL, Stored procedures.
Hadoop/Big Data:: HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Pig, Impala, Oozie, Kafka, Spark, Zookeeper, Storm, Yarn, AWS.
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
IDE's: Eclipse, Net beans, IntelliJ
Frameworks: MVC, Struts, Hibernate, Spring
Databases: Oracle MySQL, DB2, Teradata, MS-SQL Server.
Nosql Databases: Hbase, Cassandra, MongoDB
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
ETL Tools: Informatica
Web Development: HTML, DHTML, XHTML, CSS, Java Script, AJAX
XML/Web Services: XML, XSD, WSDL, SOAP, Apache Axis, DOM, SAX, JAXP, JAXB, XMLBeans.
Methodologies/Design Patterns: OOAD, OOP, UML, MVC2, DAO, Factory pattern, Session Facade
Operating Systems: Windows, AIX, Sun Solaris, HP-UX.
Confidential, Minneapolis MN
Sr. Big Data Developer
- Worked on Ecosystems like Oozie, Sqoop, Spark, Kafka, Flume, Pig, H-Base, Hive, and Sqoop with CDH5.
- Used advanced ETL functionalities and performed job designs, business models using Talend.
- Extensively worked on Apache Spark, Spark streaming using Scala programming and tuned Java garbage collectors for Apache spark applications.
- Used AWS Data Pipeline to schedule Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
- Worked on Data import and export from Teradata and Oracle into HDFS and Hive using Sqoop.
Implemented custom UDF's for Hive to achieve comprehensive data analysis, and also created several JAVA-UDF.
- Loading the HBase data into Redshift cluster using Spark Structured streaming and hive external tables on HBase using Insert Overwrite with S3 as data storage.
- Developed Spark application (Spark 2.0.0, Java 8, Scala 2.11, Apache Kafka 0.8, Yarn, EMR-FS, HBase, Spark-HBase connector, Elastic Search) to read the transaction's data and process the business rules to report the errors, transaction summary that leads to market-basket analytics
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Developed Simple to complex Map/reduce Jobs using Java programming language that are implemented using Hive and Pig.
- Developing the Tasks and setting up the requirement environment through AWS for running Hadoop in cloud on various instances.
- Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to ElasticMapReduce jobs.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
Environment: Big Data, Python, Talend, Teradata, Hadoop, HDFS, Pig, Hive, MapReduce, Azure, Sqoop, Spark, Kafka, LINUX, Cassandra, MongoDB, Scala, Storm, Elastic search, SQL, PL/SQL, Scala, AWS, S3,Informatica, Redshift.
Confidential, NYC NY
Sr. Big Data Developer
- Involved in Design and Architecting of Big Data solutions using Hadoop Eco System.
- Collaborate in identifying the current problems, constraints and root causes with data sets to identify the descriptive and predictive solution with support of the Hadoop HDFS, MapReduce, Pig, Hive, and Hbaseand further to develop reports in Tableau.
- Worked on analyzing Hadoop cluster using different Bigdata analytic tools including Kafka, Sqoop, Storm, Spark, Pig, Hive and Map Reduce.
- Installed/Configured/Maintained Hortonworks Hadoop clusters for application development andHadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Architect the Hadoop cluster in Pseudo distributed Mode working with Zookeeper and Apache.
- Storing and loading the data from HDFS to AmazonAWSS3 and backing up and Created tables in AWS cluster with S3 storage.
- Utilized Big Data technologies for producing technical designs, prepared architectures and blue prints for Big Data implementation.
- Involved in to writing Scala program using sparkcontext.
- Provided technical assistance for configuration, administration and monitoring of Hadoop clusters.
- Involved in loading data from LINUX file system to HDFS and Importing and exporting data into HDFS and Hive using Sqoop and Kafka.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive and Supported MapReduce Programs those are running on the cluster.
- Prepared presentations of solutions to BigData/Hadoop business cases and present the same to company directors to get go-ahead on implementation.
- Successfully integrated Hive tables and Mongo DB collections and developed web service that queries Mongo DB collection and gives required data to web UI.
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Used Spark to create API's in JAVA and Scala and real time streaming the data using Spark with Kafka.
- Developed Hive queries, Pig scripts, and Spark SQL queries to analyze large datasets.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Transfer data between Azure HDInsight and databases using Sqoop.
- Worked on debugging, performance tuning of Hive & Pig Jobs and implemented test scripts to support test driven development and continuous integration.
- Developed enhancements to MongoDB architecture to improve performance and scalability.
- Deployed Algorithms in Scala with Spark, using sample datasets and done Spark based development with Scala.
- Manipulating, cleansing & processing source data and stage it on final hive/redshift tables.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Used Storm to consume events coming through Kafka and generate sessions and publish them back to Kafka.
- Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
- Designed end to end ETL work flow/jobs for Cassandra NoSQL DB as source.
- Involved in analysis, design and development phases of the project. Adopted agile methodology throughout all the phases of the application.
- Provisioned an Azure HDInsight cluster and connected to an HDInsight cluster, upload data, and run MapReduce jobs.
- Gathered and analyzed the requirements and designed class diagrams, sequence diagrams using UML.
- Writing scala classes to interact with the database and writing scala test cases to test scala written code.
- Performed exceptional J2EE Software Development Life Cycle (SDLC) of the application in Web and client-server environment using J2EE.
- Used Kibana web - based data analysis and dash boarding tool for elastic search and used logstash to stream data from one or many inputs, transforms it and output it one or many outputs.
Environment: Big Data, Hadoop, HDFS, Pig, Hive, MapReduce, Azure, Sqoop, Spark, Kafka, LINUX, Cassandra, MongoDB, Scala, Storm, Elastic search, SQL, PL/SQL, Scala, AWS, S3,Informatica, Redshift.
Confidential, Miami, FL
Sr. Big data/Hadoop Developer
- Worked with HadoopEcosystem components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with ClouderaHadoop distribution.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
- Evaluated the alternatives for NOSQL Data stores then documented the HBASE vs. MongoDB data stores.
- Extensively used data pipeline using Sqoop to import customer behavioral data and historical utility data from data sources such as Teradata, MySQL and Oracle into HDFS.
- Troubleshooting and maintenance of the Hadoop core and ecosystem components (HDFS, MapReduce, Pig, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, )
- Worked on Google analytics API to fetch user demographic, device details and place it on amazon redshift tables to facilitate data scientist's analysis.
- Worked on implementation of a log producer in SCALA that watches for application logs, transforms incremental logs and sends them to a Kafka and Zookeeper based log collection platform.
- Migrated Existing MapReduce programs to Spark Models using Python and Used Spark DataFrame API over Cloudera platform to perform analytics on hive data.
- Implemented design patterns in Scala for the application and Develop quality code adhering to Scala coding Standards and best practices.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce and Spark to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Written Pig Scripts for sorting, joining, filtering and grouping the data.
- Created Hive tables, loaded data and wrote Hivequeries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Developed multiple hive scripts and redshift scripts for several workflows. Used Sqoop to efficiently transfer data between databases and HDFS.
- Implemented Partitioning, Dynamic Partitioning and Bucketing in Hive and created a Hive aggregator to update the Hive table after running the data profiling job.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Developed JavaMapReduce programs for the analysis of sample log file stored in cluster.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Generating scala and java classes from the respective APIs so that they can be incorporated in the overall application.
- Developed and tested highly configurable Apache Spark based data processing ETL framework.
- Ingesting streaming data into Hadoop using Spark, Storm Framework and Scala.
- Designed end to end ETL flow for one of the feed having millions of records inflow daily. Used apache tools/frameworks Hive, Pig, Sqoop&HBase for the entire ETL workflow.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Connected qlikview reporting tool to redshift and generated reports.
- Used Cassandra to store the analyzed and processed data for scalability.
Environment: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, MongoDB, Azure, SQL, Sqoop, Oozie, Zookeeper, Cassandra, Teradata, MySql, Oracle, Scala, JAVA, PL/SQL, Spark, Scala, UNIX Shell Scripting, AWS, EMR, S3, RedShift.
Confidential, Kennett Square PA
Sr. Java/Hadoop Developer
- Developed Map Reduce jobs in java for data cleansing and preprocessing and moving data from Oracle to HDFS and vice-versa using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Worked on Hadoop, Hive, Oozie, and MySQL customization for batch data platform setup.
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hiveschema for analysis.
- Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
- Implemented Spark using Java and Spark SQL for processing of event data to calculate various usage metrics of the app like search relevance, active users and others.
- Collaborated with development teams to define and apply best practices for using MongoDB.
- Developing data pipeline using Flume, Sqoop, Pig and MapReduce to ingest workforce data into HDFS for analysis.
- Used Spark streaming to divide streaming data into batches as an input to spark engine for batch processing and Developed Spark SQL to load tables into HDFS to run select queries on top.
- Exported the analyzed data to the relational databases using HIVE for visualization and to generate reports for the BI team
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Implemented partitioning, bucketing in Hive for better organization of the data and worked with different file formats and compression techniques to determine standards
- Migrated existing ETL Pig Scripts to JavaSpark code to improve performance.
- Involved in Various Stages of Software Development Life Cycle (SDLC) deliverables of the project using the AGILE Software development methodology.
- Developed Hive Queries, Pig Latin scripts and Spark SQL queries to analyze large datasets.
- Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
- Developed Hive queries and UDFS to analyze/transform the data in HDFS and developed Hive scripts for implementing control tables logic in HDFS.
Environment: Apache Hadoop, Pig, Hive 0.10, Sqoop,HBase, MongoDB, Ozzie, Flume, MapReduce, HDFS, LINUX, Oozie, Cassandra, Hue, Teradata, Spark, Scala, Oracle, SQL, HCatalog, Java, Eclipse, VSS, Red Hat Linux.
Confidential, Jacksonville, FL
Sr. Java Developer
- Implemented features like logging, user session validation using Spring-AOP module and Used Spring IOC as Dependency Injection
- Developed application on Struts MVC architectureutilizing Action Classes, Action Forms and validations.
- Worked on Eclipse IDE and SVN as source code repository.
- Extensively used version control tools like IBMClearcase, MS Visual Source Safe6.0, and CVS Dimensions.
- Setup of UI project codebase for WAS7.x using JSF, Richfaces, Acegi, Facelets, Maven, Hibernate, spring and Maven.
- Performed version control using PVCS and provided production support and resolved production issues.
- Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
- Used JAX-RPC Web Services using SOAP to process the application for the customer
- Developed JSPs and Servlets to dynamically generate HTML and display the data to the client side.
- Used various tools in the project including Ant build scripts, Junit for unit testing, Clearcase for source code version control, IBM Rational DOORS for requirements, HP Quality Center for defect tracking and Followed Test driven development of Agile Methodology to produce high quality software.
- Design and developed Web Services (SOAP) client using AXIS to send service requests to Web services. Invoked Web Services from the application to get data.
- Involved in a full life cycle Object Oriented application development - Object Modeling, Database Mapping, GUI Design
- Designed and developed Business Services using Spring Framework (Dependency Injection), Business Delegate and DAO Design Patterns.
- Consumed Web Services (WSDL, SOAP, UDDI) from third party for authorizing payments to/from customers.
- Used XML and XSLT, DTD, XSD to display the pages in the HTML format for the customers.
- Developed managed beans to handle business logic in the MVCarchitecture.
- Developed Web Services to communicate to other modules using XML based SOAP and WSDL protocols.