- 8+ years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE, Big Data, Scala and Spark related technologies.
- Hadoop Developer with 4+ years of working experience in designing and implementing complete end - to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase etc.
- Hands-on development and implementation experience in Big Data Management Platform (BMP) using HDFS , MapReduce , Hive , Pig and other Hadoop related eco-systems as a Data Storage and Retrieval systems.
- Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
- Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka .
- Good knowledge on Spark Ecosystem and Spark Architecture.
- Having good knowledge on SparkStreaming
- Having Good knowledge on Machine Learning.
- Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.
- Good Hands-on full life cycle implementation using CDH (Cloudera) and HDP (HortonworksDataPlatform) distributions.
- In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Application Master, Name Node, Data Node, HBase design principles etc.,
- Strong Knowledge on Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
- Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Experience in handling messaging services using ApacheKafka .
- Experience with migrating data to and from RDBMS into HDFS using Sqoop.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Worked on NoSQL databases including HBase, Cassandra and Mongo DB.
- Strong experience in collecting and storing stream data like log data in HDFS using Apache Flume.
- Experience in working with JavaHBase API for ingestion processed data to HBase tables.
- Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and Pig jobs.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
- Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
- Solid understanding of Green Plum, Proficient with creation of scalable databases.
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
- Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
- Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
- Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, Spring, Hibernate, Struts, JMS, Web Services (SOAP/REST), Oracle, SQL Server and other relational databases.
- Profound knowledge on core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization.
- Experience writing Shell scripts in Linux OS and integrating them with other solutions.
- Expert at creating UML diagrams Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Microsoft Visio and IBM Rational Rose.
- Strong Experience in working with Databases like Oracle 11g/10g/9i, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
- Good experience in development of software applications using CoreJava, JDBC, Servlets, JSPs, Spring and RESTful Web Services.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Excellent communication, interpersonal and analytical skills and a highly motivated team player with the ability to work independently.
- Ability to learn and adapt quickly to the emerging new technologies and paradigms.
Hadoop/Big Data: HDFS, MapReduce, Spark, Spark SQL, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Hue, Zookeeper.
Programming Languages: C, Java, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL Java/J2EE & Web
Development Tools: Eclipse, Net Beans, SVN, Git, Ant, Maven, SOAP UI
Databases: Oracle 11g/10g/9i, Microsoft Access, MS SQL
No SQL Databases: Apache Cassandra, Mongo DB, HBase
Frameworks: Struts, Hibernate, And Spring MVC.
Web/Application servers: WebLogic, WebSphere, Apache Tomcat
Frameworks: MVC, Struts, Spring, Hibernate.
Distributed platforms: Hortonworks, Cloudera, MapR.
Operating Systems: UNIX, Ubuntu Linux and Windows 00/XP/Vista/7/8/9/10
Network protocols: TCP/IP fundamentals, LAN and WAN.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
- Implemented Capacity Scheduler to share the resources of the cluster for the map reduces jobs given by the users.
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics and Charge Back customers on their Usage.
- Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
- Involved in implementing security on Cloudera Hadoop Cluster using with working along with operations team to move non-secured cluster to secured cluster.
- Involved in creating Hive tables, loading with data and writing Hive queries, which will run internally in map, reduce way.
- Formulated procedures for installation of Hadoop patches, updates and version upgrades.
- Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Developed Python, Shell/Perl Scripts and Power shell for automation purpose.
- Created HBase tables to store variable data formats of data coming from different portfolios.
- Created Oozie workflows and super workflows to automate the data loading process
- Assisted in loading large sets of data (Structure, Semi Structured, and Unstructured) to HDFS using Sqoop
- Extensively used Pig for data cleaning and optimization.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Analyzed user request patterns and implemented various performance optimization measures including implementing partitions and buckets in HiveQL.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Created Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process
- Involved in efficiently collecting and aggregating large amounts of streaming log data into Hadoop Cluster using Apache Flume.
- Able to integrate state-of-the-art Big Data technologies into the overall architecture and lead a team of developers through the construction, testing and implementation phase. Involved in gathering the requirements, designing, development and testing
- Good understanding in writing Python Scripts.
- Implemented MapReduce programs to handle semi/ unstructured data like XML, JSON, Avro data files and sequence files for log files.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
- Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS. Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, Flume, Zookeeper, Java, MySQL, PL/SQL and Python
Confidential, Newark DE
- Handled importing of data from various data sources, performed data control checks usingSpark and loaded data into HDFS.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Involved in converting Hive/SQL queries into Spark transformations using SparkRDD’S and Scala.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
- Involved in creating Hive tables, and loading and analyzing data using Hive queries
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Design and improve internal search engine using Big data and SOLR/Fusion
- Data migration from various data sources to SOLR via stages according to the requirement
- Used Akka as a framework to create reactive, distributed, parallel and resilient concurrent applications in Scala.
- Extensively worked on Jenkins for continuous integration and for End to End automation for all build and deployments.
- Involved in preparing JIL’s for AutoSys jobs.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
- Extracted data from Oracle database transformed and loaded into Green Plum database according to the Business specifications.
- Created Mappings to move data from Oracle, SQL Server to new Data Warehouse in Green Plum.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
- Good experience with continuous Integration of application using Jenkins
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
- Used Cloud watch logs to move app logs to S3. Create alarms based on exceptions raised by applications.
- Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
Environment: Java 1.8, Scala 2.10.5, Apache Spark 1.6.0, Apache Zeppelin, GreenPlum 4.3 (PostgreSQL), CDH 5.8.2, Spring 3.0.4, Maven, Hive, HDFS, YARN, MapReduce, Sqoop 1.4.3, Flume, SOLR, UNIX Shell Scripting, Python 2.6, AWS, Kafka, Jenkins, Akka.
Confidential, Atlanta GA
Java / Hadoop Developer
- All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Written Hive join query to fetch info from multiple tables, written multiple Map Reduce jobs to collect output from Hive
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Involved in developing Map-reduce framework, writing queries scheduling map-reduce
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Performed Filesystem management and monitoring on Hadoop log files.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Involved in Configuring core-site.xml and mapred-site.xml per the multi node cluster environment.
- Used Apache Maven 3.x to build and deploy application to various environments
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions
Environment: :Apache Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, Flume, Zookeeper, Java, MySQL, Eclipse, PL/SQL and Python.
- Involved in Analyzing, preparing technical design specification documents as per the Requirements,
- Involved in study of User Requirement Specification, Communicated with Business Analysts to resolve ambiguity in Requirements document. Handled performance issues and worked on background job, which executes huge records.
- Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
- Sending Email Alerts to supporting team using BMC m send.
- Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams were used.
- Developed nightly batch jobs which involved interfacing with external third party state agencies.
- Involved in configuration of Spring MVC and Integration with Hibernate.
- Normalized Oracle database, conforming to design concepts and best practices.
- Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
- Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
- Used Core java and object-oriented concepts.
- Created database program in SQL server to manipulate data accumulated by internet transactions.
- Wrote Servlets class to generate dynamic HTML pages.
- Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
- Developed the XML Schema and Web services for the data maintenance and structures Wrote test cases in JUnit for unit testing of classes.
- Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
- Debugged the application using Firebug to traverse the documents.
- Involved in developing web pages using HTML and JSP.
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
- Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server.
- Developed the presentation layer using CSS and HTML taken from bootstrap to develop for browsers.
- Involved in analysis of the specifications from the client and actively participated in SRS Documentation.
- Developed Servlets and JDBC were used in retrieving data.
- Designed and developed dynamic Web pages using HTML and JSP.
- Implemented Object Relational mapping in the persistence layer using Hibernate Framework in conjunction with Spring Functionality.
- Involved in planning process of iterations under the AgileScrum methodology.
- Analyzed and designed a scalable system based on Object oriented concepts and the various J2EE design patterns. Implementation of Spring MVC Architecture.
- Involved in writing PL/SQL, SQL queries.
- Implemented web services using REST, JSON and XML.
- Developed entire application in Spring tool suite IDE.
- Validated the user input using Struts Validation Framework.
- Implemented test scripts to support test driven development and continuous integration.
- Responsible to manage data coming from different sources.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Implemented the mechanism of logging and debugging with Log4j
- Involved in testing the Business Logic layer and Data Access layer using JUnit.
- Used Oracle DB for writing SQL scripts, PL/SQL code for procedures and functions.
- Wrote JUnit test cases to test the functionality of each method in the DAO layer. Configured and deployed the WebSphere application Server.
- Prepared technical reports and documentation manuals for efficient program development.