- 7+ years of experience in IT industry specializing as Hadoop/Java Developer with 4+ years of experience in Big Data ecosystem related technologies like Hadoop HDFS, Map Reduce, Apache Pig, Spark, Hive, Sqoop, HBase, Flume, and Oozie.
- Strong hands on experience in Hadoop Framework and its ecosystem including HDFS Architecture, MapReduce Programming, Hive, Pig, Sqoop, HBase, Zookeeper, Couchbase, Storm, Solr, Oozie, Spark, Scala, Flume, Storm, and Kafka.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in analyzing data using HIVEQL and Pig Latin and custom Map Reduce programs in Java and Scala.
- Experience in strong and analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience in importing and exporting data into HDFS and Hive using Sqoop.
- Integrated different data sources, data wrangling: cleaning, transforming, merging, and reshaping data sets by writing Python scripts.
- Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
- Hands on experience in installing, configuring Cloudera's Apache Hadoop ecosystem components like
- Flume - ng, HBase, Zookeeper, Oozie, Hive, Spark, Storm, Sqoop, Kafka, Hue, Pig, Hue with CDH3&4 Clusters
- Architected, Designed, and maintained high performing ELT/ETL Processes.
- Skilled in managing and reviewing Hadoop log files.
- Experienced in loading data to Hive partitions and creating buckets in Hive
- Experienced in configuring Flume to stream data into HDFS.
- Experienced in real-time Big Data solutions using HBase, handling billions of records.
- Processing this data using Spark Streaming API with Scala.
- Familiarity with distributed coordination system Zookeeper.
- Involved in designing and deploying a multitude application utilizing the entire AWS stack (Including EC2, RDS, VPC, IAM) focusing on high-availability, fault tolerance and auto-scaling.
- Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
- Good knowledge on building Apache spark applications using Scala.
- Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark with Hive and SQL/Teradata.
- Potential experience in (SDLC) Analysis, Design, Development, Integration and Testing in diversified areas of Client-Server/Enterprise applications using Java, J2EE technologies.
- Done Administration, installing, upgrading, and managing distributions of Cassandra.
- Strong database development skills using Database servers like Oracle, IBM DB2, My SQL and hands on experience with SQL, PL/SQL. Extensive experience of backend database programming in oracle environment using PL/SQL with tools such as TOAD.
- Have a particularly good understanding and worked with relational databases like MySQL, Oracle, and NoSQL databases like HBase, Mongo DB, Couchbase and Cassandra.
- Good work experience on JAVA, JDBC, Servlets, JSP.
- Proficient in Java, J2EE, JDBC, Collections, Servlets, JSP, Struts, Spring, Hibernate, JAXB, JSON,XML, XSLT, XSD, JMS, WSDL, WADL, REST, SOAP Web services, CXF, Groovy, Grails, Jersey, Gradle and Eclipse Link.
- Good knowledge in performance troubleshooting and tuning Cassandra clusters and understanding of Cassandra Data Modeling based on applications.
- Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Skilled in developing applications in Python language for multiple platforms familiarity with process and Python software development
Confidential, Washington, DC
- Administered, maintained, provisioned, patched, and maintained Cloudera Hadoop clusters on Linux.
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Developed Map Reduce pipeline jobs to process the data and create necessary HFiles.
- Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Involved in creation and designing of data ingest pipelines using technologies such as Apache Strom and Kafka.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Implemented discretization and binning, data wrangling: cleaning, transforming, merging, and reshaping data frames using Python.
- Created HBase tables to store various data formats of PII data coming from different portfolios.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Involved in managing and reviewing Hadoop log files.
- Responsible to manage data coming from different sources.
- Involved in creating Pig tables, loading with data, and writing Pig Latin queries which will run internally in Map Reduce way.
- Experienced in Using Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Transferred the data using Informatica tool from AWS S3 to AWS Redshift. Involved in file movements between HDFS and AWS S3.
- Create a complete processing engine, based on Hortonworks' distribution, enhanced to performance.
- Provide batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
- Used AvroSerdes to handle Avro Format Data in Hive and Impala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Worked on python files to load the data from csv, json, MySQL, hive files to Neo4j Graphical database.
- Handled Administration, installing, upgrading, and managing distributions of Cassandra.
- Assisted in performing unit testing of Map Reduce jobs using MRUnit.
- Assisted in exporting data into Cassandra and writing column families to provide fast listing outputs.
- Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduce jobs that extract the data on a timely manner.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Experienced in working with Spark eco system using SCALA and HIVE Queries on different data formats like Text file and parquet.
- Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
- Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
- Worked with Talend on a POC for integration of data from the data lake.
- Highly involved in development/implementation of Cassandra environment.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings. Hands-on knowledge in core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization, and deserialization of streaming applications.
Environment: Hadoop YARN, Spark-Core 2.0, Spark-Streaming, Spark-SQL, Scala 2.10.4, Python, Kafka 1.1.0, Hive 2.2.0, Sqoop, Amazon AWS, Oozie, Impala, Cassandra, Cloudera, MySQL, Informatica Power Center 9.6.1, Linux, Zookeeper, AWS EMR, EC2, and S3.
Confidential, Dallas, TX
- Administered, maintained, provisioned, patched, and maintained Cloudera Hadoop clusters on Linux.
- Developed Spark code using Java, Scala for faster testing and data processing.
- Experience with batch processing of data sources using Apache Spark.
- Experience with real time processing of data sources using Apache Spark Streaming.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into HBase.
- Worked on a product team using Agile Scrum methodology to design, develop, deploy, and support solutions that leverage the Client big data platform.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Involved in creation and designing of data ingest pipelines using technologies such as Apache Spark and Kafka
- Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
- Created External Hive Table on top of parsed data.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Automating and scheduling the Sqoop, Map Reduce and Spark jobs in a timely manner using Unix Shell Scripts and Control-M.
- Created Unit Test Documents and performing unit testing.
- Used Jira for bug tracking and Quick Build for continuous Integration.
- Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, Json, CSV formats.
- Loading data into HBase using Bulk Load and Non-bulk load.
- Involved in transforming data from Mainframe tables to HDFS, and HBASE tables using Sqoop.
- Hive QL scripts to create, load, and query tables in a Hive.
- Written MapReduce code to process & parsing the data from various sources & storing parsed data into HBase and Hive using HBase-Hive Integration.
- Experience in building batch and streaming applications with Apache Spark and Python.
- Used the libraries built on Mlib to perform data cleaning and used R programming for dataset reorganizing
- Debug CQL queries and implement performance enhancement practices.
- Strong knowledge on Apache Oozie for scheduling the tasks.
- Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
- Experience in configuring Kafka brokers, consumers, and producers for optimal performance.
- Knowledge of creating Apache Kafka consumers and producers in Java.
- Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce way.
- Practical knowledge of monitoring a Hadoop cluster using Nagios and Ganglia.
- Experience with GIT for version control system.
- Involved in loading data from UNIX file system to HDFS and developed UNIX scripts for job scheduling, process management and for handling logs from Hadoop.
- Understanding technical specifications and documenting technical design documents.
- Strong skills in agile development and Test-Driven development.
Environment:Java, Scala, Hadoop, Hortonworks, AWS, HDFS, YARN, Map Reduce, Hive, Pig, Spark, Flume, Kafka, Sqoop, Oozie, Zookeeper, Oracle, Teradata, and MySQL.
Confidential, Detroit , Michigan
- Extensively worked on Hadoop eco-systems including Hive, Spark Streaming with MapR distribution.
- Implemented J2EE Design Patterns like DAO, Singleton, and Factory.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
- Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
- Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
- Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.
- Developed the Java/J2EE based multi-threaded application, which is built on top of the strut’s framework.
- Used Spring/MVC framework to enable the interactions between JSP/View layer and implemented different design patterns with J2EE and XML technology.
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
- Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization, and user report generation.
- Implemented the J2EE design patterns Data Access Object (DAO), Session Façade and Business Delegate.
- Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro.
- Implemented MapReduce jobs in HIVE by querying the available data.
- Proactively involved in ongoing maintenance, support, and improvements in Hadoop cluster.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
- Collaborated with business users/product owners/developers to contribute to the analysis of functional requirements.
- Implemented application using MVC architecture integrating Hibernate and spring frameworks.
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Integrated Kafka-Spark streaming for high efficiency throughput and reliability
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, RDBMS/DB, Flat files, Teradata, MySQL, CSV, Avro data files. JAVA, and J2EE.
Confidential, McLean, VA
- Interacted with business analyst to understand the requirements to ensure correct modules been built to meet business requirements.
- Participated in the daily SCRUM meetings to produce quality enhancements within time.
- Developed UML using Case diagrams, Class diagrams, and Sequence diagrams using Rational Software Architect
- Spring MVC model integration for front-end request action controller.
- Developed web screens in JSP, JSTL, CSS and client-side validation using jQuery.
- Developed Web services to allow communication between application through SOAP over HTTP using Apache CXF
- Configured JMS on Web Sphere Server for asynchronous messaging through implementation of Message Driven Beans (MDB).
- Used Spring ORM module for integration with Hibernate for persistence layer.
- Implemented the application using the concrete principles laid down by several design patterns such as Session Façade, Business Delegate, Singleton, Data Access Object, and Service Locator.
- Developed the application in J2EE Application Server environment with IBM WebSphere as deployment server with RAD as development IDE.
- Used JIRA for defect tracking and project management.
- Developed and designed XML Schemas to transport and store data. XML was used to simplify data and allow for Platform Changes, as well as making data more available across the applications distributed platforms.
- Extensively used XSLT to transform XML documents to HTML.
- Developed unit and functional test cases using J-Unit.
- Maven and Jenkins used for the automatic build process.
- Used Log4J utility to log error, info, and debug messages.
- Used Rational Clear Case for version controlling.
- Worked efficiently in a very tight schedule to meet the deadlines