- Overall 7+ years of professional IT experience in Software Development. This also includes 4 years of experience in Ingestion, Storage, Querying, Processing and Analysis of Big Data using Hadoop technologies and solutions.
- Excellent understanding/knowledge of Hadoop architecture and various components of Hadoop ecosystem such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce &YARN.
- Hands on experience in using Hadoop ecosystem components like Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Flume, Zookeeper, Hue, Kafka, Storm & Impala.
- Experience with Agile Methodology.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's and Datasets.
- Developed producers for Kafka which compress, and bind many small files into a larger Avro and Sequence files before writing to HDFS to make best use of a Hadoop block size.
- Experience in analyzing data using Hive QL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Expertise in job workflow scheduling and monitoring tools like Oozie.
- Developed simple to complex Map/Reduce jobs using Hive and Pig to handle files in multiple formats like JSON, Text, XML, Sequence File etc.
- Worked extensively on creating combiners, Partitioning, Distributed cache to improve the performance of Map Reduce jobs.
- Experience in working with different data sources like Flat files, XML files, log files and Database.
- Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS) .
- Expertise in application development using Scala, RDBMS, and UNIX shell scripting.
- Experience developing Scala applications for loading/streaming data into NoSQL databases (HBASE) and into HDFS.
- Worked on ingesting log data into Hadoop using Flume.
- Experience in managing and reviewing Hadoop log files.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Management System and vice-versa.
- Using Apache Flume, collected and stored streaming data(log data) in HDFS.
- Experience in optimizing the queries by creating various clustered, non-clustered indexes and indexed views using and data modeling concepts.
- Experience with scripting languages (Scala,Pig,Python and Shell) to manipulate data.
- Worked with relational database systems (RDBMS) such as My SQL, and No SQL database systems like HBase and had basic knowledge on MongoDB and Cassandra.
- Hands on experience in identifying and resolving performance Bottlenecks in various levels like sources, Mappings and Sessions.
- Highly Motivated, Adaptive and Quick learner.
- Ability to adapt to evolving Technology, Strong Sense of Responsibility and Accomplishment.
- Hadoop, HDFS, Yarn, Map Reduce, Spark, Hive, Pig, Sqoop, Flume, Kafka, Storm, Oozie, Zookeeper,
- HBase, Cassandra, MongoDB
- Cloudera Manager, Horton Works.
- Java, Scala.
- Oracle 8i, 9i, 10g, 11g, MS Sql Server.
- TCP/IP, DNS, NIS, NIS+, NFS, AutoFS.
- Centos, Ubuntu, Linux, Windows.
Confidential, San Francisco, CA
- Involved in file movements between hdfs and awss3 and extensively worked with S3 bucket in aws.
- Developing use cases for processing real time streaming data using tools like Spark Streaming.
- Handled large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations.
- Imported required tables from Rdbms to hdfs using Sqoop and used Spark and Kafka to get real time streaming of data into HBase.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled Json Data.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Responsible for batch processing of data sources using Apache Spark.
- Developed predictive analytic using Apache Spark Scala APIs.
- Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
- Developed Kafka producer and consumers, Hbase clients, Spark and Hadoop MapReduce jobs along with components on hdfs, Hive.
- Involved in identifying job dependencies to design workflow for Oozie & yarn resource management.
- Worked on a product team using Agile Scrum methodology to Design, Develop, Deploy and support solutions that leverage the Client big data platform.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
- Design and code from specifications, Analyzes, Evaluates, Tests, Debugs, Documents, and Implements Complex Software Apps.
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts with understanding of Joins, Group and Aggregation and how does it translate to Map Reduce jobs
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Implemented Cloudera Manager on existing cluster.
- Extensively worked with Cloudera Distribution Hadoop, cdh 5.x, cdh4.x
- Responsible for troubleshooting debugging and fixing the wrong data or data missing problem for Oracle Database (Mysql).
Environment: HDFS, MapReduce, JavaAPI, JSP, JavaBean, Pig, Hive, Sqoop, Flume, Oozie, HBase, Kafka,Impala, Spark Streaming, Storm, Yarn, Eclipse, Unix Shell Scripting, Cloudera.
Confidential, Dearborn, MI
- Data Ingestion implemented using Sqoop, Spark, and loading data from various Rdbms.
- Responsible for design development of Spark Sql Scripts based on Functional Specifications.
- Data cleansing, transformations tasks are handled using Spark using Scala and Hive.
- Involved in converting Hive queries into Spark Data Frames and Datasets using Scala.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL and Pair RDD's.
- Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
- Responsible in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & Efficient Joins, Transformations and other during Ingestion process itself.
- Data Consolidation was implemented using Spark, Hive to generate data in the required formats by applying various ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to Hdfs.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- ETL development to normalize this data and publish it in Impala.
- Responsible for Job management using Fair scheduler and Developed Job Processing scripts using Oozie Workflow.
- Wrote a Shell Script to Convert all hive Internal tables to External tables.
- Integrated Hive with Hbase.
- Responsible for Performance Tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and Memory tuning.
- Importing and exporting data into Hdfs and Hive, Pig using Sqoop.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Worked with No SQL databases like HBase. Creating HBase tables to load large sets of semi structured data coming from various sources.
- Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
- Responsible to manage data coming from different sources.
- Responsible for Loading and Transforming of large seta of Structured, Semi Structured and Unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: Scala, Hive, HBase, Flume, Java, Impala, Pig, Spark, Oozie, Oracle, Yarn, Junit, Unix, HortonWorks, Flume, Sqoop, HDFS, Java, Python.
Confidential, Saint Louis, MO
- Data Ingestion using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Apache tools like Flume and Sqoop into Hive and Nosql databases like Hbase.
- Developed job flows in Oozie to automate the workflow for Pig and Hive jobs.
- Designed and built the reporting application that uses the Spark SQL to fetch and generate reports on HBase table data.
- Extracted feeds from social media sites such as Facebook, Twitter using Python scripts.
- Implemented helper classes that access HBase directly from Java using Java API.
- Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
- Responsible for converting ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
- Extracted the needed data from server and into Hdfs and bulk loaded the cleaned data into HBase
- Handled different time series data using HBase to store data and perform analytics based on time to improve queries retrieval time.
- Participated with admin in installation and configuring Map Reduce, Hive and HDFS.
- Implemented CDH3 Hadoop cluster on CentOS, assisted with performance tuning and monitoring
- Used IMPALA to analyze data ingested into HBase and compute various metrics for reporting on the dashboard.
- Managed and reviewed Hadoop log files.
- Involved in review of functional and non-functional requirements.
Environment: Hortonworks Hadoop 2.0, EMP, Cloud Infrastructure (Amazon AWS), JAVA, Python, HBase, Hadoop Ecosystem, Linux,Scala.
Confidential, Philadelphia, PA
- Involved in designing and developing Hadoop Map Reduce jobs Using JAVA Runtime Environment for the batch processing to search and match the scores.
- Involved in developing Hadoop Map Reduce jobs for merging and appending the repository data.
- Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Map-Reduce, Oozie.
- Executed speedy reviews and first mover advantages by using workflows like Oozie in order to automate the data.
- Loading process into the Hadoop distributed File System (HDFS) and Pig language in order to preprocess the data.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, Sqoop, Flume).
- Worked on Oozie workflow engine for Job scheduling.
- Importing and exporting large sets of data into HDFS and vice-versa using Sqoop.
- Used Java for reading data from MySql database and transferring it to HDFS.
- Transferred log files from the log generating servers into HDFS.
- Read the log generated data form HDFS using advanced HiveQL(Serialization-De Serialization).
- Executed the HiveQL commands on CLI (Command Line Interface) and transferred back the required output data to HDFS.
- Worked on Hive partition and bucketing concepts and created hive External and Internal tables with Hive partition
Environment: Hadoop, Map Reduce, Hdfs, Hive, Sql, Pig, Zookeeper, MongoDb, Centos, Cloudera Manager, Sqoop, Oozie, Zookeeper, MySql, Hbase, Solr, Java.
Confidential, Chicago, IL
- Developed Java Script Behavior code for user interaction.
- Developed UI screens for data entry application in Java GUI.
- Implemented the project according to the Software Development Life Cycle (SDLC).
- Front end screens development-using JSP with tag libraries and Html pages.
- Followed Coding Guidelines and update the status leads in time.
- Involved in Requirements Gathering, Analysis, Design, Development, Testing and Maintenance phases of Application.
- Used core java concepts like Collections, Generics, Exception handling, IO, Concurrency to develop business logic.
- JSON is used for serializing and de serializing data that is sent to or receives from JSP pages.
- Closely working with QA, Business and Architect to solve various Defects in quick and fast to meet the deadlines.
- Ensure all open issues/and or risks are Documented prior to moving to next Testing stage
- Involved in writing the Integrations tests and Testing the workflow of the service.
- Involved in writing the Junit Test Cases and testing the functionality. And also involved in smoke testing & integrating testing.
- Created Style Sheets (CSS) to control the look and feel of entire site.
- Developed client side screen using Html.
- Used Eclipse as IDE.
- Written multiple Map Reduce programs in Java for Data Analysis.
- Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Used Html and Css, as view components in MVC.
- Verify all Entry/ Exit criteria are completed with appropriate sign off.
- The work consisted mainly of Parsing data from the source databases into the warehouse.
- Participated in Agile Scrum methodology for application development. Analysis, Design, Coding, Unit, and Integration Testing of business applications in an Object-Oriented environment.
- Designing UML(Unified Modeling Language) diagrams for new enhancements.
- Creating requirement Documents and Design the requirement using UML Diagrams.
- Used Eclipse for the Development, Testing and Debugging of the application.
- Firebug is used as debugger.
- Implemented Services using Core Java. Developed and deployed UI layer logics of sites using JSP.
- Extensively used Java Collection framework and Exception handling.
- Worked on OOPS concepts.
- Developed the application in an Agile Environment with the constant changes in the application scope and deadlines.
- Worked on Database queries using Oracle instance.
- Involved in Integration system testing and User acceptance testing (UAT)
- Support the Application whenever encountered Production issues.
- Written and executed the Test Scripts using Junit.
- Involved in Bug Fixing and Production Support Maintenance. Integrated various modules and deployed on Websphere.
- Developed the user interface presentation screens using HTML, XML, and CSS.
- Involved in writing complex SQL queries, Stored Procedures in PL/SQL to access the data from Oracle database.
Environment: Core Java, J2EE, Html, JSP, Css, Eclipse, Sql, Plsql, Design Patterns, Web Sphere Application Serv1er, Tomcat, Web Services, Oracle, Xml, Firebug.