- Over 8+ years of experience as Solutions - oriented IT Software Developer which includes 5+ years of experience in Web Application development using Hadoop and related Big Data technologies, with 3+ years of experience is using Java 2 Enterprise edition and through all phases of SDLC.
- Experience in analysis, design, development and integration using Bigdata-HadoopTechnology like MapReduce,Hive, Pig, Sqoop, Ozzie, Kafka, HBase, AWS, Cloudera, Hortonworks, Impala, Avro, Data Processing, Java/J2EE, SQL.
- Good knowledge on Hadoop Architecture and its components such as HDFS, MapReduce, Job Tracker, Task Tracker, Name Node, Data Node.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, Hive, Spark, Scala, Spark-SQL, MapReduce, Pig, Sqoop, Flume, HBase, Zookeeper, and Oozie.
- Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
- Experience in extending Pig and Hive functionalities with custom UDFs for analysis of data, file processing, by running Pig Latin Scripts and using Hive Query Language
- Experience working with Amazon AWS cloud which includes services like (EC2, S3A, RDS and EBS), Elastic Beanstalk, Cloud Watch.
- Worked onDataModelling using various ML (Machine Learning Algorithms) via R and Python.
- Experienced in transferring data from different data sources into HDFS systems using Kafka.
- Experience in Configured Hive meta store with MySQL, which stores the metadata for Hive tables
- Strong knowledge in using Flume for Streaming the Data to HDFS.
- Good knowledge in using job scheduling and monitoring tools likeOozieandZoo Keeper.
- Expertise on working with various databases in writing SQl queries, Stored Procedures, functions and Triggers by using PL\SQL and SQl.
- Experience in NoSQL Column-Oriented Databases like Cassandra, HBase, MongoDB and FiloDB and its Integration withHadoop cluster.
- Strong Experience in troubleshooting the operating system like Linux, RedHat, and UNIX, maintaining the cluster issues and java related bugs.
- Experience in Developing Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
- Well experienced in OOPS principles inheritance, encapsulation, polymorphism and Core Java principles collections, multithreading, synchronization, exception handling.
Programming Languages: Java, J2EE, C, SQL/PLSQL, PIG LATIN, Scala, HTML, XML
Hadoop: HDFS, MapReduce, HBase, Hive, Pig, Impala, Sqoop, Flume, OOZIE, Spark, SparkQL, and Zookeeper, AWS, Cloudera, Hortonworks, Kafka, Avro.
Scripting Languages: Java Script, Pig Latin, Python 2.7and Scala.
RDBMS Languages: Oracle, Microsoft SQL Server, MYSQL.
NoSQL: MongoDB, HBase, Apache Cassandra, FiloDB.
SOA: Web Services (SOAP, WSDL)
IDES: MyEclipse, Eclipse, and RAD
Operating System: Linux, Windows, UNIX
Methodologies: Agile, Waterfall model.
Testing Hadoop: MR UNIT Testing, Quality Center, Hive Testing.
Other Tools: SVN, Apache Ant, Junit and Star UML, TOAD, Pl/SQL Developer, JIRA, Visual Source, QC, Agile Methodology
Confidential, Bowie, MD
Sr. Hadoop Developer
- Multiple Spark Jobs were written to performDataQuality checks ondata before files were moved toDataProcessing Layer.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume1.7.0.
- Involved in deploying the applications in AWSand maintains the EC2 (Elastic Computing Cloud) and RDS (Relational Database Services) in amazon web services.
- Implemented the file validation framework, UDFs, UDTFs and DAOs.
- Strong experienced in working with UNIX/LINUX environments, writing Unix shell scripts, Python and Perl.
- Created reporting views in Impala usingSentry Policy files.
- Build REST web service by building Node.js Server in the back-end to handle requests sent from the front-end JQuery Ajax calls.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Advanced knowledge in performance troubleshooting and tuningCassandraclusters.
- Analyzing the source data to know the quality of data by usingTalend Data Quality.
- Involved in creating Hive tables, loading with data and writing hive queries.
- Developed REST APIs using Java, Play framework andAkka.
- Model and Create the consolidated Cassandra, FiloDB and Spark tables based on the data profiling.
- Used OOZIE1.2.1Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in HBase and Cassandra.
- Developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Used Impala to read, write and query the Hadoop data in HDFS from Cassandra and configured Kafka to read and write messages from external programs.
- Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Create a complete processing engine, based on Cloudera distribution, enhanced to performance.
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Kafka, Flume, Oracle 11g, Core Java, FiloDB, Spark, Akka, Scala, Cloudera HDFS, Talend, Eclipse, Web Services (SOAP, WSDL), Oozie, Node.js, Unix/Linux, Aws, JQuery, Ajax, Python, Perl, Zookeeper.
Confidential, Peachtree City, GA
Hadoop/ Spark Developer
- Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform datacleaning and preprocessing on Hortonworks.
- Implemented Data Interface to get information of customers using RestAPI and Pre-Processdata using MapReduce 2.0 and store into HDFS (Hortonworks)
- Extracted files from MySQL, Oracle, and Teradata 2 through Sqoop 1.4.6and placed in HDFS Cloudera Distribution and processed.
- Worked with various HDFS file formats like Avro1.7.6, Sequence File, Json and various compression formats like Snappy, bzip2.
- Proficient in designing Row keys and Schema Design for NoSQL Database Hbaseand knowledge of other NOSQL database Cassandra.
- Used Hive to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into Hbase.
- Good understanding ofCassandraData Modeling based on applications.
- Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java andTalend.
- Developed the Pig 0.15.0UDF's to pre-process the data for analysis and Migrated ETL operations into Hadoop system using Pig Latin scripts and Python Scripts3.5.1.
- Used Pig as ETL tool to do transformations, event joins, filtering and some pre-aggregations before storing the data into HDFS.
- Troubleshooting, debugging & altering Talend issues, while maintaining the health and performance of the ETL environment.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Used spark to parse XML files and extract values from tags and load it into multiple hive tables.
- Experienced in running Hadoop streaming jobs to process terabytes of formatted data using Pythonscripts.
- Developed small distributed applications in our projects using Zookeeper3.4.7and scheduled the workflows using Oozie 4.2.0.
- Proficiency in writing the Unix/Linux shell commands.
- Developed a SCP Stimulator which emulates the behavior of intelligent networking and Interacts with SSF
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Kafka, Flume, Oracle 11g, Core Java, Spark, Scala, Cloudera HDFS, Talend, Eclipse,Oozie, Node.js, Unix/Linux, Aws, JQuery,Ajax, Python, Perl, Zookeeper.
Confidential, Chicago, IL
- Developed multiple Map-Reduce jobs in java for data cleaning and pre-processing.
- Performed Map Reduce Programs those are running on the cluster.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Configured Hadoop cluster with Namenode and slaves and formatted HDFS.
- Performed Importing and exporting data from Oracle to HDFS and Hive using Sqoop
- Performed source data ingestion, cleansing, and transformation in Hadoop.
- Supported Map-Reduce Programs running on the cluster.
- Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Worked on improving the performance of existing Pig and Hive Queries.
- Involved in developing HiveUDFs and reused in some other requirements. Worked on performing Join operations.
- Developed fingerprinting rules on HIVE which help in uniquely identifying a driver profile
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behaviour.
- Used Hive to partition and bucket data.
Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, DataProcessing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase,Cron.
- Worked on AGITAR tool, which is Junit generating software to increase the code coverage. Code coverage was major quality issue faced in Acumen at that time. It was a critical short term project.
- Analyze the generated Junit and add proper asserts and make it more code specific along with increasing the code coverage. This helped to boast my product knowledge as well as my Junit writing skills.
- Which Improved Code Quality to a commendable level.
- Joined EFT team in ACUMEN. This team basically dealt with the Electronic Fund Transfer, ATM, Credit Cards and online banking.
- Explored almost all the areas of EFT. Learned DWR.
- Worked with various challenging aspects like JPOS for ATM and online banking. Various logger applications for the cards.
- Worked on all the layers of the product, enhancing knowledge on Core Java.
- Domain knowledge gain was tremendous in this assignment.
Environment: Core Java, Oracle, DWR, spring MVC, Agitar, Tomcat, Glassfish, ClearCase, JIRA
- Worked on one of the most critical module for project, right from the beginning phase which included requirement gathering, analysis, design, review and development.
- Module lead located to another location had KT from him about roughly 2 weeks, Lead was absorbed by client.
- Took initiative in building a new team of more than 6 members with proper knowledge transfer sessions assigning and managing tasks with JIRA.
- Learned Backbone JS and worked with UI team on UI enhancements.
- Actively participating in the daily Scrums, understanding new user stories.
- Implementing new requirements after discussion with Scrum masters.
- Working with BA,QA to identify and fix bugs, raise new feature and enhancements.
- Was greatly appreciated by client with appreciation certificate and client bonus of 10k and 50k respectively.
Environment: Java/J2EE, spring MVC, Hibernate, Oracle, Backbone.js, HTML, Tomcat, WebSphere, SVN, JIRA