- Around 7 years of experience in Analysis, Architecture, Design, Development, Testing, Maintenance and User training of software application which includes over 5 years in Big Data, Hadoop and HDFS environment and 3 year experience in JAVA
- Hands on experience on installing, configuring, and using Hadoop components like Hadoop Map Reduce, YARN, HDFS, Hive, Pig, Flume and Sqoop, Spark, Zookeeper, Kafka, Elastic Search.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per requirement.
- Clear understanding on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming and also Data Node, Secondary Name Node, Backup Name Node, Resource Manager and Application Manager.
- Hands on using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Hands on writing custom UDFs for extending Hive and Pig core functionality.
- Used different Hive Serde's like Regex Serde and HBase Serde.
- Experience in analyzing data using Hive, Pig Latin, and custom MR programs in Java.
- Hands on using job scheduling and monitoring tools like Oozie and Zookeeper
- Experience working on different file formats like Avro, Parquet, ORC, Sequence and Compression techniques like Gzip, Lzo, and Snappy in Hadoop.
- Hands on dealing with log files to extract data and to copy into HDFS using flume.
- Wrote Hadoop Test Cases in Hadoop for checking Input and Outputs.
- Hands on integrating Hive and HBase and Spark.
- Experience in NoSQL databases: MongoDB, HBase, Cassandra
- Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera 5.x/4.x and HDP.
- Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3, EBS, RDS, RedShift
- Experience in JAVA, J2EE, Web Services, REST, SOAP, HTML and XML related technologies demonstrating strong analytical and problem solving skills, computer proficiency and ability to follow through with projects from inception to completion.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database and Java Core concepts-OOP, Multi-threading, Collections and IO.
- Hands on JAXWS, JMS, JSP/Servlets, Struts, Spring, Hibernate, Apache Tomcat, Web Logic, Web Sphere, JBoss, JDBC, RMI, Java Script, Ajax, jQuery, UNIX, Linux, JSON, XML, and HTML.
- Developed applications using Java, RDBMS, and Linux shell scripting.
- Experience in complete project life cycle of Client Server and Web applications.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Experience in scripting to deploy monitors, checks and critical system admin functions automation
- Have good interpersonal, communicational skills, strong problem solving skills, explore/adopt to new technologies with ease and a good team member and meets deadlines.
- Have the motivation to take independent responsibility and strong work ethic with desire to succeed and make significant contributions to the organization.
- Hadoop 2.x/1.x HDFS, Map Reduce, YARN, Hive, Pig, Hive, Zookeeper, Sqoop, Oozie
- Storm, Kafka, Spark
- HBase, Cassandra, MongoDB
- Oracle 11g/10g, IBM DB2, SQL Server, Netezza, MySQL
- Amazon Web Services (AWS)
- C, C++, JAVA/J2EE, UNIX Shell Scripting, Python
- Core JAVA, JSP, Servlets, JSF, JDBC/ODBC, Swing, EJB, JSTL, JMS Frameworks: MVC, Spring3/2.5/2, Struts 2/1, Hibernate 3
- Windows, Linux- Ubuntu, RedHat, Solaris
- Apache Tomcat, WebLogic, Web Sphere, JBoss
- Agile Scrum, UML, Design Patterns (Core Java and J2EE)
- Eclipse, Netbeans, IntelliJ IDEA, Toad, Rational Ross
Confidential, Boston, MA
Sr. Hadoop Developer
- Real time streaming the data using Spark Streaming with Kafka
- Developed Spark scripts by using Scala as per the requirement.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Performed different types of transformations and actions on the RDD to meet the business requirements.
- Worked with team members for upgrading, configuration and maintenance of various Hadoop infrastructures like Hive, and Hbase.
- Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
- Involved in loading data from UNIX file system to HDFS.
- Extracted the data from hive using Spark.
- Imported and exported data from different databases into HDFS and Hive using Sqoop.
- Used Sqoop for loading existing metadata in Oracle to HDFS.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Responsible for creating Hive tables and working on them using Hive QL and analyzed data for aggregation and reporting.
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
- Implemented Map Reduce jobs using Java API. Participated in the setup and deployment of Hadoop cluster.
- Hands on design and development of an application using Hive (UDF)
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using
- AVRO, PARQUET, JSON, CSV formats.
- Written client web applications using SOAP web services.
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Worked in AWS environment for development and deployment of Custom Hadoop applications.
- Install Hadoop, Map Reduce, HDFS, and AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Written elastic search template for the index patterns and used Tableau for reporting purposes.
Environment: HDFS, Hive, Spark, Spark-Streaming, Spark SQL, Apache, Kafka, Zookeeper, Oozie, Linux, Sqoop, Java, Scala, SOAP, REST, CDH5, AWS, Eclipse, Oracle, Git, Shell Scripting, AWS, Tableau
Confidential, Richmond, VA
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved with ingesting data received from various providers, on HDFS for big data operations.
- Accessed information through mobile networks and satellites from the equipment.
- Loaded and transformed large sets of structured, semi structured and un-structured data in various formats like text, zip, XML and JSON.
- Imported data using Sqoop to load data from Oracle to HDFS on regular basis or from Oracle server to Hbase depending on requirements.
- Wrote Hive queries for data analysis to meet the business requirements.
- Created Hive tables and working on them using Hive QL.
- Developed Scripts and Batch Job to schedule various Hadoop Programs.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
- Monitored the Hadoop Cluster using Cloudera Manager.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Scheduled data refresh on Tableau Server for daily, weekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: HDFS, Hive, Sqoop, Apache Spark, Tableau, Elastic Search, Kibana, Cloudera CDH 5.x, AWS
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using SQOOP and load into Hive tables, which are partitioned.
- Developed HiveUDF's to bring all the customers email id into a structured format.
- Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
- Strom and Kafka queue to send push messages to mobile devices
- Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day and used Sqoop to load data from DB2 into HBASE environment.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
- Designed, developed and maintained Big Data streaming and batch applications using Storm.
- Experienced in Core Java with strong understanding of Multithreading, Collections, Concurrency, and Exception handling concepts, Object-oriented analysis, design, and development.
- Import millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Created Hive, Phoenix, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.
- Developed UDF's using both DataFrames/SQL and RDD in Spark for data Aggregation queries and reverting back into OLTP through Sqoop.
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Environment: Hadoop, HDFS, Sqoop, Hive, HBase, Oozie, Spark, SQL, Java, Tableau, Eclipse.
Confidential, Atlanta, GA
- Ingested historical medical claim's data into HDFS from different data sources including databases, flat files and processed using spark, Scala, python
- Hive external tables were used for raw data and managed tables were used for intermediate tables.
- Developed Hive Scripts (HQL) for automating the joins for different sources.
- Responsible for data analysis, validation, cleansing, collection and reporting using R.
- Worked with GIT, Jira and Tomcat in Linux/Windows Environment.
- Experienced in Shell scripting, automating using crontab.
- Developed the Shell scripts for batch reports based on the given requirements.
- Coding using Teradata Analytical functions, BTEQ SQL of Teradata, wrote UNIX scripts to validate, format and execute the SQLs.
- Developed interactive dashboards, created various Ad hoc reports for users in Tableau by connecting various data sources.
- Implemented Classification using Supervised learning like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Performed explorative data analytics and developed interactive dashboard using tableau
- Involved in resolving defects found in testing and production support.
- Wrote Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on Oracle database.
Environment: Hadoop, Hive, Map Reduce, HDFS, SQOOP, HBase, Pig, Oozie, Java, Bash, My-SQL, Oracle, Windows and Linux.
- Involved in all the Functional requirements gathering sessions with the Business Analysts.
- Involved in writing Test Cases, Testing Application and Implementation plan.
- Design and Development of Requirements Specification document, model business process, data structures, use cases diagrams, class diagrams, activity diagrams and sequence diagrams and using UML.
- Worked on the JAVA Collections API for handling the data objects between business layers and the frontend.
- Designed and developed user interface screens using HTML, DHTML, JSP and CSS.
- Deployed applications on JBoss server.
- Involved in the Development of Spring Framework Controllers.
- Designed CSS and XSLT style sheets for transforming XML data to PDF.
- Used XML for data transfer between various parts of the application.
- Developed web based email client to send emails from application using Java Mail API.
Environment: Core Java, HTML, Java Script, JDBC, Servlets, JSP, EJB, JSP, JMS, JBoss, Tomcat, SQL Server, Eclipse