- Over 9+ years of experience in IT in fields of software design, implementation, and development. 5+Years of experience in Linux and Big dataHadoop,HadoopEcosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Oozie, and Zookeeper.
- Having good experience inHadoopframework and related technologies like HDFS, MapReduce, Pig, Hive, HBase, Sqoop and Oozie.
- Hands of experience on data extraction, transformation and load in Hive, Pig and HBase
- Experience in the successful implementation of ETL solution between an OLTP and OLAP database in support of Decision Support Systems with expertise in all phases of SDLC.
- Experience in creating Dstreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- BI tools (Spotfire, Crsytal Reports,Lumira, Tableau) integration withHadoop.
- Worked on improving the performance and optimization of the existing algorithms inHadoopusing Spark context, Spark - SQL, Data Frames, RDD's, Spark YARN.
- Delivery experience on majorHadoopecosystem Components such as Pig, Hive, Spark, Kafka, Elastic Search & HBase and monitoring with Cloudera Manager. Extensive working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Procedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin and custom MapReduce programs in Java.
- Experienced data pipelines using Kafka andAkkafor handling large terabytes of data.
- Hands on experience onSolrto Index the files directly from HDFS for both Structured and Semi Structured data.
- Strong experience in RDBMS technologies like MySQL, Oracle,Postgresand DB2.
- Training and Knowledge in Mahout, Spark MLlib for use in data classification, regression analysis, recommendation engines and anomaly detection.
- Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
- Involved in configuring and working with Flume to load the data from multiple sources directly intoHDFS.
- Hand on experience and knowledge in Lumira, Regex, Sed, Maven, Log4j, Junit and Ant.
- Ha nds-on experience with Hortonworks & Cloudera DistributedHadoop(CDH).
- Worked with the ApacheNififlow to perform the conversion of Raw XML data into JSON, AVRO.
- Experience in understanding security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience on predictive intelligence and smooth maintenance in spark streaming is done using Conviva and MLlib from Spark.
- Experience in configuring deployment environment to handle the application usingJettyserver and WebLogic 10 and Postgres database at the back-end.
- Involved in installing Cloudera distribution ofHadoopon amazonEC2 Instances.
- Expertise on SPARK engine creating batch jobs with incremental load through HDFS/S3,KINESIS, Sockets, AWS etc.,
- Imported data using Sqoop to load data from MySQL to S3Buckets on regular basis.
- Hands on experience in using BI tools like Splunk/Hunk.
- Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Experience object oriented programming (OOP) concepts usingPython, C++ and PHP.
- Experience in deployment of Bigdata solutions and the underlying infrastructure of Hadoop Cluster using Cloudera, MapR and Hortonworks distributions.
- Experience in importing and exporting data using Sqoop from HDFS to RelationalDatabaseSystems and vice-versa
- Analyzed the SQL scripts and designed the solution to implement usingPYSPARK.
- Experience of MPP databases such as HP Vertica and Impala.
- Hands on experience in the SVN and GitHub.
- Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs,Pythonand Scala.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Hands on experience on implementation projects like Agile and Waterfall methodologies.
- Experienced in integration ofHadoopcluster with Spark engine to perform BATCH and GRAPHX operations.
- Experienced the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
- Performed analytics in Hive using various files format like JSON, Avro, ORC, andParquet.
- Working knowledge of database such as Oracle 8i/9i/10g, MicrosoftSQLServer, DB2, Netezza.
- Experience in NoSqlDataBases like HBase, Cassandra, Redis and MongoDB.
- Experience and hands-on knowledge in Akka andLIFTFramework.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UMLMethodology, good knowledge of J2EE design patterns and Core Java design patterns.
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
No SQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Development Methodology: Agile, waterfall
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac os and Windows Variants
Data analytical tools: R and MATLAB
ETL Tools: Talend, Pentaho
Confidential, Mclean, VA
Sr. Hadoop Developer
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Spark and Shell scripts (for scheduling of jobs) Extracted and loaded data into Data Lake environment.
- Development of Sparkjobs for Data cleansing and Data processing of flat files.
- Worked on Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Worked with different File Formats like TEXTFILE, SEQUENCEFILE, AVROFILE, ORC, and PARQUET for Hive querying and processing.
- UsedSpark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- DevelopedSparkApplications in Scala and build them using SBT.
- UsedSparkAPI over ClouderaHadoopYARN to perform analytics on data in Hive.
- Developed Scala scripts, UDAFs using both Data frames/SQL/Data sets and RDD/Map Reduce inSpark1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
- Performance tuning ofSparkApplications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms inHadoopusingSparkContext,Spark -SQL, Data Frames and Pair RDD's.
- Handled large datasets using Partitions,Sparkin Memory capabilities, Broadcasts inSpark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Designed, developed and did maintenance of data pipelines in aHadoopand RDBMS environment with both traditional and non-traditional source systems using RDBMS and NoSQL data stores for data access and analysis.
- Experience in working with Hadoop 2.x version and Spark 2.x (Python and Scala).
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Worked on Cluster of size 400 nodes.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Installation & configuration of ApacheHadoopon Amazon AWS (EC2) system.
- Involved in creating Hive tables, loading and analyzing data using hive queries.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Used Talend Open Studio for getting the data.
- Worked on Continuous Integration of application using Jenkins.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.
Environment: HadoopYARN,Spark -Core,SparkStreaming,Spark SQL, Scala, Kafka, Hive, HBase, Pig, Sqoop, MapR, Amazon AWS, Impala, Cassandra, Tableau, Oozie, Jenkins, Talend, Cloudera, Oracle 12c, RedHat Linux, Python language.
Confidential, Jacksonville, Florida
- Responsible for building scalable distributed data solutions using Hadoop.
- This project will download the data that was generated by sensors from the cars activities, the data will be collected in to the HDFS system online aggregators by Kafka.
- Experience in creating Kafka producer and Kafka consumer for Spark streaming which gets the data from different learning systems of the patients.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Experience in AWS to spin up the EMR cluster to process the huge data which is stored in S3 and push it to HDFS.
- Implemented Spark SQL to access hive tables into spark for faster processing of data.
- Involved in Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
- Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
- Used Apache Oozie for scheduling and managing the Hadoop Jobs. Extensive experience with Amazon Web Services (AWS).
- Developed Python/Django application for Google Analytics aggregation and reporting.
- Developed and updated social media analytics dashboards on regular basis.
- Good understanding of NoSQL databases such as HBase, Cassandra and MongoDB.
- Supported Map Reduce Programs running on the cluster and wrote custom Map Reduce Scripts for Data Processing in Java.
- Worked with Apache Nifi for Data Ingestion. Triggered the shell Script and Schedule them using Nifi.
- Monitoring all the Nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files by using some piggybank.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS. Used Flume to stream through the log data from various sources.
- Using Avro file format compressed with Snappy in intermediate tables for faster processing of data. Used parquet file format for published tables and created views on the tables.
- Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, test and prod environment.
- Implemented test scripts to support test driven development and continuous integration.
- Good understanding of ETL tools and how they can be applied in a Big Data environment.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Environment: Hadoop, Map Reduce, Cloudera, Spark, Kafka, HDFS, Hive, Pig, Oozie, Scala, Eclipse, Flume, Oracle, UNIX Shell Scripting.
Confidential, Atlanta, GA
- Implemented authentication and authorization service using Kerberos authentication protocol.
- Worked with different teams to install operating system,Hadoopupdates, patches, version upgrades of Cloudera as required.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig and Sqoop, Hive, Spark and Zookeeper.
- Developed data pipeline using Flume, Pig and Java map reduce to ingest claim data into HDFS for analysis.
- Experience in analyzing log files forHadoopand ecosystem services and finding root cause.
- Experience monitoring and troubleshooting issues with hosts in the cluster regarding memory, CPU, OS, storage and network
- Configured Hive meta store with MySQL, which stores the metadata for Hive tables.
- Experience in scheduling the jobs through Oozie.
- Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
- Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring.
- Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig.
- Involved in Setup and benchmark ofHadoopHBase clusters for internal use.
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
- Exported the result set from HIVE to MySQL using Shell scripts.
- Actively involved in code review and bug fixing for improving the performance.
- Successful in creating and implementing complex code changes.
Environment: Hadoop, Cloudera 5.4, Java, HDFS, MapReduce, Pig, Hive, Impala, Sqoop, Flume, Kafka, Kerberos, Sentry, Oozie, HBase, SQL, Spring, Linux, Eclipse.
- Worked as software developer for ECIL on developing a supply chain management system.
- The application involved tracking invoices, raw materials and finished products.
- Gathered user requirements and specifications.
- Developed and programmed the required classes in Java to support the User account module.
- Developed Servlets to handle the requests, perform server side validation and generate result for user.
- Used Java script for client side validations.
- Developed SQL queries to store and retrieve data from database & used PL SQL.
- Used Struts framework to maintain MVC and created action forms, action mappings, DAOs, application properties for Internationalization etc.
- Used Struts Validation frame work to do business validation from server side.
- Involved in developing business components using EJB Session Beans and persistence using EJB Entity beans.
- Involved in managing Business delegate to maintain decupling between presentation & Business layers.
- Used JMS for Asynchronous messaging.
- Used Eclipse IDE to develop the application
- Involved in fixing defects & tracked them using QC & Provided support and maintenance and customization
- Developing customized reports and Unit Testing using JUnit.
- Used JDBC interface to connect to database.
- Performed User Acceptance Test.
- Deployed and tested the web application on WebLogic application server.
- Involved in various SDLC phases like Design, Development and Testing.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Used various CoreJavaconcepts such as Exception Handling, Collection APIs to implement various features and enhancements.
- Developed server side components servlets for the application.
- Involved in coding, maintaining, and administering Servlets and JSP components to be deployed on a Web Sphere application server.
- Used automated test scripts and tools to test the application in various phases. Coordinated with Quality Control teams to fix issues that were identified.
- Implemented Hibernate ORM to Map relational data directly tojavaobjects
- Worked with Complex SQL queries, Functions and Stored Procedures.
- Involved in developing spring web MVC framework for portals application.
- Implemented the logging mechanism using log4j framework.
- Developed REST API, Web Services.
- Wrote test cases in JUnit for unit testing of classes.
- Used Maven to build the J2EE application.
- Used SVN to track and maintain the different version of the application.
- Involved in maintenance of different applications with onshore team.
- Good working experience in Tapestry processing claims.
- Working experience with professional billing claims.
Environment: Java, Spring Framework, Struts, Hibernate, RAD, SVN, Maven, Web Sphere Application Server, Web Services, Oracle Database 11g, IBM MQ, JMS, HTML, Java script, XML, CSS, REST API.