- Above 5+ years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
- Experience in working in environments using Agile (SCRUM), RUP and Test-Driven development methodologies.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing
- Extensive experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Hive, Impala & Spark
- Expertise in using J2EE application servers such as IBM WebSphere, JBossand web servers like Apache Tomcat.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
- Experience in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by using custom UDF's.
- Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, J2EE, EJB, JSP, Java Servlets, Struts, and Java database Connectivity (JDBC) technologies.
- Good understanding in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Implemented Service Oriented Architecture (SOA) using Web Services and JMS (Java Messaging Service).
- Implemented J2EE Design Patterns such as MVC, Session Façade, DAO, DTO, Singleton Pattern, Front Controller and Business Delegate.
- Experienced in developing web services with XML based protocols such as SOAP, Axis, UDDI and WSDL.
- Experienced in MVC (Model View Controller) architecture and various J2EE design patterns like singleton and factory design patterns.
- Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala), NoSQL databases like MongoDB, HBase, Cassandra.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Hands on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Good knowledge in SQL and PL/SQL to write Stored Procedures and Functions and writing unit test cases using JUnit.
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.
- Strong knowledge in Object oriented design/analysis, UML modeling, Classic design patterns, and J2EE patterns.
- Hands on experience working with databases like Oracle 12g, SQL Server 2010 and MySQL.
- Hands on experience on the entire latest UI stack including HTML, CSS, mobile friendly, responsive design, user-centric design etc.
- Experience in developing web-based enterprise applications using Java, J2EE, Servlets, JSP, EJB, JDBC, Hibernate, Spring IOC, Spring AOP, Spring MVC, Spring Web Flow, Spring Boot, Spring Security, SpringBatch, Spring Integration, Web Services (SOAP and REST) and ORM frameworks like Hibernate.
- Expertise in using XML related technologies such as XML, DTD, XSD, XPATH, XSLT, DOM, SAX, JAXP, JSON and JAXB.
- Experience in using ANT and Maven for building and deploying the projects in servers and also using Junit and log4j for debugging.
Hadoop Technologies: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper,andOozie
Languages: HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/R Studio, SAS, Schemas, JSON, Ajax, Java, Scala, Python, Shell Scripting
Big Data Platforms: Hortonworks, Cloudera.
NO SQL Databases: Cassandra, HBase, MongoDB, MariaDB
Business Intelligence Tools: Tableau server, Tableau Reader, Tableau, Splunk, SAP Business Objects, QlikView, Amazon Redshift, or Azure Data Warehouse
Development Tools: Confidential SQL Studio, IntelliJ, Eclipse, NetBeans.
Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall.
Build Tools: Jenkins, Toad, SQL Loader, Maven, ANT, RSA, Control-M, Oziee, Hue, SOAP UI
Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.
Databases: Confidential SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Teradata, Netezza
Operating Systems: All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris
Confidential, Redmond, WA
- Responsible for installation and configuration of Hive, Pig, Hbase and Sqoop on the Hadoopcluster and created hive tables to store the processed results in a tabular format.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Developed the Sqoop scripts to make the interaction between Hive and vertica Database.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Streamed AWS log group into Lambda function to create service now incident.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
- Scheduled several times based Oozie workflow by developing Python scripts.
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and MySQL.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
- Used Oozieworkflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
- Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Created partitioned tables and loaded data using both static partition and dynamic partition method.
- Developed custom Apache Spark programs in Scalato analyze and transform unstructured data.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
- Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
- Scheduled map reduces jobs in production environment using Oozie scheduler.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig
- AnalyzingHadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
- Improved the Performance by tuning of HIVE and map reduce.
- Research, evaluate and utilize modern technologies/tools/frameworks around Hadoop ecosystem.
Confidential, SaltLake, UT
- Involved in Installing, Configuring Hadoop Eco-System, Cloudera Manager using CDH3, CDH4 Distributions.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Imported data using Sqoop from Teradata using Teradata connector.
- Created Data Pipeline of Map-Reduce programs using Chained Mappers.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Visualize the HDFS data to the customer using BI tool with the help of HiveODBCDriver.
- Worked on POC of Talend integration with Hadoop where Created Talend Jobs to extract data from Hadoop.
- Imported data using Sqoop to load data fromMySQL to HDFS on a regular basis
- Worked on social media (Facebook, Twitter etc) data crawling using Java and R language and MongoDB for unstructured data storage.
- Integrated Quartz scheduler with Oozieworkflows to get data from multiple data sources parallels using a fork
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Experienced with different kind of compression techniques like LZO, GZip, Snappy.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Imported Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Worked on custom Pig Loaders and storage classes to work with a variety of data formats such as JSON and XML file formats.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop
- Developed Unit test cases using JUnit, Easy Mock,andMRUnit testing frameworks.
- Experienced in MonitoringCluster using Clouderamanager.
Environment : Hadoop, HDFS, HBase, Spark, MapReduce, Tera Data, MySQL, Java, Hive, Pig, Sqoop, Flume, Oozie, SQL, Cloudera Manager.
Confidential, Columbus, OH
- Provided application demo to the client by designing and developing a search engine, report analysis trends, application administration prototype screens using AngularJS, and Bootstrap JS.
- Took the ownership of complete application Design of Java part, Hadoop integration
- Apart from the normal requirement gathering, participated in a Business meeting with the client to gather security requirements.
- Assisted with the architect to analyze the existing system and future system Prepared design blue pints and application flow documentation
- Experienced in managing and reviewing Hadoop log files Load and transform large sets of structured, semi-structured and unstructured data
- Responsible to manage data coming from different sources and application Supported Map Reduce Programs those are running on the cluster
- Responsible for working with Message broker system such as Kafka Extracted data from mainframes and feed to KAFKA and ingested to HBase to perform Analytics
- Written event-driven, link tracking system to capture user events and feed to KAFKA to push it to HBASE.
- Created MapReduce jobs to extracts the contents from HBase and configured in OOZIE workflow to generate analytical reports.
- Developed the JAX- RS web services code using Apache CXF framework to fetch data from SOLR when the user performed the search for documents
- Participated in SOLR schema, and ingested data into SOLR for data indexing.
- Written MapReduce programs to organize the data, and ingest the data to suitable for analytics in client specified format
- Hands on experience in writing python scripts to optimize the performance Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed. Implemented Bloom filters in Cassandra using keyspace creation
- Involved in writing Cassandra CQL statements God hands-on experience in developing concurrency using spark and Cassandra together
- Involved in writing spark applications using Scala Hands on experience in creating RDDs, transformations,and Actions while implementing spark applications
- Good knowledge in creating data frames using Spark SQL. Involved in loading data into Cassandra NoSQL Database
- Implemented record level atomicity on writes using Cassandra Written PIG Scripts to query and process the Datasets to figure out the patterns of trends by applying client-specific criteria, and configured OOZIE workflows to run the jobs along with the MR jobs
- Stored the derived the results in HBasefrom analysis and make it available to data ingestion for SOLR for indexing data
- Involved in integration of java search UI, SOLR and HDFS Involved in code deployments using continuous integration tool using Jenkins
- Documented all the challenges, issues involved to deal with the security system and Implemented best practices
- Created Project structures and configurations according to the project architecture and made it available to the junior developer to continue their work
- Handled onsite coordinator role to deliver work to offshore Involved in core reviews and application lead supported activities
Environment : Cassandra, Spring 3.2, Restful services using CXF web services framework, spring data, SOLR 5.2.1, PIG, HIVE, apache AVRO, Map Reduce, Sqoop Zookeeper, SVN, Jenkins, windows AD, windows KDC, Hortonworks distribution of Hadoop 2.3, YARN, Ambari
- Involved in gathering system requirements for the application and worked with the business team to review the requirements, and went through the Software Requirement Specification document and Architecture document
- Involved in intense User Interface (UI) operations and client-side validations using AJAXtoolkit.
- Used SOAP to expose company applications as a Web Service to outside clients.
- Log package is used for the debugging.
- Used Clear Case for version control.
- Ensuring adherence to delivery schedules and quality process on projects.
- Used Web Services for creating rate summary and used WSDL and SOAP messages for getting insurance plans from the different module and used XML parsers for data retrieval.
- Developed business components and integrated those using Spring features such as Dependency Injection, Auto wiring components such as DAO layers and service proxy layers.
- Used Spring AOP to implement Distributed declarative transaction throughout the application.
- Wrote Hibernate configuration XML files to manage data persistence.
- Used TOAD to generate SQL queries for the applications, and to see the reports from log tables.
- Involved in the migration of Data from Excel, Flat file, Oracle, XML files to SQL Server by using BCP and DTS utility.
Environment : Java/J2EE, MVC Arch with CICS interaction, HTML, Axis, SOAP, Servlets, Web services, Restful Web Services, Sybase, Spring, DB2, RAD, Rational Clear case, WCF, AJAX, Toad.