- Professional experience of 7 years in IT which includes 4 years of comprehensive experience in working with Apache Hadoop Ecosystem - components, Spark streaming.
- Over 4 Years of development experience in Big Data Hadoop Ecosystem components and related tools with data ingestion, importing, exporting, storage, querying, pre-processing and analysing of big data.
- Good Working Expertise on handling Terabytes of structured and unstructured data on huge Cluster environment.
- Experience in using SDLC methodologies like Waterfall, Agile Scrum, and TDD for design and development.
- Expertise in implementing Spark modules and tuning its performance.
- Experienced in performance tuning of Spark applications using various resource allocation techniques and transformations reducing the Shuffles and increasing the Data Locality configurations.
- Expertise in Kerberos Security Implementation and securing the cluster.
- Expertise in creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in Havel and experience in data transformation & file processing, building analytics using Pig Latin Scripts.
- Expertise in writing custom UDFs in Pig & Hive Core Functionality.
- Developed, deployed and supported several Map Reduce applications in Java to handle different types of data.
- Worked with various compression techniques like Avro, Snappy, and LZO.
- Hands on experience dealing with AVRO and Parquet file format, following best Practices and improving the performance using Partitioning, Bucketing, and Map side-joins and creating Indexes.
- Expert in implementing advanced procedures like text analytics, processing and implementing streaming APIs using the in-memory computing capabilities like Apache Spark written in Scala, Python and Scala.
- Experience in Data Load Management, importing and exporting data from HDFS to Relational and non- Relational Database Systems using Sqoop, Flume and Apache Nifi by efficient column mappings and maintaining the uniformity.
- Exported data to various Databases like Teradata (Sales Data Warehouse), SQL-Server, Cassandra using Sqoop.
- Experienced in creating shell scripts to push data loads from various sources from the edge nodes onto the HDFS.
- Experienced in performing code reviews, involved closely in smoke testing sessions, retrospective sessions.
- Experience in scheduling and monitoring jobs using Oozie and Crontab.
- Experienced in Microsoft Business Intelligence tools, developing SSIS (Integration Service), SSAS (Analysis Service) and SSRS (Reporting Service), building Key Performance Indicators and OLAP cubes.
- Have hands on experience in creating cubes for various reports for end clients, Configured Data Source and Data Source Views, Dimensions, Cubes, Measures, Partitions, KPI’s and MDX Queries.
- Have good exposure with the star, snow flake schema, data modelling and work with different data warehouse projects.
- Hands on working with the reporting tool Tableau, creating dashboards attractive dashboards and worksheets.
- Extensive work experience with Java/J2EE technologies such as Servlets, JSP, EJB, JDBC, JSF, Struts, spring, SOA, AJAX, XML/XSL, Web Services (REST, SOAP), UML, Design Patterns and XML Schemas.
- Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle10g, MySQL, MS SQL Server & PL/SQL.
- Experience in JAVA, J2EE, WEB SERVICES, SOAP, HTML and XML related technologies.
- Have closely worked with the technical teams, business teams and product owners.
- Strong analytical and problem-solving skills and ability to follow through with projects from inception to completion.
- Ability to work effectively in cross-functional team environments, excellent communication and interpersonal skills.
Hadoop/BigData Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Ambary, Storm, Spark and Kafka
No SQL Database: HBase, Cassandra, MongoDB
Monitoring and Reporting: Tableau, Custom Shell Scripts
Hadoop Distribution: Horton Works, Cloudera, MapR
Build Tools: Maven, SQL Developer
Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/REST services
Databases: Oracle, MY SQL, MS SQL server, Vertica, Teradata
Analytics Tools: Tableau, Microsoft SSIS, SSAS and SSRS
IDE Dev. Tools: Eclipse 3.5, Net Beans, My Eclipse, Oracle, JDeveloper 10.1.3, SOAP UI, Ant, Maven, RAD
Operating Systems: Linux, Unix, Windows 8, Windows 7, Windows Server 2008/2003
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Ambary, Storm, Spark and Kafka, Apache Nifi
Network protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Confidential, New Jersey
- Processed Big Data using a Hadoop cluster consisting of 40 nodes.
- Designed and configured Flume servers to collect data from the network proxy servers and store to HDFS.
- Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
- Built data pipeline using Pig and Java Map Reduce to store onto HDFS.
- Applied transformations and filtered both traffic using Pig.
- Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
- Performed unit testing using MR Unit.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Experience in design and develop the POC in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Consumed the data from Kafka using Apache spark.
- Performed various benchmarking steps to optimize the performance of spark jobs and thus improve the overall processing.
- Used Spark API over Horton work Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Responsible for building scalable distributed data solutions using Hadoop
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
- Setup and benchmarked Hadoop/HBase clusters for internal use
- Developed Simple to complex Map/reduce Jobs using Hive and Pig
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
Environment: Big Data Horton Work, Apache Hadoop, Hive, Hue Tool, Zookeeper, Map Reduce, Sqoop, crunch API, Pig 0.10 and 0.11, HCatalog, Unix, Java, JSP, Eclipse, Maven, Oracle, SQL Server, MYSQL.
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Importing the unstructured data into the HDFS using Flume.
- Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
- Written Map Reduce java programs to analyze the log data for large-scale data sets.
- Involved in using HBase Java API on Java application.
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System, Horton work
- Implemented Map Reduce jobs using Java API and PIG Latin as well HIVEQL
- Participated in the setup and deployment of Hadoop cluster, Horton work.
- Hands on design and development of an application using Hive (UDF).
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Provide support data analysts in running Pig and Hive queries.
- Involved in HiveQL and Involved in Pig Latin.
- Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
- Configured HA cluster for both Manual failover and Automatic failover.
- Excellent working Knowledge in Spark Core, Spark SQL, Spark Streaming.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters
- Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
- Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Experience in writing SOLR queries for various search documents
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
- Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
Environment: Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6,HDFS, Flume, Oozie, DB2, HBase, Mahout
Jr. Hadoop Developer
- Supported and monitored Map Reduce Programs running on the cluster.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines to develop programs.
- Configured the Hadoop cluster with Name-node, Job-tracker, and Task-trackers on slave nodes and formatted HDFS.
- Used Oozie workflow engine to schedule and execute multiple Hive, Pig and Spark jobs by passing arguments.
- Involved in creating Hive Tables, loading data and writing Hive queries to invoke and run Map Reduce jobs in the backend.
- Designed and implemented Incremental Imports into Hive tables using Sqoop and cleaning the staging tables and files.
- Involved in collecting, pre-processing, aggregating and moving data from servers to HDFS using Apache Flume.
- Developed multiple Map Reduce jobs in python for data cleaning and preprocessing.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
- Used Partitioning and Bucketing in Hive tables to increase the performance and parallelism.
- Experience in writing Map Reduce programs in python to cleanse Structured and unstructured data.
- Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
- Implemented Hive and Pig scripts to analyze large data sets.
- Loaded and transformed large sets of structured, semi structured and unstructured data onto HBase column families.
- Created HBase tables to store data coming from different portfolios and created Bloom filters on column families for efficiency.
- Worked on improving the performance of existing Pig and Hive Queries by optimization.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Deployed and worked with Apache Solr search engine server to help speed up the search of the sales and production data.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the data pertaining to different states across northern-USA.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig Latin scripts and store these intermediate results for further analytics.
- Migrated ETL jobs to Pig scripts to perform the Transformations, joins and pre-aggregations before storing data onto HDFS.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop
- Co-ordinate with offshore and onsite team to understand the requirements to propagate and prepare design documents from the requirements specification and architectural designs.
- Worked with application teams to install Operating Systems, Hadoop updates, patches, and version upgrades as required.
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class models, Sequence diagrams, and Activity diagrams for SDLC process of the application.
- Configured the project on WebSphere 6.1 application servers
- Implemented the online application by using Core Java, Jdbc, JSP, Servlets and EJB 1.1, WebServices, SOAP, WSDL
- Used Log4J logging framework to write Log messages with various levels.
- Involved in fixing bugs and minor enhancements for the front-end modules.
- Performed live demos of functional and technical reviews.
- Maintenance in the testing team for System testing/Integration/UAT.
- Guaranteeing quality in the deliverables to the product owners and business team.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Implemented Action Classes and Server-side validations for account activity, registration and Transaction’s history.
- Designed user-friendly GUI interface and Web pages using HTML, CSS, Struts, JSP.
- Involved in writing Client-Side Scripts using Java Scripts and Server-Side scripts using Java Beans.
- Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
- Prepared the High and Low-level design document and Generating Digital Signature
- For analysis and design of application created Use Cases, Class and Sequence Diagrams.
- For the registration and validation of the enrolling customer developed logic and code.
- Developed web-based user interfaces using struts framework.
- Coded and deployed JDBC connectivity in the Servlets to access the Oracle database tables on Tomcat web-server.
- Involved in integration of various Struts actions in the framework.
- Used Validation Framework for Server-side Validations
- Created test cases for the Unit and Integration testing.
- Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.
Environment: Java Servlets, JSP, Java Script, XML, HTML, UML, Apache Tomcat, JDBC, Oracle, SQL, Log4j