- Software professional having around 7+ years of Industry Experience as a Big Data/Hadoop/Spark/Java/J2EE Technical Consultant.
- In depth experience of Hadoop Architecture and its components such as HDFS, Yarn, Resource Manager, Node Manager, Job History Server, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.
- Expertise in writing Hadoop Jobs for analyzing data using MapReduce, Hive and Pig running on Yarn.
- Worked on real - time, in-memory tools such as Spark, Impala and integration with BI Tools such as Tableau.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa
- Experienced in extending Hive and Pig core functionality by writing custom UDFs and Map Reduce Scripts using Java & Python.
- In depth knowledge on Big Data Stack like Hadoop ecosystem Hadoop, Map Reduce,YARN, Hive, Hbase, Sqoop, Flume, Kafka, Spark, Spark Data Frames, Spark SQL, Spark Streaming, etc.
- Experience of NoSQL databases such as HBase, Cassandra and MongoDB.
- Worked on setting up Apache NiFi and performing POC with NiFi in orchestrating a data pipeline.
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Experienced in developing applications using all Java/J2EE technologies like Servlets, JSP, EJB, JDBC etc.
- Experienced in working with amazon AWS/EC2/EMR/S3.
- Experience in infrastructure automation using Chef & Dockers.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Experienced working on Continuous integration & build tools such as Jenkins and GIT, SVN for version control.
- Experienced in developing applications using HIBERNATE (Object/Relational mapping framework).
- Experienced in developing Web Services using JAX-RPC, SOAP and WSDL.
- Thorough knowledge and experience of XML technologies (DOM, SAX parsers), and extensive experience with XPath, XML schema, XML SPY.
- Good knowledge of AWS services like EC2, Elastic Load-balancers, Elastic Container Service, S3, Elastic Beanstalk, Cloud Front, Elastic File system, RDS, Dynamo DB, DMS, VPC, Direct Connect, Route53, Cloud Watch, Cloud Trail, Cloud Formation, IAM, EMR, Elastic Search.
- Ability to learn and adapt quickly and to correctly apply new tools and technology.
- Knowledge of administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig.
- Experience in developing solutions to analyze large data sets efficiently
- Experience in Data Warehousing and ETL processes.
- Knowledge of Star Schema Modeling, and Snowflake modeling, FACT and Dimensions tables, physical and logical modeling.
- Strong database modeling, SQL, ETL and data analysis skills.
Hadoop Core Services: HDFS, Map Reduce, Spark, YARN
Hadoop Distribution: Horton works, Cloudera, Apache,Ambari, Cloudera Manager
Hadoop technologies/Bigdata: HDFS, YARN, Map Reduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Zookeeper, Oozie, Elastic Search
NO SQL Databases: HBase, Cassandra, MongoDB
Hadoop Data Services: Hive, Pig, Sqoop, Flume
Hadoop Operational Services: Zookeeper, Oozie
Monitoring Tools: Cloudera Manager
Cloud Computing Tools: Amazon AWS EC2, S3, IAM, Glacier, CloudFront, EMR
Languages: C, Java/J2EE, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting, Scala, C, C++, Shell Scripting
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts Core JDBC,JNDI, Hibernate, JMS, EJB, RESTful, SOAP
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat
Databases: Oracle, MySQL, Postgress, Teradata
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans
Development methodologies: Agile/Scrum
Visualization and analytics tool: Tableau Software, Qlik View
ETL Tools: Infomatica, Talend
Confidential,New York city NY
Sr. Hadoop/Spark developerResponsibilities:
- Ingested data from various data sources into Hadoop HDFS/Hive Tables and managed data pipelines in providing DaaS (Data As Service) to business/data scientists for performing the analytics.
- Worked on real-time data processing using Spark/Storm and Kafka using Scala.
- Developed web services using Scala in building stream data platform.
- Used Cassandra to store billions of records to enable faster & efficient querying, aggregates & reporting.
- Worked on writing CQL queries in retrieving data from Cassandra.
- Importing and exporting data into HDFS and Hive using Sqoop
- Worked on Installation and configuring of Zookeeper to co-ordinate and monitor the cluster resources.
- Responsible for loading data files from various external sources like Oracle, MySQL into staging area in MySQL databases.
- Worked on POC’ Confidential with Apache Spark using Scala to implement spark in project.
- Exported the aggregated data onto RDBMS using Sqoop for creating dashboards in the Tableau.
- Used AWS S3 for data storage and EMR cluster for processing various jobs.
- Primary responsibilities include building scalable distributed data solutions using Hadoop Ecosystem
- Used datameer for integration with Hadoop and other sources such as RDBMS (Oracle), SAS, Teradata and Flat files.
- Wrote Hive and Pig Scripts to analyze customer satisfaction index, sales patterns etc.
- Extended Hive and Pig core functionality by writing custom UDFs using Java.
- Worked on writing Scala programs using Spark/Spark-SQL in performing aggregations..
- Orchestrated Sqoop scripts, pig scripts, hive queries using oozie workflows.
- Worked on Data Lake architecture to build a reliable, scalable, analytics platform to meet batch, interactive and on-line analytics requirements.
- Integrated Tableau with Hadoop data source for building dashboard to provide various insights on sales of the organization.
- Worked on Spark in building BI reports using Tableau. Tableau was integrated with Spark using Spark-SQL.
- Worked on writing cluster automation using Chef & Docker.
- Worked on setting up Apache NiFi and performed POC using NiFi in orchestrating data flows.
- Participated in daily scrum meetings and iterative development.
Technology: Spark, Spark-Streaming, Spark-SQL, NiFi, Scala, Play framework, Hadoop, MapReduce, Yarn, Shark, Hive, Pig, Sqoop, Storm, Kafka, HBase, Cassandra, Tableau, Datameer, Ambari, Chef, Docker, AWS, EC2, S3, EMR, Oracle, Teradata, SAS, Java 7.0, Scala, Ruby, Python, PySpark(NumPy, PsiPy), Log4J, Junit, MRUnit, Jenkins, Maven, GIT, SVN, JIRA.
Sr. Hadoop/Scala DeveloperResponsibilities:
- Analyzing the business requirements and doing the GAP analysis then transforming them to detailed design specifications.
- Performed Code Reviews and responsible for Design, Code and Test signoff.
- Assigning work to the team members and assisting them in development, clarifying on design issues and fixing the issues.
- Research and recommend various tools and technologies on Hadoop stack considering the workloads of the organization.
- Performed various POC’ Confidential in data ingestion, data analysis and reporting using Hadoop, MapReduce, Hive, Pig, Sqoop, Flume, Elastic Search.
- Installed and configured Hadoop.
- Developed multiple MapReduce jobs using java for data cleaning and preprocessing.
- Installed and configured Pig and also written PigLatin scripts.
- Imported/Exported data using Sqoop to load data from Teradata to HDFS/Hive on regular basis.
- Written Hive queries for ad-hoc reporting to the business.
- Experienced in defining job flows using Oozie.
- Worked on Scala programming in developing spark streaming jobs for building stream data platform integrating with Kafka.
- Worked on processing streaming data from kafka topics using Scala and ingest the data into Cassandra.
- Setup and benchmarked Hadoop clusters for internal use.
- Involved in managing and reviewing Hadoop log files.
- Responsible for Analyzing, designing, developing, coordinating and deploying web based application.
- Developed Web Services using JAX-RPC, JAXP, WSDL, SOAP, XML to provide facility to obtain quote, receive updates to the quote, customer information, status updates and confirmations.
- Extensively used SQL queries, PL/SQL stored procedures & triggers in data retrieval and updating of information in the Oracle database using JDBC.
- Expert in writing, configuring and maintaining the Hibernate configuration files and writing and updating Hibernate mapping files for each Java object to be persisted.
- Expert in writing Hibernate Query Language (HQL) and Tuning the hibernate queries for better performance.
- Used the design patterns such as Session Façade, Command, Adapter, Business Delegate, Data Access Object, Value Object and Transfer Object.
- Involved in application performance tuning and fixing bugs.
Technology: Scala, Hadoop, MapReduce, Yarn, Hive, Pig, Sqoop, Flume, Elastic Search, Cloudera Manager, Java, J2EE, Webservices, Hibernate, Struts, JSP, JDBC, XML, Weblogic Workshop, Jenkins, Maven.
- Written Map-Reduce code to process all the log files with rules defined in HDFS (as log files generated by different devices have different xml rules).
- Created Hadoop design which replicates the Current system design
- Developed and designed application to process data using Spark.
- Developed MapReduce jobs, Hive & PIG scripts for Data warehouse migration project.
- Developed and designed system to collect data from multiple portal using kafka and then process it using spark.
- Developing MapReduce jobs, Hive & PIG scripts for Risk & Fraud Analytics platform.
- Developed Data ingestion platform using Sqoop and Flume to ingest Twitter and Facebook data for Marketing & Offers platform.
- Developed and designed automate process using shell scripting for data movement and purging.
- Installation & Configuration Management of a small multi node Hadoop cluster.
- Installation and configuration of other open source software like Pig, Hive, Flume, Sqoop.
- Developed programs in JAVA, Scala-Spark for data reformation after extraction from HDFS for analysis.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Importing and exporting data into Impala, HDFS and Hive using Sqoop.
- Responsible to manage data coming from different sources.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Developed Hive tables to transform, analyze the data in HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way.
- Developed Simple to Complex Map Reduce Jobs using Hive and Pig.
- Involved in running Hadoop Jobs for processing millions of records of text data.
- Developed the application by using the Struts framework.
- Created connection through JDBC and used JDBC statements to call stored procedures.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed the Pig UDF’ Confidential to pre-process the data for analysis.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Moved all RDBMS data into flat files generated from various channels to HDFS for further processing.
- Developed job workflows in Oozie to automate the tasks of loading the data into HDFS.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
- Writing the script files for processing data and loading to HDFS.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java (jdk1.7), Flat files, Oracle 11g/10g, PL/SQL, SQL*PLUS, Windows NT, Sqoop.
Sr. Hadoop DeveloperResponsibilities:
- Processed data into HDFS by developing solutions.
- Analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Created Hive tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to the relational database MySQL using Sqoop for visualization and to generate reports.
- Created HBase tables to load large sets of structured data.
- Managed and reviewed Hadoop log files.
- Used Sqoop to transfer data between databases (Oracle & Teradata) and HDFS and used Flume to stream the log data from servers.
- Involved in providing inputs for estimate preparation for the new proposal.
- Worked extensively with HIVE DDLs and Hive Query language (HQLs).
- Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMs.
- Created Map Reduce Jobs to convert the periodic of XML messages into a partition avro Data.
- Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various.
- Used different file formats like Text files, Sequence Files, Avro.
- Cluster co-ordination services through Zookeeper.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
- Assisted in Cluster maintenance, cluster monitoring, adding and removing cluster nodes and
- Trouble shooting.
- Installed and configured Hadoop, Map Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, HBase, Shell Scripting, Oozie, Oracle 11g.
- Involved in importing data from Microsoft SQLServer, MySQL, Teradata into HDFS using Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Used Hive to analyze the partitioned and bucked data to compute various metrices of reporting.
- Involved in creating Hive tables loading data, and writing queries that will run internally in MapReduce
- Involved in creating Hive External tables for HDFS data.
- Solved performance issues in Hive and Pig Scripts with understanding of Joins, Group and Aggregation and perform the MapReduce jobs.
- Primary responsibilities include building scalable distributed data solutions using Hadoop Ecosystem
- Used Spark for transformations, event joins and some aggregations before storing the data into HDFS.
- Conducted data extraction that may include analyzing, reviewing, modeling based on requirements using higher Level Tools such as Hive and Pig.
- Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being reported.
- Developed Map Reduce programs for preprocessing and cleansing the data in HDFS obtained from Heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Worked on the Oozie workflow to run multiple Hive and Pig jobs.
- Involved in creating Hive UDF' Confidential .
- Developed Automated shell script to execute Hive Queries.
- Involved in processing ingested raw data using Apache Pig.
- Monitored continuously and managed the Hadoop cluster using cloudera manager.
- Worked on different file formats like JSON, AVRO, ORC, Parquet and Compression like Snappy, zlib, ls4 etc.
- Executed HiveSQL in Spark using SparkSQL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, scala.
- Gained Knowledge in creating Tableau dashboard for reporting analyzed data.
- Expertise with NoSQL databases like HBase.
- Implemented Streaming data Ingestion using Kafka.
- Developed Oozie workflow for scheduling and for Visualization we used Tableau.
- Involved in managing and reviewing the Hadoop log files.
- Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.
Environment: HDFS, Oozie, Teradata, MapReduce, Sqoop, Hive, Pig, Oozie, MySQL, Eclipse, Git, GitHub, Jenkins.
- Involved in analysis and design phase of Software Development Life cycle (SDLC).
- Involved in reading & generating pdf documents using ITEXT. And also merge the pdfs dynamically
- Involved in working with J2EE Design patterns (Singleton, Factory, DAO, and Business Delegate) and Model View Controller Architecture with JSF and Spring DI.
- The CMS and Server side interaction was developed using Web services and exposed to the CMS using JSON and JQuery.
- Designed and developed Struts like MVC 2 Web framework using the front-controller design pattern, which is used successfully in a number of production systems.
- Worked on Java Mail API. Involved in the development of Utility class to consume messages from the message queue and send the emails to customers.
- Developing Web applications using Java, J2EE, Struts and Hibernate.
- Installed, Configured and administered WebSphere ESB v6.x
- Involved in Using Java Message Service (JMS) for loosely coupled, reliable and asynchronous exchange of patient treatment information among J2EE components and legacy system
- Normalized Oracle database, conforming to design concepts and best practices.
- Used JUnit framework for unit testing and Log4j to capture runtime exception logs.
- Performed Dependency Injection using spring framework and integrated with Hibernate and Struts frameworks.
- Hands on experience creating shell and perl scripts for project maintenance and software migration. Custom tags were developed to simplify JSP applications.
- Applied design patterns and OO design concepts to improve the existing Java/JEE based code base.
- Used Validator framework of the Struts for client side and server side validation.
- Involved in developing web services using Apache XFire & integrated with action mappings.
- Used JAXP for parsing & JAXB for binding.
- Deployed EJB Components on Web Logic, Used JDBC API for interaction with Oracle DB.
- Involved in Transformations using XSLT to prepare HTML pages from xml files.
- Written ANT Scripts for project build in LINUX environment.
Technology: Java 1.4, J2EE (EJB, JSP/Servlets, JDBC, XML), Day CMS, XML, My Eclipse, Tomcat, Resin, Struts, iBatis, Web logic App server, DTD, XSD, XSLT, Ant, SVN, HTML, JS, AJAX, Servlets, JSP, XML, XSLT, XPATH, XQuery, WSDL, SOAP, REST, JAX-RS, JERSEY, JAX-WS, Web Logic server 10.3.3, JMS, ITEXT, Eclipse, JUNIT, Star Team, JNDI