- Over 7+ years of work experience in IT field, involved in all phases of software development lifecycle while working in different projects
- Very strong experience in processing, analyzing large sets of structured, semi - structured and unstructured data and supporting systems application architecture
- Extensive experience in writing Hadoop jobs for data analysis as per the business requirements using Hive and Pig
- Expertise in creating Hive Internal/External Tables/Views using shared Meta store
- Developed custom UDFs in Pig and Hive to extend their core functionality
- Hands on experience in transferring incoming data from various application servers into HDFS, Hive, HBase using Apache Flume
- Stored Data in Vertica EDW
- Have experience of working on Snow - flake and Vertica data warehouse.
- Worked extensively on SQOOP to import and export data from RDBMS to HDFS and vice-versa
- Performed Data Ingestion from multiple disparate sources and systems using Kafka
- Proficient in big data ingestion and streaming tools like Apache Flume, Sqoop, Kafka, Storm and Spark.
- Experience of working on data formats like Avro, Parquet
- Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement
- Good experience of AWS Elastic Block Storage (EBS), different volume types and use of various types of EBS volumes based on requirement
- Experience creating real-time data streaming solutions using Apache Spark core, Spark SQL, Kafka, spark streaming and Apache Storm.
- Worked on Oozie to manage and schedule the jobs on Hadoop cluster
- Implemented AWS provides a variety of computing and networking services to meet the needs of applications
- Knowledge of developing analytical components using Scala
- Experience in managing and reviewing Hadoop log files
- Worked with NoSQL database HBase to create tables and store data
- Experience in setting up Hive, Pig, HBase, and SQOOP on Ubuntu Operating system
- Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle 10g, MySQL, MS SQL Server & PL/SQL
- Proficient in using data visualization tools like Tableau, Raw and MS Excel
- Developed applications using Java, RDBMS and UNIX Shell scripting
- Experience of working on Servlets, JSP, JSF, Spring, Hibernate, JPA and JDBC
- Experience in developing web interfaces using technologies like XML, HTML, DHTML and CSS
- Implemented functions, stored procedures, triggers using PL/SQL
- Good understanding of ETL processes and Data warehousing
- Strong experience in writing UNIX shell scripts
- Working in different projects provided exposure and good understanding of different phases in SDLC
Hadoop/Big Data: Hadoop 1x/2x(Yarn), HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Kafka, Spark, Storm, Zookeeper, Scala, Oozie, Ambari, Tez, R
Development Tools: Eclipse, IBM DB2 Command Editor, TOAD, SQL Developer, VM Ware
Programming/Scripting Languages: Java, C++, Unix Shell Scripting, Python, SQL, Pig Latin, Hive QL
Databases: Oracle 11g,10g,9i, MySQL, SQL Server 2005,2008, PostgreSQL& DB2
NoSQL Databases: HBase, Cassandra, Mongo DB
Visualization: Tableau, Raw and MS Excel
Frameworks: Hibernate, JSF 2.0, Spring
Version Control Tools: Sub Version (SVN), Concurrent Versions System (CVS) and IBM Rational Clear Case
Methodologies: Agile/ Scrum, Waterfall
Operating Systems: Windows, Unix, Linux and Solaris
- Using Sqoop to import and export data from Oracle and DB2 into HDFS so as to use it for the analysis
- Migrated Existing MapReduce programs to Spark Models using Python.
- Migrating the data from Data Lake (hive) into s3 Bucket.
- Done data validation between data present in data lake and s3 bucket.
- Used Spark Data Frame API over Cloudera platform to perform analytics on hive data.
- Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
- Have experience of working on Snow - flake data warehouse.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
- Designed custom Spark REPL application to handle similar datasets
- Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation
- Performed Hive test queries on local sample files and HDFS files
- Used Kafka for real time data ingestion.
- Created different topic for reading the data in Kafka
- Read data from different topics in Kafka.
- Moved data from s3 bucket to snowflake data warehouse for generating the reports.
- Written Hive queries for data analysis to meet the business requirements
- Migrated an existing on-premises application to AWS.
- Created Hive tables and worked on them using Hive QL
- Assisted in loading large sets of data (Structure, Semi Structured, and Unstructured) to HDFS
- Extensively used Pig for data cleaning and optimization
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Used AWS services like EC2 and S3 for small data sets.
- Developed the application on IntelliJ IDE
- Create data Frames using Scala.
- Developed Hive queries to analyze data and generate results
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Used Scala to write code for all Spark use cases.
- Analyzed user request patterns and implemented various performance optimization measures including implementing partitions and buckets in HiveQL
- Assigned name to each of the columns using case class option in Scala.
- Developed multiple Spark Sql jobs for data cleaning
- Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Created many Spark UDF and UDAFs in Hive for functions that were not preexisting in Hive and Spark Sql.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning and bucketing in hive, doing map side joins etc.
- Good knowledge on Spark platform parameters like memory, cores and executors
- By using Zookeeper implementation in the cluster, provided concurrent access for hive tables with shared and exclusive locking
- Developed analytical component using Scala, Spark and Spark Stream.
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Worked on the NoSQL databases HBase and mongo DB.
Environment: Linux, Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, HBASE, AWS (S3, EMR), Scala, Spark, SQOOP
- Experienced in development using Cloudera distribution system.
- As a Hadoop Developer, my responsibility is manage the data pipelines and data lake.
- Performing Hadoop ETL using hive on data at different stages of pipeline.
- Worked in an agile technology with Scrum
- Sqooped data from different source systems and automating them with oozie workflows.
- Generation of business reports from Data Lake using Hadoop SQL (Impala) as per the Business Needs.
- Automation of Business reports using Bash scripts in UNIX on Data Lake by sending them to business owners.
- Developed Spark scala code to cleanse and perform ETL on the data in data pipeline in different stages.
- Worked in different environments like DEV, QA, Data Lake and Analytics Cluster as part of Hadoop Development.
- Snapped the cleansed data to the Analytics Cluster for reporting purpose to Business.
- Developed pig scripts, python to perform Streaming and created tables on the top of it using hive.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, and SQL
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Developed Oozie workflow engine to run multiple Hive, Pig, sqoop and Spark jobs.
- Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
- Developed pig, hive, sqoop, Hadoop streaming, spark actions in Oozie in the workflow management.
- Supported Map Reduce Programs those are running on the cluster.
- Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume and Kafka.
- Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
- Good Understanding of Workflow management process and in implementation.
Environment: Hadoop, AWS, Java, HDFS, MapReduce, Spark, Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
- Developed MapReduce programs to parse the raw data and store the refined data in tables
- Designed and modified database tables and used HBase queries to insert and fetch data from tables
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS that were further used for analysis
- Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications
- Created Hive tables, loaded data and wrote Hive queries that run within the map
- Used Oozie operational services for batch processing and scheduling workflows dynamically
- Developed and updated social media analytics dashboards on regular basis
- Performed data mining investigations to find new insights related to customers
- Managed and read viewed Hadoop log files
- Used Vertica as Enterprise data warehouse.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day
- Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages
- Involved in identification of topics and trends and building context around that brand
- Involved in the identifying, analyzing defects, questionable function error and inconsistencies observed in the output
Environment: HBase, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Flume 1.3, Oozie, Zookeeper, MySQL, and Eclipse
- Used XML for ORM mapping relations with the java classes and the database
- Worked in Analysis, Design and Coding for client development using J2EE stack using Eclipse platform
- Involved in creating web-based java components like client Applets and client side UI using JFC in Eclipse
- Developed PL/SQL stored procedures to perform complex database operations
- Used Struts in presentation tier
- Used Subversion as the version control system
- Played key role in the design and development of application using J2EE, Struts, Spring
- Involved in various phases of Software Development Life Cycle
- Configured Struts framework to implement MVC design patterns
- Designed and developed GUI using JSP, HTML, DHTML and CSS
- Generated the Hibernate XML and Java Mappings for the schemas
- Used Rational Application Developer (RAD) as Integrated Development Environment (IDE)
- Extensively used Core Java, Servlets, JSP and XML
- Used Oracle WebLogic workshop to generate the web service artifacts from the given WSDL for JAX-WS specification
Environment: Java, Struts, Servlets, spring, Tomcat, Hibernate, HTML, JSP, XML, SQL, J2EE, Junit, Oracle 11g, Windows
- Implemented server side programs by using Servlets and JSP
- Designed, developed and validated User Interface using HTML, Java Script, XML and CSS
- Implemented MVC using Struts Framework
- Implemented Controller Servlet to handle the access to database
- Participated in code walkthroughs, Debugging and defect fixing
- Involved in the co-ordination of end to end production release process
- Used SVN for versioning control system
- Used JDBC prepared statements to call from Servlets for database access
- Designed and documented the stored procedures
- Involved in writing JUnit Test Cases and done unit testing for various components
- Worked on database interaction layer for insertions, updating and retrieval operations of data from oracle database by writing stored procedures
- Used Spring Framework for Dependency Injection and integrated with Hibernate
- Used Log4J for any errors in the application