- Around 5 years of experience in IT industry with 4+ years of experience in Big Data along with J2EE.
- Good working experience in using Spark, Spark Core, Spark - SQL, Spark-Streaming, Spark-Core, Apache Hadoop eco system components like MapReduce, HDFS, Hive, Sqoop, Pig, Oozie, Flume, HBase and Zoo Keeper.
- Worked on setting up Apache NiFi and performing dataflows using NiFi in orchestrating data pipeline activities.
- Strong experience in NOSQL columnar databases like HBase, Cassandra and its integration with Hadoop cluster.
- Writing UDFs and integrating with Hive and Pig.
- Experience with Sequence files, AVRO and ORC file formats and compression.
- Experience in Hadoop Distributions: Cloudera and Hortonworks,
- Performed importing and exporting data into HDFS and Hive using Sqoop.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Extensive knowledge in using SQL Queries for backend database analysis.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
- Led many Data Analysis & Integration efforts involving HADOOP along with ETL.
- Hands on experience on enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
- Extensive experience with SQL, PL/SQL and database concepts.
- Transferred bulk data from RDBMS systems like Teradata, Netezza into HDFS using Sqoop.
- Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
- Well-versed in Agile, other SDLC methodologies and can coordinate with owners and SMEs.
- Worked on different operating systems like UNIX, Linux, and Windows
- Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, Enterprise Edition (Java EE), Enterprise Java Bean (EJB), JavaServer Pages (JSP), Java Servlets (including JNDI), Struts, and Java database Connectivity (JDBC) technologies.
- Experience in web application design using open source MVC, Spring and Struts Frameworks.
Bigdata Core Services: Spark (Core, Sql, Streaming, ML), HDFS, Map Reduce, YARN
Hadoop Distribution: Horton works, Cloudera, Apache
NO SQL Databases: HBase, Cassandra, MongoDB
Hadoop Data Services: Hive, Impala, Pig, Sqoop, Flume
Hadoop Operational Services: Zookeeper, Oozie
Monitoring Tools: Cloudera Manager, Ambari, Nagios.
Cloud Computing Tools: Amazon AWS, EC2, S3, EMR
Languages: C, Java/J2EE, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat
Databases: Oracle, MySQL, Postgress, Teradata, Netezza
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans
Development methodologies: Agile/Scrum
Visualization and analytics tool: Tableau Software, Qlik View
Confidential, Plano, Texas
Big Data Developer
- Importing and exporting data into HDFS and hive using Sqoop and Kafka with batch and streaming.
- Experienced with Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into HBase.
- Performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
- Enhanced and optimized Spark/Scala jobs to aggregate, group and run data mining tasks using the Spark framework.
- Installed, configured and developed various pipeline activities with Nifi using various processors such as Sqoop processor, Kafka processor, HDFS Processor, File Processors etc.
- Created Data Pipelines as per the business requirements and scheduled it using Oozie.
- Used Hive to join multiple tables of a source system and load them to Elastic search tables.
- Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data using several tools.
- Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization.
- Experienced on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop.
- Configured Hive and written Hive UDF's and UDAF's Also, created partitions such as Static and Dynamic with bucketing.
- Experience in managing and reviewing huge Hadoop log files.
- Expertise in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting
- Maintaining technical documentation for each and every step of development environment and launching Hadoop clusters.
- Worked on BI tools as Tableau to create dashboards like weekly, monthly, daily reports using tableau desktop and publish them to HDFS cluster.
Environment: Spark, Scala, HBase, Cassandra, Kafka, Nifi, Hadoop, HDFS, Hive, Oozie, Sqoop, Elastic Search, Shell Scripting, Python, Tableau, Oracle, MySQL, Teradata, Log4J, Junit, MRUnit, Jenkins, Maven, GIT, SVN, JIRA and AWS.
Confidential, Nashville, TN
Big Data Developer
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in data modeling and replication strategies in Cassandra.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Worked on handling Hive queries using Spark SQL that integrate Spark environment.
- Along with the Infrastructure team, involved in design and developed Kafka and Spark-Streaming based real-time data pipeline.
- Responsible for building scalable distributed data solutions using Hadoop
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
- Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Involved in loading data from LINUX file system to HDFS.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Hands on experience in setting up HBase Column based storage repository for archiving and retro data.
- Responsible for creating Hive tables based on business requirements.
- Used Enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Spark, Scala, Kafka, Cassandra, Flume, Apache Hadoop 2x, Cloudera, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Kafka, FLUME, Cassandra, Oracle 11g/10g, Linux, XML, MYSQL, Jenkins, Maven, GIT.
- Optimizing Hadoop MapReduce code, Hive and Pig scripts for better scalability, reliability and performance.
- Developed the OOZIE workflows for the Application execution.
- Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
- Writing Pig scripts for data processing.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Implemented Hive tables and HQL Queries for the reports.
- Executed HiveQL in Spark using Spark SQL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Involved in developing shell scripts and automated data management from end to end integration work
- Experience in performing data validation using HIVE dynamic partitioning and bucketing.
- Written and used complex data type in storing and retrieved data using HQL in Hive.
- Developed Hive queries to analyze reducer output data.
- Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
- Highly involved in designing the next generation data architecture for the Unstructured data.
- Developed PIG Latin scripts to extract data from source system.
- Created and maintained technical documentation for executing Hive queries and Pig scripts.
- Involved in Extracting, loading Data from Hive to Load an RDBMS using Sqoop
Environment: HDFS, Map Reduce, MySQL, Spark, Cassandra, Hive, HBase, Oozie, PIG, ETL, Hortonworks (HDP 2.0), Shell Scripting, Linux, Sqoop, Flume and Oracle 11g, Maven, Ant, Junit, MRUnit, SVN, Jira.