- Well - experienced Big Data Engineer, working on BigData Ecosystem in ingestion, storage, querying, processing and analysis of big data.
- Working currently on Apache Spark for Verizon product.
- Working experience in Scala in IntelliJ
- Experience in UNIX shell scripting and basic UNIX commands.
- Used Python scripting for writing workflows in Oozie and scheduling jobs.
- Experience in using Hive Query language for Data Analytics.
- Experienced in software development, specializing in analysis, design, development, testing and support of Oracle applications with strong analytical, technical and logical skills. Involved in all stages of Software Development Life Cycle (SDLC).
- Strong ability to build Analytical and complex queries in SQL.
- Have extensively used Jenkins during release and deployment.
- Capable of processing large sets of structured, unstructured and semi-structured data and supporting systems application architecture.
- Strong knowledge in NoSql column oriented databases like HBase and its integration with Hadoop cluster.
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in importing/exporting the data using Sqoop from HDFS to Relational Database Systems/mainframes.
- Excellent communication, interpersonal and problem solving ability. Very good team player and contribute to the benefit of team. Ability to effectively communicate with all levels of organization.
Big Data Stack: HDFS, Map Reduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, ZooKeeper, Tableau
Languages: Scala, C/C++, Python, Shell Scripting
Frame works: Spark
Database: NoSQL, MySQL, Oracle, MS-SQL Server, MS-Access
ETL Tools: SQOOP, Spark as ETL
Confidential, Palo Alto, CA
- Currently working in Spark and Scala for Data Analytics as Backend Developer.
- Extensively work on HDFS for data ingestion, data insertion with a Dataset size of Petabytes.
- Handle ETL Framework in Spark for writing data from HDFS to Hive.
- Extensively work on NoSql databases like Hive & Impala.
- Work on Shell scripts to enhance our working process during insertion of data.
- Use Scala based written framework for ETL.
- Proficient with UNIX, SED and AWK commands.
- Proficient in SQL querying.
- Extensively use Zookeeper as job scheduler for Spark Jobs.
Environment: Apache Spark (Cloudera), HDFS, Hive, MySQL, Linux, Shell ScriptingConfidential, Portland, OR
- Installed, configured and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Implemented NameNode backup using NFS for High availability.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Created Hive tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on HBase, MongoDB, and Cassandra.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
- Responsible for creating a Solr schema from the Indexer settings
- Written SOLR queries for various search documents
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Solr, Cloudera, Flume, HBase, ZooKeeper, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux.Confidential, Bellevue, WA
SQL DBA and Hadoop Data Engineer
- Worked on writing various Linux scripts to stream data from multiple data sources like Oracle and Teradata onto the data lake.
- Built the infrastructure that aims at securing all the data in transit on the data lake. This helps in ensuring at most security of customer data.
- Extended Hive framework through the use of custom UDF to meet the requirements.
- Actively involved in working with Hadoop Administration team to debugging various slow running MR Jobs and doing the necessary optimizations.
- Played a key role in mentoring the team on developing MapReduce jobs and custom UDFs.
- I played an instrumental role in working with the team to leverage Sqoop for extracting data from Teradata.
- Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
- Developed job flows in TWS to automate the workflow for extraction of data from Teradata and Oracle.
- Was actively involved in building the Hadoop generic framework to enable various teams to reuse some of the best practices.
- Optimized MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
- Helped the team in optimizing Hive queries.
Environment: A massive cluster with High Availability of Name Nodes.Cluster to support 400 TB of data keeping growth rate into consideration.Secured cluster with Kerberos.Used LDAP for role based access.Apache Hadoop, MapReduce, HDFS, Hive, Sqoop, Linux, JSON, Oracle11g, PL/SQL