- Around 7 years of profession IT experience in Software Systems Development, Business Systems and over 2.5+ yr.’s of experience in BigData ecosystem related technologies with a master’s degree in Computer Science.
- Hands - on-Experience on major components in BigData ecosystem Hadoop, Spark Streaming, SparkSQL, PySpark, kafka, Hive, Pig, HBase.
- Experience in data management and implementation of BigData applications using Spark and Hadoop frameworks.
- Good understanding of Spark Data frames, RDD’s and Dataset terminologies.
- Experience in analyzing data using SparkSQL, HiveQL and PIG Latin.
- Experience building streaming applications using Spark Streaming and Kafka with minimal/no data loss and duplicates.
- Experience in extending the Hive functionality by writing custom UDF’s in JAVA and Python.
- Experience in writing Map/Reduce programs to handle complex business logics.
- Experience developing python REST API’s using Bottle and Flask.
- Good understanding of NoSQL databases like HBase and Cassandra.
- Good understanding of Core JAVA concepts and basics of Python and Scala.
- Excellent technical, logical, code debugging and problem solving capabilities and ability to watch the future environment, the competitor and customer’s probable activities carefully.
- Good team player with strong analytical and communication skills.
Big Data Ecosystem: MapReduce, Spark, HDFS, Hive, Pig, Sqoop, HBase, Cassandra, Oozie, Zookeeper, Kafka, Solace, Storm, Elastic search
Operating Systems: Windows, Ubuntu, RedHat Linux, CentOS
Programming/ Scripting Languages: Java, Python, Scala, Unixshell scripting
Databases/Database Languages: MySQL, Oracle 9i/11g, NoSQL (HBase, Cassandra), SQL, PL/SQL
Frameworks: Apache Hadoop, Apache Spark, Apache Storm
IDEs: Eclipse, NetBeans, Toad, Rapid SQL, Express Studio
Reporting and Analyzing Tools: Datameer, Informatica
Confidential, San Ramon, CA
- Extensively worked in building a Data ingestion platform with a wide range of tech stack (Kafka, PySpark, Spark streaming, Elasticsearch, Cassandra, HDFS, ES)
- Worked on an open source project to make the product compatible by adding features as per the business requirements.
- Designed and developed a Data Ingestion platform to collect data from various sources and ingest it in to targeted polyglot system.
- Designed the wireframes for Data Registration and Data Catalog and developed respective CRUD API’s to support the functionality.
- Developed plugins using PySpark to write/save data into Polyglot storage (Hive, ES, HDFS).
- Enhanced the existing services to create and tag the open source project abstraction(datasets) from polyglot persisted data.
- Developed processes/methods to get/save spark dataframes from/in to polyglot by using the product abstraction (datasets)
- Helped in building a development 6 nodes Hadoop/Spark cluster with Cloudera distribution to run the open source application.
- Installed and integrated Jupyter Hub with open source product and educated/helped Data Science teams to use the environment to run their models on big chunks of data by making use of computing power of the cluster.
- Developed monitoring scripts in python using watchdog to pull the files from landing zone and ingest it into HDFS.
Confidential, Palo Alto, CA
- Extensively worked in building a Data collection platform with a wide range of tech stack (Kafka, Spark streaming, Solace, Elasticsearch, HBase, Cassandra)
- Developed kafka-spark streaming jobs to read real time streaming messages from kafka topics and produce them to Solace topics and write the data onto HDFS with zero data loss.
- Developed solace-spark streaming jobs to consume real time messages with zero data loss from solace queues and write it on to HDFS, Elasticsearch, HBaes/M7.
- Contributed in developing a Solace utility to produce and consume messages to Solace topics/queues.
- Helped in building a development Hadoop cluster with MapR distribution which ranged from 8-10 nodes.
- Worked in developing REST API’s to collect clickstream/service/event logs real time data from various end points.
- Worked and supported an API that collects mobile clickstream data from Mixpanel endpoint and developed/supported python scripts to validate data, to check data quality, to flatten the JSON messages in to delimited files and used Hadoop streaming to ingest the flatted data into the HDFS using the internal data ingestion framework.
- Extensively worked in ingesting data from various sources in to the enterprise BigData warehouse.
- Developed python scripts and used Hadoop streaming to load data into Elasticsearch to serve as an input to drill down Dashboard interfaces which were extensively used by Business users.
- As part of POC’s extensively worked on HiveQL and SparkSQL in analyzing the various categories of data to determine customer spending behavior, recommend card upgrade to customers and finding insights from CSP (Customer Service Profession) data.
- Developed Hive UDF’s in Java and Python to implement business logic.
- Worked on various business requirements in analyzing the data using Hive and load the output to Elasticsearch for users to make business decisions.
- Worked on multiple POC’s for building scalable distributed data solutions using Hadoop components.
- Used Sqoop to import data from RDMS to HDFS and later analyzed data using various Hadoop components.
- Extensively worked on creating Hive external and internal tables and then applied HiveQL to aggregate the data.
- Implemented partitioning and bucketing in Hive for efficient data access.
- Developed customized UDF’s in Java where the functionality is too complex.
- Integrated the Hive warehouse with HBase.
- Worked on MapReduce jobs to standardize the data and clean it to calculate aggregations.
- Developed Pig Latin scripts to sort, group, join and filter enterprise data.
- Used Sqoop to do incremental import/export data from RDBMS to HDFS and vice versa
- Used Oozie as a job scheduler to run multiple Hive and Pig jobs.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Extensive experience in requirement gathering, design, development, testing and implementation of enterprise data warehouse (EDW) implementations and enhancements in Waterfall/Agile software development environments
- Loading and maintaining daily, weekly and monthly data in the tables as per the retention policies
- Involved in writing Bulk Collects, Cursors, Ref Cursors, and Exceptions within the various procedures
- Support and modify existing code. Involved in creating UNIX shell scripts
- Involved in creation of Reporting DB from the Transaction DB
- Actively involved in optimization and tuning of SQL queries through utilizing the explain plan to find out the problem area of the query
- Used Temp Tables to reduce the number of rows for joins, to aggregate data from different sources for different reports
- Involved in the full development cycle of Planning, Analysis, Design, Development, Testing and Implementation.
- Created database objects like tables, views, procedures using Oracle tools like Toad
- Built and modified PL/SQL Stored Procedures, Functions, Packages and Triggers to implement business rules into the application.
- Worked on SQ*Loader to Extract, Load and Transform data from varied data sources like Flat files and Oracle.
- Generated PL/SQL scripts to create and drop several database objects including tables, primary keys, Views, Indexes, constraints.
- Involved in creating UNIX Shell Scripting
- Used the FTP commands efficiently to migrate various files and logs have been created successfully in proper directories on completion of the transfer process.