- Professional Software Developer with 10+ years of experience in IT industry, which includes 4+ years of experience in Hadoop/Big Data technologies and 2 years of extensive experience in JAVA, Python, Database development and Data analytics.
- Experience in using Cloudera and Hortonworks distributions.
- Experience in analyzing data using Spark SQL, Hive QL and custom MapReduce programs in Java.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
- Experience with different data source files like Avro, Parquet, RC, and ORC formats and compressions like snappy, bzip.
- Experience with Oozie scheduler in setting up workflow jobs with actions that run Hive and Sqoop jobs.
- Hands on experience with Relational databases like Teradata, Oracle and MySQL
- Strong Experience in Unit Testing and System testing in Big Data.
- Hands on experience with Spark using Scala and Python.
- Hands on experience working with JSON files.
- Hands on experience in Spark architecture and its integrations like Spark SQL, Data Frames and
- Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes.
- Experience in developing the complex SQL queries, unions and multiple table joins and experience with views.
- Involved in creating Hive tables, loading and analyzing data using hive queries.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Experience using Flume to collect, aggregate and store the weblog data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Migrated Flume with Spark for real time data and developed the spark Streaming Application with java to consume the data from Kafka and push them into Hive.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, efficient joins, Transformations and other during ingestion process itself.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
- Experience on data extraction, Transformation and loading (ETL) data from various sources like Oracle, SQL Server and flat files using Informatica Power Center.
- Experience in Object Oriented Analysis Design (OOAD) and development.
- Hands on experience in application development using Java, RDMS, LINUX and UNIX shell scripting.
- Hands on experience with version control software tools like SVN, Bit Bucket and Gitlab.
Big Data Ecosystem: HDFS, Map Reduce, Yarn, HBase, Pig, Hive, Sqoop, Flume, Zookeeper, Spark,Storm, Hue, Impala, Kafka, Mahout, Oozie
Hadoop Distributions: Hortonworks Data platform 2.3.6, Cloudera 5.0
Web Technologies: HTML, XML, CSS
Databases: SQL Server, MySQL, MongoDB, Cassandra
Operating Systems: Unix, Linux, CentOS, Windows, MacOS
Languages: Java, SQL, Linux shell scripting, Python.
Confidential, Irvine, CA
- Experience in implementing spark framework and UNIX scripting to implement the workflow for the jobs.
- Involved in gathering business requirement, analyze the use case and implement the use case end to end.
- Worked closely with the Architect; enhanced and optimized product Spark and python code to aggregate, group and run data mining tasks using Spark framework.
- Experienced in loading the raw data into RDDs and validate the data.
- Experienced in converting the validated RDDs into Data frames for further processing.
- Implemented the Spark SQL code logic to join multiple data frames to generate application specific aggregated results.
- Experienced in fine tuning the jobs for better performance in the production cluster space.
- Worked totally in Agile methodologies, used Rally scrum tool to track the User stories and Team performance.
- Worked extensively in Impala Hue to analyze the processed data and to generate the end reports.
- Experienced working with hive database through beeline.
- Worked on analyzing and resolving the production job failures in several scenarios.
- Implemented UNIX scripts to define the use case workflow and also to process the data files, and automate the jobs.
- Knowledge on implementing the JILs to automate the jobs in production cluster.
Environment: Spark, Python, Hive, Sqoop, Oozie, Unix Scripting, Spark SQL, Impala, Hue, Beeline, Autosys, Netezza.Confidential Jacksonville, FL
- Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
- Extensively worked on Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked with Hive to do transformations, joins and some pre - aggregations before storing the data into HDFS.
- Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Teradata.
- Built NIFI workflows for real-time data ingestion onto Hadoop and Teradata at same time.
- Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
- Developed testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
- Experience in working on the SAS code to convert the existing SAS datasets to the Hadoop environment.
- Experience in Job management using Autosys scheduler and developed job processing scripts using Oozie workflow.
Environment: Cloudera, HDFS, Hive, Sqoop, python Flume, Java, Shell-script, LINUX, Impala, Eclipse, Sas, Tableau, MySQL.Confidential
- Created Hive tables to store the processed results in a tabular format and written Hive scripts to transform and aggregate the disparate data.
- Automated the process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows jobs in Oozie.
- Worked on migrating the customers from Teradata to Hadoop and thus involved in Teradata decommission that in turn helped the organization by cost cutting.
- Developed a utility to move the data from production to lower lanes using Distcp.
- Experience in using Avro, Parquet, RC File and JSON file formats and developed UDFs using Hive and Pig.
- E2E development of the ETL process by sourcing the data from upstream, perform complex transformations and export the data to Teradata.
- Exported the aggregated data into RDBMS using Sqoop for creating dashboards in the Tableau and developed trend analysis using statistical features.
- Scheduled snapshots of volumes for backup to find root cause analysis of failures and document bugs and fixes for downtimes and maintenance of cluster.
- Utilized Agile Scrum Methodology to manage and organize the team with regular code review sessions.
Environment: Cloudera, HDFS, Hive, Sqoop, Shell-script, LINUX, Impala, Teradata.Confidential
- Responsible for gathering requirements from Business Analysts and Operational Analysts and identifying the data sources required for the requests.
- Proficient in importing/exporting large amounts of data from files to Teradata and vice versa.
- Developed the DW ETL scripts using BTEQ, Stored Procedures, Macros in Teradata.
- Developed scripts for loading the data into the base tables in EDW using Fast Load, Multi Load and BTEQ utilities of Teradata
- Created numerous scripts with Teradata utilities BTEQ, MLOAD and FLOAD.
- Highly experienced in Performance Tuning and Optimization for increasing the efficiency of the scripts.
- Developed reports using the Teradata advanced techniques like rank, row number
- Worked on Data Verifications and Validations to evaluate the data generated according to the requirements is appropriate and consistent.
- Tested database to check field size validation, check constraints, stored procedures and cross verifying the field size defined within the application with metadata.
- Proficient in working on Set, Multiset, Derived, Volatile Temporary tables.
- Designed and developed weekly, monthly reports related to the marketing and financial departments using Teradata SQL.
- Extracted data from existing data source and performed ad - hoc queries.
Environment: Teradata V12, BTEQ, MLOAD, FLOAD, ORACLE, SQL, PLSQL, UNIX, Windows XP.Confidential
- Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework.
- Responsible for developing and modifying the existing service layer based on the business requirements.
- Involved in designing & developing web - services using SOAP and WSDL.
- Involved in database design.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.
- Created User Interface using JSF.
- Involved in integration testing the Business Logic layer and Data Access layer.
- Integrated JSF with JSP and used JSF Custom Tag Libraries to display the value of variables defined in configuration files.
- Involved in JUnit testing of the application using JUnit framework.
- Written Stored Procedures functions and views to retrieve the data.
- Used Maven builds to wrap around Ant build scripts.
- CVS tool is used for version control of code and project documents.
- Responsible to mentor/work with team members to make sure the standards and guidelines are followed and delivery of tasks in time.