- Around 5 years of professional experience in Information Technology, which includes 3 years of experience in the development of Big Data and Hadoop Ecosystem applications and 2 years of extensive experience in JAVA Technologies, Database development and Data analytics.
- Expertise in Hadoop components - Hive, Hue, Pig, Sqoop, HBase, Impala, Flume, Oozie and Apache Spark.
- Experience in writing Pig Latin, HiveQL scripts and extended their functionality using User Defined Functions (UDF’s).
- Hands on experience with performance optimization techniques in Hive, Impala, Spark.
- Had a very good exposure working with various File-Formats (Parquet, Avro & JSON) and Compressions (Snappy & Gzip).
- Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
- Developed applications using Spark and Scala for data processing.
- Replaced existing map-reduce jobs and Hive scripts with Spark Data-Frame transformation and actions.
- Good knowledge on Spark architecture and real-time streaming using Spark.
- Hands on experience spinning up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
- Hands on experience with AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.
- Fluent with the core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Collections, Data-structures and serialization.
- Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Experience in Java, JSP, Servlets, Web Logic, Web Sphere, Java Script, Ajax, JQuery, XML, and HTML.
- Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server and MySQL.
- Knowledge on ETL methods for data extraction, transformation and loading in corporate-wide ETL solutions and Data warehouse tools for reporting and data analysis.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Well-Versed with Agile/SCRUM and Waterfall methodologies.
- Strong team player with good communication, analytical, presentation and inter-personal skills.
Programming Languages: SQL, Java, J2EE, Scala and Linux shell scripting
Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala, Hue, Sqoop, Kafka, Oozie, Flume, Zookeeper, Spark, Cloudera and Hortonworks.
Databases & NoSQL: Oracle, Teradata, MySQL, SQL Server, DB2, Familiar with NoSQL- HBase
Scripting & Query Languages: Linux Shell scripting, SQL and PL/SQL.
Hadoop Paradigms: Map Reduce, YARN, In-memory computing, High Availability, and Real-time Streaming.
Other Tools: Eclipse, IntelliJ, SVN, GitHub, Jira, Kanban, BitBucket.
Cloud Components: AWS (S3 Buckets, EMR, Ec2, Cloud Formation), Azure (Sql Database & Data Factory)
Confidential, Chicago, IL
Hadoop & Spark Developer
- Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Developed Spark API to import data into HDFS from Teradata and created Hive tables.
- Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL using Python and Scala.
- Involved in performance tuning of Hive from design, storage and query perspectives.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Developed Kafka consumer’s API in Scala for consuming data from Kafka topics.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Integrated Hive and Tableau Desktop reports and published to Tableau Server.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Orchestrated number of Sqoop, Hive scripts using Oozie workflow, and scheduled using Oozie coordinator.
- Used Jira for bug tracking, BitBucket to check-in, and checkout code changes.
Environment : HDFS, Yarn, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL Linux Shell Scripting, Cloudera.
Confidential, Grapevine, TX
- Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
- Written MapReduce code to parse the data from various sources and storing parsed data into Hbase and Hive.
- Worked on creating combiners, partitions, and distributed cache to improve the performance of MapReduce jobs.
- Developed Shell Script to perform data profiling on the ingested data with the help of HIVE Bucketing.
- Developed Hive UDF for performing Hashing mechanism on the Hive Column.
- Experienced in writing Hive validation scripts that are used in validation framework (for daily analysis through graphs and presented to business users).
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
- Developed code in Python to use MapReduce framework by Hadoop streaming.
- Used Pig as ETL tool to do transformations, joins and some pre-aggregations before storing the data into HDFS.
- Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Oracle.
- Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
- Experience in streaming log data using Flume and data analytics using Hive.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Extracted the data from RDBMS (Oracle, MySQL & Teradata) to HDFS using Sqoop.
- Worked on a POC on Spark for comparing performance difference of existing MapReduce jobs written in hive and spark jobs execution times.
Environment: Hadoop, MapReduce, HDFS, Pig, HiveQL, Oozie, Flume, Impala, Cloudera, MySQL, UNIX Shell Scripting, Tableau, Python, Spark.
- Involved in Full Life Cycle Development in Distributed Environment using Java and J2EE framework.
- Designed the application by implementing Struts Framework based on MVC Architecture.
- Implemented the Web Service client for the login authentication, credit reports and applicant information Apache Axis 2 Web Service.
- Developed framework for data processing using Design patterns, Java, XML.
- Used the lightweight container of the Spring Framework to provide architectural flexibility for Inversion of Controller (IOC).
- Used Hibernate ORM framework with spring framework for data persistence and transaction management.
- Designed and developed Session beans to implement the Business logic.
- Developed EJB components that are deployed on Web logic Application Server.
- Written unit tests using JUnit Framework and Logging is done using Log4J Framework.
- Designed and developed various configuration files for Hibernate mappings.
- Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
- Developed Web Services for sending and getting data from different applications using SOAP messages.
- Actively involved in code reviews and bug fixing.
- Applied CSS (Cascading style Sheets) for entire site for standardization of the site.
- Assisted QA Team in defining and implementing a defect resolution process including defect priority, and severity.