- Certified HDP (Hortonworks Data Platform) Spark Developer using Scala with around 6 years of experience in cross platform technologies using Big data with Cloudera and Hortonworks platform.
- In depth knowledge on Big Data Stack like Hadoop ecosystem Hadoop, Map Reduce, YARN, Hive, Hbase, Sqoop, Flume, Kafka, Spark, Spark Data Frames, Spark SQL, Spark Streaming, etc
- Exploring with Spark using Scala improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark - SQL, Data Frame, pair RDD's, Spark YARN.
- Knowledge on Hadoop architecture and its components like HDFS, Name Node, Data Node, Job Tracker, Task Tracker programming paradigm
- Hands on experience with HiveQL
- Hands on experience in Apache Spark 1.6.2 with Scala 2.10.
- Used Apache Spark API over Hortonworks Hadoop YARN cluster to perform analytics on data in Hive.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response using Scala.
- Good knowledge in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
- Written Hive commands to process all types of files to extract the useful information.
- Extensively involved in query optimization in Hive query language.
- Development experience in different IDE's like Eclipse, IntelliJ and STS.
- Strong database skills, Object Oriented Programming and development knowledge.
- Sound knowledge in data ingestion using Kafka and Flume .
- Good experience on Spring Core, Spring AOP, performed dependency injection of Spring Beans such as data source beans.
- Excellent understanding of relational databases. Created normalized databases, wrote stored procedures, used JDBC to communicate with database. Experienced with MySql, and SQL Server.
- Expertise on various AWS Services including Computing, Database, Storage, Networking, Security.
- Understanding of using S3 and Data storage buckets in AWS.
- Good knowledge of AWS services like EC2, Elastic Load-balancers, Elastic Container Service, S3, Elastic Beanstalk, Cloud Front, Elastic File system, RDS, Dynamo DB, DMS, VPC, Direct Connect, Route53, Cloud Watch, Cloud Trail, Cloud Formation, IAM, EMR, Elastic Search.
- Knowledge in designing both time driven and data driven automated workflows using Oozie.
- Proficient in analyzing and translating business requirements to technical requirements and architecture.
Confidential, City of Industry, CASpark Developer
- Experience in Design and Implementation of Enterprise applications and Web based applications using Java and Big Data platform.
- Extensive experience in Spring (Core/Security/Data Access), Spring MVC, SOAP & Restful Web Services.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, Hive.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce, Worked with Spark and Scala
- Used Spark SQL to process the huge amount of structured data.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing predictive analytic using Apache Spark Scala APIs.
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files through Sqoop and placed in HDFS and processed.
- Responsible for importing log files from various sources into HDFS using Flume.
- Involved in developing Map-reduce framework, writing queries scheduling map-reduce
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop .
- Used Spring JDBC Dao as a data access technology to interact with the database.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
Environment: Hadoop, Spark, HDFS, Scala, Hive, Java, Spring, Map Reduce, Sqoop, Spring MVC, Big Data, Spark SQL, Spring JDBC, Oozie, Pig, Flume
Confidential, Houston, TXSpark Developer
- Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Creating applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Involved in converting Hive/SQL queries into Spark transformations using SparkRDD'S and Scala.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
- Involved in creating Hive tables, and loading and analyzing data using Hive queries
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, Flume, Zookeeper, Java, MySQL, Eclipse
Confidential, Hudson, OHSpark Developer
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
- Implemented Capacity Scheduler to share the resources of the cluster for the map reduces jobs given by the users.
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics and Charge Back customers on their Usage.
- Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
- Involved in implementing security on Cloudera Hadoop Cluster using with working along with operations team to move non-secured cluster to secured cluster.
- Involved in creating Hive tables, loading with data and writing Hive queries, which will run internally in map, reduce way.
- Formulated procedures for installation of Hadoop patches, updates and version upgrades.
- Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Kafka, Oozie, Cloudera, Flume, Zookeeper, Java, MySQL, PL/SQL
Confidentia lSpark Developer
- Used Spark API using Scala over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Create a Hadoop design which replicates the Current system design.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark for Data aggregation and queries.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Developed Hive queries to pre-process the data required for running the business process.
- Create the Main upload files from the Hive Temporary Tables.
- Create UDF functions for HIVE queries.
- Actively involved in design analysis, coding and strategy development.
- Developed Hive scripts for implementing dynamic partitions and buckets for history data.
- Developed Spark scripts by using Scala per the requirement to read/write JSON files.
- Involve in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Used the light weight container of the Spring Framework to provide flexibility for Inversion of controller (IOC).
- Developed Spring Configuration for dependency injection by using Spring IOC, Spring Controllers, Implementing Spring MVC and IOC methodologies.
Environment: Big Data, Hadoop, Spark, Spark SQL, HDFS, Scala, Spring, Spring MVC, Spring JDBC, Spring Boot, Hive, Java, Map Reduce, Sqoop, Oozie, Pig, Flume