- I have 8 years of IT experience in various domains with Big Data (Hadoop Eco Systems technologies), Core java and SQL&PL/SQL Technologies with hands - on project experience in various Verticals which includes financial services, Health Care and trade compliance.
Big Data/Hadoop Technologies: HDFS, YARN, Map Reduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Zookeeper, Oozie, Elastic Search
Hadoop Distribution/Monitoring: Cloudera, Hortonworks, Ambari, Cloudera Manager
NO SQL Databases: HBase, Cassandra, MongoDBRelational
Databases: Microsoft SQL Server, MySQL, Oracle, DB2
Languages: Java, Scala, SQL, PL/SQL, C, C++, Shell Scripting, Python
Java & J2EE Technologies: Core Java, JSP, Servlets, JDBC,JNDI, Hibernate, Spring, Struts, JMS, EJB, RESTful, SOAP
Web Technologies: HTML, CSS, XML, Java Script, JQuery
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat
Amazon AWS: EC2, S3, IAM, Glacier, CloudFront, EMR
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
ETL Tools: Infomatica, Talend
Development Tools: Eclipse, NetBeans, IntelliJ
Development Methodologies: Agile, Waterfall
Version Tools and Testing API: Git, SVN and JUNIT
Confidential, New York City, NY
- Worked on Hortonworks Data Platform (HDP).
- Created Data Lake by extracting customer’s data from various data sources. This includes data from RDBMS, CSV and Excel.
- Involved in design and development of Data transformation framework components to support ETL process, which gets the Single Complete Actionable View of a customer.
- Developed an ingestion module to ingest data into HDFS from heterogeneous data sources.
- Used Apache Hive to run map reduce jobs on top of this HDFS data.
- Built distributed in-memory applications using Spark and Spark SQL to do analytics efficiently on huge data sets.
- Efficiently used spark transformation and actions to build simple/ quick and complex ETL applications.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Load real time data into HDFS using Kafka and structured batch data using Sqoop.
- Involved in converting Hive/SQL queries into Spark transformations using SparkRDD's and Scala.
- Developed Spark scripts by using Scala shell commands as per the requirement.
Environment: Hadoop 2.6, Spark 1.6, Hive 1.1.0, Hbase 1.2, Scala, HDFS, MapReduce, Ambari, MySQL, SQL, GitHub, Linux, Spark SQL, Kafka, Sqoop 1.46, AWS (S3).
Confidential - Southlake, TX
- Worked in Ingesting flat files from local Unix file systems to HDFS and using Sqoop ingested structured data from legacy RDBMS systems to HDFS.
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing predictive analytic using Apache Spark Scala APIs.
- Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop and Worked with Spark, Scala and Python.
- Coordinating with the Data science team in creating PySpark jobs.
- Writing Hive join query to fetch info from multiple tables and collect output from Hive Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Utilizing Oozie workflow scheduler to run Hive Jobs Extracted files through Sqoop and placed in HDFS and processed.
Environment: Hadoop, Sqoop, Hive, Spark, HDFS, Scala, Python, Spark SQL, JDBC, Kafka
Confidential, Cincinnati, OH
Big Data Developer/Engineer
- Used Spark API using Scala over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Create a Hadoop design which replicates the Current system design.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark for Data aggregation and queries.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Developed Hive queries to pre-process the data required for running the business process.
- Create the Main upload files from the Hive Temporary Tables.
- Actively involved in design analysis, coding and strategy development.
- Developed Hive scripts for implementing dynamic partitions and buckets for history data.
- Developed Spark scripts by using Scala per the requirement to read/write JSON files.
- Involve in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, Flume, Spark, Spark-Streaming, MapReduce, Kafka, AWS, Tableau 8, Apache
Confidential, New Orleans, LA
Big Data Developer
- Ingested Batch Files into HDFS using shell scripting.
- Used flume to ingest near-real-time data and perform necessary transformations and aggregations on the fly and persisted the data in Hive.
- Used Hadoop's Pig, Hive and Map Reduce for analyzing the data and to help by extract data sets for meaningful information.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing irrelevant information or merging many small files into a handful of very large, compressed files using pig pipelines in the data preparation stage.
- Extensively used PIG to communicate with Hive using HCatalog.
- Implemented exception tracking logic using Pig scripts.
- Saved the analyzed data to the Hive Tables for visualization and to generate reports for the BI team.
- Good understanding of ETL tools and how the ETL operations can be applied in a Big Data environment.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Oozie, Core Java, Python, Eclipse, Flume, Cloudera, Oracle, UNIX Shell Scripting.
- Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
- Designed the user interfaces using JSPs, developed custom tags, and used JSTL Tag lib.
- Developed various java business classes for handling different functions.
- Developed controller classes using Struts and tiles API.
- Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use Case Transaction diagrams.
- Participated in design and code reviews
- Developed User Interface using AJAX in JSP and performed client-side validation
- Developed JUnit test cases for all the developed modules. Used SVN as version control