- Having 7+years of experience in Analysis, Design, Development, Integration, Testing and maintenance of various applications using JAVA /J2EE technologies along with 4 years of Big Data /Hadoop experience.
- Experienced in building highly scalable Big - data solutions using Hadoop and multiple distributions i.e. Cloudera, Horton works and NoSQL platforms (HBase & Cassandra).
- Expertise in big data architecture with Hadoop File system and its eco system tools MapReduce, HBase, Hive, Pig, Zookeeper, Oozie, Kafka, Flume, Avro, Impala and Apache Spark.
- Hands on experience on performing Data Quality checks on petabytes of data
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of BigData.
- Developed, deployed and supported several Map Reduce applications in Java to handle semi and unstructured data.
- Experience in writing Map Reduce programs and using Apache Hadoop API for analyzing the data.
- Strong experience in developing, debugging and tuning Map Reduce jobs in Hadoop environment.
- Expertise in developing PIG and HIVE scripts for data analysis
- Hands on experience in data mining process, implementing complex business logic and optimizing the query using Hive QL and controlling the data distribution by partitioning and bucketing techniques to enhance performance.
- Experience working with Hive data, extending the Hive library using custom UDF's to query data in non- standard formats.
- Experience in performance tuning of Map Reduce, Pig jobs and Hive queries.
- Involved in the Ingestion of data from various Databases like DB2, SQL-SERVER using Sqoop.
- Experience working with Flume to handle large volume of streaming data.
- Good working knowledge on Hadoop hue ecosystems.
- Extensive experience in migrating ETL operations into HDFS systems using Pig Scripts.
- Good knowledge in evaluating big data analytics libraries (MLlib) and use of Spark-SQL for data exploratory.
- Experienced in using Apache ignite for handling streaming data.
- Experience in implementing a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm- Kafka
- Experience with Oozie Workflow Engine in running work flow jobs with actions that run Hadoop Map Reduce and Pig jobs.
- Worked with different File Formats like TEXTFILE, AVROFILE, ORC for HIVE Querying and Processing
- Used Compression Techniques (snappy) with file formats to leverage the storage in HDFS
- Experienced with build tools Maven, ANT and continuous integrations like Jenkins.
- Hands-on experience in using relational databases like Oracle, MySQL, PostgreSQL and MS-SQL Server.
- Experience in cloud platforms like AWS, AZURE and GCP.
- Experience using IDEs tools Eclipse 3.0, My Eclipse, RAD and Net Beans.
- Hands on development experience with RDBMS, including writing SQL queries, PLSQL, views, stored procedure, triggers, etc.
- Participated in all Business Intelligence activities related to data warehouse, ETL and report development methodology.
- Expertise in Waterfall and Agile software development model & project planning using Microsoft Project Planner and JIRA.
- Highly motivated, dynamic, self-starter with keen interest in emerging technologies
BigData Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Hadoop Streaming, Zookeeper, Kafka, Impala, Apache Spark, hue, Ambari. Apache ignite.
Hadoop Distributions: Cloudera (CDH4/CDH5), Horton Works
Languages: Java, C, SQL, PYTHON, PL/SQL, PIG-Latin, HQL
IDE Tools: Eclipse, IntelliJ
Framework: Hibernate, Spring, Struts, Junit
Operating Systems: Windows (XP,7,8), UNIX, LINUX, Ubuntu, CentOS
Application Servers: JBoss, Tomcat, Web Logic, Web Sphere, Servlets
Reporting Tools/ETL Tools: Tableau, Power view for Microsoft Excel, Informatica
Databases: Oracle, MySQL, DB2, Derby, PostgreSQL, No-SQL Database (HBase, Cassandra)
Confidential, Phoenix, AZ
- Hands on experience on Scala, Spark, Hive, Kafka, Shell, SQL, Tableau, Rally.
- Spark Streaming collects data from Kafka in near-real-time and performs necessary transformations and aggregation to build the common learner data model and stores the data in NoSQL store (HBase).
- Installing and configuring of various components of Hadoop ecosystem such as Flume, Hive, Pig, Sqoop, Oozie, Zookeeper, Kafka, and Storm and maintained their integrity.
- Have a good knowledge on Confidential internal data sources such as Cornerstone, WSDW, IDN, and SQL.
- We use Apache Kafka Connect for streaming data between Apache Kafka and other systems.
- Experience in performing advanced procedures like text analytics using in-memory computing capabilities of Spark using Scala.
- Partitioning data streams using Kafka . Designed and configured Kafka cluster to accommodate heavy throughput.
- Responsible for fetching real time data using Kafka and processing using Spark streaming with Scala.
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
- Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system.
- Experienced in working with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Implemented Sqoop jobs to import/export large data exchanges between RDBMS and Hive platforms.
- Working with application teams to install Hadoop updates, patches, version upgrades as required.
- Using Kafka Connect is a utility for streaming data between Map R Event Store for Apache Kafka and other storage systems.
- Worked on visualization tool Tableau for visually analyzing the data.
- Developed Scala scripts using both Data frames/SQL/Datasets and RDD/Map Reduce in Spark for Data aggregation, queries and writing data back into OLTP system through Sqoop.
- Migrated Map Reduce programs into Spark transformations using Scala.
- Application deployment and scheduling on cloud Spark/Ambari.
- Evaluate and use hosted solutions and Create data pipelines in cloud using Azure Data Factory on Azure.
- To create a user interface to access the data from landing zone tables and automate the SQL queries to provide flexibility to the users.
- The data is moving across hops with counts of amounts summary in local and as well as in USD currency.
- Drops involved will be provided in detail along with the reason and expectations of the drops.
- Drill down and drill through on the data set at an aggregated level to understand the data better.
- The uses cases like product level, country level, to recognize it at granular level.
Environment: MapReduce, Scala, Spring frameworks2.1.3, Oracle 22.214.171.124, Kafka connectors, Maven 4.0, Spark, Hive Sql, Node Js V8.11.1, no Sql, Java Version 1.8, Tableau, Ambari user views, spark real time data source, cloud platform(Azure), consumers.
- Designed schema and modeling of data and written algorithm to store all validated data in Cassandra using Spring Data Cassandra Rest.
- To standardize the Input Merchants Data, uploading images, index the given Data sets into Search and persist the data on HBase tables.
- Setting up the Spark streaming and Kafka Cluster and developed a Spark Streaming Kafka App.
- Developed prototype Spark applications using Spark-Core, Spark SQL, DataFrame API
- Involved in data analysis using python and handling the ad-hoc requests as per requirement.
- Developing python scripts for automating tasks.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Crawling of Data from 100 + sites based on ontology maintenance.
- Identified trends, opportunities, and risks to current forecasts and the next period's plan.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Used AWS services like EC2 and S3 for small data sets.
- Generate Stock Alerts, Price Alerts, Popular Product Alerts, New Arrivals for each user based on given likes, favorite, shares count information.
Environment: Cassandra, Hive, Spark (Core, SQL, MLLib, Streaming), Hadoop, MapReduce, Scala, Java, AWS, Zookeeper, Shell Scripting
Confidential - Bloomington, IL
- Developed a simulator to send / emit events based on NYC DOT data file.
- Built Kafka Producer to accept / send events to Kafka Producer which is on Storm Spout.
- Created, altered and deleted topics using Kafka Queues when required with varying
- Setting up and managing Kafka for stream processing and Broker and topic configuration and creation
- Worked on Google Cloud Platform Services(GCP) like Vision API, Instances.
- Worked on GCP Vision API for detecting information from Confidential ’s internal data(images, V Cards etc).
- Load and transform large sets of unstructured data from UNIX system to HDFS.
- Developed Use cases and Technical prototyping for implementing PIG, HDP, HIVE and HBASE.
- Experienced in running Hadoop streaming jobs to process terabytes of CSV format.
- Supported Map Reduce Programs those are running on the cluster.
- Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Written Storm topology to accept events from Kafka Producer and Process Events.
- Developed Storm Bolts to Emit data into HBase, HDFS and Rabbit-MQ Web Stomp.
- Hive Queries to Map Truck Events Data, Weather Data, Traffic Data
Environment: HDFS, Hive, HBase, Kafka, Storm, Rabbit-MQ Web Storm, GCP, Google Maps, New York City Truck Routes from NYC DOT.
Data Engineer/ Hadoop Developer
- Involving in sprint planning as part of monthly deliveries.
- Involving in daily scrum calls and standup meetings as part of agile methodology.
- Good hands on experience on Version One tool to update the work details and working hours for a task.
- Involving in the designing part of views.
- Involving in Writing Spring Configuration Files and Business Logic based on Requirement.
- Involved in code-review sessions.
- Implementing Junit tests based on the business logic w.r.t to assigned backlog in sprint plan.
- Implementing the Fixtures to execute the Fitness test tables.
- Good experience on creating the Jenkins CI jobs and Sonar jobs.
Environment: Core Java, spring, Maven, XMF Services, JMS, Oracle10g, PostgreSQL, 9.2, Fitness, Eclipse, SVN