- Skilled in predictive analytics, data pipelines, data visualization and data reporting with HDFS data, ETL, and Hadoop MapReduce using Hadoop distributed data and Hadoop architecture, and Hadoop tools in a Hadoop Ecosystem.
- Strong experience working in SDLC using various methodologies including Agile, Waterfall, and other methodologies.
- Strong knowledge of SQL Server BI (SSIS, SSAS, SSRS) Data Warehousing/Data Marts concepts and Dimensional Modeling.
- Experienced python web app developed on various cloud platforms i.e Microsoft Azure, AWS and IBM bluemix.
- Experience in applying Machine Learning, Predictive Analytics, and Classification in Hadoop distributed data analytics using Spark MLlib.
- Hands on Experience on handling Large Dataset and batch processing.
- Hands on Experience in analyzing data using HiveQl, Pig Latin and custom Map - Reduce programs in Python.
- Experience in extracting the data from multiple sources using Hadoop data pipeline to process data using Spark, Spark Streaming, or Hive.
Big Data and Cloud: Apache Hadoop, Apache Hadoop YARN, Apache Hbase, Apache Hive, Apache Kafka, Apache Pig, Apache Spark, Spark Streaming, Spark MLlib, Apache Tez, HDFS, MapReduce, Sqoop, Azure HDInsight, Horton Works HDP, AWS, IBM Bluemix
Programming Languages: C, JAVA, C++, Python 2.7/ 3.x,Scala, R
Markup Languages: HTML, XML, CSS
Network Protocols: Bluetooth, TCP, UDP, HTTP, IP
Operating Systems: Windows, Linux, Android, OS X
Database: MySQL, SQLite, Oracle, NoSQL, Vertica, HBase, MongoDb
Tools: Android Studio, SQLite, MySQL Server Management Studio, Wireshark, Eclipse, Tomcat, MS Office, Weka, NetBeans, MySQL Workbench, LoadRunner, Jmeter, PyCharm
Big Data Intern
- Worked closely with product owners and data workers / analysts to understand business objectives for Big Data platform.
- Analyzed and extracted key insights from rich data in Hadoop data lakes and Hadoop Clusters for analytics.
- Identified and ingested source data from different systems into HDFS using Sqoop and preprocessed data using Pig and Hive.
- Built various Hive UDF libraries as per business requirements which enabled to apply easy analytics on top of Hive tables.
- Involved in setting up and maintenance of MS SQL database, Vertica clusters and servers.
- Created HBase tables to store variable data formats for data analytics.
- Involved in performance tuning and optimization of long-running jobs/queries using features like Hive partition and bucket to manage very large data volume.
- Developed SSAS Multidimensional and tabular cubes with dimensions and fact tables.
- Created SSRS reports on top of cubes using MDX (Multi-Dimensional Expressions) queries for creating measures, calculations and KPI’s that fits business requirements.
- Worked on setting up Measures, attributes, Measure Groups, KPI\Datapartiniong for SSAS cubes and Scheduling cube processing as needed.
- Creating simple and complex BI/Analytics reports, dashboards (OLAP and relational).
Programmer Analyst Intern
- Implemented sensor data collection in Azure Blobs (HDFS) using HDInsight Hadoop Cluster.
- Developed workflows for data processing using PowerShell and SSIS by creating Invoke Hive Scripts.
- Configured flume agent to extract data from Kafka and more into HDFS.
- Data analysis/Machine learning using Python programming.
- Created custom Hadoop cluster creation and deletion script using Microsoft Azure Automation Jobs to minimize the cluster runtime and cost.
- Extracted data from NoSQL Azure Table storage using RowKey and PartitionKey.
- Developed hive tables on machine generated JSON data and parsed the data into derived hive tables of further analysis.
- Created BI reports(Tableau) and dashboards from HDFS data using Hive.
- Implementing Trigger and automated report generation using Pig scripts which avoided lookup and increases performance by 40%. Used Spark MLLIB Libraries for designing recommendation Analysis, predicted by using Statistical analysis in R .
Software Developer Intern
- Involved in the full life cycle of software design process of other projects, including prototyping, proof of concepts, design, interface implementation, testing and maintenance.
- Involved in writing complex SQL queries for validating the data against different kinds of reports and involved in data quality issues and back end testing.
- Created detailed design documents using UML (Use case, Class, Sequence, and Component diagrams).