- 10 years of extensive IT experience developing Big Data / Hadoop applications.
- Worked on Open Source Apache Hadoop, Cloudera Enterprise (CDH) and Horton Works.
- Hands on experience with Hadoop Ecosystem components: (MapReduce, HDFS, Sqoop, Pig, Hive, HBase, Flume, Oozie and Zookeeper)
- Experienced in programming with RDD operations using Transformations & Actions, Spark’s shared variables Accumulators and Broadcasts, using Higher - Order functions, Function Literals, collections such as Lists, Sets, Maps, Arrays, Sequences and Monadic Collections.
- Experienced in Spark SQL using DataFrames, DataSets, Schema RDDs, HiveContext/SQLContext, HiveQL, UDF’s, Caching tables for performance to work with both structured and semistructured data such as text files, HIVE tables, RDDs, Parquet, RCFiles, ORC,, JSON, Avro, utilizing JDBC connectivity and performance tuning by setting Spark SQL options.
- Experiencedwith Spark Streaming using DStreams, Stateless/Stateful transformations, Windowed transformations with different input sources as Apache Kafka, Apache Flume, Streams of files and combining multiple input sources.
- Experienced and possess good knowledge using MLlib in Spark to implement Machine Learning algorithms of different categories such as Feature Extraction, Dimensionality Reduction techniques such as Principal Component Analysis and Singular value Decomposition, Collaborative Filtering and Recommendation techniques such as Alternating Least Squares.
- Experienced in Tuning and Debugging Spark applications by changing runtime configuration values, tuning Spark’s use of memory by optimizing RDD storage, inspecting Spark Jobs, Tasks and Stages, observing Spark application behavior and performance in Spark’s built-in web UI, Driver and Executor log files, tuning level of parallelism using repartition() and coalesce() operators, reduce data shuffling and recalculations using persist() or cache(), utilizing Kryo serialization format.
- Expertise in developing Python scripts for data mining and system administration tasks.
- Extensive experience in all aspects of Microsoft SharePoint administration and developing solutions using SharePoint Online (Office 365), SharePoint Server 2010, SharePoint 2007, Windows SharePoint Services (WSS 3.0), SharePoint Designer 2010/2007, InfoPath 2010/2007 and Microsoft Visual Studio 2013/2010/2008.
- Responsible for implementing standards and practices for documentation and procedures for a seamless integration into the global SAP environment and flawless Production Support practices.
- Responsible for creating and updating the Global Project cutover plan for all maintenance and rollout activities in Hadoop and SAP landscape. Updating senior management with progress and issues.
Confidential, Houston, TX
Sr. Spark Developer
- Developed Big Data solutions dependent on Hadoop clusters.
- Leveraged Spark with Scala API, SQL using DataFrames and DataSets along with Kafka streams and HIVE to capture several hundred of customer invoices in near real time that are of pdf format to create BI dashboards for managers to make effective decisions.
- Used Log4j framework logging, debugging info & error data.
- Collaborated with administration team in cluster co-ordination services through ZooKeeper.
- Installed SAP VORA on Hadoop clusters and configured SAP HANA Spark controller.
- Leveraged SAP VORA Graph engine to create graphs from data stored in HDFS and SAP HANA as JSG format files to optimize delivery routes, point of sales analysis, identify relationship between entities of key interest, view complex structures and draw knowledge graphs.
- Leveraged SAP VORA time series engine using RAW SQL to provide time based metrics for warehouse stock replenishment and sales analysis per location for timely delivery and reduce chemicals wastage due to overstocking and optimized distribution of stocks.
- Leveraged Python libraries to perform big data analysis and implement Machine Learning techniques.
- Developed many Unix shell scripts to trigger and automate jobs performing analysis on data at time intervals.
Confidential, Jacksonville, Florida
Sr. Spark Developer
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Real time streaming the data using Spark with Kafka.
- Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka.
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop logfiles
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager
- Upgrading the Hadoop Cluster from CDH3 to CDH4, setting up High availability Cluster and integrating HIVE with existing applications.
- Worked on Storm real time processing bolts which save data to Solr and Hbase.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD’s.
- Analyzed the data by performing Hive queries and running Pig scripts to know userbehavior
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS
- Created Hbase tables to store various data formats of PII data coming from differentportfolios
- Cluster co-ordination services through ZooKeeper
- Helped with the sizing and performance tuning of the Cassandra cluster
- Involved in the process of Cassandra data modelling and building efficient data structures.
- Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem
- Responsible for architecting Hadoop clusters
- Assist with the addition of Hadoop processing to the IT infrastructure
- Perform data analysis using Hive and Pig
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Spark, Cloudera Manager, Storm, Cassandra, Pig, Sqoop, Oozie, ZooKeeper, Teradata, PL/SQL, MySQL, Windows, Horton works
Confidential, Kansas City
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java and python for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Explored Spark, Kafka, Storm along with other open source projects to create a realtime analytics framework
- Develop wrapper using shell scripting for Hive, Pig, Sqoop, Scala jobs
- Worked on developing Unix Shell scripts to automate Spark-Sql
- Worked on monitoring log input from several datacenters, via Spark Stream, was analyzed in Apache Storm
- Experienced in defining job flows.
- Experienced in managing and reviewingHadooplog files.
- Participated in development/implementation of Hortonworks environment.
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Involved in NOSQL databases like HBase, Apache CASSANDRA in implementing and integration.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs in java and python.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Experience migrating MapReduce programs into Spark transformations using Spark and Scala
- Writing Pig Latin scripts to process the data and also written UDF in java and python.
- Wrote Map Reduce programs in Java to achieve the required Output.
- Written Hive queries for data analysis to meet the Business requirements.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Cluster co-ordination through Zookeeper.
Confidential, Boston, MA
SharePoint Administrator and Developer
- Gathered Business User requirements and designed site layouts for various departments at CCA.
- Installed, configured SharePoint server 2010 infrastructure.
- Configured and performed backup and restore tasks on entire SharePoint farms.
- Form and implement effective SharePoint 2010 Governance plan.
- Used SharePoint Designer 2010 for branding and modified the look and feel of individual sites.
- Modified default Master pages to replace OOB left-navigation pane and apply customized styles.
- Developed custom style sheets for Content Query Web Parts on SharePoint Sites as business requirements.
- Developed synchronous and asynchronous SharePoint Event Receivers for Lists using Visual Studio 2010
- Used SharePoint Designer 2010 to construct SharePoint workflows on document libraries and lists.
- Configured Approval workflows and Collect Feedback workflows for document libraries.
- Configured SharePoint Alerts on Lists and Document libraries to send messages for List members on changes.
- Configured SharePoint Search schedules and Backup jobs to perform regular crawls and backups.
- Performed troubleshooting and debugging of custom functionalities that are used on SharePoint sites.