- 6+ years of experience in software development.
- 5 + years of experience in Big Data development.
- Extensive experience in using Cloudera Hadoop ecosystem.
- Pursuing Data Science specialization certificate from Confidential.
Operating System: Windows XP, Windows 7, Red hat Linux, Centos, Ubuntu, Mac OS
Technologies: Hadoop, Hive, Pig, Sqoop, Oozie, Mahaut, Zookeeper, HDFS, Flume, Amazon EC2, JMS, Spring Framework, Hibernate, AJAX, PLSQL, ExtJS 3.3
Servers: Apache, Tomcat, WebSphere Application Server, JBoss, Tornado
Development Tools: Eclipse, ArgoUML, phpMyAdmin, IBM RAD, Informatica
Relational Databases: IBM DB2, MSSQL, MySQL, Oracle
NoSql Databases: CouchDB, MongoDB, Cassandra, HBase
Continuous Integration Tools: Jenkins
Testing frameworks: SauceLabs, Selenium, PhantomJS, JMeter
Version control: Git, TFS
Sr Data Science Engineer - Data Science Project Lead and architect
- Working directly with CEO and CIO to drive the future of this project and driving the discussions about how the data collected in the Enterprise Data Hub can be used and mined using predictive analytics and machine learning to get valuable information.
- Using clustering algorithms like K-Means with Mahout to mine user segments.
- Performed capacity planning for setting up an in house Enterprise Data Hub.
- Architected an in-house Enterprise Data Hub to handle millions of transactional logs per day.
- Deployed and managed Cloudera 5.4 Hadoop(YARN) cluster with effective capacity of 140TB over 22 data nodes.
- Implemented data pipeline using Kafka 0.8.1.1 and Flume1.5, Zookeeper 3.4, Hive server2 and Sqoop2.
- Mentored new hires for understanding what Big Data is and how our enterprise architecture is set up.
- Created custom Kafka producers for handling accounting logs using Java from 9 geographical distributed PMTA servers generating 25million logs per month.
- Managed a team of 2 Linux system admins to manage the Enterprise architecture and monitoring.
- Created scripts using Hive and Linux Shell scripting to run and monitor Map-Reduce jobs.
- Currently working on adding real time data processing capabilities in the Enterprise Data Hub.
- Working on developing hands on video tutorial for people who want to learn about Big Data, machine learning and data-science in spare time along with sharpening my skills and learning something new everyday.
Confidential, New York, NY
Officer - Technology Analyst and Developer (DevOps Hadoop developer)
- Proposed to integrate BigData processing technology seamlessly into our existing Sales Portal framework.
- Set up a cloud of UNIX machines using Hadoop and configured a multi-node cluster for processing large volumes of access logs and tracing data.
- Performed performance and capacity planning for the cluster.
- Created Oozie workflows for scheduling automated report generation and data ingestion.
- Was responsible for data analysis and reporting for enterprise security anomalies using Hive.
- Developed and executed queries in Hive to generate the reports from the raw data.
- Managed parcels for distribution in CDH, as well as monitoring health and status of various services like HDFS, MapReduce, HBase, Hive and Impala.
- Managed multiple clusters using Cloudera Manager.
- Developed an in house automated testing framework based on SauceLabs.
- Developed JMeter scripts to perform load tests on Mercury Portal pages.
Confidential, New York, NY
Officer - Technology Analyst and Developer
- Managed to deliver a recommendation system for sales team based on Hadoop architecture while working with Analytics Data Advisory team.
- Headed the capability planning and setting of clusters for proof of concept project.
- Analyzed web logs with Datameer from different LOB’s and managed to draw correlation of how our clients react to market news.
- Used Flume and Sqoop for importing log files from different locations into HDFS.
- Developed a system for our sales team that produces multiple lists of potential clients who may take interest in various existing products based on current market news.
Confidential, New York, NY
Technology Analyst and Developer
- Supported the integration of Oracle Golden Gate into the sales portal servers which provided scope for hot hot data replication.
- Built up a framework for automating Selenium test case runs on cloud infrastructure. The work brought down testing costs by more than 60%.
- Formulated an algorithm to compare two very large database schemas consisting on BLOBs and CLOBs.
- The implementation of this algorithm reduced our database consistency checking time down from 3+ days to about 4 hours.
- Performed a POC to compare the benefits of PhantomJS to that of Selenium according to our test cases
- Assisted various lines of business to migrate to sales portal by developing modules in ExtJS and Liferay.
- Developed dashboards in Splunk to analyze user actions from server access logs.
Confidential, Baltimore, MD
Programmer - Information Technology
- Developed two new LoginAs building block features, one for logging admin activity and the other for viewing a specific user’s Blackboard interface
- Analyzed the student and faculty database of JHU and generated user reports that allowed for meaningful suggestions to enhance the quality of Blackboard for all users.
- Created Course Life building block to facilitate course archiving feature for JHU.
- Contributed these modules to the open source community.
Software Engineer- Associate
- Worked with Stanford University Hospital to migrate data from OLTP (database) to data warehouse performing ETL operations using Informatica.
- Collaborated with on-site teams and business analyst to translate business requirements into high-level technical design documents.
- Developed OLAP cubes for hospital patient historical data to facilitate reporting needs of BO team.