- Software Engineer with over 7 years experience working on large enterprise data warehousing projects and Big Data solutions for the banking, communications, and health - care clients.
- Experience of 2+ years in Big Data and Hadoop EcoSystem.
- Worked extensively on Hadoop platform using Pig, Hive, and Java to build highly scalable data processing pipelines.
- Commendable knowledge of Hadoop Architecture, MapReduce and also various components like HDFS, JobTracker, TaskTracker, NameNode, and DataNode.
- Hands-on experience on major components in Hadoop Ecosystem including PIG, Hive, HBase, HBase-Hive Integration, ZooKeeper, Sqoop.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Expertise in implementing Database projects that includes Analysis, Design, Development, Testing and Implementation of end-to-end IT solution offerings.
- Hands on experience in writing Test Plans/Protocols during Unit test, SIT and UAT.
- Hands-on experience with "productionalizing" Hadoop applications like recomandation engines, time series analysis, master data management.
- Used CDH4 Hue-Beeswax UI to run MR Jobs, load data into HDFS and export the output of MR jobs onto Out tables for BI reporting.
Hadoop: HDFS, Hive, Pig, Flume, Oozie, Zookeeper, HBASE and Sqoop.
Languages: JAVA, PYTHON, SQL, PIG LATIN, HQL, UNIX shell scripting.
Databases: MySQL, Oracle, MS SQL Server
Confidential, Schaumburg, IL
- Exported data from Oracle database to HDFS using Sqoop and NFS mounts
- Developed Hive HQL scripts to denormalize and aggregate the disparate data
- Developed product profiles using Pig libraries such as DataFu, PiggyBank, and ElephantBird
- Automated workflows using shell scripts and Oozie jobs to pull data from various databases into Hadoop
- Worked on performance tuning of Pig and Hive scripts developed by other developers
- Implemented external tables, dynamic partitions, and buckets using Hive
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Worked on Recommendation Algorithms based on Pearson’s Correlation and other distance measures
- Worked on Graph Data Structures to analyze product SKUs and inventory relationships
- Worked with Gephi Visualization Tool to analyze dependencies between product SKUs
Confidential, North Chicago, IL
- Obtained the requirement specifications from the SME’s, Business Analysts in the BR, and SR meetings for corporate work place project. Interacted with the Business users to build the sample report layouts.
- Involved in writing the HLD’s along with the RTM’s tracing back to the corresponding BR’s and SR’s and reviewed them with the Business.
- Implementing an Enterprise level Transfer Pricing System to ensure tax efficient supply chains and achieve entity profit targets.
- IOP implementation involved understanding the Business requirements and solution design, translating the design into model construction, data loading using ETL logic, data validation and creating several custom reports as per the end user requirements.
- Installed and configured Apache Hadoop and Hive/Pig Ecosystems.
- Installed and Configured Cloudera Hadoop CDH4 via Cloudera Manager in a pseudo distributed mode and cluster mode.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with HiveQL.
- Created Map Reduce Jobs using Hive/Pig Queries.
- Past 5 years TPSS data was collected from Teradata and pushed into HDFS using Sqoop.
- Preparation of IDP for the scheduled releases with the Developers and the Solution Architect.
- Designed Validation packages for the inbound data feeds coming in from different Divisions and scheduled jobs for regular data feeds coming from the Informatica ETL team.
- Designed Outbound Packages to dump IOP Processed data into the Out tables for the Data Warehouse and the Cognos BI team.
- Involved in Unit testing, System Integration testing and UAT post development.
- Provided End User training and configured reports in IOP.
- Developed Map Reduce program for parsing and loading the streaming data into HDFS information regarding messaging objects
- Developed Hive queries to pre-process the data for analysis by imposing read only structure on the stream data
- Developed workflow using Oozie for running Map Reduce jobs and Hive Queries
- Used Sqoop for exporting data into MYSQL
- Worked with Agile methodologies and have use scrum in the process
- Worked on defining job flows, jobs management using Fair scheduler
- Worked on Cluster coordination services through Zookeeper
- Worked on loading log data directly into HDFS using Flume
- Worked with Agile Methodologies and have used scrum in the process
Confidential, Lincolnshire, IL
- Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere.
- Involved design and development of Data Migration from Legacy system using Oracle Loader and import/export tools for OLTP system.
- Worked closely with the Data Business Analyst to ensure the process stays on track, develop consensus on data requirements, and document data element/data model requirements via the approved process and templates.
- Was involved in writing Batch Programs to run Validation Packages.
- Extensive use of Store Procedures/functions/Packages and User Defined Functions
- Proper use of Indexes to enhance the performance of individual queries and enhance the Stored Procedures for OLTP system
- Dropped and recreated the Indexes on tables for performance improvements for OLTP application
- Tuned SQL queries using Show Plans and Execution Plans for better performance
- Done the full life cycle software development processes, especially as they pertain to data movement and data integration
Linux System/Database Administrator
- Installing and maintaining the Linux servers.
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers.
- Monitoring System Metrics and logs for any problems.
- Setup crontab schedules to backup data.
- Applied Operating System updates, patches and configuration changes.
- Adding, removing, or updatinguser accountinformation, resettingpasswords, etc.
- Used JDBC to load data into MySQL.
- Maintaining the MySQL server and Authentication to required users for databases.