- Experience in understanding the Requirements, Data Analysis, Data Quality, Data Mapping, Testing and Deployment of business applications.
- Experience in Bigdata Hadoop through multiple projects involving Hadoop, MapReduce , Apache Spark , HDFS, Pig , Hive , Sqoop, UNIX .
- Strong experience in implementing Data warehousing applications using ETL tool, Oracle, and Unix.
- Extensive experience on Bigdata Analytics with hands on experience in writing MapReduce jobs on Hadoop Ecosystem including Hive and Pig.
- Strong knowledge of Data Extraction, Data Mapping, Data Conversion and Data Loading process using UNIX Shell scripting, SQL, PL/SQL, and SQL Loader.
- Excellent knowledge on Hadoop architecture as in HDFS , Job Tracker, Task Tracker , Name Node , Data Node and Map Reduce programming paradigm.
- Excellent knowledge in using Hive queries for Structured data and Hive commands/scripts for unstructured and semi structured data.
- Experience in managing, scheduling batch jobs on a Hadoop cluster and reviewing Hadoop Log files.
- Experienced to generate SQL and PL/SQL scripts to install building database objects including tables, views, primary keys, indexes, constraints, packages.
- Good experience in creating procedures, packages, functions, triggers views, tables, indexes, cursors, SQL collections and optimizing query performance and other database objects using SQL and have good knowledge in writing SQL queries.
- Using Excel to manipulate large amounts of data to perform data analysis.
- Enabled speedy reviews and first mover advantages by using EC2 server to automate data loading from AWS S3 bucket into the Hadoop Distributed File System and Hive, Drill and Spark to pre - process the data, Python and Bash scripting to batch process the data.
- Analyzing the data and building Pivot tables, Charts and reports using Excel.
- Working on production support for catching up data and Seamlessly migrated the Code from Development to Testing, UAT and Production.
- Excellent Communication skills, co-ordination, and communication with system administrators, business data analysts and DBA Team.
- Worked with applications like R and Python to develop neural network algorithms, cluster analysis.
- Adequate knowledge and working experience in Agile & Waterfall methodologies.
- Good knowledge and strong deployment experience in the Hadoop and Big Data ecosystem - Hadoop, Flume, Hive, HBase, Pig, HDFS, MapReduce, Linux, Data Lake etc
- Excellent Organization, Analytical and Problem-Solving skills, and ability to quickly learn new technologies.
Operating Systems: Win 7, 10, UNIX, Linux, Macintosh
Database: RDBMS, Oracle, DB2, SQL Server, MS Access, MySQL, Oracle, PostgreSQL, DB2, MongoDB
Utilities: MS Word, Excel, Macros, Access, Power Point
Hadoop Technologies: HDFS, Hive, Pig, Scoop, Oozie, HDFS, Map Reduce, HBase
Big Data/Hadoop Developer
- Working on the project, which manages the data for campaigning/marketing of the Confidential products.
- Imported data using Sqoop into Hive and Hbase from existing SQL Server.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Performing DQ checks using Spark.
- Developed multiple MapReduce jobs in Hive for data cleaning and preprocessing.
- Involved in Requirement Analysis, Design, and Development.
- Using Github to access control and several collaboration features, such as a wikis and basic task management tools for the project.
- Architect the data lake by cataloging the source data, analyzing entity relationships, and aligning the design as per performance, schedule & reporting requirements
- Design and Develop Hadoop ETL solutions to move data to the data lake using big data tools like Sqoop, Hive, Spark, HDFS, Talend etc.
- Using OneDrive, a file hosting service, to store files, BitLocker recovery keys in the cloud, share files, and sync files.
- Using Jira for bug tracking, issue tracking, and project management.
- Actively participated in Sprint Planning and Retrospective meeting for understanding the future scope or enhancements of the project.
- Created Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way.
- Loading data into Hive partitioned tables.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Environment: s: MapReduce, HDFS, HBase, Hive, Sqoop, Spark, Kafka, AWS, Oracle SQL Developer, MySQL Workbench, Github, OneDrive, SharePoint, Jira, Microsoft 365
Big Data Developer
- Involve in the Analysis, Design, Coding and Testing of the application and participated in discussion meetings about requirements.
- Working with the admin team for the configuration of Hadoop cluster in Pseudo-Distributed and Fully Distributed mode.
- Experience in writing custom MapReduce jobs for data analysis and data cleansing.
- Involve in creating Hive tables, loading data, and writing Hive queries as per requirements.
- Responsible for writing Views, Queries, Stored Procedures to bring the desired output from the various data sources.
- Implementing static partitioning, dynamic partitioning and bucketing of data in Hive for improving the performance.
- Experience with batch processing of data sources using Apache Spark, Elastic search.
- Implementing a continuous integration and deployment framework in Linux environment.
- Working on spartan database and creating DQs for hive table and loading into Production.
- Implementing Partitioning, Dynamic Partitions, Buckets in HIVE.
- Experience in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Working shell scripts to pull the necessary fields from huge files generated by MapReduce jobs.
- Move all log/text files generated by various products into HDFS location.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Involving in Code Review and User Acceptance Testing (UAT) of the application.
- Understanding the process and Interacting with functional team to resolve the issues in the given objects.
- Creates status reports daily and provide status updates in Review Meetings
- Collaborate with the infrastructure, network, database, application, and BI teams to ensure data quality and availability.
Environment: Hortonworks Data Platform 2.5, HDFS, YARN, Hive, Sqoop, Oracle 11g, Oozie, Linux Shell Scripting, Windows, and Unix
Confidential, Harrisburg PA
- Actively took leadership/responsibilities for entire Sever side development using Spring, Hibernate and Integrated with legacy Mainframe using IBM Messaging Queue.
- Developed PDF using iText APIs.
- Implemented Spring Batch Services, Spring AOP, Spring Security Modules.
- Worked on Data Extraction using XML Technologies and integrated it to store in NOSQL Database. (MongoDB)
- Implemented IBM DataPower Gateway for Authorize and Authentication Enterprise Security.
- Developed Rest based web services with JSON.
- Configured WebSphere 8.0 for JDBC, JNDI, MQ Bus and Security Settings.
- Developed Hybrid Mobile Application using AngularJS, Bootstrap, HTML, CSS, Objective C, Rest Web Services, JSON.
- Analyzed the response of JSON to update DOM.
- Performed Unit/Developer level testing and deployment.
Environment: Java/J2EE, Rest Web services, Spring, Hibernate, HTML5, CSS 3.0, JSON, AngularJS, Bootstrap, Objective C, XCode 6.2, DB2, MongoDB, XML Technologies, IBM Data Power Gateway.
- Implemented code based on project specifications.
- Created new tables within MySQL.
- Tested and documented the code.
- Provided feedback to the development group to improve Agile Project management methods.
Environment: TOAD, Microsoft Office, MS SQL Server 2008, MS SQL Server 2012, HP ALM