- 7 years of experience in IT with expertise in Hadoop Ecosystem - Hive, Hue, Pig, Sqoop, HBase, Impala, Flume, Zookeeper, Oozie, Kafka and Apache Spark.
- Hands on experience in installing, configuring and using ecosystem components like Hadoop Map Reduce, Hive, Sqoop, Pig, HDFS, Hbase, ZooKeeper, and Flume.
- Experience in developing Pig scripts and Hive Query Language. knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce programming paradigm.
- Developed custom Map-Reduce programs using Apache Hadoop to perform Data Transformation and analysis as per requirement.
- Converted Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL.
- Worked on Splunk tool for generating BAU alerts.
- Experienced with performing real time analytics on NoSQL databases like HBase, MongoDB and Cassandra.
- Done Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
- Hands on experience with data warehouse product Amazon Redshift, a part of the AWS (Amazon Web Services).
- Experience in collecting the log data from different sources like (webservers and social media) using Flume and storing on HDFS to perform MapReduce jobs.
- Experience with various Business Intelligence tools and SQL databases.I
- Experience in JAVA/J2EE Technologies Database development, ETL Tools, Data Analytics.
- Fluent with the core Java concepts like I/O, Muiti-threading, Exceptions, RegEx, Collections, Data-structures and serialization.
- Proficient in SQL Server Programming (T-SQL), Integration services (SSIS), Analysis Services (SSAS) and Reporting services (SSRS) in SQL 2012/2008 R 2/2008/2005.
- Experience in Data Warehouse Development environment using Microsoft SSIS, SSAS.
- Strong knowledge on data warehousing principles like fact tables, dimensional modeling, normalization and data modeling.
- Extensive experience in maintenance of Stored Procedures, triggers, joins, keys and Functions.
- Experienced in Data Loading using T-SQL, fine Tuning of Database and improving Performance, Creating new Database, Creating and Modifying Table space and Optimization.
- Hands on experience architecting Full/Historical/Initial/Incremental ETL process.
- Excellent Communication, Analytical skills and Interpersonal skills. Exceptional ability to learn new concepts.
Databases: MS-SQL Server 2012/2008R2/2008/2005Oracle 11g/10g, Microsoft Access, TD V13
NoSQL Databases: Hbase, Cassandra, MongoDB.
Big Data: Hadoop, Hive, sqoop, PIG, Spark, HBase, Oozie, Flume, Map Reduce, splunk.
Programming Languages: Java, J2ee, HTML, XML, C, C++, C#, T-SQL, Python, SPL, SAS, core JAVA.
Configuration Management Tools: JIRA, Team Foundation Server (TFS), Rally
Operating System: Windows Server 2003,2008,2012Windows XP Professional/Standard
Big data/Splunk Developer
- Administration of hadoop cluster ensuring high availability of NameNode, mixed-workload management, performance optimization, health monitoring, backup of one or more nodes.
- Involved in loading data from UNIX file system to HDFS.
- Preparing Oozie workflow, Korn Shell jobs and pushing the code to Dev and prod environments.
- Implemented Different analytical algorithms using map reduce programs to apply on top of HDFS data.
- Worked on NoSQL database including MongoDB, Cassandra and HBase
- Created tasks for incremental load into staging tables, and schedule them to run.
- Written multiple Map-Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Used Apache Kafka as messaging system to load log data, data from UI applications into HDFS system.
- Used Pig to perform data transformations, event joins, filter and some pre-aggregations before storing the data onto HDFS.
- Involved in Installing, Configuring Hadoop components using CDH 5.2 Distribution.
- Developed multiple POCs using scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Scheduling and managing cron jobs, wrote shell scripts to generate alerts.
- Worked as a splunk/ Big Data developer and used Search Processing Language (SPL) in building alert sources for different tableau dashboards across the enterprise.
- Involved in sourcing data to splunk indexers from big data servers.
- Implemented HIVE queries in Splunk DBconnect as a part of alert generation process using big data source.
- Involved in migrating Teradata queries to big data environment by data mapping and writing complex hive queries to ensure proper migration.
- Scheduled and monitored alerts in splunk as well as designed jobs for pipelining splunk output to SQL server.
- Implemented SQL server structure with tables, views, stored procedures and built a data model for accommodating data from Splunk.
Environment: CDH5.3.2, Hadoop 0.20/2.2,MapReduce,Hive, Apache Pig, SQOOP, Oozie, Scala, Pyspark, Core Java, Spark, YARN, Unix, Oracle, Teradata V13.0,MS SQL Server 2012/2008 R2, T-SQL, Teradata (TD V13), Splunk (Versions 6.2 and 6.3), C#, ASP.NET, Unix, SAS (version 9.4), Tableau (9.2,9.3), Rally
Confidential, Cincinnati, OH
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in EDW.
- Installed and configured Hive, Pig, Oozie, and Sqoop on Hadoop cluster.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers. pre-processed large sets of structured, semi-structured data, with different formats (Text Files, Avro, Sequence Files, JSON Records)
- Worked on both Map Reduce 1 and Map reduce 2 (YARN) architectures.
- Fixed various data transformation issues in Pig, Hive & Sqoop
- Wrote Pig scripts to process unstructured data and create structure data for use with Hive.
- Developed the Sqoop scripts in order to make the interaction between Hive and RDBMS Database.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Developed scripts and batch jobs to schedule variousHadoopprograms.
- Used Frameworks of BigData world, MRUnit, PIGUnit for testing raw data and executed performance scripts.
- Improved performance on MapReduce Jobs by creating combiners, Partitioning and Distributed Cache.
- Analyzed Migration plans for various versions of Cloudera Distribution of Apache Hadoop to draw the impact analysis, and proposed the mitigation plan for the same.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
Environment: Hadoop 0.20, Apache Pig, Hive, Hbase, SQOOP, Oozie MS SQL Server 2012/2008 R2, T-SQL, C#, ASP.NET, SSRS 2012/2008, Crystal Reports, SSIS 2012/2008, Erwin r7
Confidential, Charlotte, NC
Sr. JAVA developer
- Reviewed documentation released by internal software developers and learnt about the upcoming developments in software applications.
- Gathered business requirements, definition and design of data source and data flows.
- Created tables, triggers if needed to implement the business rules.
- Involved in the full software development life cycle of the project from analysis and design to testing and deployment.
- Used Spring MVC front controller pattern to handle requests starting with dispatcher servlet to handle incoming requests.
- Designed & developed the UI application using React JS, Rest, Spring MVC, Spring Data JPA, Spring Batch, Spring Integration & Hibernate.
- Developed RESTful web services using Java, Spring Boot
- Participated in technical design reviews and actively involved in Functional Design.
- Created UML diagrams to capture architecture and application design.
- Involved in preparing use-case diagrams, sequence diagrams and class diagrams using Rational Rose, UML.
- Performed feasibility analysis, scopes projects, and worked with the project management team to prioritizes deliverables, and negotiate on product functionalities.
- Developed test cases and performed unit test using JUnit Framework.
- Build code using Eclipse and deploy it using Apache Tomcat.
- Developed and designed SSIS packages to move data from production environment to different environments within the organization.
- Designed and implemented complex SSIS packages to migrate data from multiple data sources for data analyzing, deploying, and dynamic configuring of SSIS packages.
- Used SSIS to manually transfer DTS packages 2005 to SSIS packages 2008 using MS SSIS 2008 and maintained a combined data store for reporting purposes.
Confidential, Minneapolis, MN
SQL Server Developer (SSIS/SSRS)
- Interacted with users, business requirements gathering, documentation and user training.
- Worked with various business groups while developing their applications, assisting in Database design, creating ER diagrams.
- Involved in designing Logical and Physical Data Models.
- Helped in developing the long term BI strategies.
- Created and managed schema objects such as tables, views, indexes, stored procedures and maintaining Referential Integrity.
- Created SSIS packages to populate data from various data sources and for data extraction from Flat Files, Excel Files and OLEDB to SQL Server.
- Configured SSIS packages using Package configuration wizard to allow packages run on different environments.
- Developed dashboard reports using SQL Server Reporting Services (SSRS), Report Model and ad-hoc reporting using Report Builder.
- Generated complex SSRS reports using multiple data providers, Global Variables, Expressions.
- Deployed reports, created report schedules and subscriptions. Managed and secured reports using SSRS.
- Administered the MS SQL Server by creating user logins with appropriate roles, monitoring the user accounts, creation of groups, granting the privileges to users and groups.
- Used BCP command and T-SQL command to bulk copy the data from Text files to SQL Server and vice versa.
Environment: MS SQL Server 2005, MS SQL server 2008, SQL Server Reporting Services (SSRS 2008), SQL Server Integration Services (SSIS), SQL profiler.
- Installation and configuration of SQL Server 2000 on Window 2000/NT OS.
- Managing of database objects like Tables, views, indexes etc.
- Managing of users including creation/alteration, grant of system/db roles and permission on various database objects.
- Managing Backup and restore, export and import.
- Developed DTS Packages to transfer the data between SQL Server and other database and files.
- Well experienced with tools like Query Analyzer, SQL utility.
- Implementation of data transfer through DTS and setting guidelines to automate recovery scenarios.
- Creation and management of Database maintenance plan for various database consistency checks.
- Used SQL Profiler to monitor the server performance, debug T-SQL and slow running queries.
- Automation and execution of data transformation scripts.
- Developed Transact-SQL scripts to enforce business rules.
Environment: MS SQL Server 2000, Windows 2000, MS Access, T-SQL, DTS, VB 6.0.