- Extensive experience in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts.
- Experience on NoSQL databases including HBase.
- Experience in using Pig, Hive, Scoop and Cloudera Manager.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in analyzing data using HiveQL, Pig Latin and Map Reduce.
- Working knowledge with various other Cloudera Hadoop technologies (Impala, Sqoop, HDFS, Spark, SCALA etc)
- Expertise in Apache Spark Development (Spark SQL, Spark Streaming, MLlib, GraphX, Zeppelin, HDFS, YARN, and NoSQL).
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster along with CDH3&4 clusters.
- Experience in managing and reviewing Hadoop log files.
- Hands on experience in RDBMS, and Linux shell scripting
- Developed MapReduce jobs to automate transfer of data from HBase.
- Knowledge in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
- Proven ability in defining goals, coordinating teams and achieving results.
- Procedures, Functions, Packages, Views, materialized views, function based indexes and Triggers, Dynamic SQL, ad-hoc reporting using SQL.
- Worked hands on ETL process
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper, of NoSQL databases such as HBase, Cassandra, and of administrative tasks such as installing Hadoop, Commissioning and decommissioning, and its ecosystem components such as Flume, Oozie, Hive and Pig.
- Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
Database: MS SQL Server 2014/2012/2008 R2/2005 and MS Access
Database Migration: Migrated and upgraded SQL Server from 2000/2005 to 2008R2 and 2005/2008R2 to 2012
Operating System: Windows Server 2012R2/2008R2/2003 R2
Reporting Tool (SSRS): SSRS (SQL Server Reporting Services)
SSIS/DTS: Created and managed SSIS packages, upgraded DTS packages to SSIS
Data Modeling: Toad Data Modeler, ERWin
Hardware/Storage: SAN, RAID
Third Party Tools: Spotlight, LiteSpeed, WinDirStat
- Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Created Hive tables, dynamic partitions, buckets for sampling, and worked on them using Hive QL
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop.
- Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
- Monitored and managed the Hadoop cluster using Apache Ambari
- Imported data into HDFS from various SQL databases and files using Sqoop and from streaming systems using Storm into Big Data Lake.
- Worked with NoSQL databases like HBase to create tables and store the data Collected and aggregated large amounts of log data using Apache Flume and staged data in HDFS for further analysis.
- Developed custom aggregate functions using Spark SQL and performed interactive querying.
- Wrote Pig scripts to store the data into HBase.
- Stored the data in tabular formats using Hive tables and Hive SerDe.
- Exported the analyzed data to Teradata using Sqoop for visualization and to generate reports for the BI team. Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).
- Involved in HDFS maintenance and administering it through Hadoop-Java API
- Involved in writing Flume and Hive scripts to extract, transform and load data into Database
- Used HBase as the data storage
- Participated in development/implementation of Cloudera Hadoop environment.
- Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
- Installed and configured Hive and also written Hive UDFs and Used Map Reduce and Junit for unit testing.
- Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to Hive and impala.
- Developed and delivered quality services on-time and on-budget. Solutions developed by the team use Java, XML, HTTP, SOAP, Hadoop, Pig and other web technologies.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Environment: Java, Hadoop, Hive, Pig, Sqoop, Flume, HBase, Oracle 10g/11g/12C, Teradata, Cassandra, HDFS, Data Lake, Spark, MapReduce, Ambari, Cloudera, Tableau, Snappy, Zookeeper, NoSQL, Shell Scripting, Ubuntu, Solar.
- Designed Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Responsible for building scalable distributed data pipelines using Hadoop.
- Used Apache Kafka for tracking data ingestion to Hadoop cluster.
- Wrote Pig scripts to debug Kafka hourly data and perform daily roll ups.
- Data Migration from existing Teradata systems to HDFS and build datasets on top of it.
- Built a framework using SHELL scripts to automate Hive registration, which does dynamic table creation and automated way to add new partitions to the table.
- Setup and benchmarked Hadoop/HBase clusters for internal use. Developed Simple to complex MapReduce programs.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed Oozie workflows that chain Hive/MapReduce modules for ingesting periodic/hourly input data.
- Wrote Pig & Hive scripts to analyze the data and detect user patterns.
- Implemented Device based business logic using Hive UDFs to perform ad-hoc queries on structured data.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers.
- Prepared Avro schema files for generating Hive tables and shell scripts for executing Hadoop commands for single execution.
- Continuously monitored and managed the Hadoop cluster by using Cloudera Manager.
- Worked with administration team to install operating system, Hadoop updates, patches, version upgrades as required.
- Developed ETL pipelines to source data to Business intelligence teams to build visualizations.
Environment: Cloudera Manager, Map Reduce, HDFS, Pig, Hive, Sqoop, Apache Kafka, Oozie, Teradata, Avro, Java (JDK 1.6), Eclipse.
Senior SQL Server Database Analyst
- Install, configure and maintain SQL Server 2014/2012/2008 R2/2008 and 2005.
- Responsible for software installation, database creation, support and administration of SQL Server databases.
- Develop, test, and deploy database backup and recovery procedures for all Development, Test and Production environments.
- Monitor and optimize performance issues at the application and database level.
- Assist with the developing of SQL Server security procedures for user creation and privilege granting. We work with the project team to establish procedures for creating application specific users, profiles and privileges.
- Database Encryption/Cryptography - Data encryption for PII (Physically Identifiable Information) data, Financial data, and other related data
- Perform SQL Server monitoring using Spotlight and tuning to help ensure database meets performance requirements as defined by the business application owner.
- Migrate and upgradeSQL Server from 2005/2008R2 to 2012/2014
- Participate in SQL Server Patching.
- Analyze daily reports on Backup shares, CIFS shares reports, Needs review items and take necessary actions.
- Develop, test, and deploy non-production environment refresh or clone procedures. Negotiate reasonable frequencies with the developers.
- Estimate storage requirements for TEST, QA and Production databases (and other environments - training, stress as needed).
- Imported/exported MS ACCESS/Excel data into SQL Server using Import/Export Wizard.
- Provide 24x7 on-call support for Production database issues from a rotating pool of SQL Server DBAs.
- Attend Capacity Planning meeting, provide recommendations to management, review existing reports on disk storage for different server and databases, management reports including databases growth rates, contingency planning, various statistic as well as application tuning statistics.
Environment: Windows Server 2012 R2, 2008 R2, SQL Server 2014, 2012, 2008 R2, 2005, AD (Active Directory), SQL LiteSpeed, Spotlight, SQL Profiler
SQL Server DBA
- Installed, configured and maintained SQL Server 2012, 2008 R2 and 2005.
- Implemented SQL Server security and Object permissions by creating Groups, Logins and users by granting appropriate permission on Server level, Database level as well as Object level.
- Performed database backup, Automated Backup process, Restored Databases for UAT, Dev and Test purposes.
- Created, modified, and managed database objects (Stored Procedures, Functions, Triggers, TSQL and views).
- Data Modeling using Toad Data Modeler and SQL Server Data Diagram - utilized both Backward and Forward Engineering.
- Created SSIS Packages to Move data from Database to flat files (Text file, Excel file and CSV file).
- Responsible for index creation, index removal, index modification, file group modifications, and adding scheduled jobs to re-index and update statistics in databases.
- Scheduled and maintained new and existing jobs to automate different database related activities including backup, monitoring database health, disk space, backup verification, database statistics, rebuild or reorganize indexes.
- Performance tuning of Databases proactively, worked with developers for TSQL/SQL Performance tuning.
- Used DBCC to check physical and logical consistency of database.
- Creating Jobs, setting Alerts, Monitoring Jobs, troubleshooting jobs.
Work Environment: Windows Server 2012 R2, 2008 R2, IIS 7.5, SQL Server 2012, 2008 R2, AD (Active Directory), and applications in J2EE, ASP, ASP.net
SQL Server DBA
Confidential,New York City, NYResponsibilities:
- Installed, configured and maintained SQL Server 2005 and 2008R2 Enterprise edition
- Performed daily tasks including backup and restore by using SQL Server Management Studio also automated them by using SQL Server Agent
- Installed and administered SQL Server Reporting Services(SSRS) and deployed reports
- Extensively used tools like SQL Profiler, Index Tuning Advisor and Windows Performance Monitor for monitoring and tuning MS SQL Server performance
- Enforce Database Security by Managing user privileges and efficient resource management by assigning roles to users
- Migration from SQL server 2000 to 2008R2 by using In-Place upgrade process and as well as side-by-side upgrade process and documented the Upgrade Process
- Used Performance Monitor/Profiler to resolve Dead locks/Long running queries
- Monitoring Event Viewer, SQL Error logs and Log File Viewer for Software and hardware related errors
- Managed Alerts, Operators and Jobs through SQL Server Agent
- Configured and maintained Database Mirroring, Log shipping and Replication
Work Environment: Windows Server 2008 R2, 2003 R2, IIS 7.5, 6.5, Apache, Tomcat 5.5, SQL Server 2008 R2, 2005, 2000 ASP, ASP.net, PHP, AD (Active Directory)