- Over all 8+ years of progressive experience in the IT industry with proven expertise in implementing Software solutions, Database Design, Development and business intelligence of Confidential SQL Server 2016/2014/2012/2008 suite, which also includes latest versions of Big Data ecosystem and related technologies .
- Well versed in data extraction (Azure blob ) and data ingestion for raw data to clean, process and to conduct trend and sentimental analysis.
- Expertise in using Cloud based managed services for data warehousing/analytics in Confidential Azure (Azure Data Lake Analytics, Azure Data Lake Storage, Azure Data Factory, Azure Table Storage, U - SQL, Stream Analytics, HDInsight , etc
- 5 years of experience on Batch Analytics using Hadoop working environment includes Map Reduce, HDFS, Hive, Pig, spark, Zookeeper, and Sqoop.
- In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Node Manager, Applications Master, Name Node, Data Node concepts.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Hands on experience with SQL Server 2012 Always On High Availability Groups.
- Extensive experience in developing Stored Procedures, Views and Complex Queries on SQL Server 2014,2012, 2008.
- Highly proficient in the use of SQL Server and T-SQL in constructing tables, user functions, views, indexes, user profiles, relational database models and data integrity, SQL joins and query writing.
- Job workflow scheduling and monitoring using tools like Azure Data Factory.
- Good experience in Cloudera, Hortonworks & Apache Hadoop distributions.
- Worked with relational database systems (RDBMS) such as MySQL, MSSQL, Oracle.
- Assisted with performance tuning and monitoring.
- Used Shell Scripting to move log files into HDFS.
- Good hands on experience in creating the RDD's , DF's for the required input data and performed the data transformations using Spark .
- Developed Scala scripts , UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop .
- Good understanding in processing of real-time data using Spark.
- Import the data from different sources like HDFS into Spark RDD.
- Experience in writing MapReduce jobs in python for some complicated queries.
- Experienced with different file formats like Parquet, CSV, Text, Sequence, JSON .
- Good knowledge on Data Modelling and Data Mining to model the data as per business requirements.
- Involved in unit testing of Map Reduce programs using Apache MRunit.
- Good knowledge on python scripting and bash scripting languages.
- Expert in Data Visualization development using Tableau to create complex and innovative dashboards.
- Generated ETL reports using Tableau and created statistics dashboards for Analytics.
- Working knowledge of Agile and waterfall development models.
- Worked with different software version control, Jira, bug tracking and code review systems like Git.
Big data/Hadoop Ecosystem: HDFS, Map Reduce, HIVE, Sqoop, Spark, HCatalog. ADF, HD insight, Jupyter, Ambari.
Programming Languages: Scala, SQL, PL/SQL, Linux shell scripts, Power shell script
Oracle 11g/10g, DB2, MS: SQL Server, MySQL, Teradata.
Operating System: Ubuntu (Linux), Win 95/98/2000/XP, Mac OS, Red Hat
Methodologies: Agile/Scrum, Rational Unified Process and Waterfall
Distributed plat forms: Hortonworks, Cloudera, MapR
Monitoring tools: Ambari, Ganglia, Nagios.
Confidential, Redmond, WA
Big data consultant
- Understand, articulate and present business requirements into technical solutions
- Analyze and profile current data to ensure no requirements are missed
- Work closely with Data Science team to provide datasets necessary for experiments and creating machine learning models
- Create data integration and technical solutions for Azure Data Lake Analytics, Azure Data Lake Storage, Azure Data Factory, Azure SQL databases and Azure SQL Data Warehouse for providing analytics and reports for improving marketing strategies.
- Create linked services to connect to Azure Storage, on-premises SQL Server and Azure HDInsight
- Configured Azure SQL database with Azure storage Explorer and with SQL server
- Creating ADL, Azure Storage, ADB and resource groups in Azure portal
- Preparing and uploading the source data via azure storage explorer.
- Leverage Azure Blob storage accounts via poweshell or azure storage explorer for storing the different types of data like Blob storage, Table storage, Queue storage and File storage.
- Create ADF accounts for the resource groups
- Create Input and Output data sets to manage the data type, data structures, availability and policy of input and output files
- Created U-SQL script for transform activities and developed complex queries to transform the data from multiple sources, and outputted the data into Azure Data warehouse.
- Customizing extractors, reducers, processors to extract, reduce, and process data files such as txt, csv, parquet, JSON and structured data.
- Monitoring the Pipeline and Data Slices for ADF life cycle
- Manage ADF Custom activities.
- Work with OnPrem-Azure SQL Databases and Azure SQL DataWarehouse.
- Usage of Key Vault for storing and retrieving security related information.
- Worked on flexible metadata driven architecture.
- Created the Power BI report on data in ADLS, Tabular Model, Pipe line status and SQL server.
- Perform technical walk-through as needed to communicate design/coded solution and to seek input from team lead and members
- Experience working in an agile team environment
- Understanding of code promotion, code turnover to operations, version control, and QA procedures
- Conduct code walkthroughs, peer reviews, and produce technical documentation
- Work with quality assurance team to build test plans to validate that new systems meet business requirements
- Created multiple Power BI reports which are capable for generating deep analytical insights.
- Design and develop or improve data warehouses, ETL packages, multi-dimensional OLAP cubes, data mining models, performance dashboards, and reports.
- Created Queries by importing data from different sources
- Created relations between tables.
- Publish the reports in web.
- Created snapshot on the report server management to ensure that the reports on the server won't be affected by constantly changing database environment.
- Validated the reports in dashboard and shared those reports to team members.
- Exposure on COSMOS.
Environment: Hadoop, HDFS, Spark, Scala,, Hive, Git, Hortonworks, Azure HD Insight, Azure Storages (Blobs, Tables, Queues), Azure Data Factory, Azure Data warehouse, Azure portal, Power BI, Visual Studio, SSMS, SQL Server 2016 .
Confidential, San Jose, CA
Sr. Hadoop /Spark Developer
- Worked on Hadoop technologies like, Hive, Sqoop, sparkSQL and Big Data testing.
- Developed automated scripts for ingesting the data from Teradata around 200TB bi-weekly refreshment of data.
- Developed Hive scripts for end user / analyst requirements for adhoc analysis.
- Used of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle) for product level forecast.
- Worked in tuning Hive to improve performance. Developed UDFs using JAVA as and when necessary to use HIVE queries.
- Extracted the data from Teradata into HDFS using Sqoop.
- Created Sqoop job with incremental load to populate Hive External tables.
- Developed TWS workflow for scheduling and orchestrating the ETL process.
- Created Tableau Dashboards with interactive views, trends and drill downs along with user level security
- Used Impala to read, write and query the Hadoop data in HDFS .
- Functional, non-functional and performance testing of key systems prior to cutover to AWS
- Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Configured Hadoop system files to accommodate new sources of data and updated the existing configuration Hadoop cluster
- Involved in gathering business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Worked on importing and exporting data from different databases like Oracle, Teradata into HDFS and Hive using Sqoop.
- Actively participating in the code reviews, meetings and solving any technical issues.
Environment: Hadoop, Map Reduce, HDFS, Spark, Scala,, Hive, SBT, Pig, UNIX, Python, Git, Hortonworks, Oozie .
Sr. Hadoop Developer
- Imported data from different relational data sources like RDBMS, Teradata to HDFS using Sqoop.
- Imported bulk data into HBase Using Map Reduce programs.
- Designed and implemented Incremental Imports into Hive tables.
- Used Rest ApI to Access HBase data to perform analytics.
- Developed Spark code using Scala and Spark -SQL/Streaming for faster testing and processing of data.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala .
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from hdfs to hive and impala.
- Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
- Experienced with batch processing of data sources using Apache Spark, Elastic search.
- Develop wrapper using shell scripting for Hive, Pig, Sqoop, Scala jobs.
- Worked on developing Unix Shell scripts to automate Spark-Sql.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
- Developed java Restful web services to upload data from local to Amazon S3, listing S3 objects and file manipulation operations.
- Configured a 20-30 node (Amazon EC2 spot instance) Hadoop cluster to transfer the data from Amazon S3 to HDFS and HDFS to Amazon S3 and also to direct input and output to the Hadoop MapReduce framework.
- Experienced in managing and reviewing the Hadoop log files.
- Successfully ran all Hadoop MapReduce programs on Amazon Elastic MapReduce framework by using Amazon S3 for input and output.
- Involve in Data Asset Inventory to gather, analyze, and document business requirements, functional requirements and data specifications for Member Retention from sources SQL / Hadoop.
- Worked on solving performance and limit queries to the workbooks that when it connects to live database by using a data extract option in Tableau.
- Designed and developed Dashboards for Analytical purposes using Tableau.
- Designed and implemented facts, dimensions, measure groups, measures and OLAP cubes using dimensional data modeling standards in SQL Server 2008 that maintained data
- Creating and Designing OLAP using SAS OLAP Cube Studio.
- Designing Source, Job, Target using SAS OLAP Cube Studio and SAS/DIS.
- Analyzing OLAP Using SAS OLAP Viewer and SAS Dataset using SAS/EG.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Scripts.
Environment: Hadoop, HDFS, Pig 0.10, Hive, AWS, MapReduce, Sqoop, Java Eclipse, SQL Server, Shell Scripting.
Confidential, SFO, CA
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Designed and developed Big Data analytics platform for processing customer viewing preferences and social media comments using Java, Hadoop, Hive and Pig.
- Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured and unstructured data.
- Worked on analyzing hadoop cluster and different Big data Components including Pig, Hive, Spark, database and SQOOP.
- Experienced in defining job flows.
- Developed and executed custom MapReduce programs, PigLatin scripts and HQL queries.
- Used Hadoop FS scripts for HDFS (Hadoop File System) data loading and manipulation.
- Performed Hive test queries on local sample files and HDFS files.
- Developed and optimized Pig and Hive UDFs (User-Defined Functions) to implement the functionality of external languages as and when required.
- Extensively used Pig for data cleaning and optimization.
- Developed Hive queries to analyze data and generate results.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Analyzed business requirements and cross-verified them with functionality and features of NOSQL databases like HBase, Cassandra to determine the optimal DB.
- Load and transform large sets of structured, semi structured and unstructured data.
- Installed and configured Apache Hadoop, Hive and Pig environment on the prototype server. Configured SQL database to store Hive metadata.
- Loaded unstructured data into Hadoop File System (HDFS).
- Created ETL jobs to load Twitter JSON data and server data into MongoDB and transported MongoDB into the Data Warehouse.
- Responsible to manage data coming from different sources. Responsible for implementing MongoDB to store and analyze unstructured data.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Implemented CDH3 Hadoop cluster on CentOS.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Cassandra, Oozie, Java (jdk 1.6), Eclipse
- Participated in requirement gathering and converting the requirements into technical specifications.
- Created UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams.
- Created Business Logic using Servlets, POJO’s and deployed them on Web logic server.
- Wrote complex SQL queries and stored procedures.
- Developed the XML Schema for the data maintenance and structures.
- Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
- Developed and implemented custom data validation stored procedures for metadata summarization for the data warehouse tables, for aggregating telephone subscribers switching data, for identifying winning and losing carriers, and for identifying value subscribers.
- Identified issue and developed a procedure for correcting the problem which resulted in the improved quality of critical tables by eliminating the possibility of entering duplicate data in a Data Warehouse.
- Designed and implemented SQL based tools, stored procedures and functions for daily data volume and aggregation status
- Responsible to manage data coming from different sources.
- Developed map reduce algorithms.
- Got good experience with NOSQL database.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.