Big Data Consultant Resume
Redmond, WA
PROFESSIONAL SUMMARY
- Over 15+ years of IT experience, played various roles - Big Data Consultant, S/W Developer, Database Administrator in Mission Critical Projects.
- 4+ years Relevant Working Big Data experience, ongoing support for All the Functional/Non-Functional aspects of the Big Data Competency for the Big Data Analytics projects of Tier-1 Telco, like Charging Business Analytics, Operational Data Store/Data Hub - based on the ALDM (Telco specific data model), Confidential present for Microsoft Internal/External Projects on Data Analytics.
- Working on Big Data Design/Development Activities. Spark/Hive Code development/deployment on Multiple Non-Production clusters for NFT/UAT/Production using NoSQL.
- Solid Experience in Data modeling and ETL Processing.
- Expertise in Hadoop Eco system tools such as Hive, Pig, Sqoop.
- Hands on experience in NOSQL database such as Hbase/Phoenix.
- Working experience in Data Mapping, Data Ingestion, Transforming and storing large and real time data sets in Hadoop clusters.
- Hands on experience in working with large volume of Structured and Semi-Structured data.
- Handled HDP, Cloudera & Azure HDInsight distributions, varied from 4 to 20 Nodes clusters of Development/Testing Non-Production Environments.
- Working experience with Large Cluster of 50 nodes HDP Hadoop cluster, processing data of 800 TB production environment.
- Have experience on big data environment of multi nodes cluster with HortonWorks & Cloudera distribution, Microsoft Azure HDInsight, End to End Functional/Nonfunctional aspects of Projects.
- Expertise Hands on experience on developing applications on Apache Spark equally proficient on Python/Scala and processing streaming data. Worked on Messaging System - Kafka.
- Livy Server for Scheduling spark jobs. Used Livy Server for the Sharable spark rdd, in one of the used case.
- Very Good Understanding of the Machine Learning Algorithms for both Supervised/Un-Supervised Learning, Research and experiment with the latest analytics and data technologies.
- Design and develop automated, complex, and efficient ETL processes to match multiple large-scale datasets together.
- Hands-on experience with Kafka. Can create Data pipelines using Kafka writing own python Kafka producers/Consumers/Kafka Connect Configuration. Near Real-time Pyspark Streaming Processing.
- Confidential present Converting for Different Data Analytics Projects, All the Complex Pipelines for the Machine Learning Models which were using the Pandas Data Frame’s written in Pure Python for Windows Azure Machine Learning Studio to Apache Pyspark. Performance Tuning for the Spark Jobs.
TECHNICAL SKILLS
Big Data Skills/Tools: Map Reduce, HDFS, Hadoop Eco Systems ( Sqoop, Hive, Pig, Zookeeper, Apache Spark, Scala, Hbase, YARN, Kerberos, Kafka ). HDP, Cloudera, Microsoft Azure HDInsight distributions. Livy Server.
Hardware & Operating Systems: Windows, UNIX, Solaris, HP-UX, REDHAT 6.x
Database Skills/Tools: Oracle, MySQL, Postgres, Sql Server, In Memory DB (TimesTen), Golden Gate Replication.
Programming Languages & Tools: C#, PL/SQL, Developer 2000/ Designer 2000, Unix Shells .Python, Scala, Java, Github,.
Engagement Experience: Architecture and Design, Database Migration, Database maintenance, Logical and Physical Data Modeling, Project Planning, Agile (SCRUM) Methodologies, Technical Architect, MS Project.
PROFESSIONAL EXPERIENCE
Big Data Consultant
Confidential, Redmond, WA
Responsibilities:
- Project Description: Confidential present working as a Big Data consultant for Microsoft in the Data and Decision Sciences Group consisting of Software Engineering Team - handles the Big Data Analytics Projects for Internal/External Customers of Microsoft. Microsoft Internal Projects: Real Estate & Facilities Management (Peak Average Attendance, Optimum Utilization of Microsoft Offices.
- Data is collected from all the Continents from various sources like Lenel - Badge Data, NIS, SCCM - Device data, Supporting Data Science Group for building the Machine Learning Models and all the development Activities of the Project. Creating the Data Pipelines for the ETL Activities on the Azure Data Factory, Data Verification.
- Another Important Internal Project is Information Security & Risk Management: Retrieval of asset intelligence data for Microsoft security services applications and the Analysis of Windows Security Events logs from around 99999 Servers across continents and Converting Various Machine Learning Models involving complex logic to the real working Analytics Outputs on this Data on HDInsight Environment of Microsoft Azure Cloud.
- End to End Design, ETL Creation, implementation for the Performance Improvement Project for Cyber Security Big Data Analytics project for Migrating the Project from SQL Server to Hive/Hbase/Phoenix/C# integration Environment. Improved the Frontend Customer user experience from multiple seconds to Milliseconds experience. Developing the Applications using the Pyspark/Scala for Apache Spark.
- Hands-on experience with Kafka. Can create Data pipelines using Kafka writing own python Kafka producers/Consumers/Kafka Connect Configuration. Near Realtime Pyspark Streaming Processing.
- Confidential present Converting for Different Data Analytics Projects, All the Complex Pipelines for the Machine Learning Models which were using the Pandas Data Frame’s written in Pure Python for Windows Azure Machine Learning Studio to Apache Pyspark. Performance Tuning for the Spark Jobs.
Environment: Microsoft Azure HDInsights (C#, HDFS,, Hive, Pig, Sqoop, HBase, Apache Spark, Phoenix, Yarn, PowerBI, Cosmos - Microsoft proprietary Big Data Store of Multiple PB, Scope Scripting.
Confidential
Hadoop Developer
Environment: Hortonworks (HDP), HDFS, Hive, Sqoop, HBase, Tableau, Mysql/Oracle, Shell Script, Apache spark, Python.
Responsibilities:
- Used HIVE schema to define the data.
- Designed and created HIVE external tables using shared meta-store instead of derby with partitioning.
- Integrate, maintain and optimize data feeds from multiple sources to enable advanced analytics capabilities. Developed python script for various routine maintenance activities.
- Migrated the current project data from different sources like Oracle, MySql - Data extracts to Hive using sqoop.
- Responsible for technical design and review of data dictionary.
- Designed and developed Complex Data Pipelines from various sources of unstructured/structured data.
- Understanding the flow of the data, Use case analysis.
- Translated complex Non-functional and technical requirements into detailed design. Creation of the HLAD, Installation Documents.
- ETL/ELT Development for the complex Data Pipelines/ ML Models in Pyspark.
- Written Numerous Hive queries.
- Design file schemas with efficient use of partitioning, compression, and optimal file formats.
Confidential
Hadoop Administrator/Developer
Environment: Cloudera, Hive, Sqoop, Oracle, Shell Script, hbase, Python.
Responsibilities:
- Designed and created HIVE external tables using shared meta-store instead of derby with partitioning.
- Integrate, maintain and optimize data feeds from multiple sources to enable advanced analytics capabilities. Developed python script for various routine maintenance activities.
- HDFS Capacity of 40 TB, Cluster of 5 to 10 nodes, and each node of 24 Cores, 128 GB RAM.
- Responsible for technical design and review of data dictionary.
- Understanding the flow of the data, Use case analysis.
- Cluster Installation and maintenance of it through routine support to Big Data Competency.
- Migrated data from oracle data warehouse into HDFS using Sqoop tool, large volume of data Migration strategy preparation.
- Extensive knowledge on sqoop ETL jobs from different RDBMS sources, had automated sqoop jobs on a refresh of data from the different data source.
- Troubleshoot cluster and query issues, evaluate query plans, and optimize schemas and queries.
- Design file schemas with efficient use of partitioning, compression, and optimal file formats.
- Worked on to increase the efficiency on significantly reducing the turnaround time for the Multiple cluster’s quick delivery to the Development/Testing Team’s requests on small HDP (HortonWorks Data Platform) clusters of Multiple Nodes for developing the BDA Projects for Unit/System Testing, by automating the Cloned based clusters and make them available as a working cluster to the Teams in a day, with Infra Sanity and the BDA code Sanity. Earlier it used to take 3 days timeline to deliver a single BDA environment by the Team. Same strategy was applied to the Cloudera based Clusters.
Confidential
DBA Lead
Responsibilities:
- Activities for Confidential &T Site involves Doing all Major Database Upgrade during the less peak business hours. Developing Unix Shell Scripts for various DB utilities for the Products CRM, AMSS and Telegence, Enabler, Dobson Conversion Project, Production Support. Sql Tuning. Database Monitoring Using Precise and Scripts. Scripts Review, Activities involved for the Upgrade to Production. Oracle Consultation.
- Leading a Big DBA team, which provides support to Hundreds + production, UAT, LAB, DEV databases. Instances ranging from 25GB to 40TB+.
- Maintaining and Monitoring around One Thousand Oracle Databases, Total sizing up to 850 Terabytes in the OLTP Environment.
Environment: Unix, Oracle, TimesTen, WebLogic.
Confidential
Dy. Manager
Responsibilities:
- Team Leading in Datacenter for Core Banking Project- Full Branch Computerization, worked on the Core Banking Project (Enterprise wide solution, B@ncs24 FNS), which was the Centralized Web-based solution for connecting all the branches of State Bank of India. This project was on .NET Framework Frontend and Oracle 9i as the Centralized Database on Unix (HP 11i, Superdome).
- Involved in the Data Mapping, Data Migration Projects. Managing around 100 of Test, Development, Production Databases in a Unix, Windows environment with sizes from Multiple MB to Multiple TB. Routine DBA Activities for all aspects of Core DBA
- Developed Projects in Dev.2000 - Forms/Reports, Pl/Sql Packages, Procedures/Functions: Mutual Welfare Scheme, Branch Performance Report /Budget, Personal Data System, Standard Data Systems, Management Information System for Loans & Advances)
- Delegation of Work to the Team members for coding and Testing, involved in Data Modeling, Design and Creation of Database, Schema Objects, Writing Packages, Stored procedures & functions, Backup/Restore, Security aspects, Data Migration (Sql*Loader) for the projects.
- Involved in the full application and database Life Cycle of the Project -System Study, Analysis, and design.
- Worked on Various Projects using Cobol/Unix for the Back office on Banking.
Environment: Unix, Cobol, Oracle, Dev 2000, Designer 2000.