Big Data Intern Resume
Dallas, TX
SUMMARY:
- Around 4 years of experience in Data warehouse, Database design and modeling with strong emphasis In Big data
- Expertise in building Traditional data warehouse - Ralph Kimball model including building facts, dimensions based on Star schema
- Experience in building different data warehouse aspects such as conformed dimensions, role-playing dimensions, time dimensions, fact less fact, aggregate fact tables and bridge tables
- Expertise in implementing ETL for batch processes, micro batch ETL for near real-time systems and streaming ETL for Enterprise Application Integration for processing real-time data into data warehouse
- Good knowledge on handling medical data and building systems that conform to the compliance set by HIPAA
- Worked on integrating different sources of data using one common data pipeline and transforming the data to conform to the data warehouse standards of single version of truth
- Good exposure to building ETL on Big Data Framework using Hive, Pig, Sqoop to handle Terabytes of data
- Experience in building data models and database for unstructured and log data
- Good understanding of log processing methodologies using Apache Flume and Hive
- Strong experience in RDBMS technologies like SQL Stored procedures, functions, triggers and database security
- Expertise in tuning and optimizing SQL queries by reducing the logical reads, building smart indexes using the latest technologies such as clustered column store
- Experience in tuning stored procedures using methods to remove costly IO operations such as table valued functions, recursive cursor, and implicit conversions
- Expertise in building data warehouse on Apache Hive, with good expertise on tuning the Hive Queries for better report performance
- Worked on migrating legacy reports from SSRS to Tableau and QlikView, and to build Dashboard and user story
- Experience in Atlassian Jira, GitHub, SVN and other version control systems
- Good knowledge on SAP Business Objects - Building Universe, removing the chasm and fan traps and building production quality reports
- Knowledge on building predictive models using R and Python
- Expertise in translating functional requirements and problems into robust and scalable solutions
- In depth understanding of the SDLC (requirement analysis, designing, development, testing and maintenance)
- Strong experience in Agile Project management methodologies with implementation of Scrum techniques
- Ability to take critical decision and handle clients from a very diverse demographic area
- Adept at mapping client’s requirements, custom designing solutions & troubleshooting for complex information systems management
TECHNICAL SKILLS:
Programming Language: SQL, T: SQL, PLSQL, R, Shell Scripting, Java, Hive Query, Pig Latin Script, MDX
Databases: SQL Server 2012: 2016, Oracle, Mongo BD
Big Data Technologies: Hadoop MapReduce YARN, Hive, Pig, Sqoop, Oozie, Impala, Kafka, Zookeeper
Big Data Distribution: Horton Works Sandbox, Cloudera
Design Tools: CA Erwin, Microsoft Visio
Reporting Tools: Tableau, QlikView, SSRS, SAP Business Objects
ETL Frameworks: MS SSIS, Appworx
Operating Systems: Unix, Linux
Other Tools: Microsoft Project, MS Office
PROFESSIONAL EXPERIENCE:
Confidential, Dallas, TX
Big Data Intern
Environment: Cloudera Hadoop Distribution - Linux, Hive, Pig, MS SQL Server, Flume, Sqoop
Responsibilities:
- Designed and developed data warehouse using Hive, Pig, Sqoop, and Flume to support massive datasets of 20 Terabytes of data for each hospital facilities
- Developed reporting database on Hive which handles 2000+ user queries per second
- Collaborated with Doctors and healthcare expertise to gather requirements and built a system to support real time patient analysis and decision making
- Designed and developed data pipelines to gather data from smart sensors, transform them using PIG scripts and load it into data warehouse on Hadoop framework to support decision making and health monitoring
- Developed micro-batch ETL to facilitate speedy reviews and first mover advantages by using Oozie to automate data loading into HDFS and PIG to pre-process the data
- Migrated the existing data warehouse into Hadoop enabling insights into data which was never before seen using the traditional RDBMS systems, which improved decision making thus increasing profits by 16 %
- Worked on building document analysis systems which enabled the storing of Doctors notes and telephone conversations with patients
- Core member of competitive advantage team which involved building algorithms and systems that enable to get critical patterns and flags from data helping the business outsmart its competition
- Developed and Managed micro-batch ETL which supported the near real-time reporting systems
- Troubleshoot issues in existing ETL pipelines and ensured its smooth functioning
- Developed technical documents, provided training and mentored junior interns
- Created lot of custom reports for the pathologist using complex Hive queries and Tableau Integration
Confidential, Dallas, TX
Student Worker
Environment: Tableau, SSRS, QlikView
Responsibilities:
- Creating the “Wow” experience for clients and customers through seamless and trustworthy service
- Collaborated with managers to optimize the inventory movement thus reducing “stock-outs” and delayed delivery
- Involved in planning and allocating resource across the clients ensuring optimal workload on each resource
- Assisted store manager to manage funds, financial planning thus improving profits by 8 %
- Developed database to store all the reviews and performed sentiment analysis of the customer reviews
- Trained senior management on how to use Tableau and build ad-hoc reports and dashboards
- Designed, developed, implemented, and supported QlikView dashboards. Integrated data sources and databases with QlikView, and designed and developed data models and backend queries for presenting data
- Developed and implemented measurements for various marketing campaigns
Confidential
Programmer
Environment: Yardi Voyager, Horton Works Hadoop Distribution - Linux, Hive, Pig, MS SQL Server, Flume, Sqoop
Responsibilities:
- Involved in requirement gathering, design, coding and testing phases of product development
- Involved in Agile methodology, conducted daily scrum stand up meetings to get updates from the team
- Collaborated with Scrum masters to develop backlog and sprint plans, streamline ticket creation and participated in sprint retrospective meetings which resulted in delivering products before the deadline to clients
- Worked on backlog planning and executions, assisted project manager in developing plan, assigning resources, developing cost estimates for project, managing budgets and assigning workloads and leveling resources using Microsoft Project
- Developed Gantt charts, Fishbone Diagram, Work Breakdown Structure (WBS) to facilitate smooth completion of project deliverables
- Analyzed underlying data for potential discrepancies, investigated errors, and performed data scrubbing
- Performed DevOps on ETL ensuring accuracy and integrity, troubleshoot ETL issues and supported ETL enhancement and maintenance including analyzing query execution plans and optimizing queries for faster performance
- Designed and developed data warehouse to support log aggregation, real time log analysis and processing for analytics
- Developed all aspects of data warehouse for real estate application process to support business intelligence and reporting
- Trained stakeholders and higher management to effectively use the tools built on the existing data warehouse infrastructure
- Ensured coding best practices are applied and integrity of data is maintained by code compliance, security, change management
- Developed Pig scripts to automate the process of cleaning and transforming unstructured raw data for analysis
- Performance tuned Hive queries and the data warehouse to improve the response rate which reduced the report refresh time by 32%
- Designed and developed scalable Data Warehouse on Hive to support processing of 1 TB of data
- Integrated and migrated traditional RDBMS systems, NoSQL databases into Hadoop Ecosystem which lowered operating costs by 12% and facilitated the analysis of unstructured data
- Developed strategies for managing risks for business in real estate market which reduced operating costs by 18%
- Designed and developed ETL to pull data from various systems, integrate and load the data for analytics and business intelligence
- Translated business requirements into technical design specifications using Visio and Erwin to build the Data Warehouse
- Designed and developed scalable Data Warehouse on Hive to support processing of 1 TB of data
- Designed the micro batch ETL process to extract data from CRM, aggregate the data into 23 pre-defined segments and load it into specific data marts in near real-time basis
- Troubleshoot and Maintained the existing ETL packages built on SSIS, developed bug fixes for the existing ETL and tuned queries
- Collaborated with stakeholders and clients to finalize on project milestones and deliverables
- Created interactive reports using visualizations available in Tableau and QlikView
- Extracted, analyzed & created reports which helped prioritize business decisions for top management
- Collaborated with stakeholders and clients to finalize on project milestones and deliverables
- Worked on developing strategies for managing risks for small and medium business in real estate market
- Created 3 Executive Dashboards depicting Sales and Inventory data during the stint. Dealt with data volumes ranging from 20 to 50 million records
Confidential
Software Engineer
Responsibilities:
- Created and maintained logical & physical Data Models for the production and Business Intelligence Data Warehouse
- Created entity relationship diagrams and multidimensional data models, reports and diagrams for marketing
- Slashed ETL load time by 40 % by customizations and tuning & optimizing SQL queries in the Retail Data Warehouse Nightly chain
- Provided on-call support to production system and resolved issues that arise in the nightly ETL run
- Developed custom ETL pipeline using Shell scripts and SQL queries to extract data from 3rd party warehouse management system and integrate it into the main ETL pipeline
- Developed complex SQL queries to perform data cleaning, transforming and data quality checks during the ETL process
- Collaborated with users to understand their data use cases and converted them into design metrics/data models
- Expertise in Data Warehouse/Data mart, ODS, OLTP and OLAP implementations with project scope, Analysis, requirements gathering, data modeling, Effort Estimation, ETL Design, System testing, Implementation and production support
- Developed SQL queries, stored procedure, and functions to support the ad-hoc reporting systems on SQL Server
- Designed mappings to perform ETL from OLTP to OLAP Systems as cross-project constraints
- Improved processing time for reports by 33% by report framework automation and SQL query tuning
- Slashed ETL load time by 40 % by customizations in SQL queries and procedures in the nightly loading
- Cleaned data warehouse data which resulted in rectifying $30 M in sales
- Improved and maintained the existing dimensional models and schema designs to improve performance by 12 %
- Developed scripts in SQL to clean and transform sales data before loading into data warehouse resulting in rectifying $3 M in sales
- Optimized SQL queries and tuned database resulting in a faster execution of data retrieval by 18 %
- Experienced in Agile project management and have successfully worked in scrum teams and SDLC Lifecycle
- Responsible for code review of all SQL code before deploying to production
- Collaborated with users to understand their data use cases and converted them into design metrics/data models
- Spearheaded database performance tuning, recovery, cloning, table partitioning and disk space management
- Implemented organized data environment to manage expenses across all decisions utilizing SDLC and Agile
- Designed and developed meta-schema to allow Arabic language capability without changes to existing functionality
- Working with team of developers designed, developed and implement a BI solution for Sales, Product and Customer KPIs
- Initiated a lot of fine tuning mechanisms to tune the database as well as the queries to complete a set of given jobs or tasks in optimal times