Big Data Intern Resume Dallas, TX - Hire IT People

SUMMARY:

Around 4 years of experience in Data warehouse, Database design and modeling with strong emphasis In Big data
Expertise in building Traditional data warehouse - Ralph Kimball model including building facts, dimensions based on Star schema
Experience in building different data warehouse aspects such as conformed dimensions, role-playing dimensions, time dimensions, fact less fact, aggregate fact tables and bridge tables
Expertise in implementing ETL for batch processes, micro batch ETL for near real-time systems and streaming ETL for Enterprise Application Integration for processing real-time data into data warehouse
Good knowledge on handling medical data and building systems that conform to the compliance set by HIPAA
Worked on integrating different sources of data using one common data pipeline and transforming the data to conform to the data warehouse standards of single version of truth
Good exposure to building ETL on Big Data Framework using Hive, Pig, Sqoop to handle Terabytes of data
Experience in building data models and database for unstructured and log data
Good understanding of log processing methodologies using Apache Flume and Hive
Strong experience in RDBMS technologies like SQL Stored procedures, functions, triggers and database security
Expertise in tuning and optimizing SQL queries by reducing the logical reads, building smart indexes using the latest technologies such as clustered column store
Experience in tuning stored procedures using methods to remove costly IO operations such as table valued functions, recursive cursor, and implicit conversions
Expertise in building data warehouse on Apache Hive, with good expertise on tuning the Hive Queries for better report performance
Worked on migrating legacy reports from SSRS to Tableau and QlikView, and to build Dashboard and user story
Experience in Atlassian Jira, GitHub, SVN and other version control systems
Good knowledge on SAP Business Objects - Building Universe, removing the chasm and fan traps and building production quality reports
Knowledge on building predictive models using R and Python
Expertise in translating functional requirements and problems into robust and scalable solutions
In depth understanding of the SDLC (requirement analysis, designing, development, testing and maintenance)
Strong experience in Agile Project management methodologies with implementation of Scrum techniques
Ability to take critical decision and handle clients from a very diverse demographic area
Adept at mapping client’s requirements, custom designing solutions & troubleshooting for complex information systems management

TECHNICAL SKILLS:

Programming Language: SQL, T: SQL, PLSQL, R, Shell Scripting, Java, Hive Query, Pig Latin Script, MDX

Databases: SQL Server 2012: 2016, Oracle, Mongo BD

Big Data Technologies: Hadoop MapReduce YARN, Hive, Pig, Sqoop, Oozie, Impala, Kafka, Zookeeper

Big Data Distribution: Horton Works Sandbox, Cloudera

Design Tools: CA Erwin, Microsoft Visio

Reporting Tools: Tableau, QlikView, SSRS, SAP Business Objects

ETL Frameworks: MS SSIS, Appworx

Operating Systems: Unix, Linux

Other Tools: Microsoft Project, MS Office

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Big Data Intern

Environment: Cloudera Hadoop Distribution - Linux, Hive, Pig, MS SQL Server, Flume, Sqoop

Responsibilities:

Designed and developed data warehouse using Hive, Pig, Sqoop, and Flume to support massive datasets of 20 Terabytes of data for each hospital facilities
Developed reporting database on Hive which handles 2000+ user queries per second
Collaborated with Doctors and healthcare expertise to gather requirements and built a system to support real time patient analysis and decision making
Designed and developed data pipelines to gather data from smart sensors, transform them using PIG scripts and load it into data warehouse on Hadoop framework to support decision making and health monitoring
Developed micro-batch ETL to facilitate speedy reviews and first mover advantages by using Oozie to automate data loading into HDFS and PIG to pre-process the data
Migrated the existing data warehouse into Hadoop enabling insights into data which was never before seen using the traditional RDBMS systems, which improved decision making thus increasing profits by 16 %
Worked on building document analysis systems which enabled the storing of Doctors notes and telephone conversations with patients
Core member of competitive advantage team which involved building algorithms and systems that enable to get critical patterns and flags from data helping the business outsmart its competition
Developed and Managed micro-batch ETL which supported the near real-time reporting systems
Troubleshoot issues in existing ETL pipelines and ensured its smooth functioning
Developed technical documents, provided training and mentored junior interns
Created lot of custom reports for the pathologist using complex Hive queries and Tableau Integration

Confidential, Dallas, TX

Student Worker

Environment: Tableau, SSRS, QlikView

Responsibilities:

Creating the “Wow” experience for clients and customers through seamless and trustworthy service
Collaborated with managers to optimize the inventory movement thus reducing “stock-outs” and delayed delivery
Involved in planning and allocating resource across the clients ensuring optimal workload on each resource
Assisted store manager to manage funds, financial planning thus improving profits by 8 %
Developed database to store all the reviews and performed sentiment analysis of the customer reviews
Trained senior management on how to use Tableau and build ad-hoc reports and dashboards
Designed, developed, implemented, and supported QlikView dashboards. Integrated data sources and databases with QlikView, and designed and developed data models and backend queries for presenting data
Developed and implemented measurements for various marketing campaigns

Confidential

Programmer

Environment: Yardi Voyager, Horton Works Hadoop Distribution - Linux, Hive, Pig, MS SQL Server, Flume, Sqoop

Responsibilities:

Involved in requirement gathering, design, coding and testing phases of product development
Involved in Agile methodology, conducted daily scrum stand up meetings to get updates from the team
Collaborated with Scrum masters to develop backlog and sprint plans, streamline ticket creation and participated in sprint retrospective meetings which resulted in delivering products before the deadline to clients
Worked on backlog planning and executions, assisted project manager in developing plan, assigning resources, developing cost estimates for project, managing budgets and assigning workloads and leveling resources using Microsoft Project
Developed Gantt charts, Fishbone Diagram, Work Breakdown Structure (WBS) to facilitate smooth completion of project deliverables
Analyzed underlying data for potential discrepancies, investigated errors, and performed data scrubbing
Performed DevOps on ETL ensuring accuracy and integrity, troubleshoot ETL issues and supported ETL enhancement and maintenance including analyzing query execution plans and optimizing queries for faster performance
Designed and developed data warehouse to support log aggregation, real time log analysis and processing for analytics
Developed all aspects of data warehouse for real estate application process to support business intelligence and reporting
Trained stakeholders and higher management to effectively use the tools built on the existing data warehouse infrastructure
Ensured coding best practices are applied and integrity of data is maintained by code compliance, security, change management
Developed Pig scripts to automate the process of cleaning and transforming unstructured raw data for analysis
Performance tuned Hive queries and the data warehouse to improve the response rate which reduced the report refresh time by 32%
Designed and developed scalable Data Warehouse on Hive to support processing of 1 TB of data
Integrated and migrated traditional RDBMS systems, NoSQL databases into Hadoop Ecosystem which lowered operating costs by 12% and facilitated the analysis of unstructured data
Developed strategies for managing risks for business in real estate market which reduced operating costs by 18%
Designed and developed ETL to pull data from various systems, integrate and load the data for analytics and business intelligence
Translated business requirements into technical design specifications using Visio and Erwin to build the Data Warehouse
Designed and developed scalable Data Warehouse on Hive to support processing of 1 TB of data
Designed the micro batch ETL process to extract data from CRM, aggregate the data into 23 pre-defined segments and load it into specific data marts in near real-time basis
Troubleshoot and Maintained the existing ETL packages built on SSIS, developed bug fixes for the existing ETL and tuned queries
Collaborated with stakeholders and clients to finalize on project milestones and deliverables
Created interactive reports using visualizations available in Tableau and QlikView
Extracted, analyzed & created reports which helped prioritize business decisions for top management
Collaborated with stakeholders and clients to finalize on project milestones and deliverables
Worked on developing strategies for managing risks for small and medium business in real estate market
Created 3 Executive Dashboards depicting Sales and Inventory data during the stint. Dealt with data volumes ranging from 20 to 50 million records

Confidential

Software Engineer

Responsibilities:

Created and maintained logical & physical Data Models for the production and Business Intelligence Data Warehouse
Created entity relationship diagrams and multidimensional data models, reports and diagrams for marketing
Slashed ETL load time by 40 % by customizations and tuning & optimizing SQL queries in the Retail Data Warehouse Nightly chain
Provided on-call support to production system and resolved issues that arise in the nightly ETL run
Developed custom ETL pipeline using Shell scripts and SQL queries to extract data from 3rd party warehouse management system and integrate it into the main ETL pipeline
Developed complex SQL queries to perform data cleaning, transforming and data quality checks during the ETL process
Collaborated with users to understand their data use cases and converted them into design metrics/data models
Expertise in Data Warehouse/Data mart, ODS, OLTP and OLAP implementations with project scope, Analysis, requirements gathering, data modeling, Effort Estimation, ETL Design, System testing, Implementation and production support
Developed SQL queries, stored procedure, and functions to support the ad-hoc reporting systems on SQL Server
Designed mappings to perform ETL from OLTP to OLAP Systems as cross-project constraints
Improved processing time for reports by 33% by report framework automation and SQL query tuning
Slashed ETL load time by 40 % by customizations in SQL queries and procedures in the nightly loading
Cleaned data warehouse data which resulted in rectifying $30 M in sales
Improved and maintained the existing dimensional models and schema designs to improve performance by 12 %
Developed scripts in SQL to clean and transform sales data before loading into data warehouse resulting in rectifying $3 M in sales
Optimized SQL queries and tuned database resulting in a faster execution of data retrieval by 18 %
Experienced in Agile project management and have successfully worked in scrum teams and SDLC Lifecycle
Responsible for code review of all SQL code before deploying to production
Collaborated with users to understand their data use cases and converted them into design metrics/data models
Spearheaded database performance tuning, recovery, cloning, table partitioning and disk space management
Implemented organized data environment to manage expenses across all decisions utilizing SDLC and Agile
Designed and developed meta-schema to allow Arabic language capability without changes to existing functionality
Working with team of developers designed, developed and implement a BI solution for Sales, Product and Customer KPIs
Initiated a lot of fine tuning mechanisms to tune the database as well as the queries to complete a set of given jobs or tasks in optimal times

We provide IT Staff Augmentation Services!

Big Data Intern Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship