Sr. Data Analyst, Data Architect, Big Data Team Lead Resume
California Washington D, C
TECHNICAL SKILLS:
Database Design & Development: Enterprise Data Lake (EDL), SQL Server, Oracle, NO SQL (Mongo DB), Data Mapping, Data Mining, Data Profiling
Big Data: Hadoop, HIVE, Sqoop, Accumulo, Amazon AWS (Redshift, S3, Elastic Search, EMR), PYTHON
ETL: SSIS, Pentaho
Application Development Tools: Java, C#, Visual Basic, JavaScript, Active Server Pages (ASP), .NET Framework, SSIS, HTML, XML, JSON, DQL, Documentum API, DFC, Doc basic, WDK, Atlassian JIRA and Confluence
Methodology: Waterfall, Agile, Scrum
OS/Server: Win 2003/2k/NT/XP/Vista, Application Servers - Oracle/Tomcat/IBM WebSphere 7.0/JBoss, IIS, C/UNIX, Linux (CentOS, Ubuntu)
Content Management: Documentum 4i - Windows, Documentum 5.3/6.5 sp2, 6.7, Captiva
PROFESSIONAL EXPERIENCE:
Confidential - California/ Washington D.C
Sr. Data Analyst, Data Architect, Big Data Team Lead
Responsibilities:
- Manage a team in a fast paced startup in a Big Data Product development.
- Implemented two workflows in Zyudly Data Platform. A simple workflow is the most basic workflow that the Zyudly Data Platform offers for ingesting data and curating the data and writing it to a data store (AWS Dynamo DB). Separate valid and invalid data using the rule engine. Spark Streaming provides methods to create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files.
- Zyudly Data Platform also supports Custom Work Flow which doesn’t follow the criteria for Simple Work Flow. In Custom Work Flow different data stores can be utilized - HDFS, HIVE, Amazon EMR, Amazon S3, Amazon Redshift, Elastic Search, flat file to name a few. SPARK uber jar and other jar files supported for ingestion, curation and other tasks are uploaded in Oozie share lib folder.
- Zyudly supports three pipelines - Batch pipeline, Real-time pipeline and Analytics pipeline. Analytics pipeline is work in progress and currently used Redshift for analytics.
- Participated in many dashboard designs suggesting appropriate technologies. Have a good perspective on what users look for in web pages especially in data rich categories.
- Implemented SCRUM framework and disciplined the team to make timely updates on JIRA/Confluence with stories, subtasks and associated documents, artifacts and screenshots.
- Performed initial data analysis on complex data and developed a road map for ETL process of ingestion, curation, data quality and analytics.
- Participated in technical discussion to integrate with Amazon Web Services. Currently product supports S3, Redshift and Elastic Search integration.
- Designed an Enterprise Data Lake to store all the data in a native format. When needed filtered data sets or summarized results can then be sent to business users for further analytics and data sciences work.
Confidential, Washington D.C.
Sr. Data Analyst, Data Architect, Business Analyst, Solutions Architect
Responsibilities:
- Gathered requirements from the client and created a robust business process.
- Analyzed complex source databases, understood the business processes, and developed solutions for migrating the data into Big Data ecosystem.
- Developed prototype model using Pentaho ETL (PDI) and wrote transformation scripts in core Java to facilitate the migration in an effective way not compromising data quality. Documented the entire process step by step for this complex project for knowledge transfer. Proved Pentaho can reduce the cost on certain scenarios where simple data migration needed and can be efficiently used for many agencies out of 22 agencies we planned to migrate the data.
- Used SQOOP to transfer data from about 150 tables to Hadoop system where further work performed by developers for ingestion, curation and pushing into Accumulo using Map Reduce 2.0.
- Analyzed the data in pervasive databases, mapped the data from the tables to the columns provided by the business users, provided feedback, reviewed the mapping with the real data and written extensive SQL queries and Java Code to extract unique id which needed by the Big Data Accumulo system. Provided the completely curated data sets to the developers for easy processing than the complex processing being performed prior.
Confidential, - Arlington, VA
Sr. Data Analyst, Business Analyst, Solutions Architect, Team Lead
Responsibilities:
- Data analysis task for closed banks databases (hosted approx. 1 Petabyte of data of which 60 TB structured) and present the data tailored to user needs in a more user friendly and legible manner.
- Extensively created SSIS packages for migrating large sets of data applying varying Transformation rules.
- Carried out extensive data mapping process by looking at the raw production data and found the meaning of the data since for many banks no data dictionary or any data modeling diagrams captured after bank closed by FDIC.
- Performed data mining and data profiling as per user requirements using scripts, intuition and at times scanning the documents manually for specific closed bank data user needs to close the litigation.
- Created monthly and quarterly Power Point reports for upper management which used for budget allocation. Data statistics like how many users accessed which bank data, how many litigations scripts ran on the background etc., Statistical data collected using SQL queries and written Java code to create reports.
- Gathered requirements and created business processes as needed to accommodate client’s needs.
- Gathered knowledge about different Big Data tools and solutions. Installed MongoDB, uploaded files and wrote queries to retrieve data as part of the prototype model.
- Installed and configured Cloudera and performed data quality check and data analysis using HIVE for many closed bank data.
- Performed multi-function role on maintaining and update of the code written in C#, SQL queries and scripts.
- Analyzed Java based open source tools and made necessary modifications in the Java source code for the in-house requirements. Created reports using SQL Service Reporting Services.
Confidential - Washington, DC
EMC Documentum Architect/Administrator
Responsibilities:
- Installed, configured, documented the policies and procedures for Documentum 6.5 sp2 and associated components which supports user in about 130 offices around the world
- Participated in architectural discussion with various teams and based on the requirements provided the necessary Documentum components.
- Analyzed and fixed a critical longer unresolved issue of syncing users with Active Directory which was failing to import all the users. Additionally proposed/configured Federation which tremendously reduced the network traffic.
Confidential, Arlington, VA
Sr. Software Development Analyst/Documentum Administrator, Solutions Architect
Responsibilities:
- Lead Developer for Confidential, a web-based request and document tracking system that seamlessly integrates Documentum as a content management system.
- Provided technical oversight for software engineering activities and drive alignment with FDIC SDLC methodology. Develop requirements and technical design specifications, as well as code, test and debug software. Conduct data modeling using Erwin/ERX, and design system interfaces. Served as Lead Developer for successful implementation of DRRTRAK Version 1.0.
- Systems/Analyst Lead Developer for web-based Appointment Scheduling System (APSS) Version 1.0 for DRR Dallas CSC Call Center. Orchestrated design, development, and implementation of system in less than 60 days in response to an urgent requirement involving bank closing. Integrated system with DRR CSC’s Unix-based IVR (Interactive Voice Response) system. Obtained Section 508 compliance.
- Systems Administrator for Documentum and applications based on Documentum 4i and 5x platforms working in Windows and UNIX platform. Create and introduce road maps for conversion/upgrade, docbases (configuration), docapps, jobs, and ad hoc scripts using DQL/API/DFC, and ad hoc reports. Provide engineering application architectures and standards assistance to development teams throughout life cycle. Perform a range of routine Documentum maintenance tasks, such as monitoring, troubleshooting, and resolving Documentum jobs. Conduct capacity planning and monitor production docbases and Documentum activities while ensuring consistent environment availability. Collaborate with infrastructure/application team to manage server issues.
- Responded to user reported loss of critical data in FDIC Production system. Delivered series of instructions and process analysis that resulted in the identification of issues. Created unique scheme to effectively recover documents and locate data within 8 hours.