Data Architect Resume
Wilmington, DE
SUMMARY
- Solutions Architect, Technical Lead, hands - on developer with 10+ years of extensive experience in business procedures, design strategies, application development and work flow implementations
- 3+ years of strong working experience with Big Data Analytics and Hadoop Ecosystems.
- Experience with Hadoop components: YARN, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie and Flume.
- Experience in Hadoop architecture and various components of Hadoop ecosystem such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MapReduce & YARN
- Strong Working experience with Cloudera (CDH) & Hortonworks (HDP) distributions.
- Hadoop Administration experience. Built multi-node cluster using Cloudera Manager, Apache Ambari.
- Built end-to-end Data-ingestion Framework on a multi-tenant environment.
- Working experience with Voltage encryption in Hadoop environment.
- Experience in installing and configuring Multi-node Hadoop environment on AWS.
- Working experience on DevOps using Amazon web service (AWS) components.
- Experience in working with SPARK libraries using Python, R and Scala.
- Proficient in writing HiveQL queries and Pig scripts.
- Experience in Machine Learning and Statistical Modelling techniques.
- Extensive experience working with different file formats, compression techniques, setting up batch jobs, ETL/ELT pipelines, DevOps, automation using Chef on AWS.
- Experience in working with various data science libraries like Scikit-learn, statsmodel, Numpy, Scipy, Pandas, Matplotlib, Ipython. Strong experience in Python and R.
- Experience working with Data Visualizations using R, Python and Qlikview.
- Experience on Hadoop Map Reduce framework and Data ingestion techniques.
- Experience with Mapreduce frameworks like Pydoop and MRjob that are based out of Python.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in working with flume to load the log data from multiple sources directly into HDFS.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Knowledge on Teradata and Greenplum to Hadoop (HDFS and HIVE) data migration.
- Experience on Web crawling and Web Scrapping.
- Hands on experience with Natural language processing using NLTK module.
- Working Experience in NoSQL databases like Cassandra, MongoDB, HBase and Neo4J.
- Experience in creating different visualizations using plots, histogram, heat maps & highlight tables.
- Proficiency in Prototype, Module Design, and Integration.
- Expert knowledge of version control, SDLC and Agile methodologies.
TECHNICAL SKILLS
Operating Systems: UNIX (Linux), Windows, Z/OS
Databases: SQL, HIVE, Impala, DB2, IMS DB, VSAM, MySQL
No-SQL Databases: Cassandra, MongoDB, Hbase
Programming Languages: Python, R, COBOL, CSL, RG, PL/I, Core Java, Scala
Scripting Languages: Apache Pig, UNIX
Tools: /IDE: Eclipse, Putty, Xpeditor, File-Aid, Abend-aid, PyCharm, Eazytrieve, CHANGEMAN, CA7, R Studio, WinSCP, Jira, SVN
BIGDATA and Data Science: Map Reduce, Python, Pig, Hive, YARN, Flume, Sqoop, Oozie, Apache Spark, Ambari, Machine Learning, NLP, R, AWS, Web Scraping, Data Modelling, Gephi, Data Visualization, Voltage
PROFESSIONAL EXPERIENCE
Confidential, Wilmington,DE
Data Architect
Responsibilities:
- Designed and developed an end-to-end data ingestion and ETL tool written in Python.
- Created and implemented the data layer for the Multi-Tenancy topology where ClickFox, TDD (Technical Data Discovery), Data Operations and MAP resides as tenants
- Integrated the ingestion tool to be able to communicate any external data sources like RDBMS data bases like Oracle, Sybase, Mysql etc, FTP sites, Teradata, Greenplum. It also uses variety of techniques like sqoop,sqlplus,TDCH,TPT,scp etc for ingestion needs
- Automated data ingestion from various internal/external sources to discovery environment for data science needs
- Used SparkR for data exploration needs
- Developed Data Quality tools written in spark to monitor and flag issues between Source and refined destinations.
- Integrated Voltage encryption into Hadoop environments for data security needs.
- Wrote data equality routines using spark to confirm data after decryption remain same as before encryption.
- Designed the security framework of the project in the multi-tenant CDH cluster.
Technologies Used: Cloudera 5.x, Voltage, Spark, PIG, Hive, Sqoop, Control-M, Scala, Shell Scripting, Oracle, Teradata, Greenplum, Python, Core Java
Confidential, Wilmington,DE
Data Architect
Responsibilities:
- Defined and designed architectural framework, data strategy, tools, and technologies for the marketing platform in the Big Data space for the Bank.
- Architected, designed and developed the code deployment and automation process on AWS using Chef scripts
- Worked on DevOps aspects by automating the infrastructure and application code using Cloud Formation Templates(JSON based CFTs) Chef, Jenkins and uDeploy on AWS platform
- Merged all legacy ING-Direct bank data with Confidential and pushed to Hadoop Data Lake by setting up the ETL pipelines.
- Extracted data from different sources (Teradata, Oracle, Google double click etc), transformed it according to the business use case and loaded into Hadoop.
- Developed Hive Scripts, Pig scripts, Unix Shell scripts, Spark programming using Scala for all ETL loading processes and converting the files into parquet in the Hadoop File System.
- Developed various ETL transformation scripts using Hive to create refined datasets analytics use cases.
- Automated workflows using shell scripts and Control-M jobs to pull data from various databases into Hadoop.
- Developed scripts in Spark to import and export data from Cassandra, Teradata, Hadoop and vice-versa.
- Created RDD’s/DataFrames in Spark using Scala and applied several transformation logics to make the data ready for Cassandra loads
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Technologies Used: Cloudera 5.x, AWS, Chef, CFTs, DevOps,Hadoop, Spark, PIG, Hive, Sqoop, Control-M, Scala, Shell Scripting, Cassandra, Oracle, Teradata
Confidential, Moorestown, NJ
Manager/Application Architect
Responsibilities:
- Designing the architecture of the project and coming up with the technology stack.
- Installed and configured Multi-node Hadoop environment (HDP2.2) on AWS.
- Installed Apache Spark on HDP stack.
- Load the STB data in HDFS using Flume from S3 bucket present in AWS.
- Importing and exporting data into HDFS and Hive using Sqoop
- Developed multiple MapReduce jobs in java for data cleaning and processing.
- Load and transform large sets of structured, semi structured and unstructured data
- Worked on custom MapReduce program to load the data from HDFS to Hbase.
- Integration from Hbase to hive as per the requirement.
- Work on custom Hive UDF to meet the requirement.
- Populate the data in DataMart tables.
- Created Schema RDDs to access Hive tables via Spark for better query efficiency.
- Tuning of Hadoop and Spark jobs thru better memory management, serialization and efficient staging.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Unit Test to validate the Results
Technologies Used: Hadoop MapReduce, HIVE, HBase, Java, Flume, Oozie, Ambari (Used for installing and managing Hortonworks distribution), YARN, QlikView, Revo R, Apache Spark, AWS, and S3
Confidential, Chicago,IL
Project Lead
Responsibilities:
- Actively involved in the design of architecture of the application.
- Creation of Meaningful First and Last Names from a different set of first and last names database.
- Addition of Valid Area Code and Phone numbers and assigning state based on that.
- Finding the Latitude and Longitude based on the area code and generating 5000 latitude and longitude for each user.
- Slicing and Dicing of the data in order to find meaningful data, Pulling Area code and Phone number wherever possible from Address.
- Using K-Mean Algorithm to find the Centroid of the User Ad Clicks and eliminate the outlier points.
- Calling Google API to pull Zip Code and working around the limit of 2500 records imposed by the API.
Technologies Used: Hadoop Map Reduce, Hive, Oracle, SQL, D3.js, Python, R, Impala
Confidential, Chicago, IL
Project Lead
Responsibilities:
- Designing the application and guiding the team.
- User-id authentication on twitter.
- Pull the serialized data (implemented for Twitter) using FLUME.
- Calling REST twitter API via python to pull the past tweets.
- Process the data in Hive to do the analysis.
- Plot the graph using R.
- Automatic tweet pushing system
Technologies Used: Flume, Hive, REST, Map Reduce, JSON, Python, R, Web Scraping
Confidential
Project Lead
Responsibilities:
- Cleansed the raw CDR (Call detail records) file using Python. Alternative approach also followed which used Java MapReduce.
- Identify Caller and Called numbers which meet certain standards
- Create Adjacency list - Explode nodes for Community detection and mark unnecessary nodes for deletion
- Filter the exploded adjacency list after omitting nodes marked for deletion
- Visualize the nodes using D3.js
Technologies Used: Hadoop, PIG, Core Java, Python, MongoDB, D3.js, MapReduce, Pydoop, Flume, Sqoop
Confidential
Project Lead
Responsibilities:
- Designing and building the application from scratch.
- Analyze the unstructured data
- Data cleansing
- Text analytics using various techniques
Technologies Used: Hive, PIG, Python, NLTK, Matplotlib, R, SQL
Confidential
Project Lead
Responsibilities:
- Planning, monitoring and tracking of progress of work items performed by different modules.
- Application maintenance, Enhancements, coordinating with development and QA teams.
- Release coordination with PM team for a smooth and timely production release.
- Addressing customer escalations and ensuring customer satisfaction.
- Status reporting and leading various project items to effective closure.
- Co-ordination with Business Partners.
Technologies Used: Core Java, Python, COBOL, DB2, Mainframes, CSL RG, Unix
Confidential
Senior Consultant
Responsibilities:
- Application maintenance, Enhancements, 24x7 production support
- Preparing High Level Design Document and Detailed Design Document
- Interacting with the client on business decisions/issues
- Reviewing client deliverables
- Documenting existing processes in the system
- Abend Analysis and suggesting the permanent fixes to reduce the emergencies.
- Tracking deliverables on timelines
- Early Availability of Financials for reporting to stakeholders
Technologies Used: COBOL, PL/1, VSAM, DB2, IMS-DB, JCL, CICS
Confidential
System Analyst
Responsibilities:
- Development& Enhancements, Application Maintenance
- SME for USO (Universal Service Order), RBE (Rating and Billing Engine) and MPS (Message Processing System) which are the important subsystems for the billing system.
- Primary responsible analyst for ATTOMECS. This is a subsystem that generates revenue from error usage.
- Requirement gathering and Walkthrough, Design Document, Preparation of test plans, Development, Unit test plan and Results, Reviews, Preparation of implementation plan
- Generating the call detail and Summary reports that are requested by customers on adhoc basis.
- Working on the production abends which typically includes data exceptions, system errors and other program user abends.
Technologies Used: COBOL, VSAM, DB2, IMS-DB, JCL, CICS, Easytrieve, Stored Procedures, XML
