We provide IT Staff Augmentation Services!

Sr. Big Data Architect Resume

5.00/5 (Submit Your Rating)

Long Beach, CA

PROFESSIONAL SUMMARY:

  • Above 9+ years of experience as Big Data Architect/Data Modeler and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Architect, design & develop Big Data Solutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data .
  • Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
  • Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.
  • Expertise in Big Data Storage Hadoop Distributed File system.
  • Experience with Scala, Python, R, and Spark.
  • Experience with querying on data present in Cassandra cluster using CQL (Cassandra Query Language).
  • Expertise in architecting Big data solutions using Data ingestion, Data Storag
  • Expertise in Distributed Processing Framework like MapReduce, Spark and Tez .
  • Expertise in Big Data Tools like MapReduce, Hive SQL, Hive PL/SQL, Impala, Pig, Spark Core, YARN, SQOOP .
  • Expertise in Big Data Ingestion/Integration Tool like flume, Kafka.
  • Expertise in Big Data Flow design tools Oozie.
  • Provide technical thought leadership on Big Data strategy, adoption, architecture and design, as well as data engineering and modeling.
  • Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experience in Importing data from various sources to the Cassandra cluster using Java API's
  • Experience with Big Data and Big Data on Cloud, Master Data Management and Data Governance
  • Strong familiarity with data management, data governance , and best practices
  • Expertise in NOSQL databases like HBase, MongoDB.
  • Expertise in ETL Tool - Informatica, ODI and experience in BI Tool Tableau.
  • Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
  • Knowledge in configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
  • Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools/technologies.
  • Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
  • Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
  • Expertise in data analysis, design and modeling using tools like ErWin.
  • Experienced in using various Hadoop infrastructures such as Map Reduce , Hive , Sqoop , and Oozie .
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
  • Experienced in testing data in HDFS and Hive for each transaction of data.
  • Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Strong Experience in working with Databases like Oracle 12C/11g/10g/9i, DB2, SQL Server 2 and MySQL and proficiency in writing complex SQL queries.
  • Experienced in using database tools like SQL Navigator, TOAD.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS)

TECHNICAL SKILLS:

Hadoop/Big Data: Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Scala, Akka, Kafka, Storm, Mongo DB

Languages: PL/SQL, Pig Latin, HQL, R, Python, XPath, Spark

Analysis/ Modeling Tools: Erwin 9.6/9.5, Sybase Power Designer, Oracle Designer, ER/Studio 9.7

Cloud Platform: AWS, EC2, S3, SQS, Azure, MapR

No SQL Databases: Cassandra, mongo DB

Database Tools: Microsoft SQL Server 2014/2012 Teradata 15/14, Oracle 12c/11g/10g, MS Access, Poster SQL, Netezza

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9.

ETL Tools: Pentaho, Informatica Power 9.6 etc.

Operating Systems: UNIX, Ubuntu Linux and Windows, Centos, Sun Solaris.

Tools: & Software: TOAD, MS Office, BTEQ, SQL Assistant

Other tools: TOAD, SQL PLUS, SQL LOADER, MS Project, MS Visio and MS Office, Have worked on C++, UNIX, PL/SQL etc.

PROFESSIONAL EXPERIENCE:

Confidential, Long beach, CA

Sr. Big Data Architect

Responsibilities:

  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive with Cloud Architecture.
  • Implementation of Big Data eco system (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture.
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive
  • Extensive ETL testing experience using Informatica 9x/8x, Talend, Pentaho.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Developed Big Data solutions focused on pattern matching and predictive modeling.
  • Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
  • Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
  • Configured Performance Tuning and Monitoring for Cassandra Read and Write processes for fast I/O operations and low latency time.
  • Worked with Spark and Python.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Responsible for provisioning Kubernetes environment and deploying the dockerized applications by developing manifests.
  • Extensive use of the Informatica Debugger to identify and fix issues.
  • Ingest Flat files received via ECG FTP tool and files received from Sqoop into UHG Data Lake Hive and HBase using Data Fabric functionalities.
  • Managed multiple ETL development teams for business intelligence and master data management initiatives.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Involved in business requirement analysis and technical design sessions with business and technical staff to develop end to end ETL solutions.
  • Worked on analyzing Hadoop cluster and different Big Data Components including Pig, Hive, Spark, HBase, Kafka, Elastic Search, database and SQOOP.
  • Involved in business requirement analysis and technical design sessions with business and technical staff to develop end to end big data analytical solution.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Troubleshooting, debugging & altering Talend issues, while maintaining the health and performance of the ETL environment.
  • Performed data profiling and transformation on the raw data using Pig, Python.
  • Experienced with batch processing of data sources using Apache Spark.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Implemented Cluster setting for multiple Denodo node and created load balance for improving performance activity.
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Extensively worked on Java persistence layer in application migration to Cassandra using Spark to load datato and from Cassandra Cluster
  • Creating dashboard on Tableu and Elastic search with Kibana.
  • Implement enterprise grade platform(mark logic) for ETL from mainframe to NOSQL(cassandra).
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Responsible for importing log files from various sources into HDFS using Flume
  • Design and Developed pre-session, post-session routines and batch execution routines using Informatica Server to run Informatica sessions.
  • Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Loading data from different source (database & files) into Hive using Talend tool.
  • Experience in integrating oozie logs to kibana dashboard.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Used Spark SQL to process the huge amount of structured data.
  • Assigned name to each of the columns using case class option in Scala.
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.

Environment: Spark, YARN, HIVE, Pig, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, EMR, Redshift, NOSQL, Sqoop, MYSQL.

Confidential, Indianapolis, IN

Big Data Architect

Responsibilities:

  • Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
  • Utilize AWS services with focus on big data Architect /analytics / enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
  • Experience in data cleansing and data mining.
  • Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
  • Provided a variety of Data intake mechanisms, Ingesting data to Data lake and post ingestion transformations.
  • Ingest data into Hadoop / Hive/HDFS from different data sources.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Created and used reusable Mapplets and transformations using Informatica Power Center.
  • Published REST API's to fetch data from Elastic Search clusters for client based applications to search patients and claims by multi-field indexes.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Modified Cassandra .yaml files to set the configuration properties like cluster name, node addresses, seed provider, replication factors, mem Table size and flush times etc.
  • All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Experience in submitting Talend jobs for scheduling using Talend scheduler which is available in the Admin Console.
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products
  • Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Design and Development of ETL routines, using Informatica Power Center Within the Informatica Mappings, usage of Lookups, Aggregator, Ranking, Stored procedures, functions, SQL overrides usage in Lookups and source filter usage in Source qualifiers and data flow management into multiple targets using Routers was extensively done.
  • AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
  • Worked on writing Perl scripts covering data feed handling, implementingmark logic, communicating with web-services through SOAP Lite module and WSDL.
  • Used Data modelling best practices like Partition per Query strategy for good performance of the Cassandra cluster, De-normalizing data for better read performance.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Created Context Variables and Groups to run Talend jobs against different environments like Dev, Test and prod.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Design of Redshift Data model, Redshift Performance improvements/analysis.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Performed File system management and monitoring on Hadoop log files.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Developed customized classes for serialization and Deserialization in Hadoop
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: : Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

Confidential, Plano, TX

Sr. Data Modeler / Analyst

Responsibilities:

  • Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing , Review and Release as per the business requirements.
  • Wrote the HiveQL and manage Hive Meta store server to control different advanced activities.
  • Participated in performance management and tuning for stored procedures, tables and database servers.
  • Create Logical Data Model for Staging, ODS and Data Mart and Time dimension as well.
  • Developed the design & Process flow to ensure that the process is repeatable.
  • Performed analysis of the existing source systems (Transaction database)
  • Involved in maintaining and updating Metadata Repository with details on the nature and use of applications/data transformations to facilitate impact analysis.
  • Created DDL scripts using ER Studio and source to target mappings to bring the data from source to the warehouse.
  • Integrated Hadoop frameworks/technologies such as Hive and HBase to further operational and analytical experience.
  • Created Stored Procedures to communicate with SQL database.
  • Installed and configured Hive and written Hive UDFs.
  • Designed the ER diagrams, logical model (relationship, cardinality, attributes, and, candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata .
  • Performed data cleaning and data preparation tasks to convert data into a meaningful data set using R.
  • Analyzed large data sets (structured and unstructured) using Hive queries, R Programming & Pig Scripts .
  • Used Spark shell for interactive data analysis and process using Spark Sql to query structured data
  • Worked in importing and cleansing of data from various sources like Teradata , Oracle , flatfiles , MS SQL Server with high volume data
  • Worked extensively on ER Studio for multiple Operations across Atlas Copco in both OLAP and OLTP applications.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis .
  • Produced PL / SQL statement and stored procedures in DB2 for extracting as well as writing data.
  • Designed Logical & Physical Data Model / Metadata / data dictionary using Erwin for both OLTP and OLAP based systems.
  • Co-ordinate all teams to centralize Meta-data management updates and follow the standard Naming Standards and Attributes Standards for DATA & ETL Jobs .
  • Finalize the naming Standards for Data Elements and ETL Jobs and create a Data Dictionary for Meta Data Management.
  • Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
  • Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server with high volume data

Environment : ER Studio, SQL Server, SQL Server Analysis Services, SSIS, Oracle 11g, Hiive, Pig, Spark, Scala, R, Business Objects XI, Rational Rose, Data stage, MS Office, MS Visio, SQL, Rational Rose, Crystal Reports 9

Confidential

Data Analyst

Responsibilities:

  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN.
  • Extensively used reverse engineering feature of Erwin to save the data model with production.
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Responsible for the development and maintenance of Logical and Physical data models, along with corresponding metadata, to support Applications.
  • Attended and participated in information and requirements gathering sessions.
  • Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
  • Created and maintained Logical Data Model (LDM) for the project. Includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
  • Performed logical data modeling, physical data modeling (including reverse engineering) using the Erwin Data Modeling tool.
  • Validated and updated the appropriate LDM's to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change.
  • Created business requirement documents and integrated the requirements and underlying platform functionality.
  • Excellent knowledge and experience in Technical Design and Documentation.
  • Used forward engineering to create a physical data model with DDL that best suits the requirements from the Logical Data Model.
  • Involved in preparing the design flow for the Data stage objects to pull the data from various upstream applications and do the required transformations and load the data into various downstream applications
  • Experience in developing dashboards and client specific tools in Microsoft Excel and Power Point.
  • Designed and implemented business intelligence to support sales and operations functions to increase customer satisfaction.

Environment: Oracle 9i, NZSQL, Erwin 8.0, ER- Studio 6.0/6.5, Toad 8.6, Informatica 8.0, IBM OS 390(V6.0), DB2 V7.1,, PL/SQL, Solaris 9/10, Windows Server 2003 & 2008.

We'd love your feedback!