We provide IT Staff Augmentation Services!

Sr. Big Data Architect Resume

3.00/5 (Submit Your Rating)

Malvern, PA

PROFESSIONAL SUMMARY:

  • 8 years of IT experience with extensive knowledge in Software Development Life Cycle (SDLC) involving Requirements Gathering, Design, Architecting, Analysis, Development, Maintenance, Implementation and Testing.
  • Proficient in Big Data Architecture/ DW/ Bigdata/ Hadoop/ Data Integration/ Master Data Management, Data Migration and Operational Data Store, BI Reporting projects with a deep focus in design, development and deployment of BI and data solutions using custom, open source and off the shelf BI tools.
  • Experience with querying on data present inCassandracluster using CQL (CassandraQuery Language).
  • Expertise in architecting Big data solutions using Data ingestion, Data Storage
  • Architect, design & develop Big Data Solutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data.
  • Expertise in Big Data Storage Hadoop Distributed File system.
  • Experience with Scala, Python, R, and Spark.
  • Expertise in Distributed Processing Framework like MapReduce, Spark and Tez.
  • Expertise in Big Data Tools like MapReduce, Hive SQL, Hive PL/SQL, Impala, Pig, Spark Core, YARN, SQOOP .
  • Expertise in Big Data Ingestion/Integration Tool like flume, Kafka.
  • Expertise in Big Data Flow design tools Oozie.
  • Provide technical thought leadership on Big Data strategy, adoption, architecture and design, as well as data engineering and modeling.
  • Experience in Importing data from various sources to theCassandracluster using Java API's
  • Experience with Big Data and Big Data on Cloud, Master Data Management and Data Governance
  • Strong familiarity with data management, data governance, and best practices
  • Expertise in NOSQL databases like HBase, MongoDB.
  • Expertise in ETL Tool - Informatica, ODI and experience in BI Tool Tableau.
  • Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools/technologies.
  • Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
  • Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
  • Expertise in data analysis, design and modeling using tools like ErWin.
  • Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experienced in using various Hadoop infrastructures such as Map Reduce, Hive, Sqoop, and Oozie.
  • Expert in AmazonEMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
  • Experienced in testing data in HDFS and Hive for each transaction of data.
  • Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Strong Experience in working with Databases like Oracle 12C/11g/10g/9i, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
  • Experienced in using database tools like SQL Navigator, TOAD.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS)
  • Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
  • Knowledge in configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.

SKILLS:

Languages: PL/SQL, Pig Latin, HQL, R, Python, XPath, Spark

Hadoop/Big Data: Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Scala, Akka, Kafka, Storm, Mongo DB

Analysis/ Modeling Tools: Erwin 9.6/9.5, Sybase Power Designer, Oracle Designer, ER/Studio 9.7

Cloud Platform: AWS, EC2, S3, SQS.

No SQL Databases: Cassandra, mongo DB

Database Tools: Microsoft SQL Server 2014/2012 Teradata 15/14, Oracle 12c/11g/10g, MS Access, Poster SQL, Netezza

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9.

ETL Tools: Pentaho, Informatica Power 9.6 etc.

Operating Systems: UNIX, Ubuntu Linux and Windows, Centos, Sun Solaris.

Tools: & Software: TOAD, MS Office, BTEQ, SQL Assistant

Other tools: TOAD, SQL PLUS, SQL LOADER, MS Project, MS Visio and MS Office, Have worked on C++, UNIX, PL/SQL etc.

PROFESSIONAL EXPERIENCE:

Confidential, Malvern, PA

Sr. Big Data Architect

Resposibilities:

  • Implementation of Big Data eco system (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture.
  • DevelopedBigDatasolutions focused on pattern matching and predictive modeling.
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such asHadoop, Map Reduce Frameworks, HBase, Hive
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
  • Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
  • Configured Performance Tuning and Monitoring forCassandraRead and Write processes for fast I/O operations and low latency time.
  • Worked with Spark and Python.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Responsible for provisioningKubernetesenvironment and deploying the dockerized applications by developing manifests.
  • Extensive use of theInformaticaDebugger to identify and fix issues.
  • Ingest Flat files received via ECG FTP tool and files received from Sqoop into UHGDataLakeHive and HBase usingDataFabric functionalities.
  • Managed multiple ETL development teams for business intelligence andmasterdatamanagement initiatives.
  • Identify query duplication, complexity and dependency to minimize migration efforts
  • Oracle, Hortonworks HDP cluster, Attunity Visibility, Cloudera Navigator Optimizer, AWS Cloud and Dynamo DB.
  • Extensive ETL testing experience using Informatica 9x/8x, Talend, Pentaho.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Involved in business requirement analysis and technical design sessions with business and technical staff to develop end to end ETL solutions.
  • Worked on analyzing Hadoop cluster and differentBigDataComponents including Pig, Hive, Spark, HBase, Kafka,ElasticSearch, database and SQOOP.
  • Involved in business requirement analysis and technical design sessions with business and technical staff to develop end to endbigdataanalytical solution.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Troubleshooting, debugging & alteringTalendissues, while maintaining the health and performance of the ETL environment.
  • Performed data profiling and transformation on the raw data using Pig, Python.
  • Experienced with batch processing of data sources using Apache Spark.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Implemented Cluster setting for multipleDenodonode and created load balance for improving performance activity.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Extensively worked on Java persistence layer in application migration toCassandrausing Spark to load datato and fromCassandraCluster
  • Creating dashboard on Tableu and Elastic search with Kibana.
  • Implement enterprise grade platform(mark logic) for ETL from mainframe to NOSQL(cassandra).
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Responsible for importing log files from various sources into HDFS using Flume
  • Design and Developed pre-session, post-session routines and batch execution routines usingInformaticaServer to runInformaticasessions.
  • Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Experience in different Hadoop distributions like Cloudera (CDH4 & CDH5.9) and Horton Works Distributions (HDP) and MapR.
  • Loading data from different source (database & files) into Hive usingTalendtool.
  • Experience in integrating oozie logs to kibana dashboard.
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Created Azure Data Factories for data acquisition
  • Used Spark SQL to process the huge amount of structured data.
  • Assigned name to each of the columns using case class option in Scala.
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.

Environment: Spark, YARN, HIVE, Pig, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, Redshift, NOSQL, Sqoop, MYSQL.

Confidential, Chicago, IL

Big Data Architect

Resposibilities:

  • Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
  • Ingest data into Hadoop / Hive/HDFS from different data sources.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Utilize AWS services with focus on big data Architect /analytics / enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
  • Experience in data cleansing and data mining.
  • Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
  • Provided a variety ofDataintake mechanisms, IngestingdatatoDatalake and post ingestion transformations.
  • Created and used reusable Mapplets and transformations usingInformaticaPower Center.
  • Published REST API's to fetch data fromElasticSearchclusters for client based applications tosearchpatients and claims by multi-field indexes.
  • ModifiedCassandra.yaml files to set the configuration properties like cluster name, node addresses, seed provider, replication factors, mem Table size and flush times etc.
  • Created high level and detail data models for Azure SQL Databases, NonSQL databases, as well as the use of storages for logging and data movement between on-premise data warehouse and cloud vNets.
  • Cluster management and analytic in Cloudera and Horton work.
  • All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Experience in submittingTalendjobs for scheduling usingTalendscheduler which is available in the Admin Console.
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products
  • Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Design and Development of ETL routines, usingInformaticaPower Center Within theInformatica Mappings, usage of Lookups, Aggregator, Ranking, Stored procedures, functions, SQL overrides usage in Lookups and source filter usage in Source qualifiers and data flow management into multiple targets using Routers was extensively done.
  • AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
  • Worked on writing Perl scripts covering data feed handling, implementingmark logic, communicating with web-services through SOAP Lite module and WSDL.
  • Used Data modelling best practices like Partition per Query strategy for good performance of theCassandracluster, De-normalizing data for better read performance.
  • UsedHiveto analyze data ingested intoHBaseby usingHive-HBaseintegration and compute various metrics for reporting on the dashboard
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Created Context Variables and Groups to runTalendjobs against different environments like Dev, Test and prod.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Design of Redshift Data model, Redshift Performance improvements/analysis
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Performed File system management and monitoring on Hadoop log files.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Developed customized classes for serialization and Deserialization in Hadoop
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.

Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

Confidential, New York City NY

Sr. Data Analyst / Modeler

Resposibilities:

  • Participated in performance management and tuning for stored procedures, tables and database servers.
  • Create Logical Data Model for Staging, ODS and Data Mart and Time dimension as well.
  • Developed the design & Process flow to ensure that the process is repeatable.
  • Performed analysis of the existing source systems (Transaction database)
  • Involved in maintaining and updating Metadata Repository with details on the nature and use of applications/datatransformations to facilitate impact analysis.
  • Created DDL scripts using ER Studio and source to target mappings to bring the data from source to the warehouse.
  • Designed the ER diagrams, logical model (relationship, cardinality, attributes, and, candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata.
  • Worked in importing and cleansing ofdatafrom various sources like Teradata, Oracle, flatfiles, MS SQL Server with high volumedata
  • Worked extensively on ER Studio for multiple Operations across Atlas Copco in both OLAP and OLTP applications.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conductdataanalysis.
  • Produced PL/SQL statement and stored procedures in DB2 for extracting as well as writingdata.
  • Designed Logical & Physical Data Model /Metadata/ data dictionary usingErwinfor both OLTP and OLAP based systems.
  • Co-ordinate all teams to centralize Meta-data management updates and follow the standard Naming Standards and Attributes Standards for DATA &ETL Jobs.
  • Finalize the naming Standards for Data Elements and ETL Jobs and create a Data Dictionary for Meta Data Management.
  • Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
  • Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server with high volume data

Environment: ER Studio, SQL Server 2008, SQL Server Analysis Services, SSIS, Oracle 10g, Business Objects XI, Rational Rose,Datastage, MS Office, MS Visio, SQL, Rational Rose, Crystal Reports 9

Confidential

Data Modeler

Resposibilities:

  • Extensively used reverse engineering feature of Erwin to save thedatamodel with production.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and PhysicalDataModels.
  • Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN.
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Responsible for the development and maintenance of Logical and Physical data models, along with corresponding metadata, to support Applications.
  • Attended and participated in information and requirements gathering sessions.
  • Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
  • Created and maintained Logical Data Model (LDM) for the project. Includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
  • Performed logicaldatamodeling, physicaldatamodeling (including reverse engineering) using the ErwinDataModeling tool.
  • Validated and updated the appropriate LDM's to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change.
  • Created business requirement documents and integrated the requirements and underlying platform functionality.
  • Excellent knowledge and experience in Technical Design and Documentation.
  • Used forward engineering to create a physical data model with DDL that best suits the requirements from the Logical Data Model.
  • Involved in preparing the design flow for theDatastageobjects to pull thedatafrom various upstream applications and do the required transformations and load thedatainto various downstream applications
  • Experience in developing dashboards and client specific tools in Microsoft Excel and Power Point.
  • Designed and implemented business intelligence to support sales and operations functions to increase customer satisfaction.

Environment: Oracle9i, NZSQL,Erwin8.0, ER- Studio6.0/6.5, Toad 8.6, Informatica 8.0, IBM OS 390(V6.0), DB2 V7.1,, PL/SQL, Solaris 9/10, Windows Server 2003 & 2008.

We'd love your feedback!