We provide IT Staff Augmentation Services!

Sr. Big Data Architect Resume

5.00/5 (Submit Your Rating)

Bronx, NY

SUMMARY:

  • Above 9+ years of Experience in Data Analysis, Data Modeling, Data Architect, Big Data/Hadoop professional with applied information Technology.
  • Proficient in Data Architecture/ DW/ Bigdata/ Hadoop/ Data Integration/ Master Data Management, Data Migration and Operational Data Store, BI Reporting projects with a deep focus in design, development and deployment of BI and data solutions using custom, open source and off the shelf BI tools
  • Experienced in Technical consulting and end - to-end delivery with architecture, data modeling, data governance and design - development - implementation of solutions.
  • Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, and MongoDB.
  • Expertise in configuring the monitoring and alerting tools according to the requirement like AWS CloudWatch.
  • Proficiency in Big Data Practices and Technologies like HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, Kafka.
  • Data Warehousing: Full life-cycle project leadership, business-driven requirements, capacity planning, gathering, feasibility analysis, enterprise and solution architecture, design, construction, data quality, profiling and cleansing, source-target mapping, gap analysis, data integration/ETL, SOA, ODA, data marts, Inman/Kimball methodology, Data Modeling for OLTP, canonical modeling, Dimension Modeling for data ware house star/snowflake design.
  • Experience in BI/DW solution (ETL,OLAP, Data mart), Informatica, BI Reporting tool like Tableau and Qlikview and also experienced leading the team of application, ETL, BI developers, Testing team .
  • Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes usingTalendIntegration Suite.
  • Worked on Informatica Power Center tools-Designer, Repository Manager, Workflow Manager.
  • Proficiency in multiple databases likeMongoDB, Cassandra, MySQL, ORACLE and MS SQL Server.
  • Extensive use of Talend ELT, database, data set, HBase, Hive, PIG, HDFS and SCOOP components.
  • Experience in Installation, Configuration, and Administration ofInformaticaPower Center 8.x, 9.1 Client/Server.
  • Experienced on Hadoop Ecosystem andBigDatacomponents including Apache Spark, Scala, Python, HDFS, Map Reduce, KAFKA.
  • Expertise in reading and writing data from and to multiple source systems such as oracle, HDFS, XML, delimited files, Excel, Positional and CSV files.
  • Experience in Business Intelligence (BI) project Development and implementation usingMicrostrategy product suits includingMicrostrategyDesktop/Developer, Web, Architect, OLAP Services, Administrator and Intelligence server.
  • Logical and physical database designing like Tables, Constraints, Index, etc. using Erwin, ER Studio, TOAD Modeler and SQL Modeler.
  • Good understanding and hands on experience with AWS S3 and EC2.
  • Good experience on programming languages Python, Scala.
  • Experience in Performance tuning ofInformatica(sources, mappings, targets and sessions) and tuning the SQL queries.
  • Excellent knowledge on creating reports on SAP Business Objects, Webi reports for multipledata providers.
  • Created and maintainedUDBDDL for databases, table spaces, tables, views, triggers, and stored procedures. Resolved lock escalations, lock-waits and deadlocks.
  • Experience in working with business intelligence anddatawarehouse software, including SSAS, Pentaho, Cognos Database, Amazon Redshift, or AzureData Warehouse.
  • Worked on Informatica Power Center tools-Designer, Repository Manager, Workflow Manager.
  • Logical and physical database designing like Tables, Constraints, Index, etc. using Erwin, ER Studio, TOAD Modeler and SQL Modeler.
  • Extensive ETL testing experience using Informatica 9x/8x, Talend, Pentaho.
  • Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT & Dimension tables.
  • Experience with relational (3NF) and dimensional data architectures. Experience in leading cross-functional, culturally diverse teams to meet strategic, tactical and operational goals and objectives.
  • Good exposure to BI reportingMicrostrategy8i, 9i & Tableau & SQL programming, RDBMS - Teradata, Oracle, and SQL server.
  • Expertise on Relational Data modeling (3NF) and Dimensional data modeling.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.

TECHNICAL SKILLS:

Big Data Hadoop: Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Scala, Akka, Kafka, Storm, Mongo DB.

Data Analysis/ Modeling Tools: Erwin R6/R9, Rational System Architect, IBM Infosphere Data Architect, ER Studio and Oracle Designer.

Database Tools: Microsoft SQL Server12.0, Teradata 15.0, Oracle 11g/9i/12c and MS Access

BI Tools: Tableau 7.0/8.2, Tableau server 8.2, Tableau Reader 8.1,SAP Business Objects, Crystal Reports Packages: Microsoft Office 2010, Microsoft Project 2010, SAP and Microsoft Visio, Share point Portal Server

Version Tool: VSS, SVN, CVS.

Project Execution Methodologies: Agile, Ralph Kimball and BillInmondatawarehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

ETL/Datawarehouse Tools: Informatica 9.6/9.1/8.6.1/8.1 , SAP Business Objects XIR3.1/XIR2, Web Intelligence, Talend, Pentaho.

Tools: OBIE 10g/11g/12c, SAP ECC6 EHP5, Go to meeting, Docusign, Insidesales.com, Share point, Mat-lab.

AWS: AWS, EC2, S3, SQS.

Operating System: Windows, Unix, Sun Solaris

RDBMS: Microsoft SQL Server14.0, Teradata 15.0, Oracle 12c/11g and MS Access.

Other Tools: TOAD, SQL PLUS, SQL LOADER, MS Project, MS Visio and MS Office, Have worked on C++, UNIX, PL/SQL.

PROFESSIONAL EXPERIENCE:

Confidential, Bronx, NY

Sr. Big Data Architect

Responsibilities:

  • Design/ architected and implemented complex projects dealing with the considerable data size (GB/ PB) and with high complexity.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive .
  • Implementation of Big Data ecosystem(Hive, Impala, Sqoop, Flume, Spark, Lambda)with Cloud Architecture.
  • Generate metadata, create Talend etl jobs, mappings to load data warehouse, data lake.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Used Talend for Big data Integration using Spark and Hadoop.
  • Used Microsoft Windows server and authenticated client server relationship via Kerbros protocol.
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Designed data analysis and visualization using BigSQL DSM and IBM Big Sheet.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Identify query duplication, complexity and dependency to minimize migration efforts
  • Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
  • Worked on Talend Magic Quadrant for performing fast integration tasks.
  • Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
  • Worked with Spark and Python.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Performed data profiling and transformation on the raw data using Pig and Python.
  • Experienced with batch processing of data sources using Apache Spark.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Involved in working of big data analysis using Pig and User defined functions (UDF).
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Implement enterprise grade platform(mark logic) for ETL from mainframe to NOSQL(cassandra).
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Worked on tools Flume, Storm and Spark.
  • Assigned name to each of the columns using case class option in Scala.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
  • Expert in performing business analytical scripts using Hive SQL.
  • Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP) and MapR.
  • Experience in integrating oozie logs to kibana dashboard.
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Used Spark SQL to process the huge amount of structured data.

Environment: Spark, YARN, HIVE, Pig, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL, Star Schema, Flume, Scala, oozie.

Confidential, Houston, TX

Big Data Architect / Data Engineer

Responsibilities:

  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Involved in installing Hadoop Ecosystem components.
  • Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
  • Wrote Sqoop scripts for moving data between Relational DBs, HDFS, and S3 storage and automated the workflow using Oozie.
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products
  • Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive.
  • Created high level and detaildatamodels forAzureSQL Databases, NonSQL databases, as well as the use of storages for logging anddatamovement between on-premisedatawarehouse and cloud vNets.
  • Designed both 3NF data models for ODS, OLTP, OLAP systems and dimensional data models using star and snow flake Schemas.
  • Developed and designeddataintegration and migration solutions inAzure.
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.
  • Ingest data into Hadoop / Hive/HDFS from different data sources.
  • UsedHiveto analyze data ingested intoHBaseby usingHive-HBaseintegration and compute various metrics for reporting on the dashboard.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Wrote ETL jobs to read from web apis using REST and HTTP calls and loaded into HDFS using Talend.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conductdataanalysis.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Imported data frequently from MySQL to HDFS using Sqoop.
  • Created Hive tables and working on them using Hive QL.
  • Developed logistic regression and clustering models to predict usage consumption and exceptions for utility company onAzureHDInsight with Spark using R.
  • Used manyazureservices includingAzureADFS,AzureAD,AzureAD B2C, SAML 2.0, OAuth 2.0,DataFactory, Service Bus, Application Insight, Redis Cashe,AzureSQL DB, Multi Factor Authentication, Traffic Manager, Storage,AzureApp Service,AzureASE, and many more.
  • Designed ETL process usingTalendTool to load from Sources to Targets through data Transformations.
  • Design of Redshift Data model, Redshift Performance improvements/analysis
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Advised the ETL and BI teams with design and architecture of the overall solution.
  • Created DDL scripts using ER Studio and source to target mappings to bring the data from source to the warehouse.
  • Work with Development, Storage and Network teams in installation and administration ofMongoDB in the IT Enterprise Environment.
  • Wrote and executed various MYSQL database queries fromPythonusingPython-MySQL connector and MySQLdb package.
  • DevelopedTalendESB services and deployed them on ESB servers on different instances.
  • Designed and implemented theMongoDBschema.
  • Created custom aggregations of client deliverables to meet requests usingPythonand pandas.
  • Developed shell scripts in UNIX environment to support scheduling of theTalendjobs.
  • Effectively usedInformaticaparameter files for defining mapping variables, workflow variables, FTP connections and relational connections.
  • Implemented Change Data Capture technology inTalendin order to load deltas to a Data Warehouse.
  • Finalize the naming Standards for Data Elements and ETL Jobs and create a Data Dictionary for Meta Data Management.
  • Produced PL/SQL statement and stored procedures in DB2 for extracting as well as writingdata.
  • Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs fordatacleaning and preprocessing.
  • Designed development of various reports and dashboards that can be easily accessible throughTableau.

Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, SQL Server 2008, SQL Server Analysis Services, Tableau, Oracle12c, Eclipse, Dynamo DB, PL/SQL and Python.

Confidential, Plano, TX

Sr. Data Analyst / Modeler

Responsibilities:

  • Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization
  • Worked very close withDataArchitectures and DBAteam to implementdatamodel changes in database in all environments.
  • Developed data Mart for the base data in Star Schema, Snow-Flake Schema involved in developing the data warehouse for the database.
  • Developed enhancements toMongo DBarchitecture to improve performance and scalability.
  • Created DDL scripts for implementing Data Modeling changes. Created ERWIN reports in HTML, RTF format depending upon the requirement, Published Data model in model mart, created naming convention files, co-coordinated with DBAs' to apply the data model changes.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Assigned name to each of the columns using case class option in Scala.
  • Experienced with batch processing of data sources using Apache Spark.
  • Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required.
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce.
  • Involved in data modelling to define the table structure inMDM system.
  • Implement enterprise grade platform (mark logic) for ETL from mainframe to NoSQL (cassandra)
  • Designed and developed Data Mapping Application for 30+ disparate source systems (COBOL, MS Sql Server, Oracle, and Mainframe DB2), using MS Access, and UNIX Korn script.
  • Extensively used Erwin as the main tool for modeling along with Visio.
  • Installed, configured and administeredJBOSS4.0 server in various environments.
  • Worked on Metadata Repository (MRM) for maintaining the definitions and mapping rules up to mark.
  • Created DDL scripts for implementing Data Modeling changes. Created ERWIN reports in HTML, RTF format depending upon the requirement, Published Data model in model mart, created naming convention files, co-coordinated with DBAs' to apply the data model changes.
  • Extensively used Erwin as the main tool for modeling along with Visio
  • Established and maintained comprehensive data model documentation including detailed descriptions of business entities, attributes, and data relationships.
  • Worked on Metadata Repository (MRM) for maintaining the definitions and mapping rules up to mark.
  • Developed the Conceptual Data Models, Logical Data models and transformed them to creating schema using ERWIN.
  • Performeddatacleaning anddatamanipulation activities using NZSQL utility.
  • Analyzed the physicaldatamodel to understand the relationship between existing tables.
  • Created a list of domains in Erwin and worked on building up the data dictionary for the company.

Environment: Erwin r8.2, Oracle SQL Developer, OracleDataModeler, Teradata 14, SSIS, Business Objects, SQL Server, ER/Studio Windows, MS Excel.

Confidential, New York City, NY

Data Analyst/Modeler

Responsibilities:

  • Developed the logical data models and physical data models that capture current state/future state data elements and data flows using ER Studio.
  • Delivered dimensional data models using ER/Studio to bring in the Employee and Facilities domain data into the oracle data warehouse.
  • Performed analysis of the existing source systems (Transaction database)
  • Involved in maintaining and updating Metadata Repository with details on the nature and use of applications/datatransformations to facilitate impact analysis.
  • Created DDL scripts using ER Studio and source to target mappings to bring the data from source to the warehouse.
  • Designed the ER diagrams, logical model (relationship, cardinality, attributes, and, candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata .
  • Worked in importing and cleansing ofdatafrom various sources like Teradata, Oracle, flatfiles, MS SQL Server with high volumedata
  • Designed Logical & Physical Data Model /Metadata/ data dictionary usingErwinfor both OLTP and OLAP based systems.
  • Reverse Engineered DB2 databases and then forward engineered them to Teradata using ER Studio.
  • Part of team conducting logical data analysis and data modeling JAD sessions, communicated data-related standards .
  • Involved in meetings with SME (subject matter experts) for analyzing the multiple sources.
  • Involved in SQL queries and optimizing the queries in Teradata.
  • Created DDL scripts using ER Studio and source to target mappings to bring the data from source to the warehouse.
  • Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server 2005 with high volume data.
  • Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
  • Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, SQL Server 2005 with high volume data
  • Worked extensively on ER Studio for multiple Operations across Atlas Copco in both OLAP and OLTP applications.
  • Used forward engineering to create a physical data model with DDL that best suits the requirements from the Logical Data Model.
  • Worked with the DBA to convert logical Data models to physical Data models for implementation.
  • Involved in preparing the design flow for theDatastageobjects to pull thedatafrom various upstream applications and do the required transformations and load thedatainto various downstream applications.

Environment: OracleDataModeler, Business Objects, Erwin r8.2, Oracle SQL Developer, SQL Server 2008, Teradata, ER/Studio, SSIS, Windows, MS Excel.

Confidential

Data Analyst

Responsibilities:

  • Responsible for the development and maintenance of Logical and Physical data models, along with corresponding metadata, to support Applications.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and PhysicalDataModels.
  • Created conceptual, logical and physical data models using best practices and company standards to ensure high data quality and reduced redundancy.
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Project involves production, test, and development administration and support for client's existing DB2UDBplatform running DB2UDBv9.1 and v8.2 on servers under various operating system.
  • Attended and participated in information and requirements gathering sessions
  • Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
  • Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN
  • Created and maintained Logical Data Model (LDM) for the project. Includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
  • Validated and updated the appropriate LDM's to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change.
  • Responsible for the development and maintenance of Logical and Physical data models, along with corresponding metadata, to support Applications.
  • Excellent knowledge and experience in Technical Design and Documentation.
  • Used forward engineering to create a physical data model with DDL that best suits the requirements from the Logical Data Model.
  • Involved in preparing the design flow for theDatastageobjects to pull thedatafrom various upstream applications and do the required transformations and load thedatainto various downstream applications.
  • Involved in preparing the design flow for theDatastageobjects to pull thedatafrom various upstream applications and do the required transformations and load thedatainto various downstream applications.
  • Performed logicaldatamodeling, physicaldatamodeling (including reverse engineering) using the ErwinDataModeling tool.

Environment: Oracle 9i, PL/SQL, Solaris 9/10, Windows Server, NZSQL,Erwin, Toad, Informatica, IBM OS 390(V6.0), DB2 V7.

We'd love your feedback!