We provide IT Staff Augmentation Services!

Bigdata Architect Resume

5.00/5 (Submit Your Rating)

SUMMARY

  • 16+ years of work experience in Architecting, Installing, Configuring, Analyzing, Designing, Integrating, re - engineering, and developing highly sophisticated Software Systems, which consist of 4+ Yrs in Bigdata Space, 6+ Yrs in Data warehousing and Business Intelligence.
  • Overall understanding of BigData Technologies, Analytics, Data warehousing Concepts, Business Intelligence, Cloud platforms, and support. Demonstrable knowledge of Hadoop, Spark, Map Reduce (MR), HDFS, HBase, Hive, Scoop, Flume, Ambari and Oozie.
  • Installed and configured Hadoop in Cloud thru EC2 from AWS and created Multi Node - Clusters in Cloud. Good Knowledge and Experience in Mint, Ubuntu - Linux, Scripting - Pig. Experienced in Hortonworks Distribution.
  • Extensively worked on IBM BigInsights 4.2.1, using BigSQL, BigSheets and Adaptive Mapreduce to prove the High availability and fault tolerance on clients requirements. Worked in supporting the increase of Performance in BigSQL, have an understanding of Analyzing statistics, WLM capabilities, Access Plan, Optimizer and Benchmarks on BigSQL
  • Hands on Implementing the Security plan under the AD-Groups, LDAP and Batch ID, used Kerberos, Knox, and Ganglia.
  • Hadoop cluster environment administration that includes adding and removing nodes from Cluster. Very good understanding on NameNode, DataNode, Secondary NameNode, YARN (ResourceManager, NodeManager, WebAppProxy), and Map Reduce Job History Server
  • Have a good awareness on ElasticSearch, Kibana, IBM PureData (Netezza), IBM Watson and IBM Blue mix and have good technical understanding on few Bigdata distributions like Cloudera, MapR and HD-Insights.
  • Involved in Pre-Sales Activity and also in Product evaluation, and can provide recommendation for technology stack and target architecture within Hadoop ecosystem
  • Experience in DatawareHousing (ETL/ELT) - in developing and designing application using Informatica and IBM DataStage. Good understanding of OLTP & OLAP concepts.
  • Proficient in Business Intelligence using SAP (Business Objects) BO/BI - Designing Universes, preparing Dash Board Reports.
  • Project Management Experience: Worked in Phases of Initiating, Planning, Executing, Monitoring & Controlling on mostly on IT Projects, in wide range of Domains/Applications. Extensively worked in IT Service Management (ITSM) Projects.
  • Hands on with Project Schedule, Risk, Resource, Scope, Integration - Managements. Experience in SDLC (Software Development Life Cycle), Agile / Scrum & in ITIL Process.
  • Stay current on emerging tools, techniques and technologies. Strong knowledge of Dimensional Data Modeling like Star and Snowflake schemas and knowledge in designing/Modelling tools like Embarcadero, Erwin and Power Designer.

TECHNICAL SKILLS

BigData Frame work: IBM BigInsights 4.2, Horton Works (HDP 2.1), Cloudera(CDH 5, Impala 1.3), HD-Insights, MapR

ETL Tools: Syncsort, Informatica 9.x/8.x, Informatica Power Exchange, IBM DataStage

NO SQL: HBase, Cassandra, Mongo DB

BI Tools: SAP BO (Webi, Deski), Qlikview

BigData Tools &Technologies: Hadoop (HDFS, MapReduce) 2.4.6, Pig, Hive, Spark, Ambari, Flume, SQOOP, Kafka, Storm, Knox, Ganglia and Zookeeper

Analytics: R, Statistical Modelling, Predictive Analytics

RDBMS: DB2, MS-SQL Server, and MySQL

OS: Windows, Linux / Unix, Kerberos Security

Data Model: Dimensional Data Modeling, Star Join Schema Modeling, Snow-Flake Modeling, FACT and Dimensions Tables, Physical and Logical Data Modeling

Language / Tools: Java, Ecllipse, MPP, CA-7, Endevor, File Aid, Xpeditor, Changeman, Infoview, Infoman, HP Service Desk, File Manager, Sysdebug, TSO/ISPF, SPUFI, MS-Visio

Mainframes: Cobol, JCL, VSAM, CICS

PROFESSIONAL EXPERIENCE

Confidential

BigData Architect

Responsibilities:

  • Designing end to end Hadoop solution covering Data Ingestion, storage, processing, analytics and reporting, Assessing the requirements and come up with right set of Hadoop tools
  • Mapping the current data structure to the future state with a realistic transition plan and roadmap that integrates with the strategic objectives of the company.
  • Design tools required to measure data accuracy, latency, etc., work with tools team to ensure implementation and delivery. Evangelize Confidential architecture across functional teams
  • Develop validation frameworks, proactive monitoring solutions to detect data ingestion failures in big data platform and take appropriate remedies
  • Analyze and extract key insights and patterns from Blizzard's rich collections of petabytes of gameplay and operational data. Design and develop innovative predictive models for applications like matchmaking, recommendation systems, targeted marketing, etc
  • Given Ideas on machine-learning technology to build and train predictive analytics apps such as classification, recommendation, and personalization systems used R Analytics to develop Algorithms.
  • Work closely with data and systems engineers to deploy and maintain models seamlessly on production systems
  • Done POC’s on emerging Big Data technologies, will be using most of Distributions available in market like MapR, HDInsights, IBM BigInsights, HDP and etc.,
  • Used NO SQL Databases such as Cassandra, Apache Hbase, little bit of Couch DB also analyzed on SQL Server 2016, have got ideas on Polybase and etc.,
  • Worked on areas such as Data Ingestion in Hbase, Data Visualization using MSBI, Tableau and hands with Data Analytics using R. Underwent POC on using SPARK Streams instead of Infosphere Streams.

Environment: IBM BigInsights 4.2, Horton Works (HDP 2.1), MapR, Big SQL, Bigsheets, Hadoop 2.7 (HDFS, Mapreduce), PIG v 0.8, HBase, Casandra, HIVE V0.13, Ambari, Syncsort, Flume, Sqoop, Zookeeper, R (Revolutionary Analytics), Oracle 10G, DB2, JSON, CSV and Oozie

Confidential - Johns Creek, GA

BigData Architect

Responsibilities:

  • Involved in implementation of lifecycle of a Hadoop solution, including requirements analysis, platform selection, technical architecture design, application design and development, testing, and deployment.
  • Responsible for data management (cleansing, trimming, seeing the correctness, introducing the headings if not available, fixing missing data, data integration, data profiling, mapping, data quality and etc.,) coming from different sources like RDBMS, CSV files and etc., to Hadoop Ecosystems. Worked in Multi node Hadoop cluster. Supported Map-reduce programs and apache spark logics.
  • Supported the Implementation of PDW Application using IBM BigInsights 4.0, Adaptive Mapreduce to prove the High availability and fault tolerance on clients requirements. Used Big sheets and Big SQL appropriately.
  • Worked in supporting the increase of Performance in BigSQL, have an understanding of Analyzing statistics, WLM capabilities, Access Plan, Optimizer and Benchmarks on BigSQL
  • Created internal and external tables on HIVE, created executed & tuned complex queries in Hive / BigSQL for ad hoc access, discussed with business/application teams for the accuracy on results.
  • Stored and Retrieved Files / Tables to a Hadoop cluster having multiple nodes, Used different Formats, ORC, Parquet, and Avro to load tables. Used snappy compression.
  • Involved in loading data from UNIX file system to HDFS solved in creating Hive, BigSQL tables, loading with data and which will run internally by invoking Mapreduce.
  • Ingested RDBMS data into Hadoop Clusters HDFS using SQOOP, and Ingested data from CSV & other file sources using Flume.
  • Created the over all Security Plan for the Business Users / Developers accessing the Hadoop Clusters, worked on setting the batch jobs, Created the Deployment plan. Hands on Implementing the Security plan under the AD-Groups, LDAP and Batch ID used Kerberos, Knox and Ganglia.
  • Assisted in 18 node Hadoop cluster environment administration that includes adding and removing nodes from Cluster. Very good understanding on NameNode, DataNode, Secondary NameNode, YARN (ResourceManager, NodeManager, WebAppProxy), and Map Reduce Job History Server
  • Obtaining complete knowledge of the physical database schema Preparing the files needed to load each table that has been designated to receive files from operational systems rather than direct input. Implemented consistent batch load process logic to run the periodical batch in short time frame window

Environment: IBM BigInsights 4.0.0, Coudera (CDH 5), Horton Works (HDP 2.1), Big SQL, Bigsheets, Hadoop 2.4.6 (HDFS, Mapreduce), PIG v 0.8, HIVE V0.13, Ambari, Syncsort, Flume, Sqoop, Zookeeper, R (Revolutionary Analytics), Oracle 10G, DB2, JASON, CSV, Oozie, and Data Studio

Confidential - NJ

DWBI / Hadoop Technical Lead /Architect

Responsibilities:

  • Involved in implementation of lifecycle of a Hadoop solution, including requirements analysis, platform selection, technical architecture design, application design and development, testing, and deployment.
  • Responsible for data management (cleansing, trimming, seeing the correctness, data integration, data quality and etc.,) coming from different sources like RDBMS, CSV files and etc., to Hadoop Ecosystems. Worked in Multi node Hadoop cluster.
  • Participated in Design and build scalable infrastructure and platform to collect and process very large amounts of data (structured and unstructured), including near real-time data.
  • Got involved on Hortonworks implementation the bigdata distribution for High availability and fault tolerance on clients requirements.
  • Created internal and external tables on HIVE, created executed complex queries in Hive for ad hoc access, discussed with business/application teams for the accuracy on results.
  • Hands on with the Hadoop Eco Systems Apache PIG, Oozie (Job Scheduling) HDFS and MapReduce, experience in installing and configuring the tools in Hadoop cluster, Ability to troubleshoot and tune logics/queries.
  • Sourcing the data from the operational systems, Applying the business transformation rules, Preparing a database-loadable file for the Data Warehouse Hadoop Environment.
  • Assisted in Hadoop cluster environment administration that includes adding and removing nodes from Cluster. Very good understanding on NameNode, DataNode, Secondary NameNode, YARN (ResourceManager, NodeManager, WebAppProxy), and Map Reduce Job History Server
  • Knowledge of the data sources, transformation rules, and uses of the data for the area of Confidential, Knowledge of the workload limitations of the Data Warehouse / Hadoop system for the area of Confidential Participation in design sessions chaired by data Administration team and/or IT personnel where decisions are made involving the transformation from source to target.
  • Installed and configured Hadoop in Cloud thru EC2 from AWS and created Multi Node - Clusters in Cloud.
  • Installed and configured Pig, Hive in the Multi Node Cluster (1 Master and 3 slaves) and executed Programs on Pig Scripts.
  • Configured the five daemons Name node, Data node, Job tracker, Task tracker and secondary Name node.
  • Parsed the twitter streaming messages and few metadata information are extracted from the tweet & stored in HDFS
  • Junk/special characters and the words are validated by GNU Spell check library using Pig Scripts
  • Splitted the messages into keywords and each keyword is rated against SWN (SentiWordNet) library using Pig.
  • Tweet messages are given a total rating based on all the positive and negative keywords Pig Programming.
  • Results are published in the preferred graphs and charts
  • Coded MapReduce Program using Python Scripts and tried to in corporate the required logic
  • Implemented a proof of concept for the system that is developed on Hortonworks framework HDFS, PIG, HIVE, using an existing framework built using Oracle databases 10G/11i with RAC. Worked on programming model with XML, JASON, CSV file formats
  • Designed and involved in building clusters in Hadoop, environment, implementation of migrating customer, Vertical market data to hadoop environment from Oracle - thru Horton Works Distribution. used SQOOP to bring the smaller set of dimension tables from Oracle EDW required for Analytics processing
  • Designed and developed PIG code for ETL/ELT to process the Customer and Transaction data according to Business rules
  • Worked in internal and external tables of HIVE, discussed with business teams and created Hive queries for ad hoc access
  • Involved in loading data from UNIX file system to HDFS solved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
  • Worked in creating MapReduce jobs to power data for search and aggregation

Environment: Hortonworks Distribution, Ubuntu 13.0, Hadoop 2.0.1 (HDFS, MapReduce), PIG v 0.8, HIVE V0.9.0, Sqoop, Oracle 10G, Informatica 9.01, Unix, XML, JASON, CSV and Oozie.

Confidential

DWBI / Project Tech Lead

Responsibilities:

  • Transition PMO: Involved in Implementing the governance, processes and procedures to manage communication, change and escalation during the transition period.
  • Maintained and involved in reviews of Project Management Plans like Communication Management, Risk Management, Issue Management, Quality Assurance, Deliverable Acceptance, Requirement Management, Configuration Management and Change Control.
  • Worked on End-To-End ETL solution for Confidential . Provided High level spec design to the BI team members. Extensively helped the team to work on Workflow Manager, Workflow Monitor and Worklet Designer to create, edit and run workflows, tasks.
  • Enhanced performance for Informatica session using large data files by using partitions, increasing block size, data cache size and target based commit interval.
  • Worked with the Data Modeler & Data Architect to come to a final decision on the Data approach and dimension model to be adopted. Used Massive Parallel Processing (MPP).
  • Software Management Migration/Removal: Managed the transfer of Windows Server Update Services (WSUS) from HP to Confidential, Removal of CAE Agent.
  • Tuning Informatica Mappings and Sessions for optimum performance. Assisted in creation and maintained several custom reports for the client using Cognos.

Environment: Informatica PowerCenter 9.0.1 (Source Analyzer, Data warehouse designer, Mapping Designer, Mapplet, Transformations, Workflow Manager, Workflow Monitor), DataStage, SAP Business Objects, Cognos, Qlikview, Erwin 3.5, PL/SQL, Oracle 10g/9i, Windows 2000, Project Management, MS-Project.

Confidential - Plano, TX

SAP BO-BI / PMO / Technical PM

Responsibilities:

  • Worked on Designing Universes for Ad-hoc query canned reporting, hands on experience in developing complex reports using Business Objects.
  • Maintained and involved in developing Desktop Intelligence, Web Intelligence, Web intelligence rich client and Crystal Reports using different Data Sources (SQL, and DB2).
  • Prepared reports for the Dash Board, which requires DAIR Log, tracking and to work with the Action plan and work towards the closure of the DAIR Item - Maintained the Change Log.
  • Mapped the reporting requirements from business terms into SAP objects. Creation and maintenance of OLAP universes source from BEx queries.
  • Supported in writing and editing of SAP BEx queries using Query Designer. Writing of test scripts for accuracy, formatting, and performance.
  • Schedule Management: Extensively worked in Microsoft Project on defining the tasks for the Portal Roll out, worked with PM’s on all Sub Projects in Assigning resources to tasks.

Environment: Informatica Power Center 8 (Source Analyzer, Data warehouse designer, Mapping Designer, Mapplet, Transformations, Workflow Manager, Workflow Monitor), SAP Business Objects, Cognos, Qlikview, Erwin 3.5, PL/SQL, Oracle 10g/9i, Windows 2000, Project Management, MS-Project.

Confidential, WA

PMO / Project Analyst/ Technical PM

Responsibilities:

  • Worked on End-To-End ETL solution for Confidential . Provided High level spec design to the BI team members.
  • Risk & Issue Management: Worked with the PM’s on all tracks in identifying the risks under their tracks, contributed on the Qualitative analysis, Quantitative analyses.
  • Identifying the Weekly Variances any Activity which needs to be taken to the Project Managers of all Tracks to figure out the Cause of Variance(s), Anticipated Impacts, and Planned Corrective Action which would be a Mitigation or Contingency Plan.
  • Tuning Informatica Mappings and Sessions for optimum performance. Assisted in creation and maintained several custom reports for the client using Cognos.
  • Prepared reports for the Dash Board, which requires Earned Value Analysis, that is been generated by figuring out the Schedule Performance Index, Conceptual knowledge on Baseline, Rebase line, Interim Baseline
  • Generated Bugs Report, Identifying the Change Request, Minor Change Requests for the Internal Releases. Responsible for Work Scheduling for the approved (M)CR’s, will be doing routine Auditing in TFS.
  • Enterprise Architecture: Requirements, Technical, Project Management - Performed extensive work on Traceability Matrix with the help of tool called Enterprise Architecture(EA)

Environment: Informatica Power Center 8 (Source Analyzer, Data warehouse designer, Mapping Designer, Mapplet, Transformations, Workflow Manager, Workflow Monitor), Cognos, DataStage, Erwin 3.5, PL/SQL, Oracle 10g/9i, Windows 2000.

We'd love your feedback!