Big Data Architect Resume New Jersey - Hire IT People

SUMMARY

Very resourceful, high energy enterprise Big data architect offering 17+ years of expertise in architecture definition of large distributed systems, technical consulting, project management and technology implementation in Big Data solutions, Hadoop, Data Warehousing, Data Management and Application integration
Experienced in Big data solutions, Hadoop implementations for Banking and finance, communications, Logistic and retail.
5+ years in Architecting, Administering, Designing and deploying large scale big data solution using various Hadoop eco systems and NoSql databases
1+ years of experience on Cloud Platform (AWS)
Proficient with Apache kafka and Apache Spark
Experience in installing, configuring and testing Hadoop ecosystem components. Upgraded several Hadoop cluster to next stable version
Proficient in Data Architecture/DW/Bigdata/Hadoop/Data Integration and Operational Data Store, BI Reporting projects with a deep focus in design, development and deployment of BI and data solutions using open source and off the shelf BI tools like Platfora, Tableau, Splunk and Hunk
11+ years of experience using Oracle, SQL Server, Teradata, SQL, PL/SQL scripts and shell scripting
8+ years of experience in Data warehousing & ETL, Pulling data from various sources into Data Warehouses and Data Marts using Informatica Power Center 8.x/7.x/6.x
Dimensional Data Modeling of STAR, Snowflake. FACT, dimensions tables using various Data modeling tools like ERWIN, Oracle Designer. managing the Technical Specifications, Test Plans and creating project implementation documents
Strong Knowledge in Logical/Physical Data model with Normalized/ De - normalized Databases on both OLTP /OLAP environment using different modeling tools.
Recommend new technology, policies or processes to benefit the organization and improve deficiencies in the project/organization

TECHNICAL SKILLS

Big Data Frameworks: Hadoop, HDFS, Ambari, Cloudera Manager, Hive, Pig, Impala, Spark(Spark Code, Spark Sql, Spark Streaming), Pyspark, MapReduce, Hcatalog, Sqoop, Oozie, Flume, Apache Kafka, Apache SOLR, Jupyter, Zeppelin

Cloud Tools: AWS (EMR, S3, EC2, Kinesis, DynamoDB, Athena, CloudFormation, VPC, Load Balancer, Aurora, IAM, CloudWatch), CloudFoundry

NoSql: Hbase, Cassandra

Distros: Apache, Cloudera Distribution, Hortonworks Distribution, Informatica BigData Edition

Data Analysis: Hive, Pig, Python

Data Warehousing: Informatica, Talend, Pentaho Data Integration, Informatica Big Data Edition, SSIS, Informatica Power Center, SAP BW, SSIS

Databases: MySql, Oracle, Green Plum, Sql Server, Oracle Application DBA, SQL, PL/SQL, Sybase, DB2, MPP Teradata, Teradata Aster, Teradata Loom, Fast export, TPT

Data Integration: Apache NiFi, Streamsets, Pentaho Kettle, Talend, Attunity

Data Modelling: Erwin, Microsoft Visio, Oracle Designer 2000, Enterprise Architect (EA), Logical, Physical and Relational Modeling, ER Diagrams, Dimensional Data Modeling (Star & Snowflake Schema)

Languages: Java, Python, Shell Scripting, XML, SQL/PL - SQL, C, Pro*C, Pro*COBOL, AWK, Perl

O/S: (Linux (RHEL, SUSE,), Sun Solaris, HP-UX, Windows, Mainframes.

Virtualization: VMWare, Virtual Box, Cloud Foundry

Visualization: Platfora, Splunk, Hunk, Apache Zeppelin, Tableau, Datameer, Paxata

Version Controlling: Git, PVCS, CVS, Sub-version

Scheduling: Oozie, Autosys, Control-M, Crontab

Others: CI/CD, DbVisualizer, TOAD, SQL Work bench, Aqua Data Studio, Toad MySql, Eclipse, XMLSpy, Spring tool suite, Java Script, VB Script

PROFESSIONAL EXPERIENCE

Big Data Architect

Confidential, New Jersey

Responsibilities:

Big data Architect responsible for Data Architecture, Design Datalake, Hadoop and BI requirements and defining the strategy, technical architecture, implementation plan, Development, management and delivery of Big Data applications and solutions.
Experience with Docker Containers and/or Kubernetes
Building Data Lake using AWS services, Monitoring and optimizing the data lake
Migrated On-premises Datalake to AWS Cloud platform
Ingesting wide variety of data like structured, unstructured and semi structured into the Big data eco systems with batch processing (Sqoop), Near real time streaming using Apache NiFi, Kafka and Flume
Developed High Speed BI layer on Hadoop platform with Kafka, Apache Spark and Python
Design, architecture and development of data analytics and data management solutions through PySpark
Developed Spark Streaming to get 5000 lease messages per second from Kafka and store the Streamed data to Hbase and HDFS
Installed, Configured NiFi Cluster and Completed end to end design and development of Apache NiFi flow which will ingest the data from various sources to Datalake (Hive ORC table and Hbase tables) and Splunk in near real time & batch processing
Installed, Designed, Configured and developed Apache Kafka solution to Ingested Multi-Channel Message Delivery Platform (MDP) data from Rsyslog to Solr
Ingested Abuse and DMCA emails to Solr using Flume and Morphalines, Later changed to Java API
Installed, Configured and developed Solr Collections for Abuse and DMCA that can collect and index all generated emails in real time and display them in one interface.
Administer and Maintain Hadoop cluster and its eco systems. Upgraded Horton Works Hadoop to 3.0 version from 2.3.4
Configured Apache Ranger to manage policies for access to files, folders,databases, tables & columns
Evaluate the integration tools and create POC to demonstrate stake holders
Configured mountable HDFS which enables users to access HDFS file system like traditional file system on Linux

Environment: AWS (EMR, S3, EC2, Athena, Kinesis, Cloudwatch, Aurora, VPC, CloudFormation ) HortonWorks Hadoop, Ambari, HDFS, Spark(Spark Core, Spark SQL and Spark Streaming), PySpark, Hive, LLAP, Pig, Kafka, Flume, Sqoop, Hbase, Solr, Splunk, Zeppelin, Pycharm, Jupyter, Apache NiFi, Streamsets, Python, Java, Eclipse, Spring boot Suite, Maven and UNIX Shell Scripting.

Big Data Solution Architect

Confidential, New Jersey

Responsibilities:

Collaborate with stakeholders on requirements and implementation approaches for addressing demand and challenges
Act as internal subject matter expert in the evolving landscape of enterprise analytics. Report on and research emerging market trends and risks inherent in current solutions and architectures. Also, maintain a domain roadmap that can adapt to changes in business demands.
Roadmap to Deploy and implement leading-edge analytics solutions including hardware, software, end-user tools and other data services
Assist end-users in analytic development activities such as research, evaluations, and prototyping

Big Data Architect

Confidential, New Jersey

Responsibilities:

Demonstrated personal expertise by participating as a Big Data SME
Designed and led a team (Offshore and On-site) to successfully develop and deliver a Big data projects
Responsible for managing the development and deployment of Hadoop applications
Secured Hadoop cluster using Kerberos KDC installation, OpenLDAP integration.
Installing and managing Cloudera distribution of Hadoop for POC applications and also Implemented Proof of concepts on Hadoop stack and different big data analytic tools
Work closely with customers, at a technical and user level, to design and produce solutions. Have discussions with vendors and plan for the use cases and demo the product
Used Streamsets to seamlessly transfer data to Hadoop and outside with less coding
Built a framework to read JMS queue for reference data using Kafka, flume and Spark
Installed, Developed, deployed high performance and large scale data analytics solutions using Apache Spark. Explored best solution suited for the application by going through all the options available in Spark
Involved in upgrading Cloudera CDH 5.5 to 5.7.1
Wrote Spark programs using Java and Scala, Java UDFs for Hive and Pig
Installed and configured Hbase and Ingested CPPD data using flume to Hbase
As part of EAP Platform designed, architected and developed below applications
Bullseye b) AML (Anti Money Laundering) c) CPPD (Customer Predictive & Preventative Dissatisfaction) d) FFS (Financial Full Suite)

Environment: Cloudera Hadoop, Cloudera Manager, HDFS, Hive, Spark, Spark on Hive, Spark Sql, Pig, Impala, Kafka, Flume, Sqoop, Hbase, Talend, Platfora, Datameer, Paxata, Streamsets, HDFS storage formats(JSON, Parquet, RC, ORC, Avro), Python, Java, Scala, Autosys, Jenkins, Eclipse, Maven, Amazon AWS EMR & EC2 and UNIX Shell Scripting.

Big Data Architect

Confidential, St. Louis

Responsibilities:

Responsible for planning and managing next-generation “Big-Data” system architectures
Responsible for managing the development and deployment of Hadoop applications
Subject matter expertise and demonstrable hands on delivery experience working on popular Hadoop distribution platforms like Horton Works, Cloudera
Installing and managing both Horton works and Cloudera distribution of Hadoop
Successfully upgraded Hortonworks 2.0 to 2.1 with minimum downtime, added more nodes, backed up data, Hive metastore, scripts using Chef and superputty.
Added High availability for Namenode, Zookeeper and Resource Manager.
Analyzed various sources systems of structured and unstructed data, Designed data architecture solution for scalability, high availability, fault tolerance.
Deployed Splunk apps, Extracted data from Splunk using Splunk Hadoop connector, Created dashboards and reports
Developed custom components in Informatica BIg Data Edition
Architect, design and implement high performance large volume data integration processes, NoSql database, storage, and other back-end services in fully virtualized environments.
Work closely with customers, at a technical and user level, to design and produce solutions. Have discussions with vendors and plan for the use cases and demo the product
Set architectural vision and direction across a matrix of teams
Improved Hive query performance using Spark on Hive, Tez and Vectorization

Environment: Horton works Hadoop, Cloudera Hadoop, Ambari, Cloudera Manager, HDFS, Hive, Spark on Hive, Hcatalog, Pig, Flume, Sqoop, Splunk, Hunk, Informatica BDM, Cassandra, Couchbase, Hbase, Pentaho Kettle, Talend, Tableau, Teradata Loom, Aster, and UNIX Shell Scripting.

Confidential

Lead of Big Data Operations Team

Responsibilities:

Develop Mapreduce and Hive programs to parse the raw data (Structured, Semi and Unstructed), populate the refined data in GreenPlum and Hadoop.
Implement big data solutions to analyze the email logs for compliance and discovery.
Develop multiple MapReduce/Hive/Pig jobs for data cleaning and preprocessing
Validate and Recommend on Hadoop Infrastructure and data center planning considering data growth and Assist with data capacity planning and node forecasting
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems
Conducted POC on Encryption tools like Voltage and Protegrity, helped organization to choose the best encryption tool.
Conduct POC to determine the best ETL tool for dealing with huge data, POC conducted on Informatica Big data Edition, Pentaho, DM Express, Data stage Big data version, Talend
Masking tool evaluation
Architect the Data flow from Source system to Hadoop and greenplum so that it will be easy for data scientists to perform analytical queries
Be a technical advisor and educate the team with new trends and features
As part of the Data Operations team handle the data needs of the below projects

We provide IT Staff Augmentation Services!

Big Data Architect Resume

New, JerseY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship