Big Data Architect Resume
New, JerseY
SUMMARY
- Very resourceful, high energy enterprise Big data architect offering 17+ years of expertise in architecture definition of large distributed systems, technical consulting, project management and technology implementation in Big Data solutions, Hadoop, Data Warehousing, Data Management and Application integration
- Experienced in Big data solutions, Hadoop implementations for Banking and finance, communications, Logistic and retail.
- 5+ years in Architecting, Administering, Designing and deploying large scale big data solution using various Hadoop eco systems and NoSql databases
- 1+ years of experience on Cloud Platform (AWS)
- Proficient with Apache kafka and Apache Spark
- Experience in installing, configuring and testing Hadoop ecosystem components. Upgraded several Hadoop cluster to next stable version
- Proficient in Data Architecture/DW/Bigdata/Hadoop/Data Integration and Operational Data Store, BI Reporting projects with a deep focus in design, development and deployment of BI and data solutions using open source and off the shelf BI tools like Platfora, Tableau, Splunk and Hunk
- 11+ years of experience using Oracle, SQL Server, Teradata, SQL, PL/SQL scripts and shell scripting
- 8+ years of experience in Data warehousing & ETL, Pulling data from various sources into Data Warehouses and Data Marts using Informatica Power Center 8.x/7.x/6.x
- Dimensional Data Modeling of STAR, Snowflake. FACT, dimensions tables using various Data modeling tools like ERWIN, Oracle Designer. managing the Technical Specifications, Test Plans and creating project implementation documents
- Strong Knowledge in Logical/Physical Data model with Normalized/ De - normalized Databases on both OLTP /OLAP environment using different modeling tools.
- Recommend new technology, policies or processes to benefit the organization and improve deficiencies in the project/organization
TECHNICAL SKILLS
Big Data Frameworks: Hadoop, HDFS, Ambari, Cloudera Manager, Hive, Pig, Impala, Spark(Spark Code, Spark Sql, Spark Streaming), Pyspark, MapReduce, Hcatalog, Sqoop, Oozie, Flume, Apache Kafka, Apache SOLR, Jupyter, Zeppelin
Cloud Tools: AWS (EMR, S3, EC2, Kinesis, DynamoDB, Athena, CloudFormation, VPC, Load Balancer, Aurora, IAM, CloudWatch), CloudFoundry
NoSql: Hbase, Cassandra
Distros: Apache, Cloudera Distribution, Hortonworks Distribution, Informatica BigData Edition
Data Analysis: Hive, Pig, Python
Data Warehousing: Informatica, Talend, Pentaho Data Integration, Informatica Big Data Edition, SSIS, Informatica Power Center, SAP BW, SSIS
Databases: MySql, Oracle, Green Plum, Sql Server, Oracle Application DBA, SQL, PL/SQL, Sybase, DB2, MPP Teradata, Teradata Aster, Teradata Loom, Fast export, TPT
Data Integration: Apache NiFi, Streamsets, Pentaho Kettle, Talend, Attunity
Data Modelling: Erwin, Microsoft Visio, Oracle Designer 2000, Enterprise Architect (EA), Logical, Physical and Relational Modeling, ER Diagrams, Dimensional Data Modeling (Star & Snowflake Schema)
Languages: Java, Python, Shell Scripting, XML, SQL/PL - SQL, C, Pro*C, Pro*COBOL, AWK, Perl
O/S: (Linux (RHEL, SUSE,), Sun Solaris, HP-UX, Windows, Mainframes.
Virtualization: VMWare, Virtual Box, Cloud Foundry
Visualization: Platfora, Splunk, Hunk, Apache Zeppelin, Tableau, Datameer, Paxata
Version Controlling: Git, PVCS, CVS, Sub-version
Scheduling: Oozie, Autosys, Control-M, Crontab
Others: CI/CD, DbVisualizer, TOAD, SQL Work bench, Aqua Data Studio, Toad MySql, Eclipse, XMLSpy, Spring tool suite, Java Script, VB Script
PROFESSIONAL EXPERIENCE
Big Data Architect
Confidential, New Jersey
Responsibilities:
- Big data Architect responsible for Data Architecture, Design Datalake, Hadoop and BI requirements and defining the strategy, technical architecture, implementation plan, Development, management and delivery of Big Data applications and solutions.
- Experience with Docker Containers and/or Kubernetes
- Building Data Lake using AWS services, Monitoring and optimizing the data lake
- Migrated On-premises Datalake to AWS Cloud platform
- Ingesting wide variety of data like structured, unstructured and semi structured into the Big data eco systems with batch processing (Sqoop), Near real time streaming using Apache NiFi, Kafka and Flume
- Developed High Speed BI layer on Hadoop platform with Kafka, Apache Spark and Python
- Design, architecture and development of data analytics and data management solutions through PySpark
- Developed Spark Streaming to get 5000 lease messages per second from Kafka and store the Streamed data to Hbase and HDFS
- Installed, Configured NiFi Cluster and Completed end to end design and development of Apache NiFi flow which will ingest the data from various sources to Datalake (Hive ORC table and Hbase tables) and Splunk in near real time & batch processing
- Installed, Designed, Configured and developed Apache Kafka solution to Ingested Multi-Channel Message Delivery Platform (MDP) data from Rsyslog to Solr
- Ingested Abuse and DMCA emails to Solr using Flume and Morphalines, Later changed to Java API
- Installed, Configured and developed Solr Collections for Abuse and DMCA that can collect and index all generated emails in real time and display them in one interface.
- Administer and Maintain Hadoop cluster and its eco systems. Upgraded Horton Works Hadoop to 3.0 version from 2.3.4
- Configured Apache Ranger to manage policies for access to files, folders,databases, tables & columns
- Evaluate the integration tools and create POC to demonstrate stake holders
- Configured mountable HDFS which enables users to access HDFS file system like traditional file system on Linux
Environment: AWS (EMR, S3, EC2, Athena, Kinesis, Cloudwatch, Aurora, VPC, CloudFormation ) HortonWorks Hadoop, Ambari, HDFS, Spark(Spark Core, Spark SQL and Spark Streaming), PySpark, Hive, LLAP, Pig, Kafka, Flume, Sqoop, Hbase, Solr, Splunk, Zeppelin, Pycharm, Jupyter, Apache NiFi, Streamsets, Python, Java, Eclipse, Spring boot Suite, Maven and UNIX Shell Scripting.
Big Data Solution Architect
Confidential, New Jersey
Responsibilities:
- Collaborate with stakeholders on requirements and implementation approaches for addressing demand and challenges
- Act as internal subject matter expert in the evolving landscape of enterprise analytics. Report on and research emerging market trends and risks inherent in current solutions and architectures. Also, maintain a domain roadmap that can adapt to changes in business demands.
- Roadmap to Deploy and implement leading-edge analytics solutions including hardware, software, end-user tools and other data services
- Assist end-users in analytic development activities such as research, evaluations, and prototyping
Big Data Architect
Confidential, New Jersey
Responsibilities:
- Demonstrated personal expertise by participating as a Big Data SME
- Designed and led a team (Offshore and On-site) to successfully develop and deliver a Big data projects
- Responsible for managing the development and deployment of Hadoop applications
- Secured Hadoop cluster using Kerberos KDC installation, OpenLDAP integration.
- Installing and managing Cloudera distribution of Hadoop for POC applications and also Implemented Proof of concepts on Hadoop stack and different big data analytic tools
- Work closely with customers, at a technical and user level, to design and produce solutions. Have discussions with vendors and plan for the use cases and demo the product
- Used Streamsets to seamlessly transfer data to Hadoop and outside with less coding
- Built a framework to read JMS queue for reference data using Kafka, flume and Spark
- Installed, Developed, deployed high performance and large scale data analytics solutions using Apache Spark. Explored best solution suited for the application by going through all the options available in Spark
- Involved in upgrading Cloudera CDH 5.5 to 5.7.1
- Wrote Spark programs using Java and Scala, Java UDFs for Hive and Pig
- Installed and configured Hbase and Ingested CPPD data using flume to Hbase
- As part of EAP Platform designed, architected and developed below applications
- Bullseye b) AML (Anti Money Laundering) c) CPPD (Customer Predictive & Preventative Dissatisfaction) d) FFS (Financial Full Suite)
Environment: Cloudera Hadoop, Cloudera Manager, HDFS, Hive, Spark, Spark on Hive, Spark Sql, Pig, Impala, Kafka, Flume, Sqoop, Hbase, Talend, Platfora, Datameer, Paxata, Streamsets, HDFS storage formats(JSON, Parquet, RC, ORC, Avro), Python, Java, Scala, Autosys, Jenkins, Eclipse, Maven, Amazon AWS EMR & EC2 and UNIX Shell Scripting.
Big Data Architect
Confidential, St. Louis
Responsibilities:
- Responsible for planning and managing next-generation “Big-Data” system architectures
- Responsible for managing the development and deployment of Hadoop applications
- Subject matter expertise and demonstrable hands on delivery experience working on popular Hadoop distribution platforms like Horton Works, Cloudera
- Installing and managing both Horton works and Cloudera distribution of Hadoop
- Successfully upgraded Hortonworks 2.0 to 2.1 with minimum downtime, added more nodes, backed up data, Hive metastore, scripts using Chef and superputty.
- Added High availability for Namenode, Zookeeper and Resource Manager.
- Analyzed various sources systems of structured and unstructed data, Designed data architecture solution for scalability, high availability, fault tolerance.
- Deployed Splunk apps, Extracted data from Splunk using Splunk Hadoop connector, Created dashboards and reports
- Developed custom components in Informatica BIg Data Edition
- Architect, design and implement high performance large volume data integration processes, NoSql database, storage, and other back-end services in fully virtualized environments.
- Work closely with customers, at a technical and user level, to design and produce solutions. Have discussions with vendors and plan for the use cases and demo the product
- Set architectural vision and direction across a matrix of teams
- Improved Hive query performance using Spark on Hive, Tez and Vectorization
Environment: Horton works Hadoop, Cloudera Hadoop, Ambari, Cloudera Manager, HDFS, Hive, Spark on Hive, Hcatalog, Pig, Flume, Sqoop, Splunk, Hunk, Informatica BDM, Cassandra, Couchbase, Hbase, Pentaho Kettle, Talend, Tableau, Teradata Loom, Aster, and UNIX Shell Scripting.
Confidential
Lead of Big Data Operations Team
Responsibilities:
- Develop Mapreduce and Hive programs to parse the raw data (Structured, Semi and Unstructed), populate the refined data in GreenPlum and Hadoop.
- Implement big data solutions to analyze the email logs for compliance and discovery.
- Develop multiple MapReduce/Hive/Pig jobs for data cleaning and preprocessing
- Validate and Recommend on Hadoop Infrastructure and data center planning considering data growth and Assist with data capacity planning and node forecasting
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems
- Conducted POC on Encryption tools like Voltage and Protegrity, helped organization to choose the best encryption tool.
- Conduct POC to determine the best ETL tool for dealing with huge data, POC conducted on Informatica Big data Edition, Pentaho, DM Express, Data stage Big data version, Talend
- Masking tool evaluation
- Architect the Data flow from Source system to Hadoop and greenplum so that it will be easy for data scientists to perform analytical queries
- Be a technical advisor and educate the team with new trends and features
- As part of the Data Operations team handle the data needs of the below projects