We provide IT Staff Augmentation Services!

Big Data Architect Resume

2.00/5 (Submit Your Rating)

Tempe, AZ

SUMMARY

  • 7+ years of strong experience in Big Data Architect, Data Engineering using Hadoop ecosystem.
  • Bigdata Architect at Confidential working closely wif business and Technical teams.
  • Formulated architectural plans for mitigation purpose.
  • Supported integration of reference architectures and standards.
  • Utilized Big Data technologies for producing technical designs.
  • Prepared architectures and blueprints for Big Data implementation.
  • Evaluated and documented use cases and proof of concepts.
  • Created architecture components wif cloud and visualization methodologies.
  • Providing advisory services and thought leadership on modernizing analytics environments leveraging Cloud based platforms, big data technologies, including integration wif existing data and analytics platforms and tools (EDWs, data integration, BI platforms etc.)
  • Defining, designing, and implementing data access patterns for multiple analytical and operational workloads, across on - premises and Cloud based platforms
  • Creating data management solutions covering data security, data privacy, metadata management, multi-tenancy and mixed workload management across Hadoop and NoSQL platforms, spanning on-premises and Cloud based deployments
  • Operating a data warehouse environment
  • Securing data and data solutions
  • Operating data solutions in a cloud environment
  • Big Data analytics, Hadoop, Data Warehouses, NoSQL, etc
  • Experience wif AWS data technologies and infrastructure
  • Holds overall Architect responsibilities including roadmaps, leadership, planning, technical innovation, security, IT governance, etc
  • Coordinate wif the project teams as outlined in the Agile methodology providing guidance in implementing solutions at various stages of projects
  • Adopted innovative architectural approaches to leverage in-house data integration capabilities consistent wif architectural goals of the enterprise
  • Experienced in development of Big Data projects using HDFS, SPARK, java, Scala, Python, Hive, PIG, Impala, SQOOP, KAFKA, OOZIE, YARN, MapReduce, HBase, FLUME, Cassandra, ELK stack and AWS stack.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Performed a POC to compare processing time of Impala wif Apache Hive for batch applications to implement the former in project.
  • Developed technical presentations & proposals, & perform customer presentations
  • Evaluated new technologies, execute proof-of-concepts and develop specialized algorithms
  • Experience in developing applications using Big data, Java & J2EE technologies.
  • Extensive Knowledge in Java, J2ee, Servlets, JSP, JDBC, and EJB/MDB, JMS, Struts and spring Framework and web services development.
  • Adept in Spring & Hibernate and Expertise in developing Java Beans.
  • Working noledge of Web logic server clustering.
  • Proficient in various web based technologies like HTML, XML, XSLT, and JavaScript.
  • Expertise in unit testing using JUnit.
  • Implemented error logging and debugging using Log4J.
  • Strong noledge in creating/reviewing of data models dat are created in RDBMS like Oracle 10g, MySQL databases.
  • Experience in working wif versioning tools like GIT, TFS, CVS & Clear Case.
  • Goal oriented, organized, team player wif good interpersonal skills; thrives well wifin group environment as well as individually.
  • Strong business and application analysis skills wif excellent communication and professional abilities.
  • Good experience in Shell programming.
  • Developed Scala code to run SPARK jobs in Hadoop HDFS cluster.

TECHNICAL SKILLS

Languages: Java, Java script, Python, SQL, XML, HTML, Scala

J2EE Technologies: Servlets/JSP, Java Beans, JDBC, JMS, EJB, web services, GWT

Databases: Oracle 10g, DB2, TOAD, Mango DB.

Big data Technologies: Hadoop, Hive, Impala, MR, Solr, Spark, Kafka, Sqoop, Elastic search (ELK)

Cloud services: Amazon EMR, S3, AWS Glue, Athena, presto

No SQL: Cassandra, Hbase

EAI Technologies: Oracle SOA, BPEL, Tibco BW, Tibco EMS, Apache Camel

COTS: Oracle OSM 7.2.2

Application Servers: Tomcat 6, Weblogic 12.x, Jboss6.x, wildfly

Frame works: Struts1.2, Spring, Hibernate, Axis2, Jax-WS, Play, Akka

Operating Systems: Linux, UNIX, Windows 98/NT/2000/XP/Vista

Java IDE: Eclipse, EditPlus, and JDeveloper

Configuration tools: Git, VSS, Clear Case, StarTeam, SVN

Design Tools: Microsoft Visio

Testing Tools: SOAPUI

PROFESSIONAL EXPERIENCE

Big Data Architect

Confidential, Tempe, AZ

Responsibilities:

  • Developed and implemented platform architecture as per established standards.
  • Worked closely wif Business and Technical Architects in getting rid of multiple data producers to make orchestrion as single data producer.
  • Played Key role in creating National Bridge design to get rid of marchex which Confidential bought it for 5 years.
  • Formulated architectural plans for mitigation purpose.
  • Supported integration of reference architectures and standards.
  • Utilized Big Data technologies for producing technical designs.
  • Prepared architectures and blueprints for Big Data implementation.
  • Evaluated and documented use cases and proof of concepts.
  • Created architecture components wif cloud and visualization methodologies.
  • Evaluated and documented source system from RDBMS and other data sources.
  • Developed process frameworks and supported data migration on Hadoop systems.
  • TEMPPrincipal Solutions Architect responsible for Modern Data Architecture, Hadoop, Big data, data and BI requirements and defining the strategy, technical architecture, implementation plan, management and delivery of Big Data applications and solutions.
  • Providing technical thought leadership on Big Data strategy, adoption, architecture and design, as well as data engineering and modeling.
  • Working wif product owners, business SME and data ingestion and reporting architects to identify requirements and consolidate enterprise data model consistent wif business processes
  • Creating Proof of Concepts from scratch illustrating how these data integration techniques can meet specific business requirements reducing cost and time to market.
  • Responsible for assessing Applications, Big data, Hadoop, data and BI requirements and defining the strategy, technical architecture, implementation plan, and delivery of data warehouse and for establishing the long-term strategy and technical architecture as well as the short-term scope for a multi-phased Big data applications and data warehouse effort.
  • Implemented Data Lake Analytics platform leveraging AWS cloud and Hadoop technologies for providing the centralized Data repository or hub for Big Data Analytics and platforms.
  • Implementing of Big data solutions on the AWS Cloud platform
  • Involved in projects implementing Data Lake Architecture, Big Data Analytics and Modern Data warehouse applications.
  • Provided technical mentorship to teams and support management in identifying and implementing competency development measures
  • Collaborated wif Project Managers, Developers and business staff to develop products & services

Environment: Hadoop, Salesforce, AWS, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, GitHub, Talend Big Data Integration, Solr, Impala.

Sr Hadoop/Java/J2EE- Developer

Confidential

Responsibilities:

  • Application development in multiple languages wif multiple APIs
  • Providing advisory services and thought leadership on modernising analytics environments leveraging Cloud based platforms, big data technologies, including integration wif existing data and analytics platforms and tools (EDWs, data integration, BI platforms etc.)
  • Defining, designing and implementing data access patterns for multiple analytical and operational workloads, across on-premise and Cloud based platforms
  • Securing data and data solutions
  • Operating data solutions in a cloud environment
  • Big Data analytics, Hadoop, Data Warehouses, NoSQL, etc
  • Experience wif AWS data technologies and infrastructure
  • Working noledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
  • Performed Spark join optimizations, troubleshooted, monitored and wrote efficient codes using Scala.
  • Experienced wif batch processing of data sources using Apache Spark and Elastic search.
  • Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
  • Experienced to implement Cloudera distribution system.
  • Creating Hive tables and working on them for data analysis to cope up wif the requirements.
  • Developed a framework to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
  • Experienced in working wif Elastic MapReduce (EMR).
  • Developed Map Reduce programs for some refined queries on big data.
  • In-depth understanding of classic MapReduce and YARN architecture.
  • Worked wif business team in creating Hive queried for ad hoc access.
  • Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Implemented Hive UDF's to implement business logic.
  • Analyzed the data by performing Hive queries, SQL and Spark Streaming.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
  • Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
  • Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
  • Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System to pre-process the data.
  • Created detailed AWS Security groups which behaved as virtual firewalls dat controlled the traffic allowed reaching one or more AWS EC2 instances.
  • Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
  • Created HBase tables to store variable data formats coming from different portfolios Performed real-time analytics on HBase using Java API and Rest API.
  • Performed test run of the module components to understand the productivity.
  • Written Java program to retrieve data from HDFS and providing REST services.
  • Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
  • Experience in using version control tools like GITHUB to share the code snippet among the team members.
  • Worked on Maven 3.3.9 for building and managing Java based projects. Hands-on experience wif using Linux and HDFS shell commands. Worked on Kafka for message queuing solutions.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
  • Written HBASE Client program in Java and web services.

Environment: Java, J2EE, JSP, Spring, REST, Hadoop, Hive, Linux, DataStax Cassandra, Linux, Tomcat6, log4j, Eclipse, Spark, Scala, SVN, DB2, JAXB, Kafka, parquet, EMR, S3, Athena, Glue, Quicksight, ELK, Control, M.

Hadoop Developer

Confidential, Atlanta, GA

Responsibilities:

  • Working on moving on premises Hadoop environment to Amazon EMR and s3 as optional storage.
  • Implemented Web service calls for Different data integrations.
  • Implemented POC for publishing analytics data in Cassandra column family.
  • Implemented POC publish data in web-based dashboard (D3.js) by calling REST Web service calls for Different data integrations.
  • Implemented of aggregation solution using Spark, Cassandra, and tableau.
  • Preparing Design Documents (Request-Response Mapping Documents, Hive Mapping Documents).
  • Implemented of POC ETL solution using Spark, Cassandra, Alteryx and tableau.
  • Worked wif Dev-ops team for setting up quick AWS Hadoop environment.
  • Implemented of POC aggregation solution using Spark, HBase, and tableau. worked wif ingestion teams for defining ingestion process.
  • Developed application using JAVA, J2EE, JSP, spring
  • Involved in writing Map/Reduce jobs using java.
  • POCs on R, Python, SprakML to create Data analytics reports.
  • Designed and implemented a stream filtering system on top of Apache Kafka to reduce stream size.
  • Worked on a POC to compare processing time of Impala wif Apache Hive for batch applications to implement the former in project.
  • Involved in requirement and design activates.
  • Involved writing DAO call for Cassandra.
  • Designed and developed Web Services REST.
  • Reading messages from Kafka queue using spark streaming.
  • Involved in system, manual testing while integrating wif different data integration projects.
  • Involved in build and deployment activities process definitions.
  • Written hive quires and shell scripts for data integration. Worked as a member of the Big Data team for deliverables like design, construction, unit testing and deployment.
  • Involved in writing shell scripts for executing hive queries, loading data files into Hive tables.
  • Involved in gathering requirement and design.
  • Initial setup to receive data from external source.
  • Analysis and design on production views.
  • Involved in writing various user defined functions as per the requirements.
  • Translation of functional and technical requirements into detailed architecture and design
  • Responsible to manage data coming from various sources.
  • Implemented POC Real time data processing using Kafka - Spark integration (wif Scala), publishing in Elastic search.
  • Experienced in analyzing data wif Hive, Spark using Scala.
  • Did POC using play and Akka frameworks.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Spark, Kafka, LINUX, Cloudera, Java APIs, Java, collection, SQL, NoSQL, HBase, MongoDBSr Hadoop/Java/J2EE- Developer

Confidential, Mclean-VA

Responsibilities:

  • Used Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects.
  • Installation and Configuration of Hadoop Cluster. Working wif Cloudera Support Team to Fine Tune Cluster. Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform.
  • Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly. The plugin also provided data locality for Hadoop across host nodes and virtual machines.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka (version |0.8.2.2), Pig (0.12.0), Hive (version 0.10.0.) and Map Reduce (MR1 and MR2).
  • Collecting and aggregating large amounts of log data using Apache Flume (version 1.5.0) and staging data in HDFS for further analysis.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Implemented various MapReduce Jobs in custom environments and updating them to HBase tables by generating hive queries.
  • Real time streaming the data using Spark (version 1.4.0) wif Kafka (version 0.8.2.2).
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
  • Worked on tuning the performance Pig queries and involved in loading data from LINUX file system to HDFS. Importing and exporting data into HDFS using Sqoop (version 1.4.3) and Kafka.
  • Supported Map Reduce Programs those are running on the cluster. Gained experience in managing and reviewing Hadoop log files. Involved in scheduling Oozie (version 4.0.0) workflow engine to run multiple pig jobs.
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows. Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Computed various metrics using Java Map Reduce to calculate metrics dat define user experience, revenue etc.
  • Used NoSQL database wif HBase and Mongo db. Exported the result set from Hive to MySQL using Shell scripts.
  • Implemented SQL, PL/SQL Stored Procedures. Actively involved in code review and bug fixing for improving the performance.
  • Developed screens using JSP, DHTML, CSS, AJAX, JavaScript, Struts, spring, Java and XML.
Environment: Hadoop, Map Reduce, HDFS, Hive, Cassandra, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, AWS, GitHub, Talend Big Data Integration, Solr, Impala.

We'd love your feedback!