We provide IT Staff Augmentation Services!

Sr. Big Data Architect Resume

4.00/5 (Submit Your Rating)

Washington, DC

SUMMARY

  • 10 years of IT Application, including Systems Analysis, Architect, Design, Development, Testing, Implementation, Maintenance, Business Analysis, Process and Migration.
  • Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra.
  • Experienced in Big Data Hadoop Eco System including Map Reduce, Map reduce 2, YARN, flume, Sqoop, Hive, Apache Spark, Scala
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Expertise in Big Data Tools like Map Reduce, Hive SQL, Hive PL/SQL, Impala, Pig, Spark Core, YARN, SQOOP etc.
  • Expertise in Distributed Processing Framework like MapReduce, Spark and Tez.
  • Expertise in architecting Big data solutions using Data ingestion, Data Storage
  • Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
  • Expertise in NOSQL databases like HBase, MongoDB.
  • Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
  • Expertise in data analysis, design and modeling using tools like Erwin.
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloudwatch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow.
  • Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3.
  • Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3.
  • Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Strong Experience in working with Databases like Oracle 12c/11g/10g, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
  • Experienced in using database tools like SQL Navigator, TOAD.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Experienced with Akka building high performance and reliable distributed applications in Java and Scala.
  • Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
  • Knowledge in configuration and managing - Cloudera’s Hadoop platform along with CDH3&4 clusters.
  • Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL
  • Experienced in developing web based GUIs using JavaScript, JSP, HTML, JQuery, XMLand CSS.
  • Experienced to develop enterprise applications with J2EE/MVC architecture with application servers and Web servers such as, JBoss, and Apache Tomcat 6.0/7.0/8/0.

TECHNICAL SKILLS

Languages: Java, J2EE, PL/SQL, Pig Latin, HQL, R, Python, XPath, Spark

Hadoop/Big Data: MapReduce,HDFS,Hive, Pig, HBase, Zookeeper, Sqoop,OozieScala, Akka, Kafka, Storm, Mongo DB.

Java/J2EE Technologies: JDBC, Java Script, JSP, Servlets, JQuery

No SQL Databases: Cassandra, mongo DB

Web Technologies: HTML, DHTML, XML, XHTML, JavaScript, CSS, XSLT, AWSWeb/Application servers Apache Tomcat6.0/7.0/8.0, JBoss

Frameworks: MVC, Struts, Spring, Hibernate.

Operating Systems: UNIX, Ubuntu Linux and Windows, Centos, Sun Solaris.

Network protocols: TCP/IP fundamentals, LAN and WAN.

Databases: Oracle 12c/11g/10g, Microsoft Access, MS SQL

PROFESSIONAL EXPERIENCE

Confidential - Washington DC

Sr. Big Data Architect

Responsibilities:

  • Developed Scala scripts, UDF's using bothDataframes/SQL and RDD/MapReduce in Spark 2.0.0 forDataAggregation, queries and writingdataback into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing ofdata.
  • Developed Oozie 3.1.0 workflow jobs to execute hive 2.0.0, sqoop 1.4.6 and map-reduce actions.
  • Provided thought leadership for architecture and the design ofBigDataAnalytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement aBigDatasolution.
  • Developed numerous MapReduce jobs in Scala 2.10.x forDataCleansing and AnalyzingDatain Impala 2.1.0.
  • Ingest Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • CreatedDataPipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions forbigdataproblems using MR1 and MR2 in YARN.
  • Load thedatafrom different sources such as HDFS or HBase into Spark RDD and implement in memorydatacomputation to generate the output response.
  • Worked Queues, Blobs, Containers to persist data on Azure.
  • Implemented Azure APIM modules for public facing subscription based authentication.
  • Developed complete end to endBig-dataprocessing in hadoop eco system.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products
  • Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
  • Implemented theBigDatasolution using Hadoop, hive and Informatica 9.5.1 to pull/load thedata into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Active involvement in design, new development and SLA based support tickets ofBigMachines applications.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Design of Redshift Data model, Redshift Performance improvements/analysis
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.

Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

Confidential - Englewood C

Sr. Big Data Architect

Responsibilities:

  • Developing predictive analytic using Apache Spark Scala APIs.
  • Implement enterprise grade platform (mark logic) for ETL from mainframe to NoSQL (cassandra).
  • Assigned name to each of the columns using case class option in Scala.
  • Experienced with batch processing of data sources using Apache Spark.
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
  • Responsible for importing log files from various sources into HDFS using Flume
  • Expert in performing business analytical scripts using Hive SQL.
  • Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP) and MapR 2.
  • Experience in integrating oozie logs to kibana dashboard.
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Used Spark SQL to process the huge amount of structured data.
  • DevelopedBigDatasolutions focused on pattern matching and predictive modeling.
  • Implementation of Big Data eco system (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture.
  • Working on Big Data transition using NoSQL, Python, Hadoop Eco System, Cloudera CDH5.1.0, Datameer, Spark, Hive and SQOOP.
  • Experience on BI reporting with At Scale OLAP for Big Data.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such asHadoop, Map Reduce Frameworks, HBase, Hive
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
  • Worked as a Hadoop consultant on (Map Reduce/Map Reduce2/Pig/HIVE/Sqoop).
  • Worked with Spark and Python.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Identify query duplication, complexity and dependency to minimize migration efforts
  • Technology stack: Oracle, Hortonworks HDP cluster, Attunity Visibility, Cloudera Navigator Optimizer, AWS Cloud and Dynamo DB.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Involved in business requirement analysis and technical design sessions with business and technical staff to develop end to end ETL solutions.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Performed data profiling and transformation on the raw data using Pig, Python, and Java.

Environment: AWS EMR, Redshift, Flume, Dynamo DB, PL/SQL, Python, Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, Zookeeper, MySQL, Eclipse.

Confidential, Atlanta,GA

Big Data Architect

Responsibilities:

  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Design of Redshift Data model, Redshift Performance improvements/analysis
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
  • Implemented theBigDatasolution using Hadoop, hive and Informatica 9.5.1 to pull/load thedata into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Active involvement in design, new development and SLA based support tickets ofBigMachines applications.
  • Developed Scala scripts, UDF's using bothDataframes/SQL and RDD/MapReduce in Spark 2.0.0 forDataAggregation, queries and writingdataback into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing ofdata.
  • Developed Oozie 3.1.0 workflow jobs to execute hive 2.0.0, sqoop 1.4.6 and map-reduce actions.
  • Provided thought leadership for architecture and the design ofBigDataAnalytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement aBigDatasolution.
  • Developed numerous MapReduce jobs in Scala 2.10.x forDataCleansing and AnalyzingDatain Impala 2.1.0.
  • Ingest Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • CreatedDataPipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions forbigdataproblems using MR1 and MR2 in YARN.
  • Load thedatafrom different sources such as HDFS or HBase into Spark RDD and implement in memorydatacomputation to generate the output response.
  • Developed complete end to endBig-dataprocessing in hadoop eco system.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products
  • Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
  • AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
  • Worked on writing Perl scripts covering data feed handling, implementingmark logic, communicating with web-services through SOAP Lite module and WSDL.
  • UsedHiveto analyze data ingested intoHBaseby usingHive-HBaseintegration and compute various metrics for reporting on the dashboard.

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Cassandra, Pig, Sqoop, Kafka, Apache, MySQL, Eclipse, Cassandra, Oozie, Impala, AWS EMR, Redshift, Flume, Dynamo DB, PL/SQL, Python, Cloudera, Zookeeper.

Confidential, Chicago, IL

Big Data Architect

Responsibilities:

  • Collaborate withdatascience teams in understandingdatapoints, acquiring them and loading them into HDFS from disparatedatasources.
  • Implemented ETL process to aggregate salesdatafrom retail sites, social media, blogs and reports, to curate and validate it and build pipelines.
  • Crawled clusters, catalogued and inferred meaning for each field to later crowd source and propagate using automation and provision to Hive tables.
  • Implemented Item categorization engine for cataloguing.
  • In thedataduration process, used Natural language processing tools and computer vision for processing SKU (Stock keeping Unit)
  • Worked with spark streaming for anomaly detection, brand detection and dynamic taxonomy.
  • Implemented Kafka to de-coupledatapipelines and helpdataflow in and out of Kafka through main producers, spark streaming engines on HDFS, NoSQL databases and analytic warehouse.
  • Developed multiple MapReduce programs injavafordataextraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
  • Developed Scala scripts,Dataframes and RDD in Spark forDataAggregation, queries and writingdatainto HBase database. Loading and transforming of large sets of structured, semi structured and unstructureddata.
  • Worked extensively with Spark core API's like Sparks, Glib & Graph.
  • Developing Apache Spark Scala API's, to make thedataeasily accessible to theDataScience teams, usingDataFrames and Datasets.
  • Developed and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports.
  • Debugging/Troubleshoot issues on UDF's in Hive Scheduling and managing jobs on a Hadoop cluster using Oozie work flow.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Expertise with Working on Hibernate for mapping thejavaobjects to relational database and SQL queries to fetch thedata, insert and update thedatafrom database.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing ofdata.
  • Performance tuning of Spark jobs using Higher level Spark core API's for code optimization.
  • Wrote naïve bay's and Decision tree for Customerdataclassification.

Environment: Horton Works Hadoop distribution, HDFS, Spark core API's (Sparks, Glib, Streaming, Graph), Pig, Hive, Oozie, flume, Sqoop,LINUX,Kafka.NoSQL databases: Hbase

Confidential, Minneapolis, MN

Big Data/ Java Developer

Responsibilities:

  • Involved in creating Hive tables, and loading and analyzingdatausing Hive queries. Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Responsible for managedatacoming from different sources. Storage and Processing in Hue covering all Hadoop ecosystem components.
  • Experience of working on Red-Gate SQL Response and SQLDataCompare used for monitoring Servers and SQL objects.
  • Experience on Snap Manager for a project to backup and restore databases. Involved in system design and development in CoreJavausing Collections, Multithreading. Movingdatabetween different AWS services and on premisedatasources using AmazonDataPipeline.
  • Experience on MapReduce programs on Amazon Elastic MapReduce framework by using Amazon S3 for Input and Output.
  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP) and Elastic MapReduce (EMR)
  • Implemented Spark using Scala and Spark SQL for faster testing and processing ofdata. Created EC2 virtual servers in VPC also RDS in VPC. Used the Teradata fast load/Multi load utilitiestoloaddataintotables.
  • Automated Build upon check in with Jenkins CI (continuous Integration). Implemented UDFs injavafor hive to process thedatathat can't be performed using Hive inbuilt functions.
  • Developed simple to complex UNIX shell/Bash scripting scripts in framework developing process. Involved in writing Flume and Hive scripts to extract, transform and load thedatainto Database.
  • Used Oozie to orchestrate the MapReduce jobs and worked with Catalog to open up access to Hive's Metastore.Used Web services for sending and gettingdatafrom different applications using Restful.
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive. Responsible for creation of unit test cases and test the same also involved to review test case of others and test the other modules on the bases of test case also close the defect found in unit test case.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing ofdata. Using Mahout MapReduce to parallelize a single iteration. Responsible for the implementation of application system with coreJavaand spring framework.
  • Setting up Amazon EMR. Installation, management and monitoring of Hadoop cluster using Cloudera Manager.
  • Experience in developing Pig Latin and HiveQL scripts forDataAnalysis. Experience with various performance optimizations like using distributed cache for datasets, partitioning, query optimization in Hive.
  • Automation script to monitor HDFS and Hbase through corncobs. Using AWS Cloud Formation templates to deploy other AWS service. Securing environment using AWS VPC.
  • Translation of Business Processes intodatamappings for building theDataWarehouse. Created Parquet Hive tables with ComplexDataTypes corresponding to the Avro Schema.
  • Processeddatawith Hive and Teradata, and developed web applications usingJavaand Oracle SQL. Develop hive/pig/Red shift scripts and hook them to Oozie for automation.

Environment: Cloud era's Hadoop Distribution 5.6,CoreJava, Jenkins, Teradata, Apache Parquet, Git, UNIX/Linux Shell, Scripting, Hbase.

Confidential

Java Developer

Responsibilities:

  • Developing solutions to requirements, enhancements and defects. Involved in requirements Design, Development, and System Testing. Developing UI screens using JSP, Servlets.
  • Developed J2EE Backing Beans, Action Classes, and Action Mapping and Application facades and hibernate classes to retrieve and submit using the JSF Framework.
  • Implemented the JSF package with MVC framework. Created multiple view access for access control between administrators and Adjusters.
  • Developed and utilized J2EE Services SERVLETS, JSP components. Implemented Action class to encapsulate the business logic. Used Struts framework for developing applications.
  • Gathered business requirements and wrote functional specifications and detailed design documents.
  • Wrote Mapreduce jobs for aggregation, joins and analytics.
  • Built an analytics back-end for US stocks to generate daily dash boards for corporate events, stocks on the move, price alerts based on predetermined cutoffs etc. Marketdatais sourced from Google finance using web APIs.
  • Mentored a group of application developers, assigned responsibilities, elaborated use cases, managed project schedules, and module targets.
  • Created the UI interface using JSP, JSF, JavaScript and JQuery. Worked on JavaScript for dynamic content to pages; utilized CSS for the front end.
  • Worked on the modernization of a legacy and outsourced UI. Technologies used were Backbone.js, node.js.
  • Built and deployedJavaapplications using MVC architecture using Struts 2, designed and developed Servlets, JSP for controller, View layers respectively where Servlets processed requests and transferred control to appropriate JSP.
  • Worked with the development of controller layer using MVC type 2 Framework. Enhanced the application performance by introducing Multi-threading using thread-state model and priority-based thread scheduler inJava.
  • Stored Procedures, database triggers were used at all levels. Communicating across the team about the processes, goals, guidelines and delivery of items.
  • Used Ibatis to populate thedatafrom the Database. Used various design patterns like Singleton, Facade, Command, Factory, DAO.
  • Used Object Oriented Application Design (OOA/D) for deriving objects and classes.Data-retrieval from back-end database usingDataSource from JDBC Drivers.

Environment: J2EE (Java1.4, JSP, SERVLETS), Eclipse, MS-SQL Server, T-SQL, Struts Framework, Web Logic, Tomcat Web Server, XML, JDBC, JNDI, ANT, Windows XP, JavaScript, UML, Horton Works Hadoop Distribution

We'd love your feedback!