We provide IT Staff Augmentation Services!

Big Data/hadoop Solution Architect/sr. Developer Consultant Resume

SUMMARY

  • Skilled in Big Data/Hadoop projects like HDFS, MapReduce, HBase, ZooKeeper, Oozie, Hive, Pig, Flume, Phoenix, Storm, Kafka, Spark, Shark, SparkSQL, Spark Streaming
  • Hadoop cluster designing, installation and configuration on Azure, AWS EC2 and On - Premise cluster using RHEL 6.5 and Centos 6
  • Skilled and experience designing search solutions using Solr and SiLK
  • Experience using data visualization/ BI tools like Tibco Spotfire, Jaspersoft, Google Visualization, Tableau, Pentaho
  • Handling different file formats on Parquet, ProtoBuff(Protocol Buffer), Apache Avro, Sequence file, JSON, XML and Flat file.
  • Skilled in Hadoop Administration, installation, configuring tweaking, managing and designing Hadoop clusters.
  • Skilled working with HDP 2.2, Ambari 1.7.0, Cloudera Manager CDM- 4 & 5
  • Good understanding of analytic tools like Tableau, Jaspersoft and Pentaho
  • Skilled in Object Oriented Analysis and Design (UML using Rational Rose) and database analysis and design (ERWIN and Oracle Designer) web application development using Java, J2EE, JSPs, Servlets, AJAX, JavaScript, Ext JS, JSON, XML and HTML.
  • Skilled in web technologies like Groovy, Grails, GORM, Spring Security
  • Confidential HDP 2.2 cluster design, installation and configuration.
  • Big Data/Hadoop projects like HDFS, MapReduce, HBase, ZooKeeper, Oozie, Hive, Pig, Flume, Phoenix, Storm, Kafka, Spark, Shark, SparkSQL, Spark Streaming, Solr
  • HBase data modeling and rowkey design to accommodate heavy reads and writes and avoid region hotspotting.
  • Data visualization using Tibco Spotfire, Jaspersoft, Tableau, Pentaho, Tibco Spotfire and google visualization API’s
  • Groovy on Grails 2.2, Spring MVC, MyBatis 3, Struts 1.3, Hibernate, Ext JS, and J2EE design patterns.
  • Build scripts using Ant, Maven.
  • Testing tools like JProbe 6.0 and Mercury’s Test Director and Load Runner.
  • Scrum, RUP, Waterfall, and Agile methodology.
  • Mobile application development using Android 2.2 on Eclipse.
  • Mobile web application using Sencha Touch 1.0

TECHNICAL SKILLS

Big Data - Hadoop: MapReduce, Storm, Kafka, Spark, Spark Streaming, HDFS, HBase, Phoenix, Hive, Impala, Flume, ZooKeeper, Oozie, Avro, Parquet, Protobuff (Protocol Buffers) SqoopHadoop Distributions Confidential HDP 2.1, 2.2, Cloudera 4 & 5 on Azure, AWS and On-Premise setup

Search Technologies: Solr, SiLK, ElasticSearch, Kibana

Hadoop Security: Knox, Ranger, AD (Active Directory), Kerberos, LDAP, Encryption in flight and at rest, Centrify, SED (Self Encrypted Disks)

Languages/Scripting: Java 1.7, Scala, Groovy 2, Perl, Shell Scripts, PL/SQL, Phoenix

Database: HBase, Phoenix, Cassandra, MongoDB, PostgreSQL, Oracle, MySQL, MS SQL Server, RRD, Actian Vortex

Frameworks: Spring, Spring MVC, Grails, Hibernate, Struts

Web Technologies: JSP, Servlets, J2EE, Ext JS, AJAX, JSON, XML, ChartDirector, EJB 3.0, JDBC, JNDI, XSL, Web Services, iBatis/MyBatis, Hibernate, JUnit, WSDL, SOAP, Spring Security.

Business Intelligence: Tibco Spotfire, Jaspersoft, Google Visualization, Tableau, Pentaho, Tibco Spotfire, QlikView

Operating Systems: Linux, FreeBSD, Ubuntu

Application Servers: Jakarta Tomcat, WebLogic, WebSphere

Design Tools: Rational Rose, MS Visio

Tools: IntelliJ IDEA, Eclipse, Tableau, Pentaho, Tibco Spotfire, Jaspersoft, Talend, QlikView

Version Control Systems: SVN (Subversion), Git, CVS

System Testing Tools: JProbe, Mercury Load Runner, JUnit

PROFESSIONAL EXPERIENCE

Confidential

Big Data/Hadoop Solution Architect/Sr. Developer Consultant

Responsibilities:

  • Reviewed Hadoop Cluster design in current environment to manage resource utilizations for different groups. The requirement was to accommodate multitenants and allocate resources based on priority and usage. The recommendation was to have hierarchical queues using YARN capacity scheduler.
  • Conducted interviews for environment requirements and mapped them back to the business requirements. The recommendations were to have Dev, Test and DR (cold/warm) clusters. The DR cluster was recommendations were based on enterprise DR policies.
  • Review existing data lake and captured both functional and non functional requirements and provided recommendations on
  • Data Ingestion framework to handle Realtime and Batch pay loads. This framework will have features
  • Source profile registration
  • Schema management
  • Data quality checks
  • Audit data pipeline/ingestion activity
  • Job scheduling and coordination
  • Ingestion UI and management console.
  • Notification services for open communication channel between registered applications
  • Support REST API’s
  • Data Security recommendations to administer, authentication, authorization and data encryption by introducing tools like Knox, Ranger and integration with Kerberos, Enterprise level ActiveDirectory.
  • Data Governance requirement analysis and recommended tools like Falcon and Atlas. This includes recommendations on
  • Define data security and data access policies
  • Track quality at every point in business process
  • Tracking data at every point of its use during its lifecycle
  • Data delivery, presentation and usability.

Technology: Confidential HDP 2.3 - Hadoop, MS Excel, Hive, HDFS, WebHDFS, WebHCat, ZooKeeper, Java, Shell Scripting

Confidential

Big Data/Hadoop Solution Architect/Sr. Developer Consultant

Responsibilities:

  • Hadoop Cluster design: Architected and designed HDP 2.2.6 cluster, initiated and executed installation in Azure environment.
  • TLog index and search
  • Installed and configured Solr 4.10.2 in cloud mode.
  • Configured to store indexes and data into HDFS.
  • Transform and validate tlog data using xsltproc and xmllint utilities
  • Load xml files into Solr using SimplePostTool
  • Designed search queries over REST and integrated .Net search application.
  • WebLog Analytics
  • Realtime log streaming using Flume-ng into HDFS
  • Created Hive tables on log data

Technology: HDP 2.2.6 - Hadoop, MS Excel, Hive, HDFS, WebHDFS, WebHCat, ZooKeeper, Java, Shell Scripting

Confidential

Big Data/Hadoop Solution Architect/Sr. Developer Consultant

Responsibilities:

  • Hadoop Cluster design: Architected and designed HDP 2.2.4 cluster, initiated and executed installation in Azure environment.
  • Designed solution for Loading and Transforming data using WebHDFS, Hive on TEZ and WebHCat and Pig.
  • Loaded policy and funds files produced by model runs on HPC (MS High Performance Computing) cluster into HDFS using WebHDFS REST api’s
  • Created Hive external STAGE tables and generated dynamic partitions on loaded files.
  • Moved STAGE data into ORC and SNAPPY compressed Hive managed table.
  • Transform and process data using Hive QL on Tez.
  • Power shell scripting to execute WebHDFS and WebHCat REST api’s
  • Data visualization using QlikView: Integrated QlikView and Excel with Hive over ODBC and Phoenix using ODBC-JDBC bridge solution.
  • Security: Installed and configured Knox and Ranger. Hive tables are secured using ACL’s. Working on integration of HDP with existing AD using Centrify

Technology: Hortonwork HDP 2.2.4 - Hadoop, QlikView, MS Excel, Hive, Pig, HDFS, WebHDFS, WebHCat, ZooKeeper, Knox, Ranger, Java, Spark, Shell Scripting, Centrify, Actian

Confidential, Houston TX

Big Data/Hadoop Solution Architect/Sr. Developer Consultant

Responsibilities:

  • Hadoop Cluster design: Architected and designed HDP 2.0 cluster design, initiated and executed installation and configuration in AWS and on premise setup.
  • ETL: Developed ETL solution using Talend big data integration.
  • Realtime event processing, data analytics and monitoring for high rate of penetration.
  • HBase data modelling using Phoenix SQL: Designed HBase tables with Phoenix for time series and depth sensor data. Used salted buckets to evenly distributed data across region servers. Used immutable and secondary local indexes for data access pattern with column qualifiers not part of primary key.
  • Data visualization using Tibco Spotfire: Integrated Spotfire with Hive over ODBC and Phoenix using ODBC-JDBC bridge solution.
  • Jaspersoft reports and charts: Integrated Jaspersoft using Phoenix JDBC driver and generated charts and reports.
  • Security: Installed and configured Knox and Ranger.

Technology: Hortonwork HDP 2.2 - Hadoop, Kafka, Storm, HBase, Phoenix, HDFS, ZooKeeper, Knox, Ranger, Java, Spark, Shell Scripting, Tibco Spotfire, Jaspersoft, JSON, Protobuff/Protocol Buffers, JUnit

Confidential

Big Data/Hadoop Solution Architect/Sr. Developer Consultant

Responsibilities:

  • HBase data modeling and rowkey design: Designed HBase tables for time series data. Designed rowkey to avoid region hotspotting and accommodate desired read access/query patterns, used FuzzyRowFilter for fast key search across hbase regions.
  • HBase using Phoenix SQL: Designed HBase tables with Phoenix for time series and depth sensor data. Used salted buckets to evenly distributed data across region servers. Used immutable and secondary local indexes for data access pattern with column qualifiers not part of primary key.
  • Realtime data streaming: Designed and developed solution for ESB pump realtime data ingestion using Kafka, Storm and HBase. This involves Kafka and storm cluster design, installation and configuration.
  • Realtime event monitoring: Developed storm-monitoring bolt for validating pump tag values against high-low and highhigh-lowlow values from preloaded metadata.
  • Utilities: Developed utility for loading pump tag meta-data used for warning or error generation.
  • Benchmarking Kafka producer to produce 1 million messages/second: Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used kafka producer 0.8.3 API’s to produce messages.
  • Install and configure Phoenix on HDP 2.1. Create views over HBase table and used SQL queries to retrieve alerts and meta data.
  • Data query and visualization: Used HBase API’s to get and scan events data stored in HBase.
  • Implemented Douglas Peucker - Decimation algorithm to reduce data size for a given epsilon.
  • Implemented 9 point smoothing algorithm to smooth metric values for generating a smooth pattern on visualization.
  • Predictive Analytics: Designed application to do predictive analytics for high tech and sophisticated equipment’s maintenance and life cycle.
  • Hadoop Cluster design: Cluster design, installation and configuration using HDP 2.1 stack on Azure, HDP 2.2 on AWS and HDP 2.2 on premise setup.
  • Security: Installed and configured Knox and Ranger.
  • Architect and designed big data solutions using HDP 2.2 hadoop stack.
  • Developed Kafka producer to produce 1 million messages per second.
  • Developed Kafka spout, HBase and enrichment bolts for data ingestion to HBase using HBase Client API’s and using Phoenix-SQL skin.
  • Designed, installed and configured HDP 2.2 hadoop cluster along with Knox and Ranger for security requirements; configured Kafka and Storm cluster to handle the load and optimize to get desired throughput.
  • Configured security using Knox and Ranger.
  • Cluster design for HDP 2.2 on AWS, Azure and on-premise setup.

Technology: Hortonwork HDP 2.2 - Hadoop, Kafka, Storm, HBase, Phoenix, HDFS, ZooKeeper, Knox. Ranger, Java, Spark, Shell Scripting, JSON, Protobuff/Protocol Buffers, JUnit

Confidential

Big Data/Hadoop Consultant - Sr. Developer & Solution Architect

Responsibilities:

  • Architect and designed big data solutions for new system on Cloudera 5 - Hadoop ecosystem. Used Cloudera Manager and Hue for design and development.
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Developed Storm spout and bolts for data ingestion to HBase and Hive.
  • Integrated JUnit, MRUnit, Bamboo and Cobertura code coverage
  • Big Data - Analytics: This team is focused to process web application events and web logs. It also process transactional data from PostgreSQL RDBMS. Web application logs are used for crash reporting and flow tracking.
  • Configured Flume streaming agent to load events (JSON format) real time from log files to HBase.
  • Used Sqoop to import transactional data from PostgreSQL into Hive.
  • Worked on MapReduce job for converting HBase JSON data into tuple format on HDFS.
  • Map only job is scheduled to filter, sort and aggregate events from HBase to HDFS. This produces a flat file, which is more fine-grained data. This data is then piped into another Map Reduce job to generate data for Cognos analytical reports.
  • MapReduce jobs are scheduled using Oozie workflow and coordinator engine to use data from 2 and 4 and produce data for report generation.
  • Debugging MapReduce jobs using job history logs, and syslog for tasks. Used Log4j logging API’s and Counters for debugging failed jobs.
  • Create Hive managed and external tables, UDF and HiveQL
  • Generated MRUnit tests for MapReduce jobs.
  • Supporting production jobs as and when needed.
  • Fine-tuning and enhance performance MapReduce jobs.
  • Web Application: This application is developed using Groovy, Grails, GORM, RESTful web services, GSP. I worked on modeling using GORM, designing and developing restful web services, admin console and security (authentication and authorization) using Spring Security.
  • Review and estimate SCRUM user stories, create tasks in JIRA. Analysis and design using UML.
  • Involved in design of Big Data-Hadoop components like HBase, Hive and Oozie
  • Develop Big Data-Hadoop MapReduce jobs along with HDFS, HBase, Hive, Flume projects.

Hire Now