Big Data/hadoop Solution Architect/sr. Developer Consultant Resume
Milwaukee, WI
SUMMARY:
- An experienced Big Data - Hadoop Architect/Sr. Developer with varying level of expertise around different Hadoop projects like HBase, Phoenix, Storm, Kafka, Hive, Flume, ZooKeeper, Oozie, Pig, Kafka, Ranger. Good experience writing Talend, Spark, SparkSQL and Hadoop - MapReduce jobs.
- Over 14+ years of experience in Oil and Natural Gas, Telecom, Security, Healthcare, Insurance, Finance and Retail domains.
- Creative and innovative; Proficient in verbal and written communication skills; Can initiate work, play a pivotal role in a team; Quick learner and always zealous to learn new technologies.
- Currently assisting multiple customers in Oil and Natural Gas industry with their big data effort and rolling out big data solutions by transforming, loading and processing large diverse datasets using NoSQL and Hadoop ecosystem.
- Skilled in Big Data/Hadoop projects like HDFS, MapReduce, HBase, ZooKeeper, Oozie, Hive, Pig, Flume, Phoenix, Storm, Kafka, Spark, Shark, SparkSQL, Spark Streaming
- Hadoop cluster designing, installation and configuration on Azure, AWS EC2 and On-Premise cluster using RHEL 6.5 and Centos 6
- Skilled and experience designing search solutions using Solr and SiLK
- Experience using data visualization/ BI tools like Tibco Spotfire, Jaspersoft, Google Visualization, Tableau, Pentaho
- Handling different file formats on Parquet, ProtoBuff(Protocol Buffer), Apache Avro, Sequence file, JSON, XML and Flat file.
- Skilled in Hadoop Administration, installation, configuring tweaking, managing and designing Hadoop clusters.
- Skilled working with HDP 2.2, Ambari 1.7.0, Cloudera Manager CDM- 4 & 5
- Good understanding of analytic tools like Tableau, Jaspersoft and Pentaho
- Skilled in Object Oriented Analysis and Design (UML using Rational Rose) and database analysis and design (ERWIN and Oracle Designer) web application development using Java, J2EE, JSPs, Servlets, AJAX, JavaScript, Ext JS, JSON, XML and HTML.
- Skilled in web technologies like Groovy, Grails, GORM, Spring Security
- Experienced in:
- Confidential HDP 2.2 cluster design, installation and configuration.
- Big Data/Hadoop projects like HDFS, MapReduce, HBase, ZooKeeper, Oozie, Hive, Pig, Flume, Phoenix, Storm, Kafka, Spark, Shark, SparkSQL, Spark Streaming, Solr
- HBase data modeling and rowkey design to accommodate heavy reads and writes and avoid region hotspotting.
- Data visualization using Tibco Spotfire, Jaspersoft, Tableau, Pentaho, Tibco Spotfire and google visualization API’s
- Groovy on Grails 2.2, Spring MVC, MyBatis 3, Struts 1.3, Hibernate, Ext JS, and J2EE design patterns.
- Build scripts using Ant, Maven.
- Testing tools like JProbe 6.0 and Mercury’s Test Director and Load Runner.
- Scrum, RUP, Waterfall, and Agile methodology.
- Mobile application development using Android 2.2 on Eclipse.
- Mobile web application using Sencha Touch 1.0
AREAS OF PROFICIENCY:
- Architect and design Big Data/Hadoop solutions on projects like HDFS, MapReduce, Storm, Kafka, HBase, Phonenix, Hive, Pig, Flume, Spark, Shark, SparkSQL, Spark Streaming on Azure, AWS and on premise hadoop cluster.
- Web Application and Mobile Web Application software engineering architecture, design and development.
- Object Oriented Analysis and Design.
- UML based designing with Rational Rose
- 3/n Tier Business Logic Implementation
- Scrum Agile Software Development.
TECHNICAL SKILLS:
Big Data - Hadoop: MapReduce, Storm, Kafka, Spark, Spark Streaming, HDFS, HBase, Phoenix, Hive, Impala, Flume, ZooKeeper, Oozie, Avro, Parquet, Protobuff (Protocol Buffers) SqoopHadoop Distributions Confidential HDP 2.1, 2.2, Cloudera 4 & 5 on Azure, AWS and On-Premise setup
Search Technologies: Solr, SiLK, ElasticSearch, Kibana
Hadoop Security: Knox, Ranger, AD (Active Directory), Kerberos, LDAP, Encryption in flight and Confidential rest, Centrify, SED (Self Encrypted Disks)
Languages/Scripting: Java 1.7, Scala, Groovy 2, Perl, Shell Scripts, PL/SQL, Phoenix
Database: HBase, Phoenix, Cassandra, MongoDB, PostgreSQL, Oracle, MySQL, MS SQL Server, RRD, Actian Vortex
Frameworks: Spring, Spring MVC, Grails, Hibernate, Struts
Web Technologies: JSP, Servlets, J2EE, Ext JS, AJAX, JSON, XML, ChartDirector, EJB 3.0, JDBC, JNDI, XSL, Web Services, iBatis/MyBatis, Hibernate, JUnit, WSDL, SOAP, Spring Security.
Business Intelligence: Tibco Spotfire, Jaspersoft, Google Visualization, Tableau, Pentaho, Tibco Spotfire, QlikView
Operating Systems: Linux, FreeBSD, Ubuntu
Application Servers: Jakarta Tomcat, WebLogic, WebSphere
Design Tools: Rational Rose, MS Visio
Tools: IntelliJ IDEA, Eclipse, Tableau, Pentaho, Tibco Spotfire, Jaspersoft, Talend, QlikView
Version Control Systems: SVN (Subversion), Git, CVS
System Testing Tools: JProbe, Mercury Load Runner, JUnit
PROFESSIONAL EXPERIENCE:
Confidential, Milwaukee, WI
Big Data/Hadoop Solution Architect/Sr. Developer Consultant
Responsibilities:
- Reviewed Hadoop Cluster design in current environment to manage resource utilizations for different groups. The requirement was to accommodate multitenants and allocate resources based on priority and usage. The recommendation was to have hierarchical queues using YARN capacity scheduler.
- Conducted interviews for environment requirements and mapped them back to the business requirements. The recommendations were to have Dev, Test and DR (cold/warm) clusters. The DR cluster was recommendations were based on enterprise DR policies.
- Review existing data lake and captured both functional and non functional requirements and provided recommendations on
- Data Ingestion framework to handle Realtime and Batch pay loads. This framework will have features
- Source profile registration
- Schema management
- Data quality checks
- Audit data pipeline/ingestion activity
- Job scheduling and coordination
- Ingestion UI and management console.
- Notification services for open communication channel between registered applications
- Support REST API’s
- Data Security recommendations to administer, authentication, authorization and data encryption by introducing tools like Knox, Ranger and integration with Kerberos, Enterprise level ActiveDirectory.
- Data Governance requirement analysis and recommended tools like Falcon and Atlas. This includes recommendations on
- Define data security and data access policies
- Track quality Confidential every point in business process
- Tracking data Confidential every point of its use during its lifecycle
- Data delivery, presentation and usability.
Operating System: Red Hat Linux
Technology: Confidential HDP 2.3 - Hadoop, MS Excel, Hive, HDFS, WebHDFS, WebHCat, ZooKeeper, Java, Shell Scripting
Tools: Ambari 2.1, Intellj IDEA, Git, MS Excel, MS Word, VI, vim, putty, ORC file formats.
Confidential, Oklahoma City
Big Data/Hadoop Solution Architect/Sr. Developer Consultant
Responsibilities:
- Hadoop Cluster design: Architected and designed HDP 2.2.6 cluster, initiated and executed installation in Azure environment.
- TLog index and search
- Installed and configured Solr 4.10.2 in cloud mode.
- Configured to store indexes and data into HDFS.
- Transform and validate tlog data using xsltproc and xmllint utilities
- Load xml files into Solr using SimplePostTool
- Designed search queries over REST and integrated .Net search application.
- WebLog Analytics
- Realtime log streaming using Flume-ng into HDFS
- Created Hive tables on log data
Operating System: Centos 6.6
Technology: Confidential HDP 2.2.6 - Hadoop, MS Excel, Hive, HDFS, WebHDFS, WebHCat, ZooKeeper, Java, Shell Scripting
Tools: Ambari 2.1, HDP 2.2.6, Intellj IDEA, Git, MS Excel, MS Word, VI, vim, putty, ORC file formats.
Confidential, Raleigh NC
Big Data/Hadoop Solution Architect/Sr. Developer Consultant
Responsibilities:
- Hadoop Cluster design: Architected and designed HDP 2.2.4 cluster, initiated and executed installation in Azure environment.
- Designed solution for Loading and Transforming data using WebHDFS, Hive on TEZ and WebHCat and Pig.
- Loaded policy and funds files produced by model runs on HPC (MS High Performance Computing) cluster into HDFS using WebHDFS REST api’s
- Created Hive external STAGE tables and generated dynamic partitions on loaded files.
- Transform and process data using Hive QL on Tez.
- Power shell scripting to execute WebHDFS and WebHCat REST api’s
- Data visualization using QlikView: Integrated QlikView and Excel with Hive over ODBC and Phoenix using ODBC-JDBC bridge solution.
- Security: Installed and configured Knox and Ranger. Hive tables are secured using ACL’s. Working on integration of HDP with existing AD using Centrify
Operating System: Centos 6.6
Technology: Hortonwork HDP 2.2.4 - Hadoop, QlikView, MS Excel, Hive, Pig, HDFS, WebHDFS, WebHCat, ZooKeeper, Knox, Ranger, Java, Spark, Shell Scripting, Centrify, Actian
Database: Oracle
Tools: Ambari 2.0, HDP 2.2.4, QlikView, WebHDFS, WebHCat, Intellj IDEA, Git, MS Excel, MS Word, vi, vim, putty, ORC file formats.
Confidential, Houston TX
Big Data/Hadoop Solution Architect/Sr. Developer Consultant
Responsibilities:
- Hadoop Cluster design: Architected and designed HDP 2.0 cluster design, initiated and executed installation and configuration in AWS and on premise setup.
- ETL: Developed ETL solution using Talend big data integration.
- Realtime event processing, data analytics and monitoring for high rate of penetration.
- Designed and implemented data streaming solution for drilling data using Kafka, Storm and HBase.
- This involves directory monitoring using java watch service API.
- Designed and developed storm monitoring bolt for validating pump tag values against high-low and highhigh-lowlow values from preloaded metadata.
- This metadata is auto trained based on the alert event values. The analysed data is then fed to the system and dynamically distributed across all monitoring bolts.
- Designed HBase tables with Phoenix for time series and depth sensor data. Used salted buckets to evenly distributed data across region servers. Used immutable and secondary local indexes for data access pattern with column qualifiers not part of primary key.
- Data visualization using Tibco Spotfire: Integrated Spotfire with Hive over ODBC and Phoenix using ODBC-JDBC bridge solution.
- Jaspersoft reports and charts: Integrated Jaspersoft using Phoenix JDBC driver and generated charts and reports.
- Security: Installed and configured Knox and Ranger.
Operating System: Red Hat Linux 6.5
Technology: Hortonwork HDP 2.2 - Hadoop, Kafka, Storm, HBase, Phoenix, HDFS, ZooKeeper, Knox, Ranger, Java, Spark, Shell Scripting, Tibco Spotfire, Jaspersoft, JSON, Protobuff/Protocol Buffers, JUnit
Database: Apache Derby
Tools: Ambari 1.7.0, Intellj IDEA, Git, MS Excel, MS Word, JIRA, Confluence, vi, vim, putty
Confidential, Houston, TX
Big Data/Hadoop Solution Architect/Sr. Developer Consultant
Responsibilities:
- Hadoop Cluster design: Architected and designed HDP 2.0 cluster; initiated and executed installation and configuration in AWS and On premise setup.
Operating System: Red Hat Linux 6.5
Technology: Hortonwork HDP 2.2 - Hadoop, Kafka, Storm, HBase, Phoenix, HDFS, ZooKeeper, Knox. Ranger, Java, Spark, Shell Scripting, JSON, Protobuff/Protocol Buffers, JUnit
Tools: Ambari 1.7.0, MS Excel, MS Word, JIRA, Confluence, vi, vim, putty
Confidential, Houston, TX
Big Data/Hadoop Solution Architect/Sr. Developer Consultant
Responsibilities:
- Architect and designed big data solutions using HDP 2.2 hadoop stack.
- Developed Kafka producer to produce 1 million messages per second.
- Developed Kafka spout, HBase and enrichment bolts for data ingestion to HBase using HBase Client API’s and using Phoenix-SQL skin.
- Designed, installed and configured HDP 2.2 hadoop cluster along with Knox and Ranger for security requirements; configured Kafka and Storm cluster to handle the load and optimize to get desired throughput.
- Configured security using Knox and Ranger.
- Cluster design for HDP 2.2 on AWS, Azure and on-premise setup.
Operating System: Red Hat Linux 6.5
Technology: Hortonwork HDP 2.2 - Hadoop, Kafka, Storm, HBase, Phoenix, HDFS, ZooKeeper, Knox. Ranger, Java, Spark, Shell Scripting, JSON, Protobuff/Protocol Buffers, JUnit
Database: MySQL
Tools: Ambari 1.7.0, Hue, Intellj IDEA, Git, MS Excel, MS Word, JIRA, Confluence, vi, vim, putty
Confidential, Boston, MA
Big Data/Hadoop Consultant - Sr. Developer & Solution Architect
Responsibilities:
- Inbound/Outbound file tracking and ETL: Confidential provides data analytic tools and data management service for various hospital clients. This system does ETL for different EMR and HL7 data files.
- Streaming data to Hadoop using Kafka
- Data ingestion to HBase and Hive using Storm bolts.
- Perform file and record level validation.
- Light data transformation to avro to achieve standard structure for fast processing and compact storage. These avro events are posted to Kafka topics.
- Storing file metadata in HBase for tracking. E.g. file name, no. of records etc.
- Using Camus framework, which is a Map only kafka consumer to read data from Kafka topics and generate avro files on HDFS.
- Dynamic Avro schema generation for new EMR files.
- Create external HIVE tables on avro files.
- Create Sqoop jobs to export data from HIVE to Oracle stage environment.
- Dedupe HL7: Confidential receives lots of duplicate segments as part of different HL7 messages, which leads into duplicates.
- Parsing HL7 messages using IBM parsers into different segment files.
- HL7 data is produced to Kafka and using storm spout and bolts then landed into HBase and Hive.
- Spark jobs run on HBase dataset. Different segment RDD’s are joined to produce logical data and then perform dedupe logic.
- After dedupe, this dataset is stored to HDFS and exposed using Hive external table.
- Sqoop job then exports deduped data from Hive to Oracle stage enviroment.
- Architect and designed big data solutions for new system on Cloudera 5 - Hadoop ecosystem. Used Cloudera Manager and Hue for design and development.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Developed Storm spout and bolts for data ingestion to HBase and Hive.
- Integrated JUnit, MRUnit, Bamboo and Cobertura code coverage
Operating System: Linux, Mac OS XTechnology Cloudera 5 - Hadoop, MapReduce, HDFS, HBase, Hive, Kafka, Storm, ZooKeeper, Oozie,, Java, Spark, Shark, SparkSQL, Shell Scripting, Avro, Parquet, JUnit, MRUnit.
Database: Oracle
Tools: Cloudera Manager, Hue, UML, Intellj IDEA, SVN, MS Excel, MS Word, Hudson, JIRA, Confluence, vi, vim, putty
Confidential, Alpharetta, GA
Senior Big Data/Hadoop Consultant
Responsibilities:
- Big Data - Analytics: This team is focused to process web application events and web logs. It also process transactional data from PostgreSQL RDBMS. Web application logs are used for crash reporting and flow tracking.
- Configured Flume streaming agent to load events (JSON format) real time from log files to HBase.
- Used Sqoop to import transactional data from PostgreSQL into Hive.
- Worked on MapReduce job for converting HBase JSON data into tuple format on HDFS.
- Map only job is scheduled to filter, sort and aggregate events from HBase to HDFS. This produces a flat file, which is more fine-grained data. This data is then piped into another Map Reduce job to generate data for Cognos analytical reports.
- MapReduce jobs are scheduled using Oozie workflow and coordinator engine to use data from 2 and 4 and produce data for report generation.
- Debugging MapReduce jobs using job history logs, and syslog for tasks. Used Log4j logging API’s and Counters for debugging failed jobs.
- Create Hive managed and external tables, UDF and HiveQL
- Generated MRUnit tests for MapReduce jobs.
- Supporting production jobs as and when needed.
- Fine-tuning and enhance performance MapReduce jobs.
- Web Application: This application is developed using Groovy, Grails, GORM, RESTful web services, GSP. I worked on modeling using GORM, designing and developing restful web services, admin console and security (authentication and authorization) using Spring Security.
- Review and estimate SCRUM user stories, create tasks in JIRA. Analysis and design using UML.
- Involved in design of Big Data-Hadoop components like HBase, Hive and Oozie
- Develop Big Data-Hadoop MapReduce jobs along with HDFS, HBase, Hive, Flume projects.
- Also developed web application using Groovy, Grails, GORM, Restful web services, GSP, Angular JS, Git, JIRA, Shell Scripts, JUnit, Mockito, Jenkins
Operating System: Linux, Mac OS X
Technology: Cloudera Hadoop, MapReduce, HDFS, HBase, Hive, Flume, ZooKeeper, Oozie, MongoDB, Java, J2EE, Groovy, Grails, GORM, Restful web services. GSP, Angular JS, Git, Shell Scripting, JSON
Database: PostgreSQL 9.2, MongoDB
Application Server: Apache Tomcat 7
Tools: Cloudera Manager, UML, Intellj IDEA, JUnit, Git, MS Excel, MS Word, Jenkins, Hudson, JIRA, Confluence, vi, vim, putty, Tableau
Confidential, Alpharetta GA
Senior Java Hadoop Developer
Responsibilities:
- Review functional requirements and create application design.
- The application design includes business modeling using UML. Use Case, Sequence, Class and Deployment diagrams depict the flow of the application.
- Develop application using Spring MVC, Java, J2EE, Web Services.
- Developed MapReduce jobs for analytic reports for a given profile.
Operating System: Linux
Technology: Big Data-Hadoop MapReduce, HDFS, HBase, ZooKeeper, Java, J2EE, Spring 3.0, Spring MVC, Web Services, SOAP, WSDL, XML, JAXB, JSON, AJAX, JavaScript, jQuery, Eclipse-JUNO, Java 1.6, J2EE, UML, Maven.
Database: HBase, Oracle 10g
Application Server: Apache Tomcat 6
Tools: UML, Eclipse JUNO, JUnit, Subversion, MS Excel, MS Word, Tableau
Confidential, Atlanta GA
Senior Java Developer and Designer
Responsibilities:
- Create application design based on the high level design and system requirements supplied by requirements team.
- The application design includes business modeling using UML. Use Case, Sequence, Class and Deployment diagrams depict the flow of the application..
- Develop application using Spring MVC, Android, Java, J2EE, Web services, Struts and Hibernate framework.
- Develop stored procedures using PL-SQL on Oracle, develop DAO components using Hibernate ORM framework for accessing configuration and transactional data from Oracle database.
- Generate build scripts using Ant
- Accomplished proof of concept for development of mobile application using Android 2.2 platform.
Operating System: Oracle Solaris 10, Ubuntu 9.10
Technology: Spring 3.0, Hibernate 4, STRUTS 1.2, Web Services, SOAP, WSDL, XML, JSON, AJAX, Android 2.2 with Eclipse, Java 1.6, J2EE, EJB3.0, UML, Ant and Maven, PL/SQL, jQuery.
Database: Oracle 10g.
Application Server: BEA Web Logic 10.3
Tools: MS Excel, MS Word, UML, Eclipse Helios, JUnit, Subversion
