We provide IT Staff Augmentation Services!

Big Data Architect Resume

5.00/5 (Submit Your Rating)

Cincinnati, OH

PROFESSIONAL SUMMARY:

  • Around 10 years of experience in software development and analysis. Practical experience in building industry specific Java applications and implementing Big - data technologies such as Apache Hadoop and No-SQL database DataStax Cassandra.
  • Configured, tuned and setup tools like Apache Kafka, PIG, SQOOP, Hive, HBase and OOZIE in various Hadoop distributions for industry specific needs.
  • Implemented Java APIs and created custom Java programs for full-fledged utilization of Hadoop and its related tools.
  • Collaborated with line of business in understanding the needs to build the Big Data environment for the organization. Implemented the tools and technologies for creating clusters and developing data archives and data warehouses. Orchestrated the deployment of the environment by working with enterprise wide teams
  • Experienced in installing, configuring and administering Hadoop cluster of major Hadoop distributions and optimize performance by tuning the cluster for better results.
  • Worked with multiple distributions of Hadoop including enterprise version of IBM BigInsights, Cloudera (CDH4, CDH5, CDH 6), Hortonworks HDP (v2.1,v2.2) and open source Apache Hadoop
  • : Architected and implemented both On Premise and cloud distributions of real time messaging system Kafka. Hands on experience installing Kafka distributions from Apache, Confluent and Amazon Web Services
  • Architected and implemented in knowing the traditional storage systems to design and implement the required data ingestion procedures for the Hadoop clusters that included migration tools such as SQOOP and Talend.
  • Performed data modeling and analysis on HDFS and NoSQL databases. Provided cluster tuning to derive optimal results
  • Created scripts for performing data-analysis with PIG, HIVE and IMPALA. Generated reports, extracts and statistics on the distributed data on Hadoop cluster. Generated Java APIs for retrieval and analysis on No-SQL database such as HBase, Cassandra and MongoDB.
  • Built MapReduce codes in Java that are customized for the requirement. Created User Defined Functions (UDFs), User Defined Aggregate Functions (UDAFs) in PIG, Hive. Used Hcatalog for simple query execution. Composed code and created the JAR files unavailable in PIG and Hive. Used automation tool in Maven while composing and creating the JAR files for custom tasks.
  • Developed SPARK applications using Scala for easy Hadoop transitions in future. Text search analysis was implemented using Lucene and Elasticsearch. Gained hands-on experience in implementing Kafka for a persistent messaging system.
  • Created applications in core Java, built application that satisfy use of database and constant connectivity such as a client-server model using JDBC, JSP, Spring and Hibernate. Implemented web-services for network related applications in java.
  • Created front end user interface using HTML, CSS and JavaScript along with validation techniques. Implemented Ajax toolkit for validation with GUI. Worked with image editing tools such as Photoshop and Adobe Lightroom.

TECHNICAL SUMMARY:

Big Data Technologies: IBM BigInsights, Cloudera (CDH4/CDH5/CDH6), Hortonworks HDP (v2.1/v2.2), Apache Hadoop and Datastax Cassandra.

Tools in Big Data: HDFS, MapReduce, Hadoop2/YARN, PIG, HIVE, HBASE, SQOOP, FLUME, OOZIE, HUE, Lucene, ElasticSearch, Kafka, IMPALA, SPARK, ZooKeeper, Mahout, Oozie, Azkaban.

Languages: Java, Python, PHP, Scala (Spark), PIG-Latin, HQL,SQL, PL-SQL, C++, C#, C# .Net, ASP .Net (3.5 and 4), Ajax toolkit (3.5 and 4).

Web Technologies: HTML, XML, CSS, JavaScript, JSP, JDBC, Maven, AJAX

Reporting Tools/ ETL Tools: Tableau, Power View for Microsoft Excel, Informatica

Operating Systems: Windows, Linux, Unix, Red Hat Enterprise Linux (RHEL), Ubunutu, CentOS.

Frameworks: Spring, Hibernate, JUnit

Databases: Oracle 9i, NoSQL Databases (Cassandra and HBase), Oracle

IDE/ Tools: Eclipse, Net Beans

PROFESSIONAL EXPERIENCE:

Confidential, Cincinnati, OH

Big Data Architect

Environment: Kafka - Confluent and Amazon Managed Kafka Service

Responsibilities:

  • Designed end to end architecture for standing up Kafka clusters
  • Participated in vendor talks to understand the product details
  • Stood up clusters on both Confluent and Amazon Managed Kafka Service (MSK) platforms
  • Worked with network teams to create VPC peered clusters to cloud application
  • Worked with Cloud Engineering team to create Transit Gateway and Service Rail connection
  • Created Topics for multiple teams to publish and subscribe messages
  • Setup Access Control Lists (ACLs) for the topics for production and consumption
  • Worked with development team Terraform scripts to automate cluster creation on MSK
  • Created a git repository for sharing the common code base

Big Data Architect - Developer

Confidential

Environment: Kafka, Spark, HBase, ElasticSearch

Responsibilities:

  • Architected end to end ingestion and processing of the data from the source system to consumption layer
  • Worked with development team to design multiple data streams for real-time, intra-day and batch processing through Kafka Queues
  • Integrate Kafka Connectors from Mainframe to Kafka to push data to produce
  • Work with network teams to create required firewall rules to connect Data Sources and Kafka queues
  • Collaborate with enterprise wide teams to gather approvals for production release
  • Complete the governance process in time to remove potential roadblocks
  • Kafka Queue connect to Spark Streaming to cleanse the data to load onto HDFS
  • Store the results of the processing in columnar format in HBase for intraday batch file compare
  • Index the data for Elasticsearch consumption via java scripts from the consumption layer/systems

Big Data Developer/Architect

Confidential

Environment: IDAA DB2, Spark, HDFS, Hive, Talend, Custom and Proprietary ETL tools and parsers

Responsibilities:

  • Understand the nature of the data and data sources from the data organization
  • Worked with data organization to understand the nature of the data in the primary data source
  • Architected the zones and layers as per the datasets and query complexity for End to End solution design
  • Provide ingestion mechanism through Spark and Talend for ETL
  • Model the data to fit the Financial Service Data Model (FSLDM) to provide a layer for ingestion into IDAA DB2
  • Process the data from the common structure into the integrated layer comprised of HDFS and IDAA DB2
  • Scoped the file types required to keep sync IDAA tables and Hadoop data files
  • Transform the data from the integrated layer access layer hosted on DB2
  • Provide mechanisms for delta captures for incremental changes to the data from source systems to reload and retry

Big Data Developer

Confidential

Environment: Spark, python, Amazon Web Services - EC2 Cluster

Responsibilities:

  • Format and load geospatial data from the vendor into AWS EC2 cloud storage for cost reduction using direct data load
  • Create an EC2 instance with greater EBS storage for temporary files
  • Work with network teams to provide firewall access to connect from on premise to AWS cluster
  • Create and tweak the spark scripts to meet the heavy IO operation on large geospatial data
  • Understand the EBS and other storage/ script requirements to avoid the program to freeze due to space constraints
  • Access the data in the cloud to perform spark programs to calculate the ideal spot to launch a office/branch in a given vicinity
  • Create spark scripts to search in grids to understand the number of possibilities and eliminate the ones that are currently occupied / not available
  • Integrate cloudwatch to monitor the cluster performance, setup alerts to get notified when disk and memory usage gets almost filled

BIG DATA DEVELOPER

Confidential

Environment: Cloudera Hadoop, Sqoop, Hive, Hue, Tableau, Red Hat Enterprise Linux (RHEL)

Responsibilities:

  • Created SQOOP ingestion process to extract the data from RDBMS
  • Configured Batch ETL jobs to process the data via data pipeline
  • Ingested data into HDFS and created a Hcatalog to work seamlessly for queries against the data lake
  • Hive queries and HiveQL scripts to process the data and provide concentrated data sets
  • Used partitioning and bucketing of the data to improve the rate of response for queries
  • Developed Java scripts to refine the data. created landing and processed zones in HDFS and transformation rules were implemented by Talend
  • Data querying done by Tableau using JDBC/ODBC connection with Hive connector to provide visualization reports

Confidential

Big Data / Java Developer

Environment: Apache Hadoop, Cloudera Manager CDH4, CentOS, Java, MapReduce, Eclipse Indigo, PIG, Hive, Sqoop, Oozie and SQL, JUnit.

Responsibilities:

  • Installed and configured Apache Hadoop cluster for data storage
  • Installed and configured PIG, Hive, Sqoop and Oozie workflow for multiple jobs
  • Created simple and complex MapReduce jobs to pre-process data before ingestion to Hadoop cluster
  • Created MapReduce jobs using PIG scripts and Hive
  • HBase was used initially while benchmarking
  • Created Java application that analyzes data using Monte Carlo simulations.
  • Optimized PIG and Hive UDFs to work efficiently with the data sampling
  • Migrated ETL processes from Oracle to Hive for performing aggregate functions
  • Monitored Hadoop cluster with Cloudera manager
  • Implemented Oozie Workflow to run multiple PIG scripts and Hive queries

Java Developer / Java Web Application Developer

Confidential

Environment: Java, J2EE, NetBeans, Oracle 2010, Crystal Reports, Windows XP

Responsibilities:

  • Programmed in Java to create interfaces and file uploads for quotes images
  • Created javascripts to validate customer inputs on the web application
  • Implemented Captcha coding to avoid scammers
  • Client and Server side validations to check for input data using custom scripts
  • Optimized java code to provide latency free processing
  • Created JDBC/ODBC connections to Oracle for input and retrieval of the customer data
  • Created time out code for projects that are bid and time based
  • Created code to push daily crystal reports

Confidential

Senior Developer

Environment: C#.net, ASP.Net (v.3.5), Visual studio 2008 professional, SQL Server 2008, Windows XP

Responsibilities:

  • Created Active Server Pages (ASP) for different portions of the website
  • Created Master Pages to keep the schema and styling of the website alike
  • Created database schema for querying the required data for concurrent operations on the database
  • Ajax tool kit used for validation of the customer input
  • Created JDBC/ODBC connections to the SQL database to retrieve user and payment information
  • Integrated payment gateway API to make payment towards the courses registered by the users
  • Programmed in-built emailing system to notify users of their new and upcoming courses

Confidential

Business Process Associate

Environment: Troubleshooting Tool - Astrix, Logging Tool - Avira, Windows XP

Responsibilities:

  • Provide necessary support over the phone for the customers
  • Probe the customers on how and what of their activity to understand the cause of the issue
  • Remote into customers device to provide easy of solution for the customers
  • Make the customer understand the cause of the issue and provide tips to not run into other issues

We'd love your feedback!