Big Data Architect Resume Cincinnati, OH - Hire IT People

PROFESSIONAL SUMMARY:

Around 10 years of experience in software development and analysis. Practical experience in building industry specific Java applications and implementing Big - data technologies such as Apache Hadoop and No-SQL database DataStax Cassandra.
Configured, tuned and setup tools like Apache Kafka, PIG, SQOOP, Hive, HBase and OOZIE in various Hadoop distributions for industry specific needs.
Implemented Java APIs and created custom Java programs for full-fledged utilization of Hadoop and its related tools.
Collaborated with line of business in understanding the needs to build the Big Data environment for the organization. Implemented the tools and technologies for creating clusters and developing data archives and data warehouses. Orchestrated the deployment of the environment by working with enterprise wide teams
Experienced in installing, configuring and administering Hadoop cluster of major Hadoop distributions and optimize performance by tuning the cluster for better results.
Worked with multiple distributions of Hadoop including enterprise version of IBM BigInsights, Cloudera (CDH4, CDH5, CDH 6), Hortonworks HDP (v2.1,v2.2) and open source Apache Hadoop
: Architected and implemented both On Premise and cloud distributions of real time messaging system Kafka. Hands on experience installing Kafka distributions from Apache, Confluent and Amazon Web Services
Architected and implemented in knowing the traditional storage systems to design and implement the required data ingestion procedures for the Hadoop clusters that included migration tools such as SQOOP and Talend.
Performed data modeling and analysis on HDFS and NoSQL databases. Provided cluster tuning to derive optimal results
Created scripts for performing data-analysis with PIG, HIVE and IMPALA. Generated reports, extracts and statistics on the distributed data on Hadoop cluster. Generated Java APIs for retrieval and analysis on No-SQL database such as HBase, Cassandra and MongoDB.
Built MapReduce codes in Java that are customized for the requirement. Created User Defined Functions (UDFs), User Defined Aggregate Functions (UDAFs) in PIG, Hive. Used Hcatalog for simple query execution. Composed code and created the JAR files unavailable in PIG and Hive. Used automation tool in Maven while composing and creating the JAR files for custom tasks.
Developed SPARK applications using Scala for easy Hadoop transitions in future. Text search analysis was implemented using Lucene and Elasticsearch. Gained hands-on experience in implementing Kafka for a persistent messaging system.
Created applications in core Java, built application that satisfy use of database and constant connectivity such as a client-server model using JDBC, JSP, Spring and Hibernate. Implemented web-services for network related applications in java.
Created front end user interface using HTML, CSS and JavaScript along with validation techniques. Implemented Ajax toolkit for validation with GUI. Worked with image editing tools such as Photoshop and Adobe Lightroom.

TECHNICAL SUMMARY:

Big Data Technologies: IBM BigInsights, Cloudera (CDH4/CDH5/CDH6), Hortonworks HDP (v2.1/v2.2), Apache Hadoop and Datastax Cassandra.

Tools in Big Data: HDFS, MapReduce, Hadoop2/YARN, PIG, HIVE, HBASE, SQOOP, FLUME, OOZIE, HUE, Lucene, ElasticSearch, Kafka, IMPALA, SPARK, ZooKeeper, Mahout, Oozie, Azkaban.

Languages: Java, Python, PHP, Scala (Spark), PIG-Latin, HQL,SQL, PL-SQL, C++, C#, C# .Net, ASP .Net (3.5 and 4), Ajax toolkit (3.5 and 4).

Web Technologies: HTML, XML, CSS, JavaScript, JSP, JDBC, Maven, AJAX

Reporting Tools/ ETL Tools: Tableau, Power View for Microsoft Excel, Informatica

Operating Systems: Windows, Linux, Unix, Red Hat Enterprise Linux (RHEL), Ubunutu, CentOS.

Frameworks: Spring, Hibernate, JUnit

Databases: Oracle 9i, NoSQL Databases (Cassandra and HBase), Oracle

IDE/ Tools: Eclipse, Net Beans

PROFESSIONAL EXPERIENCE:

Confidential, Cincinnati, OH

Big Data Architect

Environment: Kafka - Confluent and Amazon Managed Kafka Service

Responsibilities:

Designed end to end architecture for standing up Kafka clusters
Participated in vendor talks to understand the product details
Stood up clusters on both Confluent and Amazon Managed Kafka Service (MSK) platforms
Worked with network teams to create VPC peered clusters to cloud application
Worked with Cloud Engineering team to create Transit Gateway and Service Rail connection
Created Topics for multiple teams to publish and subscribe messages
Setup Access Control Lists (ACLs) for the topics for production and consumption
Worked with development team Terraform scripts to automate cluster creation on MSK
Created a git repository for sharing the common code base

Big Data Architect - Developer

Confidential

Environment: Kafka, Spark, HBase, ElasticSearch

Responsibilities:

Architected end to end ingestion and processing of the data from the source system to consumption layer
Worked with development team to design multiple data streams for real-time, intra-day and batch processing through Kafka Queues
Integrate Kafka Connectors from Mainframe to Kafka to push data to produce
Work with network teams to create required firewall rules to connect Data Sources and Kafka queues
Collaborate with enterprise wide teams to gather approvals for production release
Complete the governance process in time to remove potential roadblocks
Kafka Queue connect to Spark Streaming to cleanse the data to load onto HDFS
Store the results of the processing in columnar format in HBase for intraday batch file compare
Index the data for Elasticsearch consumption via java scripts from the consumption layer/systems

Big Data Developer/Architect

Confidential

Environment: IDAA DB2, Spark, HDFS, Hive, Talend, Custom and Proprietary ETL tools and parsers

Responsibilities:

Understand the nature of the data and data sources from the data organization
Worked with data organization to understand the nature of the data in the primary data source
Architected the zones and layers as per the datasets and query complexity for End to End solution design
Provide ingestion mechanism through Spark and Talend for ETL
Model the data to fit the Financial Service Data Model (FSLDM) to provide a layer for ingestion into IDAA DB2
Process the data from the common structure into the integrated layer comprised of HDFS and IDAA DB2
Scoped the file types required to keep sync IDAA tables and Hadoop data files
Transform the data from the integrated layer access layer hosted on DB2
Provide mechanisms for delta captures for incremental changes to the data from source systems to reload and retry

Big Data Developer

Confidential

Environment: Spark, python, Amazon Web Services - EC2 Cluster

Responsibilities:

Format and load geospatial data from the vendor into AWS EC2 cloud storage for cost reduction using direct data load
Create an EC2 instance with greater EBS storage for temporary files
Work with network teams to provide firewall access to connect from on premise to AWS cluster
Create and tweak the spark scripts to meet the heavy IO operation on large geospatial data
Understand the EBS and other storage/ script requirements to avoid the program to freeze due to space constraints
Access the data in the cloud to perform spark programs to calculate the ideal spot to launch a office/branch in a given vicinity
Create spark scripts to search in grids to understand the number of possibilities and eliminate the ones that are currently occupied / not available
Integrate cloudwatch to monitor the cluster performance, setup alerts to get notified when disk and memory usage gets almost filled

BIG DATA DEVELOPER

Confidential

Environment: Cloudera Hadoop, Sqoop, Hive, Hue, Tableau, Red Hat Enterprise Linux (RHEL)

Responsibilities:

Created SQOOP ingestion process to extract the data from RDBMS
Configured Batch ETL jobs to process the data via data pipeline
Ingested data into HDFS and created a Hcatalog to work seamlessly for queries against the data lake
Hive queries and HiveQL scripts to process the data and provide concentrated data sets
Used partitioning and bucketing of the data to improve the rate of response for queries
Developed Java scripts to refine the data. created landing and processed zones in HDFS and transformation rules were implemented by Talend
Data querying done by Tableau using JDBC/ODBC connection with Hive connector to provide visualization reports

Confidential

Big Data / Java Developer

Environment: Apache Hadoop, Cloudera Manager CDH4, CentOS, Java, MapReduce, Eclipse Indigo, PIG, Hive, Sqoop, Oozie and SQL, JUnit.

Responsibilities:

Installed and configured Apache Hadoop cluster for data storage
Installed and configured PIG, Hive, Sqoop and Oozie workflow for multiple jobs
Created simple and complex MapReduce jobs to pre-process data before ingestion to Hadoop cluster
Created MapReduce jobs using PIG scripts and Hive
HBase was used initially while benchmarking
Created Java application that analyzes data using Monte Carlo simulations.
Optimized PIG and Hive UDFs to work efficiently with the data sampling
Migrated ETL processes from Oracle to Hive for performing aggregate functions
Monitored Hadoop cluster with Cloudera manager
Implemented Oozie Workflow to run multiple PIG scripts and Hive queries

Java Developer / Java Web Application Developer

Confidential

Environment: Java, J2EE, NetBeans, Oracle 2010, Crystal Reports, Windows XP

Responsibilities:

Programmed in Java to create interfaces and file uploads for quotes images
Created javascripts to validate customer inputs on the web application
Implemented Captcha coding to avoid scammers
Client and Server side validations to check for input data using custom scripts
Optimized java code to provide latency free processing
Created JDBC/ODBC connections to Oracle for input and retrieval of the customer data
Created time out code for projects that are bid and time based
Created code to push daily crystal reports

Confidential

Senior Developer

Environment: C#.net, ASP.Net (v.3.5), Visual studio 2008 professional, SQL Server 2008, Windows XP

Responsibilities:

Created Active Server Pages (ASP) for different portions of the website
Created Master Pages to keep the schema and styling of the website alike
Created database schema for querying the required data for concurrent operations on the database
Ajax tool kit used for validation of the customer input
Created JDBC/ODBC connections to the SQL database to retrieve user and payment information
Integrated payment gateway API to make payment towards the courses registered by the users
Programmed in-built emailing system to notify users of their new and upcoming courses

Confidential

Business Process Associate

Environment: Troubleshooting Tool - Astrix, Logging Tool - Avira, Windows XP

Responsibilities:

Provide necessary support over the phone for the customers
Probe the customers on how and what of their activity to understand the cause of the issue
Remote into customers device to provide easy of solution for the customers
Make the customer understand the cause of the issue and provide tips to not run into other issues

We provide IT Staff Augmentation Services!

Big Data Architect Resume

Cincinnati, OH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship