We provide IT Staff Augmentation Services!

Big Data Solution Architect Resume

2.00/5 (Submit Your Rating)

St Louis, MO

SUMMARY

  • 13 Years of IT experience including 5 years 9 months in Big data and Hadoop technologies and worked in ingestion, storage, querying, processing and analysis layers of big data.
  • Expertise in Hadoop technology stack which includes HDFS, Map Reduce programming, Sqoop, Hive, HBase, Impala, Zookeeper, Cassandra, Mongo DB, Oozie, Pig, Flume, Kafka, Spark Core, Spark Streaming, Spark SQL, Scala, Python, Storm, Avro, Elastic Search, Big SQL, Text Analytics, JAQL, etc.
  • Experienced Software Developer and Architect providing strategic and at the same time structured detailed solutions for clients and stakeholders.
  • Demonstrated ability to jump start new projects and impact on - going development with wide variety of technologies. Bringing solution through full life cycle into production on time, within budget, and of expected quality.
  • Recent experience combines hands-on coding, durable design and architecture, and repeatable Integration and delivery processes, to particular project needs.
  • Experience in tuning of Hadoop Cluster to achieve good performance in processing. Experience in Data Integration between ETL Tools (Diyotta, Pentaho) with Hadoop.
  • Familiar with (NoSQL) big-data database HBase and MongoDB
  • Experience includes writing complex queries in Hive QL.
  • Implemented workflows in Oozie using Sqoop, MapReduce, Hive, Spark and other Java and Shell actions. Worked in both Scala and Java programming in Spark Application development.
  • Good in integrating Spark with Hive and HBase, Spark with Cassandra.
  • Good in Spark Streaming designing and implementation.
  • In-depth knowledge of Sequence File, ORC, Avro and Parquet File Formats.
  • Extending Hive and Pig core functionality by writing Custom UDFs.
  • Expertise in working with SerDes in Hive to work with different file formats and with HBase.
  • Strong experience in Lambda Architecture to design and architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, mlib, Kafka, Flume, Map reduce, Hive etc.
  • Proficient in Cassandra Data Modeling and Analysis and CQL (Cassandra Query Language).
  • Experience in using Flume to load transactional and application log files into HDFS.
  • Good Knowledge in creating event processing data pipelines using Kafka and Storm.
  • Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using spark streaming.
  • Hands on experience in dealing with Compression Codecs like Snappy, LZO and BZIP2.
  • Experience in writing MR Unit to test the correctness of MapReduce programs.
  • Experience in installing, configuring and maintaining the Hadoop Cluster including YARN configuration using Apache, Cloudera, Hortonworks, IBM Big Insights and MapR Distributions.
  • Supported 200+ node clusters in different environments for different applications.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
  • Strong analytical skills with ability to quickly understand client’s business needs. Involved in meetings to gather information and requirements from the clients. Leading the Team and involved in Onsite, Offshore co-ordination.
  • Experience in developing Middleware Modules in Spring Boot and deploying them through Dockers and Ranchers.
  • Extensively worked on Database Applications using DB2, Oracle, MySQL, and PL/SQL.
  • Strong experience as a senior Java Developer in Web/intranet, Client/Server technologies using Java, J2EE, Angular, Java Script, Servlets, JSP and Spring Boot.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of Java and J2EE design patterns.
  • Experience in Agile Engineering practices. Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.

TECHNICAL SKILLS

Languages: Java, Scala, Python, C, C++, UNIX, SQL, Shell Script

BigData & Hadoop Eco Systems: Hadoop, MapReduce, YARN, Pig, Hive, H-Base, Sqoop, Impala, Oozie, Zookeeper, Spark Core, Spark SQL, Spark Streaming, Spark MLLib, R, Ambari, Cassandra, Elastic Search, Amazon EMR and S3, Avro, Parquet, Maven, Ant, Snappy, Bzip2, Big SQL.

Event Processing/Streaming: Flume, Spark Streaming, Kafka, Storm

Hadoop Distributions: Cloudera, MapR, Hortonworks, Pivotal HD, IBM Big Insights, Amazon EMR and S3.

Databases: Oracle 11g/10g/9i/8i, Microsoft SQL Server 2005/2008MySQL, DB2, Flat Files.

NoSQL Databases: HBase, Cassandra, MongoDB

Web Technologies: Angular, JavaScript, CSS, HTML, AJAX, XML, XSLT, BootStrap

Web Services: Restful

JAVA/J2EE Technologies: Java Beans, Spring Boot, Hibernate, JSP, Servlets, JDBC, Dockers, Rancher and Kubernetis, Jenkin

Cloud Computing: Amazon Web Services (EC2, EMR, S3)

Web/Application Servers: JBOSS 4x/6.4, Tomcat 5x/6x/7x/8x

Build and Build Management Tools: Maven, Ant, SBT, NPM

Databases Tools: TOAD, SQL Developer, SQL Workbench, Teradata SQL Assisstant

Operating Systems: UNIX / LINUX, Solaris 10, CENTOS, Ubuntu, Windows-XP/7/8

Development Tools: IntelliJ, Eclipse, Netbeans, RSA for Modelling, TOAD, Putty, VI Editor, Proficient in Microsoft Office tools, JIRA

PROFESSIONAL EXPERIENCE

Confidential - St. Louis, MO

Big Data Solution Architect

Responsibilities:

  • Designed the system using cutting edge technologies like Spark, Hadoop and Kafka to analyze real time data.
  • Used Spark DataFrames implementation for migrating existing SQL functionality in to Spark jobs.
  • Developing a streaming application to pilot the stream process from kafka to Hadoop
  • Implemented extraction, transformation and enrichment steps in spark streaming jobs.
  • Tuning Spark streaming to improve the performance.
  • Integrated Kafka with spark streaming to do real time data ingestion and ETL on top of it.
  • Implemented and called Spark SQL jobs in spark streaming process.
  • Implemented a kafka job to stream the data from islon to Hadoop to ingest switch network records
  • Used Kafka for real time data injestion and processing.
  • Tuned Kafka jobs to improve the performance and bandwidth of messages ingesting to Hadoop.

Environment: HDP 2.5, Ambari, HDFS, Spark SQL, Flume, Spark Streaming, Spark Core, Mongo DB, Scala, Python, Oozie, Kafka.

Confidential

Sr. Big Data Developer

Responsibilities:

  • Worked extensively in creating Hive scripts and do aggregations on different levels.
  • Worked on ETL using HBase, Hive on Hadoop.
  • Managed data coming from different sources.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Job management using Capacity scheduler Queues mechanism.
  • Designed a data warehouse using Hive
  • Worked on different data sources such as Teradata, SQL Server, Flat files etc.
  • Developed PIG scripts to do certain data enrichments as specified by business users.
  • Involved in end to end process of implementing and scheduling Hadoop, Spark jobs that used various technologies such as Sqoop, PIG, Hive, Mapreduce, Scala and Shell scripts, Sqoop.
  • Created Hbase tables to store audit data
  • Evaluated Oozie for workflow orchestration in the automation of Hive jobs.
  • Captured data from existing relational database into HDFS using Sqoop.
  • Experienced in writing Hive UDFs to for certain date conversions.
  • Worked on performance tuning process to improve the jobs performance.
  • Developed a POC using Spark SQL with Scala.
  • Experienced in managing and reviewing Hadoop log files.
  • Implemented test scripts to support test driven development and continuous integration.
  • Gained very good business knowledge on Telecom domain.

Environment: HDP 2.3, 2,4, 2.5, Ambari, HDFS, Hive, Spark, Scala, Oozie, Scoop, Spark and Linux Shell Scripting.

Confidential

Big Data Developer/ Architect

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Worked on moving all log files generated from various sources to HDFS for further processing.
  • Developed workflows in oozie using custom MapReduce, Pig, Hive and Sqoop Actions.
  • Tuned the cluster for optimal performance to process these large data sets.
  • Worked hands on with ETL process. Handled importing data from various data sources, performed transformations
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying. Written Hive UDF to for date conversions as per time zones.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Developed workflow and coordinators in Bedrock tool to automate tasks of loading data into HDFS and preprocessing with Hive in Avro format.
  • Gained very good business knowledge on processing claims data, detecting claim frauds, improve pharma services, marketing strategies etc.
  • Worked on data migration from SQL Server to HDFS framework.
  • Responsible for cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Used Maven for building jar files of MapReduce programs and deployed to Cluster.
  • Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
  • Involved in moving all log files generated from various sources to HDFS for further processing using Flume.

Environment: MapReduce, PIG, Hive QL, My SQL, H Base, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Sqoop, Unix, Avro, HL7, Json, XML, Pivotal HD, MapR, Hortonworks.

Confidential

Big Data Developer

Responsibilities:

  • Developing Scripts and Batch Job to schedule various functional modules.
  • Working on Importing and exporting data from different databases like Oracle, Teradata into HDFS and Hive using Sqoop.
  • Writing Hive and Pig scripts to do data cleaning, filtering, validations before applying the analytics.
  • Involving in Business Requirements clarification by coordination with Business Analysts.
  • Analyzing the requirements by conducting workshops with the Business Users.
  • Supporting the application up to two months after production deployment.
  • Acting as a team lead and is responsible for developing and debugging map reduce programs, Pig scripts and Hive queries. Installation & configuration of a Hadoop cluster (18 nodes) in AWS.
  • Developed Optimized Pig Scripts which reduced the execution time of traditional batch jobs by 80% of time. Experience in development of Java APIs for Pig Invocations.
  • Optimizing Hadoop Mapreduce code/ Hive scripts from the team for better Scalability, Reliability and Performance.
  • Used Hive for processing and querying over data and used MySQL for storing metadata information.
  • Implemented Hive tables and HQL Queries for the reports.
  • Involved in Organizational level Hadoop Centre of Excellence where we come up with strategies in improving the in house Hadoop competency.

Environment: Java, CDH 4.x, Hadoop, Map Reduce, Hive, Pig, Sqoop, Shell Script, Amazon EMR and S3.

Confidential

Responsibilities:

  • Technical Lead and System Manager for the team of 12.
  • Implemented utility framework for GUI and validation, designed the ORM from domain to relational model using hibernate configuration, implemented and configured the business components using spring IOC, implemented spring AOP for middleware services.
  • Responsible for design and development of new requirements and modifying existing components.
  • As a system manager, always in contact with client to know the technical difficulty they are facing and providing the appropriate solution to them.
  • Presenting prototypes/demos to clients and regular Status to the clients/PM.

Environment: Java, J2EE, JSP, Servlets, Java Beans, JavaScript, JDBC, JBoss Server 5.1.0, Spring 3.0, Hibernate 3.6.x, Oracle, Java Script, Eclipse, Servlets, CVS, Solaris 10, Eclipse 3.x and JBuilder 2007.

Confidential

IT Analyst

Responsibilities:

  • Implemented utility plug-in for GUI widget creation and validation.
  • Done proof of concepts of each delivery and presented to client before the project started.
  • As a module lead, always in contact with client to know the technical difficulty they are facing and providing the appropriate solution to them.
  • Presenting demos to clients and regular Status to the clients/PM

Environment: Core Java, Eclipse Plug-in Development, spring 2.5, Eclipse 3.4.x/3.5.x, IBM Clear Case, Citrix, X-Session and Solaris 10

Confidential

Software Engineer

Responsibilities:

  • Learned new technologies like CTI (Call Telephone Integration), DI(Desktop Integration), and Eclipse RCP in addition to J2EE technologies which are challenging.
  • Developed a batch code which was very much appreciated by client for making their job easier to deploy the UI Product on daily basis for 5 environments during SIT (System Integration Test) and PIT (Process Integration Test) phases. Intervene in Client interactions

Environment: Eclipse RCP, Core Java, JAXWS 2.1, Jacozoom, Web Services, WSDL, SOAP, XML, CTI(Call telephone Integration), Visual Basic 6.0 and ANT, WAS - IBM Application Server, WPS - IBM Process Server, Eclipse 3.1.x & 3.2.x, RSA (UML Modeling), Visual Source Safe and Sub version

We'd love your feedback!