We provide IT Staff Augmentation Services!

Sr. Big Data Architect Resume

3.00/5 (Submit Your Rating)

Waltham, MA

SUMMARY

  • Over 10+ years of extensive hands - on experience in IT industry including deployment of Hadoop Ecosystems and Google cloud computing like MapReduce, Yarn, Sqoop, Flume, Pig, Hive, Big Query, Big Table and experience on Spark, Storm, Scala, Python.
  • Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.
  • Extensive Knowledge in Software Development Life Cycle (SDLC) model, Software Testing Life Cycle (STLC) and knowledge of few agile related frameworks such as Agile Scrum, Spiral, Kanban and RUP.
  • Strong Knowledge on Power BI to import data from various sources such as SQL Server, Azure SQL DB, SQL Server Analysis Services (Tabular Model), MS Excel etc.
  • Strong knowledge in writing complex queries in Teradata, DB2, Oracle, PL SQL.
  • Experience in using Sqoop to import data from databases to HDFS, PIG, Hive, Sqoop, SQL Scripting, Unix Shell Scripting & Autosys. Knowledge of setting Hadoop Cluster and experience on Cloudera environment
  • Excellent Experience inHadooparchitecture and various components such asHDFS, Job Tracker, Task Tracker, Name Node Data, NodeandMapReduceprogramming paradigm.
  • Experience in installing configuring and using Hadoop ecosystem components like HadoopMapReduce HDFS, HBase, Hive, Sqoop, Pig ZookeeperandFlume.
  • Good Exposure on Apache HadoopMap ReduceprogrammingPIG Scriptingand Distribute Application andHDFS.
  • In-depth Knowledge onHadoopClusterarchitectureand monitoring the cluster with understanding ofData Structureand Algorithms.
  • Experience in managing and reviewingHadoop log files with and knowledge of NOSQL databases likeMongoDB HBase Cassandra.
  • Experience in importing and exporting data usingSqoopfromHDFSto Relational Database Systems and vice-versa.
  • Experience inJava JSP Servlets, EJB, WebLogic, WebSphere, Hibernate, Spring, JBoss, JDBC, RMI Java Script, Ajax, JQuery, XML and HTML.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling anddata mining,machine learningand advanced data processing. Experience in optimizing ETL workflows.
  • Experience creating Visual report, Graphical analysis and Dashboard reports using Tableau, Data Studio.
  • Good experience in utilizing Cloud Storage Services like Git. Extensive knowledge in using GitHub and Bit Bucket.
  • All the functionality is implemented using Spring Boot and Hibernate ORM. Implemented Java EE components using Spring MVC, Spring IOC, Spring transactions and Spring security modules.
  • Built on-premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Talend

Programming Languages: Java, C/C++, eVB, Assembly Language

Scripting Languages: JSP & Servlets, PHP, JavaScript, XML, HTML, Python and Bash

Databases: NoSQL, Oracle, Teradata, DB2, PL SQL

UNIX Tools: Apache, Yum, RPM

Tools: Eclipse, JDeveloper, JProbe, CVS, Ant, MS Visual Studio

Platforms: Windows(2000/XP), Linux, Solaris, AIX, HPUX

Application Servers: Apache Tomcat 5.x 6.0, Jboss 4.0

Testing Tools: NetBeans, Eclipse, WSAD, RAD

Methodologies: Agile, UML, Design Patterns

PROFESSIONAL EXPERIENCE

Confidential, Waltham, MA

Sr. Big Data Architect

Responsibilities:

  • Participated in client meeting to configure and setup the tool required on the cluster.
  • Analyzed the files received from Upstream and suggested development team to implement based on the business logic.
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
  • Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery and DQF
  • Performed Data Quality checks and maintained the file data throughtout the process from preprocessing to big Query loading.
  • Resolved several issues when trying to load the data into the Google cloud.
  • Implemented the application using Spring Boot Framework and handled the security using Spring Security.
  • Used Micro service architecture with Spring Boot based services interacting through a combination of REST and Apache Kafka message brokers and also worked with Kafka Cluster using ZooKeeper.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Attended the Data mapping workshop to understand the relation between the data to be transferred, massage and load into the hive table.
  • Created data-models for customer data using Amazon Redshift Query Language.
  • Implemented zones in UNIX environment to move the data to ensure the data security between the teams.
  • Involved in data masking meeting to identify key critical elements and implemented the same in the code to mask the data.

Confidential, WASHINGTON, D.C.

Sr. Big Data Engineer / Hadoop Developer

Responsibilities:

  • Provided design recommendations to stakeholders that improved review processes and resolved technical problems.
  • Imported data from SQL Server DB, Azure SQL DB to Power BI to generate reports.
  • Used Power BI, Power Pivot to develop data analysis prototype, and used Power View and Power Map to visualize reports
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
  • Worked on Multi Clustered environment and setting up Cloudera Hadoop echo System
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau

Confidential, SAN FRANCISCO BAY AREA, CA

Sr. BigData Engineer / NoSQL Developer

Responsibilities:

  • Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
  • Perform debugging, tuning and performance enhancement of the NoSQL database platform
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Worked in writingHadoopJobs for analyzing data usingHive, Pig accessingText format files, sequence files, Parquet files.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Responsible for providing day-to-day Health Checks, administration support, and maintenance for existing and new NoSQL databases.
  • UtilizedOozieworkflow to runPigandHiveJobs Extracted files fromMongo DBthroughSqoopand placed inHDFSand processed.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.

Confidential, DEERFIELD, IL

Sr. Data Engineer / NoSQL DBA

Responsibilities:

  • Leveraged Hadoop and HDP to analyze massive amounts of clickstream data and identified the most efficient path for customers making an online purchase.
  • Analyzed Hadoop clusters using big data analytic tools including Pig, Hive, and MapReduce.
  • Conducted in-depth research on Hive to analyze partitioned and bucketed data.
  • Responsible for data modeling, design and implementation, software installation and configuration, database backup and recovery, database connectivity and security
  • Developed Oozie workflow to automate the loading of data into HDFS and Pig for data pre-processing.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Developed ETL framework using Python and Hive (including daily runs, error handling, and logging) to glean useful data and improve vendor negotiations.
  • Performed cleaning and filtering on imported data using Hive and MapReduce.

Confidential - TAMPA, FL

Hadoop Developer

Responsibilities:

  • Installed Setup of Hadoop cluster onAmazon EC2using whirr for POC.
  • Worked on analyzingHadoop clusterand different big data analytic tools includingPig,Hbasedatabase andSqoop.
  • Buildscalable distributed data solutionsusing Hadoop and Installed/configuredFlume, Hive, Pig, Sqoop and HBaseon the Hadoop cluster.
  • Implemented nine nodesCDH3Hadoop cluster on Red hat LINUX.
  • Worked on installing cluster commissioning / decommissioning of datanode, namenode, and recovery capacity planning with slots configuration.
  • Involved in loading data fromUNIX file systemtoHDFS.
  • Implemented best income logic usingPigscripts.

We'd love your feedback!