We provide IT Staff Augmentation Services!

Big Data Architect Resume


  • J2EE/JEE Application Components: Java 1 - 8, J2EE 2-4, AJAX, Jersey Framework, JavaScript, XML, XSL, HTML, CSS, jQuery
  • 10+ years of experience inSDLCwith key emphasis on the trending Big Data Technologies -Spark, Scala, Spark MLlib, Hadoop, Tableau in Cloudera / Hortonworks / MapR / AWS / GCP.
  • Extensive experience indata modeling, data architect,data warehousing&business intelligenceconcepts.
  • Architect, design&develop Big DataSolutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data.
  • Architecting and implementing Portfolio Recommendation Analytics Engine usingHadoop MR, Oozie, Spark SQL, Spark MLlib.
  • Excellent understanding ofHadoop architectureand underlyingframeworkincluding storage management.
  • Expertise in architecting Big data solutions usingData ingestion, Data Storage
  • Experienced in Worked onNoSQL databases - HBase &MongoDB, database performance tuning &data modeling.
  • Extensive knowledge in architecture design of Extract, Transform, Load environment usingJava, Spark, BigQuery.
  • Experience in usingPL/SQLto writeStored Procedures,FunctionsandTriggers.
  • Experience in integration of various data sources definitions likeSQL Server, Oracle, Sybase,ODBC connectors & Flat Files.
  • Experience in Handling Huge volume of data in/out fromTeradata/IBM DB2 / Big Data.
  • Experience in development of Big Data projects usingHadoop, Hive, HDP andMap Reduceopen source tools/technologies.
  • Experience inAmazon AWS EC2, Dynamo DB, S3 and other services
  • Expertise in data analysis, design and modeling using tools likeErwin.
  • Expertise inBig Data architecture(AWS, GCP, Hortonworks, Cloud era) distributed system,MongoDB, NoSQL.
  • Hands on experience onHadoop /Big Datarelated technology experience inStorage, Querying, Processing and analysis of data.
  • Experienced in using various Hadoop infrastructures such asMapreduce,Hive,Sqoop, andOozie.
  • Experience inAmazonEMR, Spark, Kinesis, S3, ECS, Cloud watch,Lambda, Athena,Zeppelin & Airflow.
  • Experienced in testing data inHDFSandHivefor each transaction of data.
  • Strong Experience in working with Databases likeOracle 12C/11g/10g/9i, DB2, SQL Server 2008andMySQLand proficiency in writing complexSQLqueries.
  • Experienced in using database tools likeSQLNavigator,TOAD.
  • Experienced with Sparkimproving the performance and optimization of the existing algorithms inHadoopusingSpark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Experienced in usingFlumeto transfer log data files toHadoop Distributed File System(HDFS)
  • Knowledge and experience in job work-flow scheduling and monitoring tools likeOozie and Zookeeper.
  • Good experience inShell programming.
  • Worked on Machine Learning models using Python / AWS.
  • Trained in Machine Learning landscape of Statistics, Python, Jupyter, Matplotlib, PyPlot, Tableau, Streamlit, RShiny.


J2EE/JEE Application Components: Java 1-8, J2EE 2-4, AJAX, Jersey Framework, Spring Boot/Core, MVC, Servlets, JSP, JavaScript, XML, XSL, HTML, CSS, jQuery

J2EE Containers and Services: IBM WAS 5.1 / 6.0, BEA Weblogic 5.1 / 6, Apache, JNDI and JDBC

Databases: IBM DB2 8, Oracle 7/8/9/10g/11g, MS SQL Server 2000/2005, Sybase, Informix and MySQL

Developer Tools: NetBeans, Eclipse, IntelliJ IDEA, PL/SQL Developer, Maven, JIRA, SharePoint, Autosys, and Oracle Developer

Business Analysis/Reporting/Other Tools: Magic draw, Visio, Actuate 4.x/5.x/6.x/7.x

Operating Systems: UNIX (Solaris, HP-UX, Red Hat Linux), Windows 3. / NT / 2000 / XP, Autosys

Big Data: Hadoop MRv1/MRv2, Hive, HBase, Drill, Solr, Spark, Sqoop, Pig, Scala, Drill, Amazon EMR, Amazon Redshift, AWS Athena, AWS Sagemaker, AWS Lambda, Apache Airflow, Apache Kafka, Apache Kudu and Zookeeper

Distributions: - Hortonworks, MapR, Cloudera, AWS and GCP

Machine Learning: Scala MLLib/ML, pySpark, Python, Anaconda, Jupyter, NumPy, Scitikit-Learn, Matplotlib, Pandas and Spyder

Languages and Systems (Working Knowledge): Talend Studio, C, C++, Python, R, MongoDB and Tableau



Big Data Architect


  • Worked on end-to-end workflow of each jobs from requirements gathering thru Development, Unit Testing, Code Review, creating ESP Scheduler jobs using Airflow in Test, UAT and PROD environments.
  • Worked on design/development of BigQuery jobs to ingest data from Teradata into BigQuery for Database Ingestion jobs.
  • Optimization techniques were used to fix Spark performance issues when running on cluster.
  • Global configuration files were used to drive the memory and various other requirements of individual Spark jobs.
  • Airflow DAGs were developed to chain Spark and Query Ingestion jobs.
  • CI/CD pipeline was implemented using Jenkins followed by Airflow job triggers.


Big Data Architect


  • Working on User stories with the Agile model of development using JIRA.
  • Working on design/development of multiple Spark jobs in Scala that ingest heavy data in size of 2TB from Oracle.
  • Working on Transformation jobs to take Parquet data files from HDFS with the final output as a CSV file stored on HDFS.
  • Copy of Allocated data was stored in Impala tables for BI consumption, other was used for building ML models.
  • Used the crunched down Parquet files as input data to analytics developed using Apache Spark Machine Learning and MLLib packages and pySpark.
  • As part of OCR pipeline, implemented Text mining to transpose words and phrases in unstructured data into numerical values.
  • Used machine learning and statistical modeling techniques to develop and evaluate algorithms to improve performance, quality, data management and accuracy.
  • Developed K-means Clustering approach on Sagemaker using Python to create a Prediction model of Customer Cash reserve as part of SEC Rule 15c3-3(e) from the mined text data stored in AWS S3.
  • A separate Linear regression model was built on Sagemaker using Python to test on Beta/Dev data and test on (presently QA data) involving Broker-Traders Firms/Customers data involving prediction of missing data in terms of Allocation codes etc.
  • A few optimization techniques were also used as part of the Machine Learning model development.

Hire Now