Big Data Engineer - Consultant Resume
5.00/5 (Submit Your Rating)
SUMMARY
- Senior J2EE / Spark, Big Data Engineer and Cloud Developer with 14 years of hands on experience in designing and developing enterprise level solutions in J2EE, Big Data Stack of technologies and Cloud Solutions.
- Strong experience in JAVA, J2EE, Spring, RESTful Web services, SQL and Development of User Interface Services.
- Strong experience in Kafka, Spark Streaming, Spark Data Frames, Spark SQL (analytical and aggregate functions) and Scala programming Language.
- Have good knowledge in performance tuning of Spark like Capacity scheduling(yarn queue), Dynamic resource allocation, Persisting data in memory, Repartitioning based on resource availability and parallelism by reading data in more parallel way.
- Have good working knowledge in AWS EMR, S3 and Kinesis Streams and good experience in application migration from on premises set up to Big Data environment in AWS Cloud.
- Have good working experience on NoSQL stores like Hbase, Big data stores like Hive & HDFS and traditional database technologies like Postgre SQL, Oracle, Sybase 12.5 (ASE) and MySQL.
- Experience in continuous integration with automated build and deployment using Maven and Jenkins.
- Have working experience in Elastic Search, Logstash and Kibana.
- Worked throughout the entire software development lifecycle including requirements gathering, analysis, design, development, testing, implementation, and post implementation support.
- Have working experience in HortonWorks and Cloudera environments.
- Good experience in design and development of applications using Spark / Scala in various Hadoop distributions including Databricks, Cloudera and Hortonworks.
- Have good experience in System Design and developed a Spring Boot project integrated with Spark and Scala.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Worked in the traditional Waterfall and Agile Scrum methodologies.
TECHNICAL SKILLS
- Big Data Technologies: Hadoop Map/Reduce, Hive, Spark, Spark Streaming, Spark SQL, Kafka, Parquet, Sqoop, Zookeeper, Scala, Elastic Search, HBase, Phoenix, Hortonworks development Platform, Ambari, Tableau.
- Java Technologies: Java, J2EE, Spring, Maven, SBT, Groovy, Gradle, ELK(Elastic Search, Logstash and Kibana), RESTful Web Services, Hibernate, JAXB, JMS, JSP & Servlets
- Scripting Languages: JavaScript, XML, HTML, Python, JQuery, Angular.js
- Databases: Postgre SQL, Sybase 12.5 (ASE), MySQL, Oracle, NoSQL(HBase)
- Development Tools: Eclipse, Intellij Idea, NetBeans, ANT, SBT, Maven, Gradle, Embedded Jetty, Rapid SQL, DBVisualizer, Squirrel SQL, JBoss, JBPM, WebSphere Application Server, WebSphere MQ, Tomcat, VSS, CVS, GIT, JUnit, Visio, JIRA, QuickBuild, Jira, Fisheye, Crucible, Jenkins
- Platforms: Windows, Linux, Cloudera (CDH 5.3), Hortonworks Development Platform
- Cloud: AWS (EMR, S3, Kinesis)
- Related Skills: UML, JUnit, Mockito, GMock, LOG4J/SLF4J
- Methodologies: Agile Scrum, UML, Design Patterns
PROFESSIONAL EXPERIENCE
Confidential
Big Data Engineer - Consultant
Responsibilities:
- LogViewer Analytics Tool - Logviewer Analytics tool is a parser and analytics tool where the incoming unstructured binary data is parsed and processed to JSON data which is stored in HBase. Later the same data is used in microservices for doing analysis on the LogViewer Analytics tool(Web UI).
- Created MicroServices using Spark and Scala as the spring boot project to retrieve data for LogViewer Analytics tool to display the data in various formats (millions of records sampled for graphical analysis, returning data based on filter criteria, detailed view of single record). Deployed the spring application in the big data environment as a spark job.
- Used Spark SQL and UDF's extensively to parse and analyse unstructured(schemaless) data and stored the data as json in Hbase.
- Have good experience in using HBase api's to create, update and select data from HBase.
- Used Phoenix client api on HBase to write sql queries and have good experience in developing UDF's in Phoenix query server.
- Have good experience in performance tuning of data storage and data retrieval in Hbase tables using Phoenix queries and Hbase configuration change to avoid overloading of region servers in HDP.
- Have good knowledge in using capacity scheduling and applying dynamic resource allocation techniques in spark for tuning the performance of spark job.
Confidential
Big Data Engineer - Consultant
Responsibilities:
- DMAT - Device Monitoring Analytics tool is a Confidential 's project for Network engineers for Monitoring performances on Network Key Parameter Indices of various mobile devices and doing Analytics on the collected data.
- Worked in developing a parsing solution using Spark and Scala where the Drive routes data collected from different mobile devices will be parsed
- DMAT Post Processor - Parsing solution using Big Data technology stack to parse the unstructured binary data from android mobile devices, converted to sequence file format and parsed to identify various key parameters, stored in Hive as a staging layer and it is pushed to Elasticsearch after applying logical formulas to derive Network key parameter indices.
- Heavily used Spark SQL and Spark UDF’s to retrieve the data from hdfs and doing data manipulation in different layers and also involved in performance tuning of spark jobs including capacity scheduling and dynamic resource allocation.
- As part of performance tuning implemented Scheduled batch processing of different modules, persisting the data in cluster memory and repartitioning datasets based on resource availability (when Elasticsearch was a bottleneck), applied parallelism by reading data in more parallel way.
- Have designed Table partitioning in Hive and improved the performance by storing the data in parquet format (binary files with column-oriented storage)
- DMAT Post Processor Parsing solution had been developed using big data technologies Spark (Dataframes and SQL more than rdd's), scala, hdp, hive, Postgre Sql and elasticsearch. and the key parameter indices will be stored in elasticsearch for Network Engineers to do analysis of the data in Monitoring tool (Web UI).
- Analysed and computed key parameter indices using Spark SQL on data stored in Hive in parquet format.
- Handled unstructured and structured (CSV, Postgre SQL) data for data analysis using spark SQL and Hive datastore for creating reports(for reporting Dashboard in DMAT tool).
- Have experience in Hortonworks development Platform for deployment and monitoring spark applications and data stores like hive and hbase.
- Have great experience in performance tuning of the spark applications.
- Involved in migration of BigData technology stack into AWS using EMR, S3 and Kinesis.
- Have created the Spark Cluster with HDFS and Hive in AWS EMR and used S3 as Data storage for binary files from android mobile devices. Used Kinesis streams for storing application logs in S3.
Confidential
Big Data Engineer - Consultant
Responsibilities:
- DMAT - Device Monitoring Analytics tool is a Confidential 's project for Network engineers for Monitoring performances on Network Key Parameter Indices of various mobile devices and doing Analytics on the collected data.
- Developed backend solution using spring, java, elasticsearch and postgre technologies to parse the files retrieved from android mobile devices and store the extracted data into elasticsearch.
- Developed rest api's for the Monitoring tool(Web UI) to retrieve data from Elasticsearch and Postgre sql.
- Developed kafka elasticsearch connector to migrate data from kafka to Elasticsearch.
- Created visualizations in Kibana and Tableau to embed the graphical representations in Device Monitoring tool.
- Have very good experience in writing elasticsearch queries for data retrieval in REST api's.
- Developed Microservices in Spring boot and used the services in DMAT Web UI.
- Have written several batch processing jobs to transform and process data from semi structured input format and push the same into ElasticSearch.
- Have written Elastic Search Queries for ArcGIS Service Oriented Interceptor Layers which renders maps on the web UI.
- Have worked in fixing the performance issues in Kafka by partitioning connector streams and fixing the Elasticsearch queue capacity Elastic Search accordingly.
- Have experience in creating Test cases using Mockito and also have experience in Maven build tool.
- Have experience in developing Multi threaded Java Post Processing code to convert data from android devices to JSON format.
Confidential
Technical Consultant
Responsibilities:
- Have worked in creating analysis model for a retail client using Spark Streaming and Cassandra. Worked in a Pricing Optimization and Inventory Management (Order Management) modules for setting the right prices and promotions for the products and to automate the order processing in the Inventory.
- Have developed a project of the log analysis using Kafka, Spark Streaming and Cassandra. Ingested real time log files into spark streaming through Kafka and stored the log analysis report in Cassandra. The Kafka messaging queue passes the logs on the Spark Streaming Component via a Spark Streaming Receiver. The processing logic is applied and the required metrics are calculated on the cluster and the data is stored in Cassandra.
- Have worked in developing Spark SQL/DataFrames in Scala code to perform analytical data processing logic and stored the data in Cassandra.
- Expertise in setting up Multi-node Hadoop Eco system and has a working proficiency in Sqoop, Flume, Hive and Impala.
- Have done data ingestion from Oracle database into HDFS using Sqoop.
- Involved in loading the data from HDFS, have done transformations using spark sql and Dataframes and stored results back to HDFS using Spark.
- Involved in creating a table in the Hive metastore in a given schema and worked on improving query performance by creating partitioned tables in the Hive metastore.
- Have experience in Agile Scrum Methodology.
- Have a freelancing experience in Confidential in 2015 and worked in transforming data from HDFS to HBase database using spark sql.