Big Data / Hadoop Developer Resume
New, YorK
PROFESSIONAL SUMMARY:
- Around 5 years of IT experience in various domains with Bigdata, Hadoop Ecosystems and Java J2EE technologies.
- Very good hands - on in Spark Core, Spark Sql and good knowledge on Spark Streaming and Spark machine learning using Scala and Pspark.
- Solid understanding of RDD operations in Apache Spark i.e., Transformations and Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimizing Broadcasts.
- Tuned Spark RDD parallelism technics to improving the performance and optimization of the spark jobs on Hadoop cluster
- In depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
- Hands on experience in testing and implementation phase of all the big data Technologies.
- Experience with Apache Nifi sandbox instance.
- Solid understanding of open source monitoring tools like Apache Ambari .
- Analyze and develop Transformation logic for handling large sets of structured, semi structured and unstructured data using Spark and Spark SQL.
- Hands on experience in working with input file formats like orc, parquet, json, avro.
- Hands on experience with NOSQL databases Hbase and Cassandra.
- Good working knowledge in creating Hive tables and worked using HQL for data analysis to meet the business requirements.
- Experience in managing and reviewing Hadoop log files.
TECHNICAL SKILLS:
Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, MapReduce
Bigdata distribution: Hortonworks, Amazon EMR, Cloudera
Programming languages: Java, Scala, Python Scripting, SQL, Shell Scripting
Operating Systems: Windows, Linux
Databases: Oracle, SQL Server
Designing Tools: Intellij, Eclipse
Java Technologies: JSP, Servlets, Junit, Spring, Hibernate
Web Technologies: XML, HTML, JavaScript, JVM, JQuery, JSON
Linux Experience: System Administration Tools
Web Services: Web Service (RESTful and SOAP)
Development methodologies: Agile
Logging Tools: Log4j
Application / Web Servers: Cherrypy, Apache Tomcat, Websphere
Messaging Services: Flume, Kafka
Version Tools: Git, SVN
PROFESSIONAL EXPERIENCE:
Confidential, New York
Big Data / Hadoop Developer
Responsibilities:
- Analyzed Integrated data facility platform (IDF) and made recommendations to the Scala source code in storing large amount of data in parquet format on S3 buckets.
- Debugged the defects such as count mismatch, missing tables on S3.
- Analyzed nightly, daily, weekly and monthly spark jobs which are triggered using Autosys in IDF platform.
- Used EMR edge nodes in development and QA environments for debugging the issues.
- Analyzed Spears Automated Surveillance which is a rule based monitoring system built on top of IDF platform.
- Developed various test cases scenarios based on rules provided by the business using Spark-Scala API’s. Worked closely with Business analysts and Enterprise architects for understanding the rules provided by the business.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data
- Worked on several semi-structured formats to process datasets in S3 like JSON, XML, CSV
- Writing Spark SQL in Scala for extracting data from AWS S3, transform it and load the data to AWS S3
- Worked extensively on AWS Components such as EC2, S3, Elastic Map Reduce (EMR)
- Used Spark-SQL to load json data and create schema RDD and load it into Hive tables.
- Developed Spark scripts using scala as per requirements.
Environment: SparkCore, Spark-Sql, Scala, Hive, S3, EMR, Autosys, Oracle, Scoop
Confidential, California
Hadoop and Data Science Platform Engineer
Responsibilities:
- Performed benchmarking of federated queries in Spark and compared their performance by running the same queries on Presto.
- Defined Spark confs for optimization of federated queries by maneuvering the number of executors, executor-memory and executor-cores.
- Created partitions and buckets defined Hive tables for data analysis.
- Successfully migrated data from Hive to Memsql db via Spark engine where the largest table being 1.2T.
- Successfully ran benchmarking queries on Memsql database and calculated the performance of each query.
- Compared the performance of each benchmark query among different solutions like Spark, Teradata, Memsql, Presto, Hive (using Tez engine) by creating a bar graph in Numbers.
- Successfully migrated data from Teradata to Memsql using Spark by persisting Dataframe to Memsql.
- Provided a solution using HIVE, SQOOP (to export/ import data), for faster data load by replacing the traditional ETL process with HDFS for loading data to target tables.
- Developed Spark scripts by using Scala as per the requirement.
- Developed Java scripts using both RDD and Data frames/SQL/Data sets in Spark 1.6 and Spark 2.1 for Data Aggregation, queries and writing data.
- Used Grafana for analyzing the usage of spark executors for different queues on different clusters.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, efficient Joins, Transformations and other during ingestion process itself.
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Converting SQL codes to Spark codes using Java and Spark-SQL/Streaming for faster testing and processing of data and Import and index data from HDFS for secure searching, reporting, analysis and visualizations in Splunk.
- Working extensively on Hive, SQL, Scala, Spark, and Shell.
- Used to complete the Assigned radar's in time, used to store the code in GIT repository.
- Tested python, R, livy, teradata jdbc interpreters but executing sample paragraphs.
- Performed CI/CD builds of Zeppelin, Azkaban and Notebook using Ansible.
- Built a new version of Zeppelin by applying Git patches, changing the artifacts using Maven.
- Worked on shell scripting to determine the status on various components in data science platform.
- Performed data copying activities in a distributed environment using Ansible.
Environment: SparkCore,SparkSQL,Memsql,Presto,Teradata,Hive,ApacheZeppelin,Maven,Github,Intellij,Nginx,Redis,Monit,Linux,Shell Scripting,Ansible.
Confidential, Orlando,Florida
Bigdata Developer
Responsibilities:
- Extracted the data from Teradata into HDFS using Sqoop.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS.
- Implemented MapReduce programs on log data to transform into structured way to find user information.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views and visit duration.
- Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading using Informatica.
- Extensively used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Created Talend ETL jobs to read the data from Oracle Database and import in HDFS.
- Worked on data serialization formats for converting complex objects into sequence bits by using Avro,RC and ORC file formats
- Modified store procedures to setup the control table that used to generate package id, batch id and status for each batch.
- Created DDL’s for the migration tables residing in mysql and mssql databases
- Performed batch processing on large sets of data.
- Built Apache Nifi flow for migration of data from mssql and mysql databases to the Staging table.
- Performed transformations on large data sets using Apache Nifi expression language.
- Unit tested the migration of mysql and mssql tables using the built Nifi flow.
- Used Dbeaver for connecting to the different databases that are on different sources.
- Performed queries for verifying the data types of different columns that are being migrated to staging table.
- Responsible for monitoring data from source to target.
- Successfully populated the staging tables in mysql database without any data mismatch errors.
- Worked on Agile Version One methodology by attending the scrums and scrum plannings.
Environment: Apache Hadoop, Hortonworks HDP 2.0, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBaseOozie, Teradata, Avro, Java, Talend, Linux, Apache NiFi, Mysql, Mssql, Agile,Dbeaver.
Confidential - Omaha, NE
Hadoop Developer
Responsibilities:
- Imported data from our relational data stores to Hadoop using Sqoop.
- Created various MapReduce jobs for performing ETL transformations on the transactional and application specific data sources.
- Wrote PIG scripts and executed by using Grunt shell.
- Worked on the conversion of existing MapReduce batch applications for better performance.
- Big data analysis using Pig and User defined functions (UDF).
- Worked on loading tables to Impala for faster retrieval using different file formats.
- The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
- Created Reports and Dashboards using structured and unstructured data.
- Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
- Used HIVE definition to map the output file to tables.
- Setup and benchmarked Hadoop/HBase clusters for internal use
- Wrote data ingesters and map reduce programs
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance
- Wrote MapReduce/HBase jobs
- Worked with HBase, NOSQL database.
Environment: Apache Hadoop 2.x, MapReduce, HDFS, Hive, Pig, Hbase, Sqoop, Flume, Linux, Java 7, Eclipse, NOSQL
Confidential
Java Developer
Responsibilities:
- Worked with the business community to define business requirements and analyze the possible technical solutions.
- Requirement gathering, Business Process flow, Business Process Modeling and Business Analysis.
- Extensively used UML and Rational Rose for designing to develop various use cases, class diagrams and sequence diagrams.
- Used JavaScript for client-side validations, and AJAX to create interactive front-end GUI.
- Developed application using Spring MVC architecture.
- Developed custom tags for table utility component
- Used various Java, J2EE APIs including JDBC, XML, Servlets, and JSP.
- Designed and implemented the UI using Java, HTML, JSP and JavaScript.
- Designed and developed web pages using Servlets and JSPs and also used XML/XSL/XSLT as repository.
- Involved in Java application testing and maintenance in development and production.
- Involved in developing the customer form data tables. Maintaining the customer support and customer data from database tables in MySQL database.
- Involved in mentoring specific projects in application of the new SDLC based on the Agile Unified Process, especially from the project management, requirements and architecture perspectives.
- Designed and developed Views, Model and Controller components implementing MVC Framework.
Environment: JDK 1.3, J2EE, JDBC, Servlets, JSP, XML, XSL, CSS, HTML, DHTML, JavaScript, UML, Eclipse 3.0, Tomcat 4.1, MySQL.