Big Data Developer Resume
5.00/5 (Submit Your Rating)
State Street, MA
PROFESSIONAL SUMMARY:
- 10+ years of professional experience involving project development, implementation, deployment and maintenance using Core Java and Big Data related technologies.
- Hadoop Developer with 4+ years of working experience in designing and implementing complete end - to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Scala,Yarn, PIG, HIVE, Sqoop, Storm, Flume, Oozie, HBase.
- Good knowledge of Hadoop Architecture and expertise in using various components such as HDFS, YARN, Spark, MapReduce, Hive, Impala, Hbase, and Kafka.
- Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Pair RDD's, Spark YARN.
- Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), User Defined Aggregate Function (UDAFs) for custom data specific processing.
- In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Application Master, Name Node, Data Node etc.
- Worked with Big Data Spark applications on cloud through Amazon Web Services (AWS) EMR and S3.
- Experience in developing Pig scripts and Hive Queries.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Hands on experience working with NoSQL database including MongoDB and HBase.
- Experience in optimizing Spark jobs by using different optimization techniques.
- Participated in multiple big data POCs to evaluate different architectures, tools and vendor products.
- Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Hands on experience in application development using Java, RDBMS, and Linuxshell scripting.
- Expertise in Web technologies using Core Java, J2EE, Servlets, EJB, JSP, JDBC, Java Beans, and Design Patterns.
- Strong Knowledge on Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
- Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Worked on NoSQL databases including HBase,Marklogic.
- Experienced with performing CRUD operations using HBase Java Client API.
- Experience in working with JavaHBase API for ingestion processed data to HBase tables.
- Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
- Experience in Implementing Continuous Delivery pipeline with Maven, Ant.
- Profound knowledge on core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization.
- Experience writing Shell scripts in Linux OS and integrating them with other solutions.
- Strong Experience in working with Databases like SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent communication, interpersonal and analytical skills and a highly motivated team player with the ability to work independently.
TECHNICAL SKILLS
- Hadoop/Big Data: HDFS, MapReduce, Spark, Scala, Yarn, PIG, HIVE, Sqoop, Storm, Flume, Oozie, HBase, Hue, Zookeeper.
- Programming Languages: Java, PL/SQL, Pig Latin, HiveQL, Scala, SQL
- Development Tools: Eclipse, SVN, Git, Maven
- API’s: REST,EJB, Java Naming, and Directory Interface (JNDI)
- Databases: MS SQL
- No SQL Databases: Apache HBase, Marklogic
- Distributed platforms: Hortonworks, Cloudera
- Cloud Services: AWS S3, AWS MapReduce, DynamoDB
- Operating Systems: Linux and Windows
- JAVA/J2EE Technologies: Servlets, JSP, JDBC, EJB, JAXB, JMS, JAX-RPC, JAX- WS, JAX-RS, Apache CFX.
- Web Technologies: HTML, CSS, JavaScript, jQuery, Ajax, Backbone.js, Node.js, Ext JS
PROFESSIONAL EXPERIENCE
Confidential, State Street MA
Big data Developer
Responsibilities
- Developed the different components using Spark and Java.
- Designed and created Hive tables to store the processed result.
- Created multiple StreamSet pipeline for end to end integration.
- Built multiple data pipe lines using Pig scripts for processing data for specific applications.
- Used different file formats such as Parquet, Avro, and ORC for storing and retrieving data in hadoop.
- Started exploring AWS stack for migrating current on premises applications to cloud. i.e. EMR and RedShift.
- Used Spark-streaming for consuming event based data from Kafka and joined this data set with existing Hive table data to generate performance indicators for an application.
- Imported data from AWS S3 in to Spark data frames, Performed transformations and actions on data frames
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
- Real time streaming the data using Kafka with Spark
- Used the Spark - Cassandra Connector to load data to and from Cassandra
- Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (Hive QL)
- Developed analytical queries on different tables using Spark sql for finding insights and building data pipelines for data scientists to consume this data for applying ML models.
- Writing the script files for processing data and loading to HDFS/Hive.
- Completely involved in the requirement analysis phase.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Write the Queries using SparkSQL and Spark Dataset.
- Developed manual test validation test cases which involves data sampling, data completeness and data quality.
- Created automated scripts to drop and create the entire DB schema.
- Used Spark API over Cloudera Data Platform, YARN to perform analytics on data in Hive.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Confidential, MA
Bigdata Developer
Technical Environment: Spark, Scala, Oozie, HBase, Hive, Java, Shell script and Git, Cloudera,Hadoop, Map Reduce, HDFS, Hive, Cassandra, Python, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, HBase, Zookeeper, MongoDB, PL/SQL, MySQL
Responsibilities
- Developed the different components using Spark and Scala.
- Created Hive tables to store the processed results in a tabular format.
- Writing the script files for processing data and loading to HDFS
- Writing CLI commands using HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Completely involved in the requirement analysis phase.
- Responsible for building scalable distributed data solutions using Hadoop
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Exported the result set from Hive to SQL using Shell scripts.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs
- Developed manual test validation test cases which involves data sampling, data completeness and data quality.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Hortonworks Data Platform, YARN to perform analytics on data in Hive.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Confidential
Role: Bigdata Developer
Technical Environment: Strom, Mark Logic, Shell script, MapReduce, Git, XML, JDK1.6/1.7, HDFS, Hadoop, MapReduce, HDFS, Pig, Sqoop,Cassandra, Spark, Kafka Hive, Java, Oracle, Eclipse and Shell/Python Scripting, Linux, Eclipse
Responsibilities
- Developed the different componentsSpout, boltsfor processing trade store data using Storm.
- Developed code base to stream data from topic to Storm Spout Bolt and Database.
- Identifying the errors in the logs and rescheduling/resuming the job, killing the topology and start again.
- Configured, deployed and maintained a single node storm cluster in DEV environment.
- Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text file, Avro data files, Sequence files, Xml and JSON files, ORC and Parquet).
- Write queries to retrieve the data from MarkLogic database.
- Work on XML configuration for the Mark Logic and Production boxes.
- Write JUnit test case for the developed component.
- Automate the manual task using shell script and java.
- Provide Post deployment support
- Handling Issues, incidents in Production.
- Setup Geneos system for monitoring alerts.
- Setting up the work schedule using oozie.
