We provide IT Staff Augmentation Services!

Big Data Developer Resume

4.00/5 (Submit Your Rating)

Bloomington, Il

SUMMARY

  • Experienced in facilitating streaming data using Kafka and Storm.
  • Strong knowledge in HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table - Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive.
  • Good experienced on moving the data in and out of Hadoop RDBMS, No-SQL and UNIX from various systems using SQOOP and other traditional data movement technologies.
  • Experience on Integration of Quartz scheduler with Oozie work flows to get data from multiple data sources in parallel using fork.
  • Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Data warehouses.
  • Experienced in facilitating streaming data using Kafka and Storm.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Experience on configuring fully the Flume agent, suitable for all type of logger data and store them in Avro Sink in Parquet file format and developing 2-tier architecture connecting channels between Avro sinks and Source.
  • Experienced in job scheduling and monitoring using Oozie, Zookeeper.
  • Experience with the Scala, Sparkimproving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD's,Spark YARN.
  • Experience in Querying on Parquet files by loading them in to Spark's data frames by using Zeppelin notebook.
  • Experienced in developing MapReduce jobs using Scala in Spark-Shell.
  • Good knowledge on tuning the Spark jobs by changing the configuration properties and using broadcast variables.
  • Good knowledge on spark architecture and real-time streaming using spark.
  • Hands-on experience with different spark libraries i.e., Spark Core, Spark SQL, Spark Streaming, Spark Graphx.
  • Experience on Productionizing Apache Nifi. for dataflows with significant processing requirements and controlling security of data flow.
  • Deploying templates to environments can be done via NiFi RestAPI integrated with other automation tools.
  • Experience on Amazon Web Services(AWS) with variety of services i.e, S3, EMR, Elastic Search(SOLR), EC2.
  • In-depth level of understanding in the strategy and practical implementation of AWS Cloud-Specific technologies including EC2, S3, VPC, Redshift, EMR, Spark, Glue.
  • Experienced in installation, configuration, supporting and managing Hadoop Clusters using Apache Cloudera distributions, Hortonworks, Cloud Storage and Amazon web services (AWS).
  • Good understanding of Azure Cloud services Azure SQL, Document DB, Data Lake Analytics, Event Hubs.
  • Good understanding of NoSQL databases and hands on work experience in writing applications No SQL Databases HBase, Cassandra and MongoDB.
  • Good knowledge in using different libraries and building data analysis tool using R/SAS programming
  • Experience in Object Oriented concepts, Multithreading and Java/Scala.
  • Proficient in development of applications using JAVA and J2EE technologies with experience in JSP,Servlets, Struts and Hibernate frameworks.
  • Experienced in building highly scalable Big-data solutions using Hadoop distributed platforms i.e., Cloudera.
  • Exposure to Hadoop Distributed Platforms i.e., Hortonworks and MapR.
  • Experienced in installation, configuration, supporting and managing Hadoop Clusters using Apache Cloudera distributions, Hortonworks, Cloud Storage and Amazon web services (AWS).
  • Solid understanding of Data Structures, Algorithms and Object-Oriented design concepts, UML (Use
  • Cases, Sequence and Class diagrams) Rational rhapsody.
  • Good programmatic Knowledge on Machine Learning algorithms i.e., Supervised and Unsupervised learning algorithms using python Machine Learning libraries.
  • Good experience in utilizing Cloud Storage Services like Git. Extensive knowledge in using GitHub.
  • Experience in using the databases such as MySQL, MS SQL Server, DB2 and Oracle 9i/10 g /11g.

TECHNICAL SKILLS

Big-Data Technologies: HDFS and Map Reduce, Pig, Hive, Impala, YARN, Oozie, Zookeeper, Apache Spark, Apache NiFi, Apache STORM, Apache Kafka, Sqoop, Flume. Streaming Technologies Spark Streaming, Kafka, Flume.

Machine Learning Algorithms: Supervised Learning, Unsupervised Learning, Deep Learning, NLP, Recommender Systems.

NoSQL Databases: HBase, MongoDB, Cassandra.

Java Technologies: Java, J2EE, JDK 1.4/1.5/1.6/1.7/1.8 , JDBC, Hibernate, XML, JSP 1.2/2, Servlets, EJB, JMS, Java Beans, AJAX, JNDI.

Frameworks: MVC, Struts, Hibernate, Spring Framework, Spring Boot.

Databases: Netezza, SQL Server, MySQL, ORACLE, DB2.

Programming Languages: C, C++, Java, J2EE, JDBC, JUnit, Log4j, C#, Python, Scala, Android, PL/SQL, HQL, Unix, Shell Scripting.

Scripting Languages: Python, Perl, Windows Power Shell

Web Services: JAX-RS, Restlet, JAX-WS, SOAP.

IDE Development Tools: Eclipse, Net Beans, Visual Studio, XCode, Android Studio, Intellij, Jetbrains.

Web Development: HTML, JavaScript, JQuery, Ajax, Boot Strap, Angular JS, Node.js.

PROFESSIONAL EXPERIENCE

Confidential, BLOOMINGTON, IL.

Big Data Developer

Responsibilities:

  • Involved in loading data from LINUX file system to HDFS.
  • Implemented Sparkusing Scalaand utilizing Data frames and SparkSQL API, Data Frames and Pair RDD's for faster processing of data.
  • Involved in designing the row, key in HBase to store text.
  • Used Spark API over ClouderaHadoopYARN to perform analytics on data in Hive.
  • Load the data into Spark RDD and performed in-memory data computation to get faster output response and implemented spark SQL queries on data formats like Text file, CSV file and XML files.
  • Responsible for gathering requirements, process workflow, data modelling, architecture and design and led application development using Scrum.
  • Created Directories in HDFS according to the date using Scala code.
  • Integrated Maven build and designed workflows to automate the build and deploy process.
  • Collected and aggregated large amounts of policy data from different sources such as webservers in the form of XML using Kafka and intercepted the received data using Apache Flume and stored the data into HDFS for analysis.
  • Used Scala libraries to process XML data that was stored in HDFS and processed data was stored in HDFS.
  • Used Sparkfor interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Used HUE for running Hive queries. Created Partitions per day using Hive to improve performance.
  • Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
  • Used Sparkfor interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy, bzip2.
  • Developed UDF's to pre-process the data and compute various metrics for reporting in both pig and hive.
  • Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model.
  • Created SQOOP jobs to handle incremental loads from RDBMS into HDFS to apply Spark Transformations and Actions.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Written Sparkapplications using Scala to interact with the MySQL database using SparkSQL Context and accessed Hive tables using Hive Context.
  • Developed Python code to perform validation of input XML files, in separating bad data before ingestion in to the Hive and Data Lake.
  • Exploring with the Sparkfor improving the performance and optimization of the existingalgorithms in Hadoop using SparkContext, Spark-SQL, Data Frame, Pair RDD's,SparkYARN.

Tools: Cloudera, HDFS, Spark, Hive, Pig, Sqoop, Putty, HaaS (Hadoop as a Service), Apache Kafkaand the AWS, Spark, SPARK SQL, Maven, Java, Scala, SQL and Linux, YARN, Agile Methodology.

Confidential

Big Data Developer

Responsibilities:

  • Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
  • Prepared an ETL pipeline with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
  • Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop. Responsible for building scalable distributed data solutions using Hadoop.
  • Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
  • Developed ETL scripts based on technical specifications/Data design documents.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Reporting the data to analysts for further tracking of trends per various consumers.
  • Worked on Sequence files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS.
  • Involved in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive, Pig and Sqoop to import files into Hadoop.
  • Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled Structured data using Spark SQL.
  • Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
  • Expert knowledge on MongoDB NoSQL data modeling, tuning and backup.
  • Used Zookeeper to manage coordination among the clusters.
  • Developed Custom Input Format, Record Reader, Mapper, Reducer, Partitioner as part of developing end to end Hadoop applications.
  • Followed Agile-Scrum project development methodology for implementation of projects, part of the daily scrum meetings and sprint meetings.
  • Worked with NoSQL databases like HBase in making HBase tables to load expansive arrangements of semi structured data.
  • Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.

Tools: Hortonworks, HDFS, Map Reduce, Pig,Mesos, AWS Hive, Sqoop, Scala, Flume, Mahout, HBase, Spark, SPARK SQL, Yarn, Java, Maven, Git, Cloudera, MongoDB, Eclipse and Shell Scripting.

Confidential, SAN FRANSCISCO, CA.

J2EE/Hadoop Developer

Responsibilities:

  • Responsible for gathering business and functional requirements for the development and support of in-house and vendor developed applications.
  • Played key role in design and development of new application using J2EE, Servlets, and Spring technologies/frameworks using Service Oriented Architecture (SOA).
  • Wrote Action classes, Request Processor, Business Delegate, Business Objects, Service classes and JSP pages.
  • Developed validation using Spring's Validation Interface and used Spring Core and MVC develop the applications and access data.
  • Design and developed different PL/SQL blocks, Stored Procedures in DB2 database.
  • Developed data mapping to create a communication bridge between various application interfaces using XML and XSL.
  • Developed the application under JEE architecture, developed, designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript
  • Worked with structured and semi structured data of approximately 100TB with replication factor of 3.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
  • Extensively used Hive/HQL or Hive queries to query or search for a string in Hive tables in HDFS.
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
  • Involved in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.

Tools: Java, UNIX, HDFS, Pig, Hive, Spark, Scala, MapReduce, Flume, Sqoop, HBase, Cassandra, Cloudera Distribution, Yarn, Shell scripting, Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2.

Confidential

Jr. Java Developer

Responsibilities:

  • Actively involved in the analysis, definition, design, implementation and deployment of full Software Development Life Cycle (SDLC) of the project.
  • Used Hibernate, Object Relational Mapping (ORM) solution, technique of mapping data representation from MVC model to Oracle Relational data model with an SQL-based schema.
  • Implemented RESTful web services using Jersey for JAX-RS
  • Designed and implemented application using JSP, Spring, Hibernate, JDBC, SQL, ANT, JMS, Oracle.
  • Used object storage container to store the secured files, and retrieved from API by using Amazon Web Services (AWS).
  • Developed various UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams.
  • Used Multithreading (Concurrent) in programming to improve overall performance using Singleton design pattern in Hibernate Utility class.
  • Implemented SOA architecture with Web Services using SOAP, WSDL, UDDI and XML using Apache CXF framework tool/Apache Commons. Worked on parsing the XML files using DOM/SAX parsers.
  • Involved in Bug fixing of various modules that were raised by the testing teams in the application during the Integration testing phase.
  • Used Junit framework for unit testing of application and Log4j to capture the log that includes runtime exceptions. Used CVS for version control for implementing the application.

Tools: Java, JSP, HTML, CSS, Ubuntu Operating System, JavaScript, AJAX, Servlets, Struts, Hibernate, EJB

We'd love your feedback!