We provide IT Staff Augmentation Services!

Big Data Developer Resume

5.00/5 (Submit Your Rating)

Hillsboro, OR

SUMMARY

  • Overall 6+ years of IT professional experience in Software Development and Requirement Analysis in Agile work environment with experience of Big Data Ecosystems experience in ingestion, storage, querying, processing and analysis ofBig Data.
  • Experience in dealing with Apache Hadoop components like HDFS, Map Reduce, Hive, HBase, Pig, Sqoop, Oozier, Mahout, Python, Spark, Cassandra, Mongo DB.
  • Experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
  • Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node,DataNode, Secondary Name node, and Map Reduce concepts.
  • Experienced managing No - SQL DB on largeHadoopdistribution Systems such as: Cloud era, Horton works HDP, Map M series etc.
  • Used shell scripts to perform ETL process to call sql or pmcmd commands, pre-post ETL process like file validation, zipping, massaging and archiving the source and target files, and used UNIX scripting to manage the file systems.
  • Experienced developingHadoop integration fordata ingestion,datamapping anddataprocess capabilities.
  • Worked with variousdatasources such as Flat files and RDBMS-Teradata, SQL server 2005, Netezza and Oracle. Extensive work in ETL process consisting ofdatatransformation, data sourcing, mapping, conversion.
  • Hands on experience in installing, configuring, and usingHadoopecosystem components likeHadoopMap Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume.
  • Strong understanding ofData Modelling and experience withDataCleansing,DataProfiling andDataanalysis.
  • Designed and implemented Apache Spark streaming application using Python and Scala.
  • Experience in ETL (data stage) analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of databases.
  • Experience in extracting source datafrom Sequential files, XML files, Excel files, transforming and loading it into the target data warehouse.
  • Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON.
  • Proficiency in programming with different IDE's like Eclipse, Net Beans.
  • Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes.
  • Good understanding of Service Oriented architecture (SOA) and web services like XML, XSD, XSDL, and SOAP.
  • Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services (leveraging AWS cloud services: EC2, Cloud Formation, VPC, S3, etc.
  • Good Knowledge onHadoopCluster architecture and monitoring the cluster.
  • In-depth understanding of Data Structure and Algorithms.
  • Experience in managing and troubleshootingHadooprelated issues.
  • Expertise in setting up standards and processes forHadoopbased application design and implementation.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Experience in managingHadoopclusters using Cloud era Manager.
  • Hands on experience in VPN, Putty, wisp, Unviewed, etc.
  • Expertise in setting up standards and processes forHadoopbased application design and implementation.
  • Excellent communication and inter-personal skills, flexible and adaptive to new environments, self-motivated, team player, positive thinker and enjoy working in multicultural environment.
  • Analytical, organized and enthusiastic to work in a fast paced and team-oriented environment.
  • Expertise in interacting with business users and understanding the requirement and providing solutions to match their requirement.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, Map Reduce, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Pig, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.

Hadoop Distributions: Cloudera, Hortonworks, Apache.

Languages: Java, Python, SQL, Scala and JavaScript, Pyspark

No SQL Databases: Cassandra, MongoDB and HBase

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts

Methodology: Agile, Waterfall

Web Design Tools: HTML, DHTML, AJAX, JavaScript, jQuery and CSS, Angular Js, Ext JS and JSON, Node.js.

Development/Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J

Frameworks: Struts, spring and Hibernate

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

Operating Systems: Windows 2000/2003/2008/2012 , Red hat Linux

Third Party Tools: Outline Extractor, SQL Developer, Putty, WINSCP

PROFESSIONAL EXPERIENCE

Confidential, Hillsboro, OR

Big Data Developer

Responsibilities:

  • Created End to End data pipeline for data movement between different servers to Azure data factory
  • Involved in requirement gathering phase of the SDLC and helped team by breaking up the complete project into modules with the help of my team lead.
  • Designed and develop ETL code using Informatica Mappings to load data from heterogeneous Source systems like flat files, XML’s, MS Access files, Oracle to target system Oracle under Stage, then to data warehouse and then to Data Mart tables for reporting.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Involved with heavy batch processing of files from internal and external sources. Also processed records into the Databricks/Spark environment using PySpark. Supported the client's CI/CD pipeline for the data they are loading with Kafka.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and Pre-processing.
  • Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Responsible for documenting use cases, solutions and recommendations.
  • Hands on experience on configuring Capacity scheduler.
  • Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
  • Primarily involved in data Migration process using Azure by integrating with GitHub repository.
  • Performed transformations, cleaning and filtering on imported datausing Hive, MapReduce, Impala and loaded finaldatainto HDFS.
  • Importeddatausing Sqoop to loaddatafrom Oracle to HDFS on regular basis or from Oracle server to HBase depending on requirements.
  • Developed pipelines to analyze large datasets combining Spark with established modeling tools.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structureddatacoming from various sources.
  • Created and maintained various Shell and Python scripts for automating various processes and optimized MapReduce code, pig scripts and performance tuning and analysis.
  • Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
  • Involved in loadingdatafrom UNIX file system to HDFS. Involved in designing schema, writing CQL's and loadingdatausing Cassandra.
  • Worked in importing and exporting the data from Relational Database Systems to HDFS by using Sqoop.

Environment: Spark 2.3, Hive 2.3, Pig 0.17, SQL, HBase, Sqoop 2.0, Apache Flume 1.8, Cassandra 3.11, Zookeeper 3.4, Python, MapReduce MRv2, Hortonworks

Confidential, Frisco, TX

Big Data Developer

Responsibilities:
  • Worked on Apache Solar which is used as indexing and search engine.
  • Configured Spark Streaming to receive real timedatafrom the Apache Kafka and store the streamdatato HDFS using Scala.
  • Worked with source systems to design, then developing ETL to load data.
  • Exported thedatausing Sqoop to RDBMS servers and processed thatdatafor ETL operations.
  • Designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability.
  • Experienced in designing and developing POC's in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Responsible for coding MapReduce program, Hive queries, testing and debugging the MapReduce programs.
  • Managed and supported of enterpriseData Warehouse operation,big data advanced predictive application development using Cloudera &Hortonworks HDP.
  • Built web portal using JavaScript, it makes a REST API call to the elastic search and gets the row key.
  • Loaded and transformed large sets of structured, semi structured and unstructureddatain various formats like text, zip, XML and JSON.
  • Extracted Real time feed using Spark streaming and convert it to RDD and processdataintodataFrame and load thedatainto Cassandra.
  • Involved in the process ofdata acquisition,datapre-processing anddataexploration of telecommunication project in Scala.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
  • Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Imported weblogs & unstructureddatausing the Apache Flume and stores thedatain Flume channel.
  • Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
  • Used RESTful web services with MVC for parsing and processing XMLdata.
  • Built the automated build and deployment framework using Jenkins, Maven etc.

Environment: Spark 2.3, Hive 2.3, Pig 0.17, SQL, HBase, Cassandra 3.11, Zookeeper 3.4, Python, MapReduce MRv2, Hortonworks, Hadoop, HDFS, YARN, Sqoop, Spark, Hive.

Confidential, Bothell, WA

Big Data Developer

Responsibilities:

  • Installed and configured Hortonworks Distribution Platform (HDP 2.4) on Amazon EC2 instances with 100 nodes.
  • Designed workflows in Jenkins to automate and parallelize ingestion and import jobs (Sqoop, Spark, JScape, datameer, etc.) on Apache Hadoop environment by Hortonworks (HDP 2.4).
  • Developed scripts for build, deployment, maintenance, and related tasks using Jenkins, Maven and Bash.
  • Configured YARN Queue Manager to accept multiple applications by setting User limit factor.
  • Developed automated scripts to import data from S3 to Datameer.
  • Created FTP jobs (JSCAPE) to import data from COREMETRICS detailed files into data lake (hdfs/S3) through edge node.
  • Installed and configured flume agents to consume demand orders data.
  • Provided ad-hoc queries and data metrics to the Business Users using Datameer, Hive and Redshift.
  • Helped to develop a RESTful API to provide access to data in elastic search, Cassandra and hdfs.
  • Developed java RESTful webservices to upload data from local to Amazon S3/listing S3 objects and file manipulation operations.
  • Spark Streaming collects the data from flume in real-time and performs necessary transformations and aggregation on the fly to build a use case and persists the data in HDFS.
  • Developed Scala scripts, using both Data Frames/SQL and RDD in Spark for data aggregation and queries.
  • Developed a Spark code and Spark-SQL/Streaming for processing of data.
  • Ingested data from Cassandra database into datalake (AWS S3/HDFS) using spark.
  • Responsible for monitoring the health of the cluster.
  • Debug and proposed solutions to the power users and business users for the data extractions and analytics using the Datameer and HDFS files.
  • Utilized market basket analysis to discover and understand customer purchasing behavior.
  • Used Git/SVN version-controlled tools.

Environment: Hadoop, Adobe Analytics, HDFS, YARN, Sqoop, Spark, Hive, AWS, Jenkins, Maven, Cassandra, Restful, SQL, RDD, Agile, Sqoop 2.0, Scala 2.12, Apache Flume 1.8

Confidential

Hadoop Developer

Responsibilities:

  • Worked in importing and exporting the data from Relational Database Systems to HDFS by using Sqoop.
  • Developed a common framework to import the data from Teradata to HDFS and to export to Teradata using Sqoop.
  • Load and transform large sets of structured, semi structured, and unstructured data that includes Avro, sequence files and XML files.
  • Involved in gathering the requirements, designing, development and testing.
  • Utilized Apache Hadoop environment by Cloudera.
  • Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
  • Imported the log data from different servers into HDFS using Flume and developed MapReduce programs for analyzing the data.
  • Performed operation using Partitioning pattern in MapReduce to move records into different categories.
  • Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
  • Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
  • Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
  • Worked on Hue interface for querying the data.
  • Hive was Used to Produce Results quickly based on the report that was requested
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the users.
  • Oozie and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
  • Experience in setting up and governing redshift Cluster.
  • Used Git/SVN version-controlled tools.

Environment: Java, Hadoop, HDFS, Hive, HBase, Pig, SQOOP, Oozie, MySQL, MapReduce, Linux, Eclipse, Zookeeper, Cloudera.

Confidential

Java Developer

Responsibilities:

  • Designed and Developed Server-side Components (DAO, Session Beans) Using J2EE.
  • Worked with Core Java concepts like Collections Framework, multithreading, memory management.
  • Involved in the design, development and deployment of the Application usingJava/J2EE Technologies.
  • Developed web components using JSP Servlets, JDBC and Coded JavaScript for AJAX and client-side data validation.
  • Created Use Case Diagrams, Class Diagrams, Activity Diagrams during the design phase.
  • Used JENKINS for continuous Integration.
  • Imported data from various Sources transformed and loaded into Data Warehouse Targets using Informatica Power Center.
  • Designed and developed front end user interface using HTML and Java Server Pages (JSP) for customer profile setup.
  • Used JUnit for testing Modules
  • Created Custom Exceptions and implemented Exception handling using Try, Catch and Finally Blocks.
  • Developed user interface using JSP, JavaScript and CSS Technologies.
Environment: Java, J2EE, Servlets, JSP, SQL, PL/SQL, HTML, JavaScript, CSS, Eclipse, Oracle, MYSQL, IBM WebSphere, JIRA.

We'd love your feedback!