Spark/scala Developer Resume
Naperville, IL
SUMMARY
- Over 7+ years of professional experience as software developer in Design, Development, Deployment, and support of large scale distributed systems.
- Experience in Bigdata related technologies like Hadoop frameworks, Map Reduce, Hive, HBase, PIG, Sqoop, Spark, Kafka, Flume, ZooKeeper and Oozie.
- Excellent understanding or knowledge of Hadoop architecture and various components such as Big Data and Hadoop Files System HDFS, Job Tracker, Task Tracker, Name Node, Data Node (Hadoop1.x), YARN concepts like Resource Manager, Node Manager (Hadoop 2.x) and Hadoop MapReduce programming paradigm.
- Sound knowledge of No - SQL databases Cassandra and HBase
- Using Sqoop to import data into HDFS / Hive from RDBMS and exporting data back to HDFS or HIVE from RDBMS
- Scala and Java, Created frameworks for processing data pipelines through Spark.
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF).
- Extensive experience in Designing, Installation, Configuration and Management of Apache Hadoop Clusters, Hadoop Eco systems and Spark with Scala.
- Extensive experience in Data Ingestion, Transformation, Analytics using Apache Spark framework, and Hadoop ecosystem components.
- Expert in working with Hivedata warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Experience using Kafka cluster for Data Integration and secured cloud service platform like AWS and doing Data summarization, Querying and Analysis of large Datasets stored on HDFS and Amazon S3 filesystem using Hive Query Language (HiveQL)
- Good knowledge on Spark architecture and real-time streaming using Spark.
- Strong knowledge on implementation of data processing on Spark-Core using SPARK SQL and Spark streaming.
- Hands on experience in working on Spark-SQL queries, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
- Expertise in integrating the data from multiple data sources using Kafka.
- Strong experience in Hadoop development and Testing big data solutions using Cloudera Distribution, Hortonworks), Amazon Web Services(AWS).
- Strengths include handling variety of software systems, capacity to learn and adapt to new technologies.
- Excellent problem-solving skills, high analytical skills and interpersonal skills.
- Ability to handle multiple tasks and work independently as well as in team.
- Experience in Active Development as well as onsite coordination activities in web based, client/server and distributed architecture using Java, J2EE which includes Web services, Spring, Struts, Hibernate and JSP/Servlets.
- During this period I have also acquired strong knowledge of Software Quality Processes and SDLC (Software Development Life Cycle).
- Good working knowledge on servers like Tomcat, Web Logic 8.0.
- Extensively worked on Java development tools, which includes Eclipse Galileo 3.5, Eclipse Helios 3.6, Eclipse Mars 4.5, WSAD 5.1.2.
- Ability to work in teams as well as an individual, quick learner and able to meet deadlines
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop 0.22.0, MapReduce, HDFS, HBase, Zoo Keeper, Hive, PigSqoop, Cassandra, Oozie, Azkaban, Apache Solr
Java/J2EE: Java 6, Ajax, Log4j, JSP 2.1 Servlets 2.3, JDBC 2.0, XML, Java Beans
Methodologies: Agile, UML, Design Patterns
Frameworks: Struts, Hibernate, Spring
DataBase: Oracle 10g, PL/SQL, MySQL
Application Server: Apache Tomcat 5.x 6.0, JBoss 4.0
Web Tools: HTML, Java Script, XML, XSL, XSLT, XPath, DOM
IDE/ Testing Tools: NetBeans, Eclipse
Scripts: Bash, ANT, SQL, HiveQL, Shell Scripting
Testing API: JUNIT
PROFESSIONAL EXPERIENCE
Spark/Scala Developer
Confidential - Naperville, IL
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Worked on Cluster of size 80 nodes.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Good experience with Talend open studio for designing ETL Jobs for Processing of data.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Involved in file movements between HDFS and AWS S3.
- Extensively worked with S3 bucket in AWS.
- Good experience with continuous Integration of application using Jenkins.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: Hadoop YARN, Spark 1.6, Spark Streaming, Spark SQL, Scala, Python, Kafka, Hive, Sqoop 1.4.6, Elastic Search, Impala, Cassandra, Tableau, Talend, Oozie, Jenkins, Cloudera, AWS-S3, Oracle 12c, Linux.
Confidential - Charlotte, NC
Hadoop Developer
Responsibilities:
- Analyzed large data sets by running Hive queries and Pig scripts
- Worked with the Data Science team to gather requirements for various data mining projects
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed Simple to complex MapReduce Jobs using Hive and Pig
- Involved in running Hadoop jobs for processing millions of records of text data
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing
- Involved in loading data from LINUX file system to HDFS
- Responsible for managing data from multiple sources
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, and Big Data
Confidential - Cambridge, MA
Hadoop Developer
Responsibilities:
- Oversee the performance of Design to develop technical solutions from Analysis documents.
- Exported data from DB2 to HDFS using Sqoop.
- Developed MapReduce jobs using Java API.
- Installed and configured Pig and also wrote Pig Latin scripts.
- Wrote MapReduce jobs using Pig Latin.
- Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
- Worked on Cluster coordination services through Zookeeper.
- Worked on loading log data directly into HDFS using Flume.
- Involved in loading data from LINUX file system to HDFS.
- Responsible for managing data from multiple sources.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Implemented JMS for asynchronous auditing purposes.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
- Experience in Develop monitoring and performance metrics for Hadoop clusters.
- Experience in Document designs and procedures for building and managing Hadoop clusters.
- Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs.
- Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
- Involved in Creating, Upgrading, and Decommissioning of Cassandra clusters.
- Involved in working on Cassandra database to analyze how the data get stored.
- Successfully loaded files to Hive and HDFS from Mongo DB Solar.
- Experience in Automate deployment, management and self-serve troubleshooting applications.
- Define and evolve existing architecture to scale with growth data volume, users and usage.
- Design and develop JAVAAPI (Commerce API) which provides functionality to connect to the Cassandra through Java services.
- Installed and configured Hive and also written Hive UDFs.
- Experience in managing the CVS and migrating into Subversion.
- Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.
Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, MySQL and Ubuntu, Zookeeper, Java (JDK 1.6).