Spark Developer Resume Minneapolis, Minnesota - Hire IT People

SUMMARY:

7 years of work experience in IT, which includes experience in Development and Implementation of Hadoop, Data warehousing solutions and Java.
Experience working with Hadoop ecosystem using components like HDFS, MapReduce, Hive, HBase, Sqoop, Impala in Cloudera distribution and good knowledge on Hortonworks.
Hands on experience in writing Sqoop scripts to import data from multiple RDBMS to HDFS.
Experience in Hive partitioning, bucketing and perform joins on Hive tables and implement Hive SerDes.
Good Knowledge in writing Spark Applications in PySpark and Scala using Datafames.
Hands on Experience in designing and developing applications In Spark using Scala and Pyspark to compare the performance of Spark with Hive and SQL/Oracle.
Software developer in core Java Application Development, Client/Server Applications, and Internet/Intranet based database applications and developing, testing and implementing application environment using J2EE, JDBC, JSP, Servlets, Web Services, Oracle, PL/SQL and Relational Databases.
Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
Having working experience with Building RESTful web services, and RESTful API
Solid design skills using Java Design Patterns and Unified Modeling Language UML.
Strong understanding of real time streaming technologies Spark and Kafka.
Strong understanding of Logical and Physical data base models and entity - relationship modeling.
Replaced existing MR jobs and Hive scripts with Spark SQL, Spark data transformations for efficient data processing.
Strong understanding of Java Virtual Machines and multithreading process.
Experience in writing complex SQL queries, creating reports and dashboards.
Excellent analytical, communication and interpersonal skills.
Possess excellent communication, interpersonal and analytical skills along with positive attitude.

TECHNICAL SKILLS:

Programming/Scripting Languages: Scala, PySpark, Core Java, Python, SQL

Big Data: Hadoop, MapReduce, HDFS, Hive, sqoop, Spark, Kinesis, Kafka

Other tools: Microsoft Office tools, VSTS, VM ware, Git NoSQL, Oracle, MYSQL Apache Cassandra, Hbase

Big data Eco System: HDFS, Oozie, Zookeeper, Spark SQL, Spark streaming, Hue, Ambari, Impala.

File Formats: Txt, XML, JSON, Avro, Parquet, ORC

Cloud Computing: AWS

Visualization and Reporting Tools: Tableau, Microsoft Power BI

PROFESSIONAL EXPERIENCE:

Confidential - Minneapolis, Minnesota

Spark Developer

Responsibilities:

Working as a Big Data Engineer on Hortonworks distribution. Responsible for Data Ingestion, Data Cleansing, Data Standardization and Data Transformation.
Working with Hadoop 2.x version and Spark 2.x (Python and Scala).
Involved in extracting data from various data sources into Hadoop HDFS. This included data from SFTP server, GCS and AWS buckets.
Worked on creating Hive managed and external tables based on the requirement.
Implemented Partitioning and Bucketing on Hive tables for better performance.
Used Spark-SQL to process the data and to run on Spark engine.
Worked on Spark for improving performance and optimization of existing algorithms in Hadoop using Spark-SQL and Scala.
Worked on Google cloud Big Query to execute the queries and analyze the data quickly.
Worked on various file formats like Parquet, Json and ORC.
Developed end to end ETL pipeline using Spark-SQL, Scala on Spark engine.
Worked with external vendors or partners to onboard external data into Confidential GCS buckets.
Worked on Oozie to develop workflows to automate ETL data pipeline.
Worked visualization tools like Google visual studio and internal tool like Domo.
Used Spark for interactive queries, processing of streaming data and integration with NoSQL database for huge volume of data.
Experienced in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in Spark, Effective efficient Joins, Transformations and other during ingestion process itself.
Developed custom ETL solutions, batch processing and real-time data ingestion pipeline to move data in and out of Hadoop using Pyspark and shell scripting.
Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frame.
Worked with Sqoop import and export functionalities to handle large data set transfer between Oracle databases and HDFS.
Developed Spark jobs to clean data obtained from various feeds to make it suitable for ingestion into Hive tables for analysis.
Imported data from various sources into Spark RDD for analysis.
Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.
Utilized Hive tables and HQL queries for daily and weekly reports. Worked on complex data types in Hive like Structs and Maps.
Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDDs.
Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Supported code/design analysis, strategy development and project planning.
Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Assisted with data capacity planning and node forecasting.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Designing ETL processes using Informatica to load data from Flat Files, Oracle and Excel files to Confidential Oracle Data Warehouse database.

Environment: Spark, Spark SQL, Hive, Oozie, Sqoop, Flume, Java, Scala, PySpark, Shell scripting, Tableau.

Confidential - Atlanta, Georgia

Hadoop developer

Responsibilities:

Involved in review of functional and non-functional requirements.
Utilized SQOOP, Kafka, Flume and Hadoop File System API’s for implementing data ingestion pipelines.
Worked on real time streaming, performed transformations on the data using Kafka and Spark Streaming.
Migrated existing on-premise application to AWS and used AWS services like EC2 and S3 to process and store small data sets.
Experienced in maintaining the Hadoop cluster on AWS EMR.
Worked on importing metadata into Hive and migrating existing tables and applications to work on Hive and AWS cloud.
Involved in managing and reviewing Hadoop log files. Designed and implemented an ETL framework using Java.
Worked on real time streaming, performed transformations on the data using Kafka and flume.
Developed Spark Streaming jobs in Scala to consume data from Kafka Topics, made transformations on data and insert into HBase tables.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Developing Scripts and Batch Job to schedule various Hadoop Program.
Written Hive queries for data analysis to meet the business requirements.
Creating Hive tables and working on them using Hive QL.
Experienced in defining job flows.
Got good experience with MYSQL database Cassandra.
Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way.
Developed a custom Filesystem plug in for Hadoop so it can access files on Data Platform.
Designed and implemented MapReduce-based large-scale parallel relation-learning system.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Horton Works, Cloudera, HBase, Linux, XML, MySQL, Hadoop, HDFS, ETL, Kafka, YARN, Drill, Hive, Cassandra

Confidential

Hadoop Developer

Responsibilities:

Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System.
Performed ETL operations using Hive to transform transactional data into de-normalized form.
Created adhoc reports by gathering requirements from different teams.
Utilized Hive user defined functions to analyze the complex data to find specific user behavior.
Analyzed data using HiveQL to derive metrics like game duration, daily active users (DAU), weekly active users (WAU) etc.
Implemented Hive generic UDFs to in corporate business logic into Hive queries.
Worked along with the admin team to assist them in adding/ removing cluster nodes, cluster monitoring and trouble shooting.
Exported data to relational databases using Sqoop for visualization and to generate reports.
Created Machine Learning and statistical models like (SVM, CRF, HMM) to assess gamer performance.
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
Managed and reviewed Hadoop log files.
Tested raw data and executed performance scripts.
Shared responsibility for administration of Hadoop and Hive.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, PL/ SQL, MySQL. Sqoop, Linux XML MySQL.

Confidential

Core Java/J2EE Developer

Responsibilities:

Troubleshoot various software issues using debugging process and coding techniques.
Provide high-level customer support to remote clients using a support e-ticketing system.
Perform system administration for hosting server and client software.
Developed RESTful web services, and RESTful API.
Developed screens using Java, HTML, DHTML, CSS, JSP and JavaScript.
Designed Database for the application.
Implemented all validations and done testing.
Implemented and managed SQL database for use in background for security and internal proprietary processes.
Diagnose and correct errors within Java/HTML/PHP code to allow for connection and utilization of proprietary applications.
End user support and administrative functions to include password and account management.
Developed PL/SQL View function in Oracle 9i database for get available date module.
Used Quartz schedulers to run the jobs in a sequential with in the given time
Used JSP and JSTL Tag Libraries for developing User Interface components.

Environment: JDK 5.0, JavaScript, HTML, DHTML, XML, Struts, JSP, Servlet, JNDI, J2EE, Tomcat, Oracle, JSP, RESTful API.

Confidential

Java/J2EE Developer

Responsibilities:

Analyzed and reviewed client requirements and design
Worked on testing, debugging and troubleshooting all types of technical issues.
Good knowledge in OOPS concepts
Used JDBC for database connectivity and manipulation
Used Eclipse for the Development, Testing and Debugging of the application.
Working as java j2ee backend developer in creating the Maven web application project
Used DOM Parser to parse the xml files.
Log4j framework has been used for logging debug, info & error data.
Used WinSCP to transfer file from local system to other system.
Performed Test Driven Development (TDD) using JUnit.
Used JProfiler for performance tuning
Built the application using MAVEN and deployed using WebSphere Application server.
Gathered and collected information from various programs, analyzed time requirements and prepared documentation to change existing programs.
Used SOAP for exchanging XML based messages.
Used Microsoft VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in the design phase.
Developed Custom Tags to simplify the JSP code.
Designed UI screens using JSP and HTML.

Environment: Java, HTML, Servlets, JSP, Hibernate, Junit Testing, Oracle DB, SQL, Jasper Reports, iReport, Maven, Jenkins.

We provide IT Staff Augmentation Services!

Spark Developer Resume

Minneapolis, MinnesotA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship